AP · Inference for a Mean Difference with Paired Data · 14 min read · Updated 2026-05-10

Inference for a Mean Difference with Paired Data — AP Statistics Study Guide

For: AP Statistics candidates sitting AP Statistics.

Covers: Paired study design, matched pairs design, conditions for inference, paired t-tests for a population mean difference, and paired t-confidence intervals for a population mean difference.

You should already know: One-sample t-procedures for a single population mean, conditions for inference for means, basics of hypothesis testing and confidence interval interpretation.

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.

1. What Is Inference for a Mean Difference with Paired Data?

Paired data inference is a type of inference for quantitative data used when we have two linked (dependent) measurements on the same or matched units, and we want to draw conclusions about the true mean difference between these measurements. Unlike inference for two independent samples, we reduce the two sets of measurements to a single set of differences, so paired inference is just a one-sample t-procedure applied to these differences.
By convention, notation for this topic is: $d_{i} = x_{1 i} - x_{2 i}$ for the difference of the two measurements on the $i$ -th pair, $\overset{ˉ}{d}$ is the sample mean of differences, $s_{d}$ is the sample standard deviation of differences, $n$ is the number of pairs, and $μ_{d}$ is the true population mean difference we are doing inference on.
This topic is part of Unit 7, which makes up 12-15% of the AP Statistics exam, and paired inference accounts for roughly one-third of that unit’s content. It appears in both multiple choice (as a procedure identification question) and free response (as a full inference question worth 3-4 points). Synonyms for this topic include matched pairs inference, paired t-inference, and inference for dependent means.

2. Paired Data and Study Design

Data is classified as paired when there is a natural one-to-one matching between observations in the two groups we are comparing. There are two common scenarios that produce paired data:

Repeated measures on the same unit: The same individual or experimental unit is measured twice, once for each condition (e.g., heart rate before and after exercise, test scores on two versions of an exam for the same student).
Matched pairs experimental design: Units are matched into pairs based on shared confounding variables (e.g., matching two patients of the same age, gender, and health status, then assigning one to treatment and one to control). Each pair contributes one difference.

The purpose of pairing is to eliminate variability between pairs, which reduces the standard error of the mean difference and makes inference more powerful than using independent samples. If pairing is present but ignored, you will get an unnecessarily large standard error and a less powerful test, leading to incorrect conclusions.

Worked Example

A consumer researcher wants to compare the price of 20 popular grocery items at a local supermarket vs. an online delivery service. For each item, she looks up the price at both retailers, recording two prices per item. Is this paired data, and why?

Check for matching: There is one price for each item at each retailer. Every item in the supermarket price list has exactly one matching price for the same item in the online delivery list.
The pairing reduces variability: differences in price can be attributed to the retailer, not to differences between products.
Conclusion: This is paired data, because each item creates a natural matched pair of prices across the two retailers. It would be incorrect to treat this as two independent samples of 20 prices each.

Exam tip: When in doubt, ask: "Does every observation in the first group have exactly one unique, logically connected observation in the second group?" If yes, it is paired; if no, it is independent.

3. Conditions for Paired t-Inference

All paired t-procedures (confidence intervals and hypothesis tests) require three conditions to be met, analogous to one-sample t-procedures, but applied to the differences rather than the original measurements:

Random: The pairs must be either randomly selected from the population of pairs (for observational studies) or treatments must be randomly assigned within pairs (for experiments). Random selection allows you to generalize results to the whole population, while random assignment allows you to draw causal conclusions.
Normal/Large Sample: The sampling distribution of $\overset{ˉ}{d}$ is approximately normal if either: (a) the number of pairs $n \geq 30$ , or (b) for $n < 30$ , the distribution of sample differences has no strong skewness or extreme outliers.
Independent: The differences must be independent of each other. If sampling without replacement from a finite population, the 10% condition applies: the population of pairs must be at least 10 times the sample size of pairs ( $N \geq 10 n$ ). For experiments with random assignment, the independence condition is automatically satisfied.

Worked Example

A physical therapist wants to test whether a new stretching routine reduces lower back pain. He recruits 18 patients with chronic lower back pain, measures their pain score before and after 4 weeks of the stretching routine, so he has 18 differences (before pain - after pain). Check the conditions for paired t-inference.

Random: If the 18 patients were randomly selected from the population of chronic lower back pain patients, the random condition is satisfied. If they were a convenience sample, we note that generalizability is limited, but we can still proceed for the sample.
10% Condition: There are more than $10 \times 18 = 180$ chronic lower back pain patients in the population, so the independence condition is satisfied.
Normal/Large Sample: $n = 18 < 30$ , so we need to plot the 18 sample differences to check for strong skewness or outliers. If no extreme departures from normality are present, the condition is satisfied.

Exam tip: Always check conditions on the differences, not the original two groups of measurements. The AP exam deducts points for checking conditions on unpaired original data.

4. Paired t-Test for a Population Mean Difference

A paired t-test is used to test a claim about the true population mean difference $μ_{d}$ . In almost all cases, the null hypothesis is $H_{0} : μ_{d} = 0$ , because we are testing whether there is any difference between the two paired measurements. The alternative hypothesis is one-sided ( $H_{a} : μ_{d} < 0$ or $H_{a} : μ_{d} > 0$ ) or two-sided ( $H_{a} : μ_{d} \neq = 0$ ) depending on the research question.

The test statistic for a paired t-test is: $t = \frac{d ˉ - μ _{d 0}}{s _{d} / n}$ where $μ_{d 0}$ is the hypothesized mean difference from the null hypothesis (almost always 0), $df = n - 1$ degrees of freedom. We use this t-statistic to calculate a p-value, which we compare to our significance level $α$ to draw a conclusion.

Worked Example

A coffee roaster tests whether a new roasting method increases the caffeine content of coffee beans. He roasts 10 batches of beans, splits each batch into two equal portions, roasts one portion with the old method and one with the new method, then measures the caffeine content per 100g for each portion. He calculates differences $d_{i} = new caffeine - old caffeine$ , getting $\overset{ˉ}{d} = 12 mg /100 g$ , $s_{d} = 20 mg /100 g$ . Conduct a paired t-test at $α = 0.05$ .

Hypotheses: Define $μ_{d} =$ true mean difference in caffeine content (new minus old). $H_{0} : μ_{d} = 0$ (no difference in mean caffeine), $H_{a} : μ_{d} > 0$ (new method has higher mean caffeine).
Conditions are confirmed to be met, so calculate test statistic: $t = \frac{12 - 0}{20/ 10} \approx \frac{12}{6.32} \approx 1.90$ .
Degrees of freedom $df = 10 - 1 = 9$ . For a one-sided test, the p-value is between 0.04 and 0.05 (from t-table: $t = 1.833$ gives $p = 0.05$ , $t = 2.262$ gives $p = 0.025$ ).
Conclusion: Since $p < 0.05$ , we reject the null hypothesis. There is convincing evidence at the $α = 0.05$ level that the new roasting method increases mean caffeine content.

Exam tip: Always define $μ_{d}$ in context (specifying which measurement minus which) before writing hypotheses. AP scoring requires this to earn full points for hypotheses.

5. Paired t-Confidence Interval for a Population Mean Difference

A paired t-confidence interval estimates the true value of $μ_{d}$ when we want to quantify the size of the mean difference, not just test if it is non-zero. The formula for a $C %$ confidence interval for $μ_{d}$ is: $\overset{ˉ}{d} \pm t^{*} \times \frac{s _{d}}{n}$ where $t^{*}$ is the critical t-value for confidence level $C$ and $df = n - 1$ degrees of freedom, and the term $t^{*} \times \frac{s _{d}}{n}$ is the margin of error. If the confidence interval does not contain 0, we reject $H_{0} : μ_{d} = 0$ at significance level $α = 1 - C$ for a two-sided test.

Worked Example

For the coffee roaster example above, construct and interpret a 95% confidence interval for $μ_{d}$ .

We already confirmed conditions are met. $df = 9$ , so for 95% confidence, $t^{*} = 2.262$ .
Calculate margin of error: $M E = 2.262 \times \frac{20}{10} \approx 2.262 \times 6.32 \approx 14.3$ .
Calculate interval: $12 \pm 14.3 = (- 2.3, 26.3)$ mg/100g.
Interpretation: We are 95% confident that the true mean increase in caffeine content from the new roasting method is between -2.3 mg/100g (a 2.3 mg decrease) and 26.3 mg/100g. Because 0 is inside the interval, we would fail to reject $H_{0} : μ_{d} = 0$ for a two-sided test at $α = 0.05$ , which matches our one-sided test conclusion here (the interval reflects two-sided uncertainty).

Exam tip: When interpreting the interval, always specify the direction of the difference (which group minus which) in context. Ambiguous interpretations lose points on the AP exam.

6. Common Pitfalls (and how to avoid them)

Wrong move: Treating paired data as two independent samples and using a two-sample t-test instead of a paired t-test. Why: Students see two groups of measurements and automatically reach for the two-sample procedure, without checking for pairing. Correct move: Always check for matching between observations first; if every observation in group 1 has one matching observation in group 2, use paired t-procedures on differences.
Wrong move: Checking the normality condition on the original two groups instead of the distribution of differences. Why: Students forget inference is on the differences, not the original measurements. Correct move: Always check for skewness/outliers on the sample differences, not the original two datasets.
Wrong move: Calculating $s_{d}$ from the difference of the two sample means instead of from the individual differences. Why: Students assume $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2} = \overset{ˉ}{d}$ (which is true) so they use the standard deviation of the two samples to calculate standard error, which is wrong. Correct move: Always calculate $d_{i}$ for each pair first, then calculate $\overset{ˉ}{d}$ and $s_{d}$ directly from the set of differences.
Wrong move: Using z-procedures instead of t-procedures for paired data even when $n$ is large. Why: Students confuse the t/z distinction, thinking large n justifies z. Correct move: Always use t-procedures for paired inference, because the population standard deviation of differences $σ_{d}$ is almost never known.
Wrong move: Interpreting a confidence interval as "95% of individual differences are between the two bounds". Why: Students confuse the distribution of individual differences with the confidence interval for the mean difference. Correct move: Always interpret the interval as a confidence interval for the population mean difference, not individual differences.
Wrong move: Forgetting to state a conclusion in context. Why: Students stop after rejecting/failing to reject $H_{0}$ and do not connect the result back to the problem scenario. Correct move: Always end your hypothesis test conclusion with a sentence that answers the original research question in the context of the problem.

7. Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

A researcher wants to test whether a new noise-canceling headphone reduces the number of distractions a worker experiences in an open office. The researcher recruits 25 office workers, has each worker work one day with the new headphones and one day with their original headphones, with the order of days randomized. Which of the following is the correct inference procedure for this study? A) Two-sample t-test for a difference in means, because we are comparing two groups B) Paired t-test, because each worker is measured twice under two conditions C) Two-sample t-test for a difference in means, because we have 25 measurements for each headphone type D) Paired t-test, because the sample size is less than 30

Worked Solution: First, we check for pairing: each worker contributes two measurements (one for each headphone type), so there is a natural one-to-one matching between the two sets of measurements. This is repeated measures on the same unit, so paired data requires a paired t-test. Option A incorrectly uses two-sample, option C incorrectly uses two-sample, and option D gives the wrong reason for using a paired t-test (sample size does not determine if a procedure is paired). The correct answer is B.

Question 2 (Free Response)

A bakery wants to know if a new type of yeast reduces the average baking time of sourdough loaves. The baker tests 10 different sourdough recipes, baking one loaf with the old yeast and one loaf with the new yeast for each recipe. She calculates differences $d_{i} = old baking time - new baking time$ (in minutes), getting $\overset{ˉ}{d} = 3.8$ minutes, $s_{d} = 4.1$ minutes. (a) Identify the appropriate inference procedure and explain why it is appropriate for this study. (b) Construct a 95% confidence interval for the true mean difference in baking time. (c) Based on your interval, is there convincing evidence at the $α = 0.05$ level that the new yeast reduces mean baking time? Justify your answer.

Worked Solution: (a) A paired t-confidence interval for the population mean difference $μ_{d}$ is appropriate. Each recipe has two matched loaves (one with each yeast type), so the data are paired by recipe, which removes variability from recipe-to-recipe differences, justifying paired inference. (b) Degrees of freedom $df = n - 1 = 10 - 1 = 9$ . For 95% confidence, $t^{*} = 2.262$ . The interval is: $\overset{ˉ}{d} \pm t^{*} \frac{s _{d}}{n} = 3.8 \pm 2.262 \times \frac{4.1}{10} = 3.8 \pm 2.93 = (0.87, 6.73)$ (c) For a two-sided test at $α = 0.05$ , a 95% confidence interval that does not contain 0 provides convincing evidence to reject $H_{0} : μ_{d} = 0$ . Our interval (0.87, 6.73) contains only positive values (old time minus new time is positive, so new time is shorter on average) and does not contain 0. So there is convincing evidence at the $α = 0.05$ level that the new yeast reduces mean baking time.

Question 3 (Application / Real-World Style)

An environmental scientist wants to estimate the difference in indoor air pollution (PM2.5 concentration) between homes with gas stoves and homes with electric stoves. She matches 20 homes by square footage, location, and ventilation rate, pairing one home with a gas stove and one with an electric stove per pair. She calculates differences (gas PM2.5 minus electric PM2.5), getting $\overset{ˉ}{d} = 4.5 μg / m^{3}$ , $s_{d} = 7.2 μg / m^{3}$ . Construct a 90% confidence interval for the true mean difference in PM2.5 concentration and interpret it in context.

Worked Solution: We use a paired t-interval because the data are matched pairs by home characteristics. Degrees of freedom $df = 20 - 1 = 19$ , so $t^{*} = 1.729$ for 90% confidence. Margin of error is: $M E = 1.729 \times \frac{7.2}{20} \approx 1.729 \times 1.61 = 2.78$ The interval is $4.5 \pm 2.78 = (1.72, 7.28) μg / m^{3}$ . Interpretation: We are 90% confident that the true mean PM2.5 concentration is 1.72 to 7.28 micrograms per cubic meter higher in matched homes with gas stoves compared to homes with electric stoves.

8. Quick Reference Cheatsheet

Category	Formula / Rule	Notes
Individual difference	$d_{i} = x_{1 i} - x_{2 i}$	Calculate for each pair first; $\overset{ˉ}{d}$ and $s_{d}$ are calculated from the $d_{i}$ values
Population parameter	$μ_{d}$	True mean of all differences in the population; $H_{0} : μ_{d} = 0$ for most tests
Paired t-test statistic	$t = \frac{d ˉ - μ _{d 0}}{s _{d} / n}$	Degrees of freedom $df = n - 1$ ; $μ_{d 0} = 0$ for nearly all tests
Paired t-confidence interval	$\overset{ˉ}{d} \pm t^{*} \times \frac{s _{d}}{n}$	$t^{*}$ = critical t-value for confidence level C, $df = n - 1$
Random condition	Pairs randomly sampled OR treatments randomly assigned within pairs	Required for generalizability (sample) or causal inference (experiment)
Normal/Large Sample condition	$n \geq 30$ OR differences have no strong skewness/outliers	Check on differences, not original measurements
Independence condition	$N \geq 10 n$ for sampling without replacement	10% condition applies to number of pairs, not individual measurements
Confidence interval interpretation	"We are C% confident the true mean difference [context] is between L and U"	Never interpret as C% of individual differences fall in the interval

9. What's Next

This topic is a core part of Unit 7 inference for means, and AP exam writers frequently test your ability to distinguish between paired and independent mean inference, so mastery here is critical for scoring well. This topic is a prerequisite for the upcoming topic of inference for a difference in means between two independent samples, where you will practice identifying the correct procedure for any given study design, a skill tested heavily in both MCQ and FRQ. Across the AP course, paired inference also teaches you the benefit of blocking (matching is a form of blocking) in experiments, which is a key concept in Unit 8 on inference for experiments. Without mastering this chapter, you will frequently misidentify the correct inference procedure, leading to unnecessary point deductions on exam day.

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →

Inference for a Mean Difference with Paired Data — AP Statistics Study Guide

1. What Is Inference for a Mean Difference with Paired Data?

2. Paired Data and Study Design

Worked Example

3. Conditions for Paired t-Inference

Worked Example

4. Paired t-Test for a Population Mean Difference

Worked Example

5. Paired t-Confidence Interval for a Population Mean Difference

Worked Example

6. Common Pitfalls (and how to avoid them)

7. Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

Question 2 (Free Response)

Question 3 (Application / Real-World Style)

8. Quick Reference Cheatsheet

9. What's Next

More study guides