AP · Inference for the Difference in Two Population Means · 14 min read · Updated 2026-05-10

Inference for the Difference in Two Population Means — AP Statistics Study Guide

For: AP Statistics candidates sitting AP Statistics.

Covers: Conditions for two-sample t-inference, confidence intervals for the difference of two independent population means, two-sample t-hypothesis tests, pooled vs. unpooled variance, and distinguishing independent two-sample inference from matched pairs inference.

You should already know: Confidence intervals and hypothesis testing for a single population mean; properties of the t-distribution; basics of random sampling and experimental design.

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.

1. What Is Inference for the Difference in Two Population Means?

Inference for the difference in two population means (often called two-sample t-inference for means) is a set of statistical methods to compare the true mean values of a quantitative variable across two distinct populations. This topic makes up approximately 4-5% of the total AP Statistics exam score, as part of Unit 7 (Inference for Quantitative Data: Means), which accounts for 12-15% of total exam score. It appears regularly in both multiple-choice (MCQ) and free-response (FRQ) sections, and often forms the core of a 3-5 point FRQ question.

Standard notation is used consistently on the AP exam: $μ_{1}$ and $μ_{2}$ are the unknown true population means for the two groups, $\overset{x}{ˉ}_{1}$ and $\overset{x}{ˉ}_{2}$ are the sample means calculated from observed data, $s_{1}$ and $s_{2}$ are sample standard deviations, and $n_{1}$ , $n_{2}$ are the sample sizes for each group. We almost always focus on inference for $μ_{1} - μ_{2}$ , the true difference between the two population means. This topic only applies when the two samples are independent—meaning no observation in one group is linked to a specific observation in the other group.

2. Conditions for Two-Sample Inference

Before conducting any inference for the difference in two population means, you must verify three core conditions to ensure your sampling distribution is well-behaved and your results are valid. The three conditions are:

Random: Both groups must be independent random samples from their respective populations, or come from a randomized comparative experiment with two treatment groups. This ensures the sampling distribution is unbiased.
Independence: Individual observations within each sample must be independent of one another. For sampling without replacement, this means the 10% condition holds: the sample size for each group is less than 10% of the total population size.
Normal/Large Sample: The sampling distribution of $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ must be approximately normal. This is satisfied if either: (a) both sample sizes are at least 30 (by the Central Limit Theorem), or (b) for smaller sample sizes, the distribution of each sample has no extreme outliers or strong skew, so the sampling distribution of the sample mean will still be approximately normal.

AP exam questions almost always require you to name and verify each condition to earn full credit, so it is critical you do not skip this step.

Worked Example

Problem: A high school counselor wants to compare the mean number of hours students spend on extracurricular activities per week between students who live on campus and students who commute. She obtains a random sample of 27 boarding students and 32 commuter students from the school’s large student body. The distribution of weekly extracurricular hours for boarding students is approximately symmetric with no outliers; the distribution for commuters is also roughly symmetric with no outliers. Are all conditions for inference met?

Random Condition: The problem explicitly states the samples are random, so the random condition is satisfied. This ensures no selection bias.
Independence Condition: The school has more than $10 \times 27 = 270$ boarding students and $10 \times 32 = 320$ commuter students, so the 10% condition holds. Observations within each group are independent.
Normal Condition: One sample size is less than 30 ( $n_{1} = 27$ ), but we are told both distributions have no outliers and are approximately symmetric, so the normality condition holds. $n_{2} = 32 \geq 30$ , so the Central Limit Theorem applies to the commuter sample.

All conditions for inference are met.

Exam tip: Never just write "conditions are met" on an FRQ. You must explicitly verify each condition with reference to the problem context to earn the point for each condition.

3. Confidence Intervals for $μ_{1} - μ_{2}$ (Independent Samples)

A confidence interval for the difference in two population means gives a range of plausible values for $μ_{1} - μ_{2}$ , the true difference between the two population means. The general structure of a confidence interval follows the same form as all confidence intervals: $point estimate \pm margin of error$ .

The point estimate for $μ_{1} - μ_{2}$ is simply $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ , the difference between the two sample means. The standard error (the standard deviation of the sampling distribution of the point estimate) is: $S E = \frac{s _{1}^{2}}{n _{1}} + \frac{s _{2}^{2}}{n _{2}}$ This formula comes from the property that the variance of the difference of two independent random variables equals the sum of their variances, which is why we add the two variance terms (never subtract them, a common mistake). This is the unpooled standard error, which is the default you will always use on the AP exam unless explicitly told to pool variances.

For degrees of freedom, AP accepts two methods: the conservative method ( $df = min (n_{1} - 1, n_{2} - 1)$ which is easy to calculate by hand) or the Welch-Satterthwaite approximation used by calculators, which gives a larger df and narrower interval. Both are accepted for full credit.

The final confidence interval is: $(\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}) \pm t^{*} \times S E$ where $t^{*}$ is the critical t-value for the desired confidence level and your degrees of freedom.

Worked Example

Problem: A coffee shop owner wants to estimate the difference in mean wait time for orders between the morning and afternoon rushes. A random sample of 15 morning orders has a mean wait time of 2.8 minutes with standard deviation 0.7 minutes. A random sample of 12 afternoon orders has a mean wait time of 2.1 minutes with standard deviation 0.6 minutes. Construct a 95% confidence interval for the difference in mean wait time ( $μ_{morning} - μ_{afternoon}$ ).

We already verified conditions are met, so proceed. The point estimate is $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2} = 2.8 - 2.1 = 0.7$ minutes.
Calculate the standard error: $S E = \frac{0. 7 ^{2}}{15} + \frac{0. 6 ^{2}}{12} = \frac{0.49}{15} + \frac{0.36}{12} = 0.0327 + 0.03 = 0.0627 \approx 0.250$
Degrees of freedom (conservative method) = $min (15 - 1, 12 - 1) = 11$ . The critical $t^{*}$ for 95% confidence and $df = 11$ is 2.201.
Margin of error = $2.201 \times 0.250 \approx 0.55$ . The confidence interval is $0.7 \pm 0.55 = (0.15, 1.25)$ .

We are 95% confident that the true difference in mean wait time between morning and afternoon is between 0.15 and 1.25 minutes.

Exam tip: If 0 is not in your confidence interval, that means you can reject the null hypothesis of no difference at significance level $α = 1 - confidence level$ — this is a common connection tested on multi-part FRQs.

4. Hypothesis Testing for the Difference in Two Population Means

Two-sample t-tests are used to test a claim about the difference between two population means, almost always testing the null hypothesis that there is no difference between the two means ( $H_{0} : μ_{1} - μ_{2} = 0$ ). The alternative hypothesis can be two-sided ( $H_{a} : μ_{1} - μ_{2} \neq = 0$ ), left-sided, or right-sided, depending on the research question.

The test statistic follows the same general form for hypothesis tests: $t = \frac{( x ˉ _{1} - x ˉ _{2} ) - Δ _{0}}{S E}$ where $Δ_{0}$ is the hypothesized difference from the null hypothesis, which is almost always 0. The standard error $S E$ is the same as for confidence intervals, and degrees of freedom follow the same rules. We calculate the p-value (the probability of observing a difference as extreme or more extreme than our sample difference if the null hypothesis is true) and compare it to our significance level $α$ to make a conclusion.

As with confidence intervals, you always use the unpooled version of the test unless explicitly told to assume equal population variances and pool.

Worked Example

Problem: A researcher claims that adults who regularly practice meditation have a lower mean resting heart rate than adults who do not. A random sample of 35 meditators has a mean resting heart rate of 68 bpm with standard deviation 5 bpm. A random sample of 42 non-meditators has a mean of 72 bpm with standard deviation 6 bpm. Test the researcher’s claim at $α = 0.05$ .

Define parameters and state hypotheses: Let $μ_{1}$ = mean resting heart rate for meditators, $μ_{2}$ = mean for non-meditators. $H_{0} : μ_{1} - μ_{2} = 0$ , $H_{a} : μ_{1} - μ_{2} < 0$ , $α = 0.05$ .
Check conditions: Random samples stated, 10% condition holds for both populations, both $n_{1} \geq 30$ and $n_{2} \geq 30$ , so all conditions are met.
Calculate test statistic: Point estimate = $68 - 72 = - 4$ . $S E = \frac{5 ^{2}}{35} + \frac{6 ^{2}}{42} = 0.714 + 0.857 \approx 1.253$ . $t = \frac{- 4 - 0}{1.253} \approx - 3.19$ .
Find p-value: $df = min (34, 41) = 34$ , one-tailed p-value for $t = - 3.19$ is approximately 0.0015.
Conclusion: Since $0.0015 < 0.05$ , we reject $H_{0}$ . There is convincing evidence at the 0.05 significance level that meditating adults have a lower mean resting heart rate than non-meditating adults.

Exam tip: Always define your parameters in words when writing hypotheses for FRQs. AP exam graders require this to earn full credit for the hypotheses step.

5. Pooled vs Unpooled, and Independent vs Matched Pairs

Two key distinctions that are commonly tested on the AP exam are (1) when to use pooled vs unpooled inference, and (2) when to use two-sample independent inference vs matched pairs inference.

For pooled vs unpooled: Pooled t-procedures assume that the two populations have equal variance, which is almost never known to be true in practice. The AP exam never requires you to use pooled procedures unless explicitly told to do so. Unpooled is always the default, and you should only use pooled if the problem explicitly states that the population variances are equal and you are instructed to pool.

For independent vs matched pairs: Two-sample inference for the difference in two population means is only for independent groups, where there is no link between individual observations in the two groups. Matched pairs data occurs when observations are paired: for example, the same subject measured before and after a treatment, or pairs of subjects matched on confounding variables like age and gender, with one in each group. For matched pairs, you should not use two-sample inference—instead, you calculate the difference within each pair and do a one-sample t-procedure on the mean difference.

Worked Example

Problem: For each scenario, state whether you should use two-sample t-inference for the difference in two population means, or a matched pairs t-procedure: (a) A physical therapist measures flexibility for 20 patients before and after a 4-week stretching program, to test if the program increases flexibility. (b) A physical therapist randomly assigns 20 patients to do either the stretching program or a control program, then compares mean flexibility after 4 weeks between the two groups.

For (a): Each patient has two measurements (before and after), so the two samples are dependent, linked by patient. This is matched pairs data, so we use a matched pairs t-procedure (one-sample t on the within-patient differences).
For (b): The two groups (treatment and control) are independently randomly assigned, with no linkage between individual patients in the two groups. This is independent groups, so we use two-sample t-inference for the difference in two population means.

Exam tip: AP exam questions often include a distractor that looks like two-sample data but is actually matched pairs. Always check if there is a pairing structure first before choosing your procedure.

6. Common Pitfalls (and how to avoid them)

Wrong move: Writing the standard error as $S E = \frac{s _{1}^{2}}{n _{1}} - \frac{s _{2}^{2}}{n _{2}}$ , subtracting the variances instead of adding. Why: Students confuse the difference in means with the difference in variances, forgetting that variance adds for any independent random variables. Correct move: Remember $V a r (X - Y) = V a r (X) + V a r (Y)$ , so always add the two variance terms under the square root, double-check this step before calculating.
Wrong move: Using a two-sample t-procedure for matched pairs data. Why: Students default to two-sample because there are two groups, and miss the pairing structure that makes the groups dependent. Correct move: Always check first if observations are linked between groups (same subject, matched subjects) — if yes, use matched pairs, not two-sample.
Wrong move: Saying the normality condition is satisfied when one sample is small and skewed, just because the other sample is large. Why: Students forget that both sampling distributions need to be approximately normal, not just one. Correct move: If one sample size is below 30, explicitly confirm it has no outliers or strong skew to satisfy the normality condition.
Wrong move: Using z-procedures instead of t-procedures just because both sample sizes are large. Why: Students know t approximates z for large n, but forget that population standard deviations are still unknown. Correct move: Always use t-procedures for inference on means, regardless of sample size, on the AP exam.
Wrong move: Interpreting a confidence interval as "there is a 95% chance the true difference is between $a$ and $b$ ". Why: Students repeat a common misinterpretation of confidence intervals. Correct move: State "we are 95% confident that the true difference in population means is between $a$ and $b$ " — the confidence is in the method, not the specific interval.
Wrong move: Using pooled t-procedures by default. Why: Some introductory courses teach pooled first, leading students to assume it is standard. Correct move: Only use pooled if the problem explicitly tells you to assume equal population variances; otherwise, always use unpooled.

7. Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

A sociologist studying differences in hourly wages between rural and urban workers in a region obtains independent random samples of 25 rural workers and 30 urban workers. The sample mean wage for rural workers is $18.50/ h o u r w i t h s t an d a r dd e v ia t i o n$ 2.50, and for urban workers it is $21.20/ h o u r w i t h s t an d a r dd e v ia t i o n$ 3.20. What is the standard error of the difference in sample means (urban minus rural)? A) 0.62 B) 0.76 C) 1.35 D) 2.15

Worked Solution: The standard error for the difference of two independent sample means is given by the unpooled formula $S E = \frac{s _{1}^{2}}{n _{1}} + \frac{s _{2}^{2}}{n _{2}}$ , where we add the two variance terms. Plugging in $s_{1} = 3.20$ , $n_{1} = 30$ , $s_{2} = 2.50$ , $n_{2} = 25$ : $S E = \frac{3. 2 ^{2}}{30} + \frac{2. 5 ^{2}}{25} = 0.341 + 0.25 = 0.591 \approx 0.76$ . The other options come from common mistakes: A uses the wrong degrees of freedom, C subtracts the variances, D is the difference in sample means. The correct answer is B.

Question 2 (Free Response)

A bakery wants to compare the mean rise of two different brands of yeast for sourdough bread. They bake 25 loaves with brand A and 20 loaves with brand B, getting the following summary statistics for height rise (in cm):

Brand	Mean Rise	Standard Deviation
A	4.2	0.8
B	3.7	0.7

(a) Check all conditions for inference for the difference in mean rise between the two brands. (b) Construct and interpret a 90% confidence interval for $μ_{A} - μ_{B}$ . (c) Based on your interval, is there statistically significant evidence of a difference in mean rise at the $α = 0.10$ level? Justify.

Worked Solution: (a) 1. Random: This is a randomized experiment (loaves assigned to each yeast brand), so random condition is met. 2. Independence: Loaves are baked independently, and the 10% condition is satisfied for the population of all loaves baked with each brand, so independence is met. 3. Normality: Both sample sizes are close to 20, no mention of outliers or skew, so normality is reasonable. All conditions are met.

(b) Point estimate: $4.2 - 3.7 = 0.5$ cm. Standard error: $S E = \frac{0. 8 ^{2}}{25} + \frac{0. 7 ^{2}}{20} = 0.0256 + 0.0245 = 0.0501 \approx 0.224$ . Degrees of freedom = $min (24, 19) = 19$ , $t^{*} = 1.729$ for 90% confidence. Margin of error: $1.729 \times 0.224 \approx 0.39$ . Confidence interval: $0.5 \pm 0.39 = (0.11, 0.89)$ . Interpretation: We are 90% confident that the true difference in mean rise between brand A and brand B yeast is between 0.11 cm and 0.89 cm.

(c) For $α = 0.10$ , a 90% confidence interval that does not contain 0 means we reject the null hypothesis of no difference. Since 0 is not in the interval $(0.11, 0.89)$ , there is convincing evidence at the 0.10 level that the mean rise differs between the two brands.

Question 3 (Application / Real-World Style)

An agricultural researcher tests whether a new fertilizer increases the mean yield of corn (measured in bushels per acre) compared to the standard fertilizer. She randomly assigns 30 one-acre plots to get the new fertilizer, giving a mean yield of 162 bushels per acre with standard deviation 12 bushels. She assigns 25 plots to get the standard fertilizer, giving a mean yield of 153 bushels per acre with standard deviation 10 bushels. Conduct a test at $α = 0.05$ to test the claim that the new fertilizer increases mean yield, and interpret your result in context.

Worked Solution: Let $μ_{1}$ = mean yield for new fertilizer, $μ_{2}$ = mean yield for standard fertilizer. Hypotheses: $H_{0} : μ_{1} - μ_{2} = 0$ , $H_{a} : μ_{1} - μ_{2} > 0$ , $α = 0.05$ . All conditions are met (random assignment, independent plots, large sample sizes). Test statistic: Point estimate = $162 - 153 = 9$ . $S E = \frac{1 2 ^{2}}{30} + \frac{1 0 ^{2}}{25} = 4.8 + 4 = 8.8 \approx 2.97$ . $t = \frac{9}{2.97} \approx 3.03$ . $df = min (29, 24) = 24$ , one-tailed p-value ≈ 0.0027. Since 0.0027 < 0.05, we reject $H_{0}$ . There is convincing evidence at the 0.05 significance level that the new fertilizer increases the mean corn yield per acre compared to the standard fertilizer.

8. Quick Reference Cheatsheet

Category	Formula / Rule	Notes
Parameter of Interest	$μ_{1} - μ_{2}$	True difference between population 1 mean and population 2 mean, for independent samples
Point Estimate	$\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$	Unbiased estimate of the true difference
Unpooled Standard Error	$S E = \frac{s _{1}^{2}}{n _{1}} + \frac{s _{2}^{2}}{n _{2}}$	Default for all inference; always use unless told to pool
$(1 - α)$ Confidence Interval	$(\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}) \pm t^{*} \times S E$	$df = min (n_{1} - 1, n_{2} - 1)$ for conservative calculation
Two-Sample t Test Statistic	$t = \frac{( x ˉ _{1} - x ˉ _{2} ) - Δ _{0}}{S E}$	$Δ_{0}$ is almost always 0 (null of no difference)
Conditions for Inference	Random, Independence, Normal/Large Sample	Random: independent random samples or randomized experiment; Independence: 10% condition for sampling without replacement; Normal: both $n \geq 30$ OR approx normal with no outliers
Matched Pairs vs Two-Sample	Matched: dependent paired data; Two-sample: independent groups	Two-sample only for independent groups; matched pairs uses one-sample t on within-pair differences
Pooled Inference	Only use when explicitly required	Never default to pooled; only use when told population variances are equal

9. What's Next

Inference for the difference in two independent population means is the foundation for all comparative methods for quantitative data in statistics. Most real-world experiments and observational studies compare two groups, so this is one of the most widely used inference methods in practice. Next in the AP Statistics Unit 7 syllabus, you will study inference for matched pairs means, which extends the logic of comparative inference to dependent paired data. Later, you will learn one-way ANOVA, which extends this framework to compare means across more than two populations. Mastering the conditions, standard error calculation, and context interpretation in this chapter is critical to avoiding mistakes in these more advanced topics, as well as for earning full credit on FRQs that ask for comparative inference.

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →

Inference for the Difference in Two Population Means — AP Statistics Study Guide

1. What Is Inference for the Difference in Two Population Means?

2. Conditions for Two-Sample Inference

Worked Example

3. Confidence Intervals for μ1​−μ2​ (Independent Samples)

Worked Example

4. Hypothesis Testing for the Difference in Two Population Means

Worked Example

5. Pooled vs Unpooled, and Independent vs Matched Pairs

Worked Example

6. Common Pitfalls (and how to avoid them)

7. Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

Question 2 (Free Response)

Question 3 (Application / Real-World Style)

8. Quick Reference Cheatsheet

9. What's Next

More study guides

3. Confidence Intervals for $μ_{1} - μ_{2}$ (Independent Samples)