| Study Guides
AP · Hypothesis Tests for a Population Mean · 14 min read · Updated 2026-05-10

Hypothesis Tests for a Population Mean — AP Statistics Study Guide

For: AP Statistics candidates sitting AP Statistics.

Covers: One-sample t-tests, matched pairs t-tests, conditions for inference, p-value and critical value approaches, connection between confidence intervals and two-sided hypothesis tests, and conclusion-writing for population mean significance testing.

You should already know: How to state null and alternative hypotheses for significance tests. How to calculate and interpret confidence intervals for a population mean. Key properties of the t-distribution and degrees of freedom.

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.


1. What Is Hypothesis Tests for a Population Mean?

A hypothesis test for a population mean is a statistical inference method that uses sample mean data to evaluate a claim about the true, unknown mean of an entire population. According to the AP Statistics Course and Exam Description (CED), Unit 7 (Inference for Quantitative Data: Means) makes up 12-15% of the total AP exam score, and this topic is the largest component of that unit, appearing regularly in both multiple-choice (MCQ) and free-response (FRQ) sections — it is common for a full 4-5 point FRQ question to be dedicated entirely to this topic.

Our parameter of interest is the population mean , and we test it against a hypothesized value , with the null hypothesis always written as and the alternative hypothesis as one-sided ( or ) or two-sided () depending on the research question. Because we almost never know the true population standard deviation in real-world problems, we almost always use a t-test (rather than a z-test) for this inference, so this topic is often referred to as the one-sample t-test for a population mean.

2. Conditions for Inference for a Population Mean Hypothesis Test

Before running any hypothesis test, you must verify three core conditions to ensure your inference is valid: Random, Independent, and Normal/Large Sample (often abbreviated RIN).

  1. Random: Your data must come from either a random sample from the population of interest or a randomized comparative experiment. This condition ensures no systematic selection or assignment bias that would skew your results. If the sample is not random, you can only proceed if you explicitly assume the sample is representative of the population, and you must note this as a potential source of bias.
  2. Independent: Individual observations must be independent of one another. When sampling without replacement from a finite population, we use the 10% condition: the sample size must be less than 10% of the total population size. This ensures our standard error calculation remains accurate, even without adjusting for a finite population.
  3. Normal/Large Sample: The sampling distribution of the sample mean must be approximately normally distributed. This is satisfied if either the sample size is (by the Central Limit Theorem), or if and the sample data has no strong skewness or extreme outliers (which suggests the underlying population is approximately normal).

Worked Example

A researcher wants to test the claim that the mean resting heart rate for regular long-distance runners is less than 50 bpm. They recruit 25 runners from a local running club who volunteered for the study, and the total population of regular long-distance runners in the local area is 1200. The sample data is roughly symmetric with no outliers. Check the conditions for inference.

  1. Random Condition: The sample is a convenience sample of volunteers, not a random sample from the population. We must assume the sample is representative of all long-distance runners to proceed; volunteer bias is a potential risk here.
  2. Independence Condition: The 10% condition requires of the population: of 1200 = 120, and , so this condition is satisfied.
  3. Normal/Large Sample Condition: , but the sample is roughly symmetric with no outliers, so there is no evidence of strong skewness. The Normal condition is satisfied.

Exam tip: AP graders require you to explicitly connect each condition to the context and numbers of the problem, not just list "RIN". You will lose points for just naming conditions without verifying them for your specific problem.

3. One-Sample t-Test for a Population Mean

Once conditions are verified, we calculate the t-test statistic, which measures how far our sample mean is from the hypothesized mean, in standard error units. The formula for the t-test statistic is: where is the sample mean, is the sample standard deviation, is the sample size, and is the hypothesized population mean from the null hypothesis. Degrees of freedom for the t-distribution are .

We use the p-value approach (most common on AP) to make a decision: the p-value is the probability of observing a t-statistic as extreme or more extreme than the one we calculated, assuming the null hypothesis is true. For a one-sided test, the p-value is the area in the tail beyond our test statistic; for a two-sided test, it is twice the area of the tail beyond the absolute value of our test statistic. The decision rule is simple: if the p-value is less than your pre-specified significance level (usually 0.05, unless stated otherwise), you reject the null hypothesis; otherwise, you fail to reject the null hypothesis.

Worked Example

A local coffee shop claims that their medium lattes have a mean caffeine content of 150 mg. A consumer advocacy group suspects the true mean is different from 150 mg. They take a random sample of 12 medium lattes, and find a sample mean of 154 mg with a sample standard deviation of 7.2 mg. Conduct a hypothesis test at .

  1. State Hypotheses: Let = the true mean caffeine content of all medium lattes from this shop. , .
  2. Conditions: All conditions are verified (random sample, of all lattes, no outliers in sample).
  3. Calculate Test Statistic: , .
  4. Find p-value: For a two-sided test, the p-value is between 0.05 and 0.10 (from t-table, 1.925 falls between (one-sided p=0.05) and (one-sided p=0.025)).
  5. Conclusion: p-value > 0.05, so we fail to reject . There is not sufficient evidence at the 0.05 significance level to conclude that the mean caffeine content of the shop's medium lattes differs from 150 mg.

Exam tip: Never write "we accept the null hypothesis" — we only fail to find enough evidence to reject it, we do not prove it is true. Always use "fail to reject" for a non-significant result.

4. Matched Pairs t-Test for a Mean Difference

A matched pairs t-test is used for paired data, where we have two measurements on the same individual, or two matched individuals with similar characteristics assigned to different treatments. Common examples include before-and-after studies of a treatment on the same subject, or matched block experiments.

For matched pairs, we are interested in the true mean difference , where for the -th pair. A matched pairs test is just a one-sample t-test conducted on the sample of differences, not a separate test. We do not use a two-sample t-test for paired data because the two measurements are not independent, and pairing reduces variability from individual differences, making the test more powerful.

Worked Example

A physical therapist wants to test if a new 4-week stretching routine increases mean hamstring flexibility. She recruits 10 volunteers, measures flexibility before and after the routine, and calculates differences . The mean difference is cm, with a sample standard deviation of cm. Test the claim at .

  1. State Hypotheses: Let = the true mean difference (after minus before) in flexibility for all people who complete the routine. , .
  2. Conditions: Random recruitment of volunteers, 10 < 10% of all potential participants, no outliers in the differences, so all conditions are satisfied.
  3. Calculate Test Statistic: , .
  4. Find p-value: For a one-sided test, is larger than the critical value 3.250 (which corresponds to p=0.005), so p-value < 0.005.
  5. Conclusion: p-value < 0.05, so we reject . There is sufficient evidence at the 0.05 significance level to conclude that the new stretching routine increases mean hamstring flexibility.

Exam tip: Always explicitly define the order of your differences when doing a matched pairs test. Mixing up the order of subtraction will flip the sign of your alternative hypothesis, leading to an incorrect conclusion.

5. Connection Between Hypothesis Tests and Confidence Intervals

For a two-sided hypothesis test with significance level , a confidence interval for will give the same conclusion as a full t-test. The rule is: if the hypothesized value falls inside the confidence interval, we fail to reject at level ; if falls outside the interval, we reject at level .

This works because a confidence interval contains all values of that would not be rejected by a two-sided test at the corresponding significance level. This connection is frequently tested on AP FRQs, where you may be given a confidence interval and asked to test a claim instead of running a full t-test. Note that this only works for two-sided tests — standard two-sided confidence intervals cannot be used to draw conclusions for one-sided hypothesis tests.

Worked Example

A 95% confidence interval for the mean birth weight of kittens at a city shelter is (92 g, 118 g). A veterinarian claims the true mean birth weight is 125 g. Use the confidence interval to test this claim at .

  1. State Hypotheses: , , where is the true mean birth weight of kittens at the shelter.
  2. Apply the connection rule: A 95% confidence interval matches the significance level for a two-sided test. The hypothesized value 125 g does not fall inside the interval (92, 118).
  3. Conclusion: We reject the null hypothesis. There is sufficient evidence at the 0.05 significance level to conclude that the true mean birth weight of kittens at this shelter differs from 125 g.

Exam tip: Never use this connection for one-sided tests. If the question asks for a test of a one-sided claim, you must run a full t-test, even if you have a confidence interval.

6. Common Pitfalls (and how to avoid them)

  • Wrong move: Using a z-test instead of a t-test because you confuse mean tests with proportion tests. Why: Early textbook examples sometimes assume is known for teaching purposes, but this is almost never true on the AP exam. Correct move: Always use a t-test for a population mean unless the problem explicitly states you know the population standard deviation .
  • Wrong move: Doing a two-sample t-test for matched pairs data instead of a one-sample test on differences. Why: You confuse paired (dependent) data with two independent samples. Correct move: If you have two measurements per individual or matched pairs, always compute differences and run a one-sample t-test on the differences.
  • Wrong move: Claiming the Normal condition is met for without checking for skewness/outliers. Why: You memorize "n≥30 = Normal" but forget to apply the rule for small samples. Correct move: For , always explicitly state that the sample has no strong skewness or outliers to confirm the Normal condition.
  • Wrong move: Stating conclusions as "we accept the null hypothesis" when . Why: You think failing to reject proves the null is true. Correct move: Always write "we fail to reject the null hypothesis" — we only have insufficient evidence to reject it, not proof of correctness.
  • Wrong move: Using a two-sided confidence interval to test a one-sided hypothesis test. Why: You think the confidence interval trick works for any test. Correct move: Only use the confidence interval method for two-sided tests; run a full t-test for one-sided alternatives.
  • Wrong move: Just listing condition names without connecting them to the problem context. Why: You think naming is enough for points. Correct move: For each condition, add a context-specific verification (e.g., "n=22 < 10% of 500 total customers, so the 10% condition is satisfied").

7. Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

A biologist tests whether the mean pH of rainwater in an industrial forest differs from the natural baseline of 5.6. She takes a random sample of 20 rainwater samples and calculates a t-statistic of 2.12. What is the range of the p-value for this test? A) B) C) D)

Worked Solution: The test is two-sided because we are testing for a difference from the baseline, so we double the one-sided p-value. Degrees of freedom are . From the t-table for df=19, our t-statistic 2.12 falls between 2.093 (which gives a two-sided p of 0.05) and 2.539 (which gives a two-sided p of 0.02). This means the p-value is between 0.02 and 0.05. The correct answer is B.


Question 2 (Free Response)

A bakery claims that the mean sodium content in their standard sourdough loaf is at most 280 mg. A dietician suspects the true mean is higher than 280 mg. She takes a random sample of 15 loaves, gets a sample mean of 288 mg, and a sample standard deviation of 16 mg. The sample data is roughly symmetric with no outliers. (a) State the appropriate hypotheses for this test, and define the parameter of interest. (b) Check all conditions for inference. (c) Carry out the test and state your conclusion in context at .

Worked Solution: (a) Let = the true mean sodium content of all standard sourdough loaves from this bakery. The hypotheses are , . (b) 1. Random: The problem states the sample is random, so the random condition is satisfied. 2. Independence: We can assume the bakery produces more than loaves, so of the population, so independence is satisfied. 3. Normal/Large Sample: , but the sample is roughly symmetric with no outliers, so the Normal condition is satisfied. All conditions are met. (c) Test statistic: , . For a one-sided test, p-value is between 0.025 and 0.05. Since p-value < 0.05, we reject . There is sufficient evidence at the 0.05 significance level to conclude that the true mean sodium content of the bakery's sourdough loaves is higher than 280 mg, supporting the dietician's suspicion.


Question 3 (Application / Real-World Style)

A battery company claims that the mean lifespan of their 9-volt batteries is 50 hours. An independent consumer testing lab tests 40 random batteries from the production line, and finds a sample mean lifespan of 47.5 hours with a sample standard deviation of 6.1 hours. Is there significant evidence at the level to conclude the company's claim is overstated?

Worked Solution: Let = the true mean lifespan of the company's 9-volt batteries. Hypotheses: , . Conditions: Random sample is given, so the Normal condition is satisfied, 40 < 10% of all batteries produced, so independence holds. Test statistic: , . For a one-sided test, p-value < 0.01 (t=2.59 > 2.426, the 0.01 critical value for df=40). Since p-value < 0.01, we reject . In context: There is significant evidence at the 0.01 level to conclude that the company's claim of a 50-hour mean lifespan is overstated.

8. Quick Reference Cheatsheet

Category Formula Notes
Hypotheses
Use for matched pairs differences; is the hypothesized population mean
One-Sample t-test Statistic Use when population standard deviation is unknown (almost always true for AP)
Degrees of Freedom = sample size (number of pairs for matched pairs)
Conditions for Inference 1. Random Sample/Experiment
2. 10% Condition for Independence
3. Normal/Large Sample
Must verify each in context for full AP credit
Matched Pairs Test One-sample t-test on sample differences For paired (dependent) data; never use two-sample t-test here
CI/Test Connection CI: Fail to reject
CI: Reject
Only works for two-sided hypothesis tests at significance level
Decision Rule Reject if is standard unless stated otherwise

9. What's Next

Hypothesis tests for a population mean lay the foundational inference skills you need for the next topic in Unit 7: comparing two population means with two-sample t-tests. Without mastering the conditions for t-tests, how to calculate the t-test statistic, and how to write context-rich conclusions, you will struggle not only with two-sample tests but also with inference for regression slope in Unit 9, which uses the same t-distribution logic you learned here. The habits you build here — checking conditions, avoiding common misstatements of conclusions, and grounding all results in context — translate to every inference topic on the AP exam, from proportions to chi-square tests.

Follow-on topics: Two-Sample t-Tests for Difference in Population Means Confidence Intervals for a Population Mean Inference for Slope of a Regression Line

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →