Inference for Categorical Data: Proportions — AP Statistics Unit Overview

For: AP Statistics candidates sitting AP Statistics.

Covers: All core subtopics of this unit, including introducing and constructing confidence intervals for one proportion, introducing and carrying out hypothesis tests for one proportion, and inference and confidence intervals for a difference in two population proportions.

You should already know: Properties of categorical data and sample proportions, the sampling distribution of a sample proportion, conditions for normality of sampling distributions.

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.

1. Why This Unit Matters

This is the first full unit of inferential statistics most AP Statistics students encounter after learning about sampling distributions, and it lays the foundational logic of inference that you will use for every topic that comes after. Per the AP Statistics Course and Exam Description (CED), this unit makes up 12-15% of your total exam score, and it appears on both multiple choice and free response sections, often as the first low-to-medium difficulty FRQ that tests your mastery of basic inference conventions. Inference for proportions is directly applicable to real-world contexts you will see on the exam: election polling, public health surveys, marketing A/B testing, and clinical trial result analysis all rely on these methods. Because proportions use the familiar normal approximation you already learned from sampling distributions (no new t-distribution here), this unit lets you focus on mastering core inferential skills: stating parameters, checking conditions, calculating test statistics or intervals, and writing conclusions in context, before you move on to more complex inference topics. This foundational unit is make-or-break for your ability to tackle all later inference topics.

2. Unit Concept Map

This unit is built incrementally, with each subtopic adding complexity while building on the logic you learn early on. The order of subtopics reflects the natural progression of learning:

Introducing Confidence Intervals for Proportions: Starts with the core logic of interval estimation, connecting what you already know about sampling variability to explain why we need confidence intervals, what a margin of error represents, and what it means to be "95% confident."
Constructing a Confidence Interval for a Proportion: Translates that conceptual logic into a step-by-step process for calculating and interpreting confidence intervals for a single population proportion, including condition checks, sample size calculations, and common interpretation mistakes.
Introducing Hypothesis Tests for Proportions: Shifts from estimation to answering yes/no claims about population proportions, introducing core concepts like null and alternative hypotheses, p-values, and significance levels, using the simple proportion context to avoid overwhelming you with new math.
Carrying Out a Hypothesis Test for a Proportion: Applies that conceptual logic to a step-by-step process for testing claims about a single population proportion.
Inference for a Difference in Two Proportions: Extends everything you learned about single-proportion inference to compare proportions from two independent populations, the most common design in experiments like clinical trials or A/B tests.
Confidence Intervals for the Difference in Two Proportions: Adds interval estimation for the difference between two proportions, letting you estimate the size of a difference (not just that it exists). Every subtopic builds on the previous: the conditions, logic, and interpretation conventions you learn for one-proportion confidence intervals transfer directly to all other inference topics in this unit and beyond.

3. A Guided Tour of a Full Exam-Style Problem

To see how the subtopics connect in a single problem, let's walk through a common multi-part exam question: A beverage company develops a new cherry soda, and wants to test whether more than 60% of teenagers prefer the new soda over the leading competitor. They survey a random sample of 120 teenagers, and 81 prefer the new soda. They also want to estimate how much more teenagers prefer the new soda compared to adults, based on an independent random sample of 150 adults where 78 prefer the new soda.

This problem touches three of the unit's most central subtopics in sequence:

First, the question asks you to test the claim that more than 60% of teenagers prefer the new soda, which uses the subtopic Carrying Out a Hypothesis Test for a Proportion. We start by defining the parameter $p$ as the true proportion of all teenagers who prefer the new soda, state hypotheses $H_{0} : p = 0.6$ , $H_{a} : p > 0.6$ , check conditions: random (given), 10% condition (population of teenagers is more than 1200), normality ( $n p_{0} = 72 \geq 10$ , $n (1 - p_{0}) = 48 \geq 10$ ). We calculate the z-test statistic $z \approx 1.68$ , get a p-value of ~0.046, and conclude there is significant evidence to support the claim that more than 60% of teenagers prefer the new soda. All of this draws directly on the skills you learn in this subtopic.
Next, the question asks you to estimate the difference in preference between teenagers and adults, which draws on both Inference for a Difference in Two Proportions and Confidence Intervals for the Difference in Two Proportions. We define $p_{1}$ as the true proportion for teenagers and $p_{2}$ as the true proportion for adults, check conditions for two independent samples, calculate the point estimate $\overset{p}{^}_{1} - \overset{p}{^}_{2} = 0.155$ , calculate the unpooled standard error, multiply by the critical $z^{*}$ value to get the margin of error, and end up with a 95% interval of $(0.039, 0.271)$ . We then interpret the interval in context: since 0 is not in the interval, we have evidence that the preference rate is higher for teenagers, and we estimate the difference is between 3.9% and 27.1%. This uses the exact same logic of interval estimation you learned for one proportion, just adjusted for two samples.

This guided tour shows that the entire unit builds on the same core inferential framework, with only small changes to formulas and conditions as you move from one to two samples, and from estimation to testing.

4. Common Cross-Cutting Pitfalls (and how to avoid them)

These are the most frequent mistakes that span all subtopics in this unit, rooted in common confusions between related methods:

Wrong move: Using the sample proportion $\overset{p}{^}$ to calculate the standard deviation for a hypothesis test for a single proportion. Why: Students confuse the standard error formula for confidence intervals (which uses $\overset{p}{^}$ ) with the formula for hypothesis tests, which uses the null hypothesized $p_{0}$ . Correct move: Always check whether you are doing a confidence interval or a hypothesis test. For tests, use $p_{0}$ from $H_{0}$ to calculate standard deviation; for intervals, use $\overset{p}{^}$ to calculate standard error.
Wrong move: Failing to define population parameters in context when setting up inference. Why: Students rush to calculate and skip this critical first step, which earns full points on AP FRQs. Correct move: Always start any inference problem by explicitly writing out what your parameter ( $p$ , $p_{1} - p_{2}$ , etc.) represents, including the population(s) of interest.
Wrong move: Using the pooled proportion for a confidence interval for the difference in two proportions. Why: Students remember pooling for two-proportion hypothesis tests and incorrectly apply it to confidence intervals. Correct move: Only use a pooled proportion when doing a two-proportion hypothesis test where $H_{0} : p_{1} = p_{2}$ . Never pool for any confidence interval for a difference in proportions.
Wrong move: Writing hypotheses in terms of the sample proportion $\overset{p}{^}$ instead of the population proportion $p$ . Why: Students mix up known sample statistics and unknown population parameters. Correct move: Always write hypotheses in terms of unknown population parameters, never in terms of observed sample statistics.
Wrong move: Interpreting a confidence interval as "there is a 95% chance that the true proportion is in this interval." Why: Students misremember the correct interpretation, forgetting that the true proportion is a fixed (not random) value. Correct move: Always interpret confidence intervals in terms of the method: 95% of intervals constructed with this method would capture the true population proportion, so we are 95% confident the true proportion is in our calculated interval.

5. Quick Check: When To Use Which Subtopic

Test your understanding by matching each scenario to the correct subtopic from this unit:

A pollster wants to estimate what percentage of registered voters support a new ballot measure, based on a random sample of 500 voters.
A researcher wants to test whether a new vaccine has a higher infection prevention rate than a placebo, with 300 participants in each group.
A snack company claims that 95% of their products are correctly labeled. A regulator wants to test if the actual proportion of correctly labeled products is lower than 95%.
A researcher wants to estimate how much higher the infection prevention rate of the new vaccine is compared to placebo.
Calculate the margin of error for a poll estimating the percentage of voters supporting a ballot measure, with 95% confidence.

Click to reveal answers

1. Constructing a Confidence Interval for a Proportion 2. Inference for a Difference in Two Proportions (hypothesis test) 3. Carrying Out a Hypothesis Test for a Proportion 4. Confidence Intervals for the Difference in Two Proportions 5. Constructing a Confidence Interval for a Proportion

6. Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

A researcher wants to test whether the proportion of left-handed people differs between men and women. Which of the following is the correct inference method for this analysis? A) A one-proportion z-confidence interval B) A two-proportion z-test for the difference in proportions C) A one-proportion z-test for a population proportion D) A two-proportion z-confidence interval for the difference in proportions

Worked Solution: The goal of this analysis is to test a claim of difference between two population proportions, not estimate a value. This eliminates confidence interval methods, so A and D are incorrect. A one-proportion z-test only analyzes a single population proportion, so it cannot compare two groups, eliminating C. Only a two-proportion z-test for a difference in proportions matches the research goal. Correct answer: B.

Question 2 (Free Response)

A city planner wants to estimate the proportion of city residents who support building a new public park. They take a random sample of 200 residents, and 118 indicate they support the park. (a) Identify the appropriate inference method for this analysis, and check all conditions required for inference. (b) Construct and interpret a 90% confidence interval for the true proportion of residents who support the park. (c) If the city council requires a margin of error no larger than 3% for this estimate, what is the minimum sample size needed to achieve this margin of error with 95% confidence? Use the sample result as a prior estimate for $p^{*}$ .

Worked Solution: (a) The appropriate method is a confidence interval for a single population proportion. Conditions: 1) Random: the problem states the sample is random, so this condition is satisfied. 2) Independence: 10% condition: the population of city residents is more than $200 \times 10 = 2000$ , which holds for any city, so this condition is satisfied. 3) Normality: $n \overset{p}{^} = 118 \geq 10$ , $n (1 - \overset{p}{^}) = 82 \geq 10$ , so the sampling distribution is approximately normal. All conditions are met. (b) First calculate $\overset{p}{^} = \frac{118}{200} = 0.59$ . The standard error is $S E = \frac{p ^ ( 1 - p ^ )}{n} = \frac{0.59 \times 0.41}{200} \approx 0.0348$ . For 90% confidence, the critical $z^{*}$ value is 1.645, so margin of error $M E = 1.645 \times 0.0348 \approx 0.057$ . The interval is $0.59 \pm 0.057 = (0.533, 0.647)$ . Interpretation: We are 90% confident that the true proportion of all city residents who support the new park is between 53.3% and 64.7%. (c) Use the sample size formula for a proportion: $n = \frac{( z ^{*} ) ^{2} p ^{*} ( 1 - p ^{*} )}{M E ^{2}}$ . Plugging in values: $z^{*} = 1.96$ for 95% confidence, $p^{*} = 0.59$ , $M E = 0.03$ : $n = \frac{( 1.96 ) ^{2} ( 0.59 ) ( 0.41 )}{( 0.03 ) ^{2}} \approx 1031$ . We always round up for sample size, so the minimum required sample size is 1031.

Question 3 (Application)

A botanist develops a new variety of sunflower seed, and claims it has a higher germination rate than the standard 75% germination rate for commercial seeds. She tests a random sample of 180 new seeds, and 148 germinate. Carry out an appropriate significance test at the $α = 0.05$ level to evaluate the botanist's claim.

Worked Solution: This is a one-proportion z-test for a population proportion. First, state hypotheses: $H_{0} : p = 0.75$ , $H_{a} : p > 0.75$ , where $p$ is the true germination rate of the new sunflower seed variety. Check conditions: random sample (given), 10% condition holds (population of seeds is far larger than 1800), normality: $n p_{0} = 180 \times 0.75 = 135 \geq 10$ , $n (1 - p_{0}) = 45 \geq 10$ , so all conditions are met. Calculate the test statistic: $\overset{p}{^} = 148/180 \approx 0.822$ , so $z = \frac{0.822 - 0.75}{\frac{0.75 ( 0.25 )}{180}} \approx 2.24$ . The p-value for a one-sided test is $P (Z > 2.24) \approx 0.0125$ . Since $0.0125 < 0.05$ , we reject the null hypothesis. In context: There is statistically significant evidence at the $α = 0.05$ level that the new seed variety has a germination rate higher than the standard 75%.

7. Quick Reference Cheatsheet

Category	Formula	Notes
One-proportion z-test statistic	$z = \frac{p ^ - p _{0}}{\frac{p _{0} ( 1 - p _{0} )}{n}}$	Uses $p_{0}$ from $H_{0}$ , not $\overset{p}{^}$ ; only for one-proportion hypothesis tests
One-proportion confidence interval	$\overset{p}{^} \pm z^{*} \frac{p ^ ( 1 - p ^ )}{n}$	Uses $\overset{p}{^}$ for standard error; $z^{*}$ is critical value for your confidence level
Two-proportion difference point estimate	$\overset{p}{^}_{1} - \overset{p}{^}_{2}$	$p_{1}, p_{2}$ are the true population proportions for group 1 and 2
Pooled proportion for two-proportion test	$\overset{p}{^}_{pooled} = \frac{x _{1} + x _{2}}{n _{1} + n _{2}}$	Only used when $H_{0} : p_{1} = p_{2}$ ; never use for confidence intervals
Two-proportion z-test statistic	$z = \frac{( p ^ _{1} - p ^ _{2} ) - 0}{p ^ _{pooled} ( 1 - p ^ _{pooled} ) ( \frac{1}{n _{1}} + \frac{1}{n _{2}} )}$	Null difference is 0 when testing for equal population proportions
Two-proportion confidence interval SE	$S E = \frac{p ^ _{1} ( 1 - p ^ _{1} )}{n _{1}} + \frac{p ^ _{2} ( 1 - p ^ _{2} )}{n _{2}}$	Do not pool here; always use individual sample proportions
Two-proportion confidence interval	$(\overset{p}{^}_{1} - \overset{p}{^}_{2}) \pm z^{*} \cdot S E$	Same $z^{*}$ critical value logic as one-proportion intervals
Minimum sample size for desired ME	$n = \frac{( z ^{} ) ^{2} p ^{} ( 1 - p ^{*} )}{M E ^{2}}$	Use $p^{*} = 0.5$ if no prior estimate is available; always round up to the next whole number