Sampling Distributions — AP Statistics Unit Overview
For: AP Statistics candidates sitting AP Statistics.
Covers: Full unit overview of AP Statistics Sampling Distributions, including core concept relationships and guidance on when to apply each of the three unit sub-topics: biased/unbiased estimators, sampling distribution of a sample proportion, and sampling distribution of a sample mean.
You should already know: Basic probability rules for independent random events. How to calculate sample proportions and sample means from raw data. How to calculate probabilities for normal distributions.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. Why This Unit Matters
Sampling Distributions is the foundational bridge unit between the first three units (exploring data, probability, random variables) and all the inferential statistics that make up over 50% of the AP Statistics exam. Per the AP Statistics Course and Exam Description (CED), this unit accounts for 10–15% of your total exam score, and concepts from this unit appear on both multiple-choice (MCQ) and free-response (FRQ) sections. They are almost always embedded in multi-part inference questions even when not explicitly asked about directly.
The core goal of this unit is to quantify how much random variation we expect when we use a sample statistic (like a sample proportion or sample mean) to estimate an unknown population parameter. Before you can answer a question like, “Does our sample give us good evidence that more than half of voters support this policy?” you first have to answer: how often would we get a sample result this extreme just by random chance, even if the true level of support was 50%? That question can only be answered with an understanding of sampling distributions, making this unit non-negotiable for every other inferential topic you will learn.
2. Unit Concept Map
This unit builds sequentially from qualitative foundational checks to quantitative descriptions of the most common sampling distributions, aligned to how we actually approach inference problems on the exam:
- First: Biased and Unbiased Estimators (foundational gate check): Before we do any calculations, we first confirm whether a sample statistic systematically overestimates or underestimates the true population parameter on average. If a statistic is biased, it is not usable for inference regardless of other properties. This establishes the ground rule for what statistics we can trust to draw conclusions about a population.
- Second: Sampling Distribution of a Sample Proportion: The most common statistic for categorical data (the proportion of individuals with a trait of interest) gets its own set of rules for center, spread, and shape, accounting for sample size and the underlying population proportion.
- Third: Sampling Distribution of a Sample Mean: The most common statistic for quantitative data (the average value of a measurement) has its own distinct rules, including the Central Limit Theorem that lets us use normal probability calculations even when the original population distribution is not normal.
Each concept builds directly on the prior: you cannot correctly describe the spread of a sampling distribution if you do not first confirm the estimator is unbiased, and you need separate formulas and conditions for proportions vs. means because they describe fundamentally different data types.
3. A Guided Tour of a Unit-Wide Exam Problem
We will walk through a single multi-part exam-style problem to show how all three core sub-topics apply in sequence, just as they would on a real AP question:
Problem context: A coffee chain produces 200,000 bags of whole bean coffee annually. 10% of bags are supposed to be their limited-edition holiday blend, and the mean net weight of all bags is 12 oz with population standard deviation 0.5 oz. A regional manager takes a random sample of 150 bags from the annual production run to audit the process.
First, apply Biased and Unbiased Estimators: The manager will use the sample proportion of holiday blend bags to estimate the true population proportion , and the sample mean weight of bags to estimate the true population mean . Are both statistics appropriate for this? We confirm first: from a random sample is an unbiased estimator of , and is an unbiased estimator of , so both will center on the true population value on average. A biased estimator (like the sample minimum weight to estimate the population minimum weight) would not be acceptable here, so we stop and choose a different statistic if we were working with a biased estimator.
Next, apply Sampling Distribution of a Sample Proportion: The manager wants to find the probability that the sample proportion of holiday blend bags is less than 7%, to test if the production line is underproducing the holiday blend. We first confirm the sampling distribution is approximately normal: 10% condition (sample size 150 < 10% of 200,000, satisfied) and success-failure (, , both ≥10, satisfied). The sampling distribution has center and spread , which lets us calculate the desired probability using normal CDF.
Finally, apply Sampling Distribution of a Sample Mean: The manager now wants the probability the sample mean weight is less than 11.9 oz, to check for underfilling of bags. We confirm properties of the sampling distribution: 10% condition is satisfied for the large population. Even if the population distribution of bag weights is slightly skewed, the Central Limit Theorem tells us that for , the sampling distribution of is approximately normal. Center is oz, spread is . We can now calculate the z-score and probability for the observed sample mean weight.
This sequence is exactly how any inference problem on the AP exam uses concepts from this unit: check estimator bias, then describe the sampling distribution of your statistic, then use that to answer your research question.
4. Common Cross-Cutting Pitfalls (and how to avoid them)
- Wrong move: Confusing the standard deviation of the population with the standard deviation of the sampling distribution (using instead of or ). Why: Students mix up the spread of individual values in the population with the spread of sample statistics, because both are called "standard deviation" and often written with sigma. Correct move: On every problem, label your standard deviation explicitly as either population standard deviation or standard deviation of the sampling distribution before you use it in a calculation.
- Wrong move: Forgetting to check the 10% condition when calculating the standard deviation of a sampling distribution. Why: Students think the 10% condition is just a trivial check, but it is required for the independence assumption that our standard deviation formula relies on (sampling without replacement changes the probability of draws when the sample is more than 10% of the population). Correct move: Always list the 10% condition check when describing a sampling distribution, even if the population is stated to be "very large" (just confirm it is satisfied in that case).
- Wrong move: Claiming that an individual sample proportion or mean is unbiased, or that a single sampling distribution is biased/unbiased. Why: Students confuse the property of the estimator (the method of calculating the statistic) with the result from one sample. Correct move: Remember that bias/unbiasedness is a property of the estimator (the process) across all possible samples, not a property of a single sample or a single parameter.
- Wrong move: Applying the Central Limit Theorem to the shape of the population distribution instead of the shape of the sampling distribution. Why: Students misremember what CLT says, mixing up the population shape with the sampling distribution shape. Correct move: Always pair the Central Limit Theorem with the phrase "the sampling distribution of the sample mean is approximately normal" — never use CLT to describe the population.
- Wrong move: Using the standard deviation formula for a sampling distribution with the sample proportion instead of the population proportion when describing the sampling distribution. Why: Students confuse later inference's standard error (which uses when is unknown) with the sampling distribution description in this unit, where we condition on a known population parameter. Correct move: When describing the sampling distribution for a given known population parameter, use the population parameter or to calculate spread, not the sample statistic.
5. Quick Check: When To Use Which Sub-Topic
Test your understanding by matching each scenario to the correct core sub-topic:
- You want to know if the sample median systematically overestimates the true population median for a skewed population.
- You want to calculate the probability that between 18% and 22% of a random sample of 100 voters support a local bond measure, when the true population support is 20%.
- You want to find the standard deviation of the distribution of average heights of random samples of 50 high school students, when the population standard deviation of heights is 2.5 inches.
- You want to confirm that your statistic will on average equal the parameter you are trying to estimate before you do inference.
- You need to confirm that the distribution of your sample average is approximately normal when the population distribution of your variable is strongly skewed.
Answers:
- Biased and Unbiased Estimators
- Sampling Distribution of a Sample Proportion
- Sampling Distribution of a Sample Mean
- Biased and Unbiased Estimators
- Sampling Distribution of a Sample Mean
6. Quick Reference Cheatsheet
| Category | Formula / Rule | Notes |
|---|---|---|
| Unbiased Estimator | Bias/unbiasedness is a property of the estimator (process), not a single sample. The mean of the sampling distribution equals the true population parameter. | |
| Center of Sampling Distribution | Only true for random sampling; is always unbiased for . | |
| Spread of Sampling Distribution | Requires 10% condition (sample < 10% of population) for independence. | |
| Normality for | Both conditions must be satisfied for approximate normality. | |
| Center of Sampling Distribution | is unbiased for the population mean for random samples. | |
| Spread of Sampling Distribution | Also requires the 10% condition for sampling without replacement. | |
| Normality for | Either (1) population is normal, or (2) (Central Limit Theorem) | CLT only applies to the sampling distribution of , not the population distribution. |
7. What's Next (Core Sub-Topics)
This unit is the essential prerequisite for all of inferential statistics, which makes up over 50% of the AP Statistics exam. Every confidence interval and hypothesis test you will construct in subsequent units relies on the foundation you build here: you must know the shape, center, and spread of your statistic's sampling distribution to calculate margin of error, p-values, and draw valid conclusions about unknown population parameters. Without understanding bias, sampling distribution spread, and normality conditions, you cannot justify any inference result, which means you will lose most points on FRQ inference questions. Immediately after this overview, you will dive into each core sub-topic in detail, mastering the specific formulas and conditions for each.
The core sub-topics in this unit are linked below: