Biased and Unbiased Estimators — AP Statistics Study Guide
For: AP Statistics candidates sitting AP Statistics.
Covers: Definition of an estimator, calculation of estimator bias, identification of biased and unbiased estimators for common population parameters, use of expected value to confirm bias status, and interpretation of bias in context for AP exam questions.
You should already know: What population parameters and sample statistics are. The definition of the expected value of a random variable. The basics of sampling distributions for sample means and proportions.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. What Is Biased and Unbiased Estimators?
An estimator is a sample statistic used to estimate an unknown population parameter. For example, we use the sample proportion to estimate the population proportion , and the sample mean to estimate the population mean . The bias of an estimator describes the difference between the expected value of the estimator and the true value of the population parameter it estimates.
This topic is weighted approximately 10-15% of Unit 5 (Sampling Distributions) on the AP Statistics CED, and regularly appears in both multiple-choice (MCQ) and free-response (FRQ) sections of the exam. AP questions on this topic typically ask you to identify whether a given estimator is biased or unbiased, explain why a sampling method produces biased results, or justify that a statistic is an unbiased estimator.
A common misconception is that bias refers to how far a single estimate is from the true parameter; actually, bias is a property of the center of the estimator's sampling distribution, not the spread or the error of one estimate. The key question is: if we took infinitely many random samples and calculated the estimator for each, would the average of all those estimators equal the true population parameter? If yes, it is unbiased; if no, it is biased.
2. Formal Definition and Calculation of Bias
If is an estimator for the population parameter , the bias of is formally defined as:
By this definition, an estimator is unbiased if and only if , which simplifies to . That means the center (mean) of the sampling distribution of is exactly at the true value of . Intuition: If you repeatedly sample from the population and calculate your estimator every time, on average you hit the true parameter exactly.
Bias does not mean any one estimate is wrong; every estimate has sampling error. It means the entire distribution of estimates is shifted left or right of the true value. A positively biased estimator will, on average, give an estimate higher than the true parameter, while a negatively biased estimator will on average give an estimate lower than the true parameter. To calculate bias, you only need to find the expected value of the estimator and subtract the true population parameter.
Worked Example
Suppose we want to estimate the mean age of students at a large high school. A researcher uses the following estimator for a sample of 2 students: , where and are the ages of the two sampled students. Confirm whether this estimator is biased or unbiased, and calculate the bias if it exists.
- For any randomly selected student, for both and , since each observation is drawn from a population with true mean .
- Use linearity of expectation to find :
- Calculate bias using the formal definition: .
- Conclusion: The estimator is unbiased, because its expected value equals the true population parameter.
Exam tip: When asked to justify that an estimator is unbiased on an FRQ, you must explicitly state that the expected value of the estimator equals the true population parameter; simply saying "it's unbiased" is not enough for full credit.
3. Common Biased and Unbiased Estimators
The AP exam expects you to quickly identify the bias status of common estimators, without requiring a full expected value calculation for every scenario. The most frequently tested common estimators are summarized below:
Unbiased estimators:
- Sample mean for population mean
- Sample proportion for population proportion
- Sample variance (calculated with division by ) for population variance
Biased estimators:
- Sample range for population range
- Sample standard deviation for population standard deviation (the square root of an unbiased estimator is not unbiased, due to non-linearity of the square root function)
- Maximum of a sample for population maximum
- Sample variance calculated with division by instead of for (this estimator has a negative bias of , meaning it underestimates the true variance on average)
Worked Example
Which of the following statistics is an unbiased estimator for its corresponding population parameter? A) Sample range for population range B) Sample proportion for population proportion C) Sample standard deviation for population standard deviation D) Sample variance with denominator for population variance Identify the correct answer and justify.
- Evaluate each option: A) The sample range can never exceed the population range, so it underestimates the true range on average and is biased. Eliminate A.
- B) By definition, the expected value of the sample proportion is equal to the population proportion , so is unbiased. Keep B.
- C) While (sample variance) is unbiased for , the square root transformation introduces bias, so is biased for . Eliminate C.
- D) Sample variance with denominator has a negative bias and underestimates , so it is biased. Eliminate D. Conclusion: The correct answer is B.
Exam tip: On MCQ questions asking to identify an unbiased estimator, the most common distractor is "sample standard deviation is unbiased". Always remember only sample variance (with denominator) is unbiased for ; the standard deviation itself is biased.
4. Bias in Sampling Methods
Bias can arise from two sources: inherent bias in the estimator even with random sampling, or bias from a non-random or flawed sampling method. A biased sampling method will almost always produce a biased estimator for the population parameter of interest, even if the estimator would be unbiased for a simple random sample.
For example, if you want to estimate the average income of a city's residents and only survey people leaving a luxury shopping mall, your sampling method overrepresents high-income people. This shifts the center of the sampling distribution of the sample mean to the right of the true population mean, resulting in a positively biased estimator.
AP FRQs regularly ask you to identify bias from a flawed sampling method and describe the direction of bias in context, so it is critical to tie your conclusion to the specific scenario, not just give a generic statement that the method is biased.
Worked Example
A high school student council wants to estimate the proportion of students who support a new $5 increase in student activity fees to fund new gym equipment. They survey every 3rd student leaving the gym after school hours. Explain whether the resulting estimator is biased or unbiased, and describe the direction of bias.
- The population of interest is all students at the school, and the parameter is the proportion of all students who support the fee increase for gym equipment.
- The sampling frame only includes students who are at the gym after school, who are far more likely to use the gym equipment and support the fee increase than the average student. Students who do not use the gym after school (the majority of students in most cases) are underrepresented in the sample.
- The expected value of the sample proportion from this method will be higher than the true population proportion , because the sample overrepresents supporters.
- Conclusion: This estimator is biased, and it overestimates the true proportion of students who support the fee increase.
Exam tip: When asked to describe bias direction in context, always state whether the estimator overestimates or underestimates the true parameter; only saying "positive bias" or "negative bias" will not earn full credit on AP FRQs.
5. Common Pitfalls (and how to avoid them)
- Wrong move: Claiming that sample standard deviation is an unbiased estimator for population standard deviation , because is unbiased for . Why: Students assume that if a transformation of an estimator is unbiased, the estimator itself is also unbiased, forgetting that expectation does not preserve non-linear transformations like square root. Correct move: Memorize that only the sample variance (with denominator ) is unbiased for ; is always biased for .
- Wrong move: Saying that an individual estimate from an unbiased estimator is always equal to the true population parameter. Why: Students confuse the property of the sampling distribution's center with the outcome of a single sample. Correct move: Always frame unbiasedness as a property of the sampling distribution: on average across many samples, the estimator equals the true parameter; any single estimate can still be far off.
- Wrong move: Claiming that a biased sampling method is biased because it produces a spread-out sampling distribution. Why: Students confuse bias (center of the sampling distribution) with variability (spread of the sampling distribution). Correct move: When justifying bias, always talk about the shift in the center relative to the true parameter, not how spread out the distribution is.
- Wrong move: Calculating bias as instead of , leading to wrong sign on bias. Why: Students mix up the order of subtraction in the definition. Correct move: Always memorize the order as estimator expected value minus true parameter to avoid sign errors.
- Wrong move: When describing bias from a sampling method, saying "this is convenience sampling so it is biased" without linking to the direction of bias in context. Why: Students memorize that convenience sampling is biased, but forget AP FRQs require context-specific description of bias direction. Correct move: After stating the method is biased, always explain whether it overestimates or underestimates the true parameter, and why that direction makes sense for the scenario.
6. Practice Questions (AP Statistics Style)
Question 1 (Multiple Choice)
A wildlife biologist wants to estimate the maximum weight of adult black bears in a large national park. They take 100 random samples of 8 bears each, and calculate the maximum weight for each sample. If the true maximum weight of adult black bears in the park is 650 pounds, what is the expected value of the sample maximum, and what is the bias status? A) Expected value = 650 pounds, unbiased estimator B) Expected value < 650 pounds, biased estimator C) Expected value > 650 pounds, biased estimator D) Expected value < 650 pounds, unbiased estimator
Worked Solution: The maximum of any random sample can never exceed the true population maximum, because all individuals in the sample are drawn from the population. Most random samples will not include the very largest bear in the entire park, so the sample maximum will almost always be smaller than the true population maximum. The expected value of the sample maximum is therefore less than 650 pounds, and the bias is non-zero, so it is a biased estimator. This matches option B. Correct answer: B.
Question 2 (Free Response)
A bakery owner wants to estimate the average order value for all customer orders at the bakery. They consider two estimators based on a random sample of orders, with order values : Estimator 1: Estimator 2:
(a) Calculate the bias of and the bias of . (b) Identify whether each estimator is biased or unbiased. (c) Suppose the owner takes a sample of size , and the true population variance of order values is . Which estimator has smaller variance? (Hint: for constant )
Worked Solution: (a) We know for all . For : For :
(b) Both biases are non-zero for any , so both estimators are biased. overestimates on average (positive bias), and underestimates on average (negative bias).
(c) Calculate variance for each: has smaller variance.
Question 3 (Application / Real-World Style)
An economist studying household energy use in a rural county wants to estimate the true mean annual electric bill, which is . The economist uses a sampling method that only reaches households with electric heating, and historical data shows the expected value of the sample mean from this method is . Calculate the bias of this estimator, and interpret the result in context.
Worked Solution: Bias is calculated as . The bias is positive, meaning this sampling method on average overestimates the true mean annual electric bill for all households in the county by . This makes sense because households with electric heating have higher annual electric use than households that use other heating sources, leading to the overestimate.
7. Quick Reference Cheatsheet
| Category | Formula | Notes |
|---|---|---|
| Bias of an Estimator | = estimator, = true population parameter | |
| Unbiased Estimator Criterion | Bias equals 0; refers to center of the sampling distribution | |
| Unbiased: Sample Mean | Unbiased for population mean | |
| Unbiased: Sample Proportion | Unbiased for population proportion | |
| Unbiased: Sample Variance | Unbiased for population variance ; requires denominator | |
| Biased: Sample Standard Deviation | Biased for population standard deviation , even though is unbiased | |
| Biased: Sample Maximum/Range | Sample max / sample range | Always biased, underestimates true population value on average |
| Biased: -denominator Variance | Negative bias, underestimates |
8. What's Next
This topic is the foundational prerequisite for confidence intervals and hypothesis testing, the core topics of Units 6 and 7 of AP Statistics. When you construct a confidence interval for a population parameter, you rely on the fact that your point estimator is unbiased to center the interval at the correct expected value. Without understanding bias, you cannot properly interpret the results of a confidence interval or hypothesis test, and you will not be able to identify flaws in study design that invalidate a researcher's conclusions. This topic also connects directly to sampling methods and study design from Unit 2, as bias in estimation most often originates from bias in sampling.
Next topics you will study after this: Confidence Intervals for a Population Proportion Confidence Intervals for a Population Mean Hypothesis Testing for a Population Mean Sampling Bias in Study Design