Sources of Bias in Sampling — AP Statistics Study Guide
For: AP Statistics candidates sitting AP Statistics.
Covers: identification and description of common sampling biases including selection bias, undercoverage, voluntary response bias, nonresponse bias, and response bias, plus explaining how each bias impacts study results for AP exam assessment.
You should already know: Basic probability sampling methods, The difference between a population and a sample, The distinction between a parameter and a sample statistic.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. What Is Sources of Bias in Sampling?
Bias in sampling is a systematic error that causes a sample statistic to consistently overestimate or underestimate the true population parameter, unlike random sampling error which fluctuates around the true value without consistent direction. The AP Statistics Course and Exam Description (CED) allocates 12-15% of total exam weight to Unit 3 (Collecting Data), and sources of bias is one of the most frequently tested subtopics in the unit, appearing in both multiple-choice (MCQ) and free-response (FRQ) sections. On MCQs, you will almost always be asked to correctly identify the type of bias present in a described study design. On FRQs, you will typically need to name a source of bias in the given design, explain why the bias arises, and describe how it skews the resulting estimate, a task that is worth 1-2 points on most such questions. A core defining feature of sampling bias is that it is caused by a flaw in study design, not random chance: increasing sample size will never reduce sampling bias, a common misconception tested repeatedly on the exam.
2. Selection Bias (Undercoverage and Voluntary Response)
Selection bias occurs when some members of the population of interest are systematically more or less likely to be selected into the sample than others. This flaw stems from a problem with the sampling frame (the full list of population members you draw your sample from), creating a sample that does not accurately reflect the full population. Two of the most common and heavily tested subtypes of selection bias are undercoverage bias and voluntary response bias.
Undercoverage bias occurs when the sampling frame leaves out entire groups of the population entirely, so those groups have no chance of being selected for the sample. For example, if you study average income of all city residents by only sampling people at a downtown grocery store on weekday mornings, you miss full-time workers who cannot shop during those hours and low-income residents without reliable transportation to downtown, leading to an overestimate of average city income.
Voluntary response bias is a subtype where the entire sample consists only of people who choose to volunteer to participate. People with strong, often negative, opinions are much more likely to participate, so the sample consistently overrepresents extreme views. This is one of the most biased sampling designs possible, and it appears frequently on AP exams.
Worked Example
A city council wants to estimate the percentage of city residents who support a new property tax increase to fund local parks. The council posts a survey link on the city’s website and asks residents to complete the survey. They receive 1,243 responses from residents. What type of bias is most likely present here, and how will the bias impact the result?
- First, identify how the sample was selected: participation is entirely voluntary, so residents choose whether to respond or not, with no selection by the researcher.
- This matches the definition of voluntary response bias, a subtype of selection bias.
- Residents who strongly support the tax (because they use parks often) and residents who strongly oppose the tax (because they do not want higher payments) are both more motivated to respond than residents who are neutral on the issue. In most cases, opposition is more strongly motivating than support, so opposition is overrepresented.
- The resulting estimate of support for the tax will be lower than the true percentage of support in the full population of city residents.
Exam tip: On AP FRQs, you will never get full credit for only naming the type of bias. You must always connect the bias to the specific context of the problem to earn all points.
3. Nonresponse Bias
Nonresponse bias occurs when selected individuals who are part of the properly drawn sample refuse to participate, cannot be reached, or fail to respond, and the people who do not respond differ systematically from those who do respond. This is a common point of confusion with selection bias subtypes: in nonresponse bias, you start with a properly selected random sample, but some selected people do not participate after selection. This differs from undercoverage, where groups are never given a chance to be selected at all, and voluntary response, where the entire sample is self-selected from the start.
For example, if you send a mail survey about how often people eat fast food to a random sample of 1000 households, people who eat fast food very frequently may be embarrassed to admit it and throw away the survey. Your final sample will overrepresent people who eat fast food less often, leading to an underestimate of average weekly fast food consumption. The key distinction is that all households were selected and given a chance to respond; the bias comes from systematic differences between responders and nonresponders.
Worked Example
A researcher selects a simple random sample of 500 registered voters from a state’s voter database to study voter approval of the governor’s new public health policy. The researcher calls each selected voter during work hours (9am to 5pm on weekdays) to ask their opinion. 120 voters answer the phone and respond, while 380 do not answer. What type of bias is most likely here, and how will it skew results?
- The sample starts as a properly selected random sample from the target population of registered voters, so there is no undercoverage at the selection stage.
- Voters who work full time during 9am-5pm are much less likely to answer the phone than voters who do not work full time (e.g., retirees, unemployed voters), so nonresponse is concentrated in working-age voters.
- This matches the definition of nonresponse bias: selected voters who differ systematically in their opinions do not respond, so the final sample overrepresents non-working voters.
- If retirees are more likely to support the public health policy than working-age voters, the resulting estimate of approval will be higher than the true approval rate in the full population of registered voters.
Exam tip: Always ask: were missing groups excluded at the selection stage, or were they selected but did not respond? Exclusion at selection = undercoverage; selected but missing = nonresponse.
4. Response Bias
Response bias (also called measurement bias) occurs when the method of collecting data causes respondents to give inaccurate or untruthful answers, leading to a systematic error in the measured value. This differs from all previous biases, which relate to selection of the sample; response bias occurs after the sample is selected, during the data collection step.
Common causes of response bias tested on the AP exam include leading questions (wording that pushes respondents to answer a certain way), social desirability bias (people give answers that make them look good, even if untrue), and recall bias (people cannot accurately remember past behavior). For example, a survey that asks "Do you support the job-destroying new minimum wage increase that will raise business costs?" uses leading wording that will inflate the share of "no" responses compared to the true opinion. Similarly, asking people how many times they have driven over the speed limit in the last month will lead to underreporting, because people do not want to admit to illegal behavior.
Worked Example
A school administrator wants to estimate the percentage of high school students who have vaped on school property in the last month. The administrator asks each student in a random sample of students to raise their hand if they have vaped on campus in the last month, in a group assembly with their principal present. What type of bias is most likely here, and how will it impact the result?
- The sample of students is randomly selected from the target population, so there is no selection bias or nonresponse bias in this case (all selected students are present and can respond).
- Vaping on campus is against school rules, and students are asked to respond publicly in front of their principal and peers. This creates pressure to give a socially acceptable answer rather than the truth.
- This matches the definition of response bias, specifically social desirability bias, where respondents give inaccurate answers to avoid judgment or punishment.
- Most students who have vaped will not raise their hand, so the resulting estimate of the percentage of students who have vaped on campus will be much lower than the true population percentage.
Exam tip: If the question asks about the wording of a survey question, it is almost always response bias, not a selection bias. Always state whether the parameter will be over- or underestimated in context to earn full credit.
5. Common Pitfalls (and how to avoid them)
- Wrong move: Calling voluntary response bias nonresponse bias, because "people choose not to respond". Why: Students confuse the order of selection: in nonresponse, people are selected first then don't respond; in voluntary response, the entire sample is self-selected from the start. Correct move: Ask "was the person selected into the sample by the researcher, or did they volunteer to be in the sample?" If they volunteered, it's voluntary response bias; if selected then didn't respond, it's nonresponse.
- Wrong move: Stating that increasing sample size will reduce sampling bias. Why: Students confuse random sampling error (which decreases with larger sample size) with systematic bias (which is a design flaw). Correct move: Remember that increasing sample size only reduces random variation, never fixes sampling bias. If asked what reduces bias, the answer must involve changing the sampling method, not increasing sample size.
- Wrong move: Mixing up undercoverage bias and nonresponse bias. Why: Both result in groups being missing from the final sample, but the stage at which they are excluded is different. Correct move: Ask "was the group never given a chance to be selected in the first place?" If yes, it's undercoverage; if they were selected but didn't participate, it's nonresponse.
- Wrong move: On FRQs, only naming the type of bias and not explaining how it impacts the result in context. Why: Students memorize bias names but forget AP requires context-specific reasoning for points. Correct move: After naming the bias, always add one sentence explaining who is over/underrepresented and whether the parameter will be over/underestimated in this specific context.
- Wrong move: Calling response bias a type of selection bias. Why: Students group all biases together, but response bias occurs after the sample is selected, during data collection. Correct move: Check the source of the error: if the error comes from inaccurate responses, it's response bias, regardless of whether the sample was selected correctly.
6. Practice Questions (AP Statistics Style)
Question 1 (Multiple Choice)
A researcher wants to estimate the average amount of money full-time undergraduate students at a large public university spend on textbook materials per semester. The researcher obtains a list of all full-time undergraduates and selects a simple random sample of 200 students. The researcher emails the survey to selected students, and 72 students respond. The average reported spending from the responses is $485. Which of the following best describes the most likely bias in this result? A) Voluntary response bias, because students with high textbook costs are more likely to respond, leading to an overestimate of average spending B) Nonresponse bias, because students who have higher textbook costs are often taking more classes and are too busy to respond, leading to an underestimate of average spending C) Response bias, because students will overreport their textbook spending to look more studious, leading to an overestimate of average spending D) Undercoverage bias, because the researcher only selected full-time undergraduates, leading to an underestimate of average spending
Worked Solution: First, the sample starts as a simple random sample of the target population (full-time undergraduates), so undercoverage of the target population (option D) is not a problem here. Voluntary response bias (option A) only occurs when the entire sample is made of volunteers, not when some pre-selected students do not respond, so A is incorrect. Response bias (option C) refers to inaccurate answers, and there is no inherent reason students would overreport textbook costs in an anonymous email survey, so C is incorrect. The bias here is nonresponse: students with higher textbook costs are more likely to be taking more classes, have busier schedules, and are less likely to respond, so they are underrepresented in the final sample, leading to an average that is lower than the true population average. Correct answer is B.
Question 2 (Free Response)
A national coffee shop chain wants to estimate the proportion of its customers who support adding a permanent plant-based menu section to all locations. The chain places a paper survey on every table in one randomly selected store, asking customers to fill it out and leave it at the register when they leave. (a) Identify the sampling method used here, and name the most likely source of bias. (b) Explain why this bias occurs in this specific context. (c) The regional manager says "If we get more than 1000 responses across all our stores, the result will be accurate enough to use for our business decision." Do you agree with the manager? Explain why or why not.
Worked Solution: (a) This is a convenience sample, because the chain only surveys customers who choose to fill out and return the survey. The most likely source of bias is voluntary response bias. (b) Voluntary response bias occurs here because only customers who have strong opinions about a plant-based menu (either strongly for or strongly against) will choose to take the time to fill out the survey. Customers who are neutral about the change will almost always ignore the survey, so the sample overrepresents extreme views. If customers who support plant-based options are more motivated to respond, the estimate of support will be higher than the true proportion of all chain customers who support the change. (c) I do not agree with the manager. Voluntary response bias is a systematic flaw in the sampling design, not random error. Increasing the sample size only reduces random sampling error (fluctuation from chance), it cannot fix systematic bias caused by a flawed sampling method. Even with 1000 responses, the results will still overrepresent extreme opinions, so the result will not be accurate for the full population of customers.
Question 3 (Application / Real-World Style)
A public health researcher wants to estimate the proportion of adults over 50 in a rural county that have had at least one dose of the shingles vaccine, which is recommended for all adults in this age group. The researcher selects a random sample of 300 adults over 50 from the county’s public health patient list, and calls them at home between 2pm and 4pm on weekdays to ask if they have received the vaccine. The researcher finds that 62% of respondents have had the vaccine, and concludes that 62% of all adults over 50 in the county have been vaccinated. What source of bias is most likely present here, and do you think the 62% estimate is higher or lower than the true population value? Explain your reasoning.
Worked Solution: The most likely source of bias here is nonresponse bias. The researcher calls during weekday afternoons, when most working adults aged 50-64 are still at work and will not answer the phone. Retirees aged 65 and older are much more likely to be home to answer the call, and vaccine uptake is consistently higher among adults over 65 than 50-64 year old working adults. Because the final sample overrepresents older retirees (who are more likely to be vaccinated) and underrepresents 50-64 year old working adults (who are less likely to be vaccinated), the 62% estimate is higher than the true proportion of all adults over 50 in the county that have received the shingles vaccine.
7. Quick Reference Cheatsheet
| Category | Formula | Notes |
|---|---|---|
| Sampling Bias | N/A | Systematic error that consistently skews a statistic; never reduced by increasing sample size |
| Selection Bias | N/A | Occurs when some groups are systematically more likely to be selected into the sample |
| Undercoverage Bias | N/A | Subtype of selection bias; groups are excluded from the sampling frame, never have a chance to be selected |
| Voluntary Response Bias | N/A | Subtype of selection bias; entire sample is self-selected volunteers; overrepresents extreme opinions; highly biased |
| Nonresponse Bias | N/A | Occurs when selected participants do not respond; nonrespondents differ systematically from respondents |
| Response Bias | N/A | Occurs during data collection; leads to inaccurate/untruthful responses; caused by leading questions or social desirability |
| Random Sampling Error | N/A | Random fluctuation of a statistic around the true parameter; reduced by increasing sample size |
8. What's Next
Sources of bias in sampling is a foundational prerequisite for designing studies and interpreting research results, the core next topics in Unit 3 (Collecting Data). Without being able to identify sources of bias in sampling, you will not be able to evaluate whether a study’s conclusions are valid, a core skill tested on every AP Statistics exam. This topic also feeds directly into statistical inference for proportions and means later in the course: when we construct confidence intervals or run hypothesis tests, we assume our sample is free of systematic bias, so if bias is present, any inference from the sample will be unreliable. Next you will learn to identify bias in experiments, distinguish between observational and experimental studies, and use random selection and assignment to reduce bias.