College Board · cb-statistics · AP Statistics · Collecting Data · 18 min read · Updated 2026-05-07

Collecting Data — AP Statistics Stats Study Guide

For: AP Statistics candidates sitting AP Statistics.

Covers: all Unit 3 Collecting Data content per the 2020+ AP Statistics CED, including probability sampling methods, sampling bias, core experimental design principles, confounding vs lurking variables, and rules for drawing valid inferences from data.

You should already know: Algebra 2, basic probability intuition.

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official College Board mark schemes for grading conventions.

1. What Is Collecting Data?

Collecting Data is the process of gathering representative, reliable observations to answer statistical questions, forming the foundation of all inferential statistics in the AP Statistics curriculum. Poor data collection leads to invalid conclusions even with perfect analysis, so this topic makes up 15-23% of your AP Stats exam score per the CED, appearing in both multiple choice and free response questions, often as part of a 6-point investigative task at the end of the exam. The core goal of rigorous data collection is to minimize systematic error so that any observed patterns in your sample can be reliably attributed to the effect you are studying, not flaws in your study design.

2. Sampling methods — SRS, stratified, cluster, systematic

Sampling refers to selecting a subset (sample) of a larger population to measure, since measuring every member of the population (a census) is often impractical, costly, or impossible. Below are the four probability sampling methods tested on the AP exam, each with specific use cases, advantages, and disadvantages:

Simple Random Sampling (SRS)

Every possible sample of size $n$ from the population has an equal probability of being selected. To implement SRS: assign every population member a unique identifier, use a random number generator or random digit table to select $n$ identifiers without replacement.

Worked example: For a population of 100 10th graders, assign numbers 00-99, use random digits to pick 20 unique numbers: this forms an SRS of size 20.
Advantages: Unbiased, simple for small populations with a complete sampling frame (list of population members).
Disadvantages: Not efficient for large, geographically spread populations, may underrepresent small subgroups.

Stratified Random Sampling

Divide the population into mutually exclusive, homogeneous groups called strata (singular: stratum) based on a characteristic related to the variable you are measuring, then take an SRS from each stratum. Strata are homogeneous within groups and different between groups.

Worked example: If you are measuring average AP exam score across a high school, create strata for freshmen, sophomores, juniors, and seniors, then take an SRS of 15 students from each class.
Advantages: Ensures representation of small subgroups, reduces sampling variability.
Disadvantages: Requires prior knowledge of stratum membership for every population member.

Cluster Sampling

Divide the population into mutually exclusive, heterogeneous groups called clusters that each mirror the diversity of the full population, then randomly select a subset of clusters and measure every member of the selected clusters. Clusters are heterogeneous within groups and similar between groups.

Worked example: For a survey of students in a large district with 50 identical elementary schools, treat each school as a cluster, randomly select 8 schools, and survey every student in those 8 schools.
Advantages: Low cost, efficient for geographically spread populations, no need for a full population-level sampling frame.
Disadvantages: Higher sampling variability than SRS/stratified sampling if clusters are not truly representative of the full population.

Exam tip: Examiners frequently ask you to distinguish stratified vs cluster sampling: if you take SRS from every group, it is stratified; if you select entire groups to measure, it is cluster.

Systematic Sampling

Select every $k$ -th member of the population after a random starting point between 1 and $k$ , where $k = ⌊ \frac{population size}{desired sample size} ⌋$ .

Worked example: For a population of 5000 customers and desired sample size 200, $k = \frac{5000}{200} = 25$ . Pick a random start at 7, then select customers 7, 32, 57, ... until you reach 200 respondents.
Advantages: Easy to implement for ordered populations (e.g. people entering a store, customer transaction lists), no need for a full sampling frame.
Disadvantages: Biased if there is a repeating pattern in the population order that matches $k$ (e.g. surveying every 10th mall visitor when every 10th visitor is part of a sports team arriving for a group event).

3. Sampling bias and undercoverage

Sampling bias occurs when some members of the population are systematically more likely to be selected in the sample than others, leading to sample results that are not representative of the population. This is distinct from random sampling error, which is natural variation between random samples and cannot be eliminated, only reduced by increasing sample size.

Undercoverage (most frequently tested on AP exams)

Undercoverage occurs when some groups in the population are inadequately represented in the sampling frame (the list of population members you are selecting from).

Worked example: If you conduct a survey of town residents by calling landlines only, people who only use cell phones (disproportionately young and low-income people) are excluded from the sampling frame, leading to undercoverage of these groups. If you are measuring support for a new youth center, your results will overestimate opposition, because older people are overrepresented in your sample.

Other common bias types

Nonresponse bias: When selected individuals refuse to participate, and non-participants differ systematically from participants. For example, an email survey about job satisfaction will have higher response rates from people who are very happy or very unhappy with their job, leading to biased results.
Response bias: When respondents give inaccurate answers, usually due to leading questions, social desirability bias, or confusing wording. For example, the question "Do you support the harmful new tax increase that will raise grocery prices for working families?" is a leading question that will produce more negative responses than a neutrally worded alternative.
Voluntary response bias: When respondents self-select to participate (e.g. online polls, call-in radio surveys). People with strong opinions are far more likely to respond, so results are almost always biased.

Exam tip: To get full marks on bias questions, follow the 3-step rule: 1) Name the bias, 2) Explain how it applies to the scenario, 3) State the direction of the bias (overestimate/underestimate of the measured value).

4. Experiments — control, randomisation, replication

First, distinguish the two core study types:

Observational study: You measure variables without interfering with subjects (you only observe). You can only find associations, not causation.
Experiment: You deliberately impose a treatment on subjects to measure their response. Well-designed experiments allow you to establish causal relationships.

Below are the three core principles of experimental design, tested heavily on both multiple choice and free response questions:

1. Control

Hold all other variables that could affect the response constant across treatment groups, so that the only difference between groups is the treatment you are testing. Common control measures include:

A control group: a group that receives no treatment, a placebo, or the current standard treatment, to compare results to.
Blinding: Hiding which group subjects are in to avoid the placebo effect (the phenomenon where people experience a response just because they think they received treatment).
Double-blinding: Neither the subjects nor the researchers interacting with them know which group is which, to avoid researcher bias in measuring responses.

2. Randomisation

Randomly assign subjects to treatment groups, so that any variables you did not control for are evenly spread across groups, on average. This eliminates systematic differences between groups before the treatment is applied. Key distinction: Random sampling (selecting subjects from the population) is different from random assignment (assigning selected subjects to treatment groups): random sampling allows you to generalise to the population, random assignment allows you to draw causal conclusions.

3. Replication

Use a sufficiently large number of subjects in each group, so that chance differences between groups are unlikely to explain differences in response. You can also replicate the entire experiment multiple times with different groups to confirm results. Larger sample sizes per group reduce the probability of a Type I error (false positive result).

Worked example: A teacher wants to test if a new study app improves test scores. They give the app to all students in their 1st period class, and do not give it to students in their 5th period class. They then compare test scores between the two classes. Two flaws in this design are: 1) No random assignment: students are assigned to classes based on schedule, so 1st period students may be more alert or have higher prior achievement than 5th period students, 2) No control: the teacher did not control for variables like time spent studying or test difficulty across the two classes.

5. Confounding and lurking variables

Both types of variables prevent you from drawing causal conclusions in observational studies:

Lurking variable: A variable that is not measured in the study, but affects both the explanatory variable and the response variable.
Confounding variable: A variable that is measured, but differs systematically between treatment groups, so you cannot tell if the difference in response is due to the treatment or the confounding variable.

Worked example: A study finds that people who drink coffee every day have a lower risk of heart disease than people who do not drink coffee. A lurking variable here is physical activity: coffee drinkers may be more likely to exercise regularly, which reduces heart disease risk, so the lower risk could be due to exercise, not coffee.

Exam tip: To get full marks when identifying a confounding variable, follow the 3-step rule: 1) Name the variable, 2) Explain how it differs between groups, 3) Explain how it affects the response variable. For example: "Prior GPA is a confounding variable: 1st period students likely have higher average prior GPA than 5th period students, and higher prior GPA is associated with higher test scores, so we cannot tell if higher scores are due to the app or higher prior GPA."

In well-designed experiments, randomisation eliminates confounding variables by evenly distributing them across treatment groups, on average.

6. Scope of inference

The scope of inference refers to the valid conclusions you can draw from a study, based entirely on how the data was collected. There are two core rules tested every year on the AP exam:

If the sample was selected using random sampling from a defined population, you can generalise the results of the study to that population. If the sample was not randomly selected (e.g. convenience sample, voluntary response), you can only generalise to the group that was studied, not a larger population.
If the study used random assignment of subjects to treatment groups, you can draw causal conclusions about the effect of the treatment on the response. If there was no random assignment (observational study), you can only conclude there is an association between the variables, not that one causes the other.

The four possible scenarios are summarised below:

Random Sampling?	Random Assignment?	Scope of Inference
Yes	Yes	Can generalise to population, can draw causal conclusions
Yes	No	Can generalise to population, only association, no causation
No	Yes	Cannot generalise beyond study sample, can draw causal conclusions for the sample
No	No	Cannot generalise, only association for the sample

Worked example: A research team randomly selects 200 adults from a city, and randomly assigns 100 to follow a low-carb diet and 100 to follow a low-fat diet for 6 months. They find that the low-carb group lost an average of 3kg more than the low-fat group, with a statistically significant difference. They used random sampling, so they can generalise results to all adults in the city, and random assignment, so they can conclude the low-carb diet caused the greater weight loss.

7. Common Pitfalls (and how to avoid them)

Wrong move: Mixing up stratified and cluster sampling when explaining study designs. Why you do it: Both involve grouping the population before sampling, so it is easy to confuse the two. Correct move: Remember the difference in group characteristics: strata are homogeneous (members of the same stratum share the key characteristic you are controlling for), clusters are heterogeneous (each cluster looks like a mini version of the full population). If you are taking SRS from every group, it is stratified; if you are randomly selecting entire groups to measure, it is cluster.
Wrong move: Identifying a bias in a free response question but not explaining its direction or effect on the result. Why you do it: You think naming the bias is enough to get marks, but AP graders require context. Correct move: Follow the 3-step rule for bias questions: 1) Name the bias, 2) Explain how it applies to the specific scenario, 3) State whether it will lead to an overestimate or underestimate of the value you are measuring.
Wrong move: Claiming causation from an observational study, or generalising results from a convenience sample to a full population. Why you do it: It is intuitive to assume correlation = causation, or that the people you studied are representative of everyone. Correct move: Always check for two things before drawing conclusions: was there random sampling (for generalisation) and random assignment (for causation)? If either is missing, you cannot make that claim.
Wrong move: Confusing random sampling and random assignment, or using the terms interchangeably. Why you do it: Both use randomness, but for very different purposes. Correct move: Random sampling = selecting who is in the study from the population, for generalisation. Random assignment = sorting selected study participants into treatment groups, for causal inference. Explicitly use the correct term in your answers to avoid losing marks.
Wrong move: Ignoring the placebo effect as a confounding factor in experiment design questions. Why you do it: You forget that people's beliefs about treatment affect their response. Correct move: If an experiment uses human subjects and no blinding or placebo control, always mention the placebo effect as a potential flaw, and recommend double-blinding and a placebo control group as a fix.

8. Practice Questions (AP Statistics Style)

Question 1

A local council wants to survey 500 residents of a city with 4 distinct neighborhoods to measure support for a new public park. Neighborhood A has 15,000 residents, Neighborhood B has 10,000, Neighborhood C has 12,000, and Neighborhood D has 13,000 residents. a) Describe how to select a sample of 500 residents using stratified random sampling. b) Describe how to select a sample of 500 residents using cluster sampling, using neighborhoods as clusters. c) Which sampling method is more appropriate for this study? Justify your answer.

Solution 1

a) First, calculate the total population: $15, 000 + 10, 000 + 12, 000 + 13, 000 = 50, 000$ residents. The sampling fraction is $\frac{500}{50 , 000} = 0.01$ , so we select 1% of residents from each neighborhood. Assign every resident in Neighborhood A a unique number 00001 to 15000, use a random number generator to select 150 unique numbers (1% of 15,000) and survey those residents. Repeat for the other neighborhoods: select 100 residents from B, 120 from C, and 130 from D, using SRS in each stratum. b) Treat each of the 4 neighborhoods as a cluster. Assign each neighborhood a number 1 to 4, use a random number generator to select 1 cluster, then randomly select 500 residents from the chosen neighborhood to survey. Alternatively, you can randomly select smaller sub-clusters (e.g. city blocks) from each neighborhood to reach 500 total respondents. c) Stratified random sampling is more appropriate. The 4 neighborhoods likely have different levels of support for the park: for example, a neighborhood close to the proposed park site will have higher support than a neighborhood far away. Stratified sampling ensures we get representation from all 4 neighborhoods, so our results will be more accurate and have lower sampling variability than cluster sampling, which might overrepresent or underrepresent neighborhoods if only a subset of clusters are selected.

Question 2

A fitness company claims that their new 30-day workout program leads to an average weight loss of 5kg. They tested the program on 100 volunteers who signed up for the program via a social media ad. At the end of 30 days, the average weight loss for the group was 5.2kg, with a statistically significant difference from 0. a) Identify one source of bias in this study, and describe its effect on the estimated weight loss. b) Can the company conclude that the workout program caused the weight loss? Justify your answer. c) Can the company generalise these results to all adults who are trying to lose weight? Justify your answer.

Solution 2

a) Voluntary response bias: The volunteers signed up for the program via an ad, so they are more motivated to lose weight than the general population, and more likely to follow the program strictly. This will lead to an overestimate of the average weight loss the program would produce for typical users. b) No, they cannot conclude causation. This is an observational study with no control group and no random assignment: there is no group of similar people who did not do the program to compare to. The weight loss could be due to other factors like changes in diet, increased water intake, or the placebo effect, not the workout program itself. c) No, they cannot generalise to all adults trying to lose weight. The sample was a self-selected group of volunteers, not a random sample of adults trying to lose weight. The volunteers are more motivated, so their results are not representative of the broader population.

Question 3

A research team is testing if a new sunscreen is more effective at preventing sunburn than the leading brand. They recruit 200 volunteers for the study. a) Describe a completely randomised design for this experiment. b) Describe a matched pairs design for this experiment, using each volunteer as their own control. c) What is one advantage of the matched pairs design over the completely randomised design for this study?

Solution 3

a) Assign each of the 200 volunteers a unique number from 001 to 200. Use a random number generator to assign 100 volunteers to the treatment group (new sunscreen) and 100 to the control group (leading brand). Ask all volunteers to apply the assigned sunscreen to all exposed skin before spending 2 hours outside on a sunny day. Measure the number of volunteers in each group who develop a sunburn, and compare the proportions. Use double-blinding: make both sunscreens look identical, and do not tell the volunteers or the researchers grading the sunburns which group each volunteer is in, to avoid bias. b) For each volunteer, randomly assign one half of their back to receive the new sunscreen, and the other half to receive the leading brand. Ask all volunteers to spend 2 hours outside with their back exposed. After 2 hours, compare the severity of sunburn on each half of each volunteer's back. Use blinding: make both sunscreens look identical, and do not tell the researchers grading the sunburns which side got which sunscreen. c) The matched pairs design controls for individual differences between volunteers that affect sunburn risk, like skin tone, age, and sun sensitivity. By using each volunteer as their own control, we eliminate these variables as confounding factors, so we can more accurately measure the difference in effectiveness between the two sunscreens, with lower sampling variability.

9. Quick Reference Cheatsheet

Core Sampling Methods

Method	Description	Key Use Case
SRS	Every sample of size $n$ has equal selection probability	Small populations with complete sampling frame
Stratified	Divide into homogeneous strata, SRS from each	Need representation of small subgroups, lower variability
Cluster	Divide into heterogeneous clusters, sample entire clusters	Large, geographically spread populations, low cost
Systematic	Select every $k$ -th member after random start	Ordered populations, no full sampling frame

Key Bias Types

Undercoverage: Groups excluded from sampling frame, systematic underrepresentation
Nonresponse: Selected subjects don't participate, differ from respondents
Response: Respondents give inaccurate answers (leading questions, social desirability)
Voluntary response: Self-selected sample, biased toward strong opinions

Experimental Design Principles

Control: Hold non-treatment variables constant, use control groups, blinding, placebos
Randomisation: Randomly assign subjects to groups to eliminate confounding
Replication: Large sample size per group to reduce chance variation

Scope of Inference Rules

Random Sampling?	Random Assignment?	Valid Conclusions
✅	✅	Generalise to population, causal conclusions
✅	❌	Generalise to population, only association
❌	✅	Causal conclusions only for study sample
❌	❌	Only association for study sample

10. What's Next

Collecting Data is the foundation for all inferential statistics you will learn later in the AP Statistics curriculum. The validity of every confidence interval, hypothesis test, and regression analysis you perform in later units depends entirely on the quality of the data collection method used: if your data is biased or your experiment is poorly designed, even the most sophisticated statistical analysis will produce meaningless results. You will see this topic integrated into free response questions about hypothesis testing for proportions and means, chi-square tests for association, and linear regression, where you will be asked to assess whether the conclusions of a study are valid based on its data collection design.

If you have any questions about sampling methods, experimental design, or scope of inference, you can ask Ollie, our AI tutor, for personalised explanations, extra practice questions, or feedback on your free response answers at any time. Be sure to practise official College Board past paper questions to get familiar with the exact wording and grading conventions used on the AP Statistics exam, and check out our other study guides for Unit 4 Probability, Random Variables, and Probability Distributions next to build on the skills you learned here.

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →