AP · Collecting Data · 16 min read · Updated 2026-05-10

Collecting Data — AP Statistics Study Guide

For: AP Statistics candidates sitting AP Statistics.

Covers: Full unit overview of AP Statistics Collecting Data, including study planning, random sampling methods, identification of sampling bias, experimental design fundamentals, principles of good experimentation, and rules for inference from studies.

You should already know:

Difference between populations, parameters, samples, and statistics

Basic descriptive statistics for categorical and quantitative data

Introductory probability concepts for random processes

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.

1. Why This Unit Matters

Collecting data is the foundational first step of all statistical practice, and per the AP Statistics Course and Exam Description (CED), this unit accounts for 12-15% of your total exam score, appearing on both multiple-choice (MCQ) and free-response (FRQ) sections. You can expect at least one full MCQ set or FRQ part focused on identifying bias, describing experimental design, or justifying what inferences can be drawn from a study.

The entire rest of the AP Statistics course—from sampling distributions to confidence intervals to hypothesis testing—relies on data collected correctly. The core principle of statistics is "garbage in, garbage out": even the most sophisticated statistical analysis cannot fix systematic errors from bad data collection. This unit also emphasizes conceptual thinking over calculation, which aligns with the AP exam’s shift toward testing your ability to reason about statistical design rather than just compute values. Mastery of this unit ensures you can recognize good studies, critique bad ones, and correctly interpret the results of any statistical analysis you encounter later in the course.

2. Unit Concept Map

The 6 sub-topics of Collecting Data build sequentially from first principles to final inference, following the exact order you would follow when running a real study:

Introduction to Planning a Study: The foundational starting point. This sub-topic teaches you to define your population of interest, distinguish between parameters and statistics, and classify a study as observational or experimental, setting the scope for all subsequent steps.
Random Sampling and Data Collection: Next, you learn how to select a sample that represents your population of interest, covering common sampling methods from simple random sampling to stratified, cluster, and systematic sampling. This builds directly on the population/sample distinction from the first sub-topic.
Sources of Bias in Sampling: After learning how to sample correctly, this sub-topic teaches you to recognize and avoid common systematic errors that lead to unrepresentative samples, including selection bias, nonresponse bias, and response bias.
Designing an Experiment: Moving from observational studies to experimental research, this sub-topic introduces core experimental terminology: treatments, response variables, control groups, and random assignment. It builds on the study type distinction you learned in the first sub-topic.
How to Experiment Well: This sub-topic expands on basic experimental design to cover the three core principles of good experiments (control, randomization, replication), plus advanced techniques like blocking, confounding, and matched pairs design.
Inference and Experiments: The capstone of the unit, this sub-topic unifies all previous content to answer the core question: what inferences can you draw from a well-designed vs poorly designed study? It clarifies when you can generalize results to a broader population and when you can conclude causation.

3. A Guided Tour of an Exam-Scale Problem

To see how these sub-topics work together on a single exam problem, let’s walk through a common research question step by step, highlighting which sub-topic applies at each stage:

A nutrition researcher wants to test whether eating one serving of blueberries daily reduces resting systolic blood pressure in adults over 50 in the United States.

First, plan the study (Introduction to Planning a Study): We start by defining our population of interest (all U.S. adults over 50) and parameter of interest (the true difference in mean systolic blood pressure between daily blueberry consumers and non-consumers). Since our research question asks about causation, we choose to run an experiment rather than an observational study, because observational studies cannot rule out confounding from variables like exercise, income, or overall diet. This entire first step relies on the first unit sub-topic.
Second, sample participants and check for bias (Random Sampling and Sources of Bias): If the researcher recruits only participants from a local senior center, what error is introduced? We use random sampling methods to get a representative sample that allows generalization to the full population. Recruiting only from one local senior center creates undercoverage bias: adults who don’t attend that senior center (e.g., those with mobility issues, low-income seniors) are excluded, so the sample is not representative. We can correct this by using stratified random sampling, dividing the U.S. into regions and sampling randomly from each region to ensure full representation.
Third, draw valid conclusions (Inference and Experiments): After designing the experiment with random assignment of participants to blueberry vs. control (no daily blueberry) groups and random sampling from the full population of U.S. adults over 50, we can draw two valid inferences: (1) we can generalize results to all U.S. adults over 50 because we used random sampling, and (2) any statistically significant difference in blood pressure is caused by daily blueberry consumption because we used random assignment.

4. Common Cross-Cutting Pitfalls (and how to avoid them)

Wrong move: Confusing random sampling with random assignment when stating what inferences can be drawn. Why: Students learn both random processes in different sub-topics and mix up their distinct purposes. Correct move: Always remember the rule: random sampling (how you select participants from the population) lets you generalize results to the full population; random assignment (how you assign treatments to participants) lets you conclude causation.
Wrong move: Calling any variation in study results "bias". Why: Students confuse random sampling variability with systematic bias, which is a consistent error that favors one outcome. Correct move: Only label an error bias if it is a systematic, repeatable error that shifts results away from the true value in a consistent direction; random fluctuation between samples is sampling variability, not bias.
Wrong move: Claiming causation from an observational study based on a strong correlation. Why: Students forget that observational studies cannot control for all confounding variables, even if the sample is large. Correct move: Only concede that a relationship is causal if the study uses random assignment of treatments in a well-designed experiment.
Wrong move: Confusing stratified sampling with cluster sampling when identifying a sampling method. Why: Both methods divide the population into groups, so students mix up the purpose and implementation of each. Correct move: For any grouped sampling: if you take a small sample from every group, it is stratified (groups are different from each other); if you randomly select whole groups and sample everyone in selected groups, it is cluster (each group is representative of the population).
Wrong move: Omitting control of confounding variables when describing an experiment. Why: Students focus on remembering random assignment and replication, and skip the need to control other variables that could affect the response. Correct move: Always include a step in your description to hold all variables other than the treatment constant across groups, or use blocking to account for known sources of variation.
Wrong move: Assuming that an unbiased sample gives a perfectly accurate estimate of the population parameter. Why: Students think eliminating bias removes all error from sampling. Correct move: Recognize that even unbiased samples have random sampling variability, so your sample estimate will almost never equal the true population value exactly.

5. Quick Check: When Do You Use Which Sub-Topic?

For each of the following scenarios, identify which sub-topic of Collecting Data you would use to address the situation:

You need to select 200 employees from a company of 2,000 to ask about their satisfaction with remote work policies.
70 of the 200 selected employees do not respond to your survey. You need to identify what problem this creates for your results.
You want to test whether a new onboarding program reduces new employee turnover, and need to decide whether to run an observational study or an experiment.
You want to design an experiment that accounts for the fact that turnover differs between entry-level and executive employees, and need to adjust your design to account for this difference.
You ran your experiment with random sampling of all new employees and random assignment to the new vs old onboarding program. You need to state what conclusions you can draw from your results.

Click for Answers

1. Random Sampling and Data Collection: to select a representative sample from the company employee population. 2. Sources of Bias in Sampling: this is nonresponse bias, a systematic error that skews results. 3. Introduction to Planning a Study: this step involves classifying study type based on your research question. 4. How to Experiment Well: you will use blocking to account for the known difference between job levels, improving your design. 5. Inference and Experiments: this sub-topic gives you the rules to state what inferences are valid.

6. See Also: All Sub-Topics in This Unit

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →