| Study Guides
AP · Designing an Experiment · 14 min read · Updated 2026-05-10

Designing an Experiment — AP Statistics Study Guide

For: AP Statistics candidates sitting AP Statistics.

Covers: distinctions between experimental and observational studies, core principles of experimental design, three common design types, blinding, placebo effect, confounding, and identifying design flaws.

You should already know: Difference between population and sample, the definition of a random sample, basic descriptive statistics for quantitative and categorical variables.

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.


1. What Is Designing an Experiment?

Designing an experiment is the process of planning a controlled study to isolate the causal effect of one or more explanatory variables (called factors) on a measured response variable. This differs from observational studies, where researchers only measure existing variables without imposing any treatments on study units. According to the AP Statistics Course and Exam Description (CED), this topic makes up roughly 8-12% of Unit 3 (Collecting Data), translating to 2-4% of the total AP exam score. This topic appears in both multiple choice (MCQ) and free response (FRQ) sections of the exam: MCQs typically ask to identify appropriate design types or common sources of bias, while FRQs often ask to describe an appropriate design for a given scenario or critique a poorly designed experiment. Standard notation used on the exam: factors are the manipulated explanatory variables, levels are the distinct values of a factor, treatments are the specific combinations of factor levels applied to units, and experimental units are the individual objects or people studied. The core goal of good experimental design is to eliminate confounding, which occurs when the effects of two variables on the response cannot be distinguished from one another.

2. Core Principles of Experimental Design

All valid experiments are built on four core principles that work together to reduce bias and eliminate confounding, allowing researchers to draw causal conclusions.

  1. Control: Researchers include a control group (a group that receives no active treatment, a placebo, or the existing standard treatment) to compare against the treatment groups of interest. Control eliminates confounding from lurking variables like the placebo effect, where participants experience a change in response just from knowing they received a treatment.
  2. Randomization: Experimental units are randomly assigned to treatments, rather than letting participants choose their treatment or assigning based on researcher preference. Randomization balances out the effects of both known and unknown lurking variables across treatment groups on average, so any observed difference in response can be attributed to the treatment rather than confounding.
  3. Replication: Each treatment is applied to multiple independent experimental units. Replication reduces the impact of random variation between individual units, making it easier to detect a true treatment effect that is not just due to chance.
  4. Blocking: Researchers group experimental units into blocks (groups of units that are similar on a known lurking variable that is expected to affect the response), then randomize treatments within each block. Blocking removes variability from the known lurking variable from the treatment comparison, leading to a more precise, sensitive test of the treatment effect.

Worked Example

A researcher wants to test whether a new allergy medication reduces symptom severity more than the current leading medication. She knows that pollen exposure (which varies by where participants live) has a large effect on allergy symptoms. What core principles must she include in her experiment, and what purpose does each serve?

  1. Control: She will use the current leading medication as a control group to compare the new medication against. This controls for the placebo effect and natural variation in allergy symptoms over the study period, eliminating confounding from the act of receiving treatment.
  2. Randomization: She will randomly assign participants to either the new medication or the control medication. This balances out unknown lurking variables like pre-existing health conditions across the two groups, preventing confounding from these variables.
  3. Replication: She will assign dozens of participants to each group, rather than just one or two. This reduces the impact of random individual variation, making it more likely that a true difference between medications will be detected.
  4. Blocking: She will block participants by their residential pollen level (high, medium, low). By randomizing within each pollen block, she removes pollen-related variation from the comparison of medication effects, leading to a more sensitive test.

Exam tip: On AP FRQs, always explain why each principle or design choice is needed in the specific context of the problem — half of all points for design questions come from context-specific justification, not just naming the principle.

3. Common Experimental Design Types

The four core principles are combined into three standard design types that are tested repeatedly on the AP exam, each suited for different scenarios.

  1. Completely Randomized Design (CRD): All experimental units are randomly assigned directly to treatments, with no blocking. This is the simplest design, used when there are no large, known lurking variables that need to be controlled.
  2. Randomized Block Design (RBD): Units are first grouped into blocks based on a known lurking variable expected to affect the response, then all treatments are randomly assigned within each block. Blocking reduces variability from the block variable, making it easier to detect a treatment effect. There can be any number of blocks and any number of units per block.
  3. Matched Pairs Design: A special case of randomized block design where each block has exactly two units. The two units in each block are matched on all major lurking variables expected to affect the response, then one unit in the pair is randomly assigned to each treatment. A common variation uses the same unit as both members of the pair: the unit receives both treatments in random order, with a washout period between treatments to avoid carryover effect. Matched pairs removes almost all between-unit variation, making it the most sensitive design for studies with two treatments.

Worked Example

A market researcher wants to test whether a new package design for a soda increases consumer rating of the product compared to the original package. 40 consumers have agreed to participate in the study. Describe an appropriate design for this study.

  1. Choose matched pairs design: Each consumer’s rating is highly personal, so using the same consumer for both packages removes between-consumer variation in preferences, leading to a more sensitive comparison.
  2. Randomize order of presentation: Use a random number generator to assign half of the consumers to see the new package first and rate it, then see the original package and rate it. The other half see the original first, then the new. Randomizing order eliminates bias from the order of presentation.
  3. Use a short break between ratings to avoid carryover effect (where the rating of the first package affects the rating of the second).
  4. Calculate the difference in ratings (new minus original) for each consumer, then analyze the distribution of differences to test for an effect of the new package.

Exam tip: If a question asks for a matched pairs design, always mention randomization (of order for within-subject pairs, or of treatment within the pair for matched subjects) — forgetting randomization is the most common point deduction on this topic.

4. Blinding and Confounding

Blinding and control for confounding are key concepts that appear frequently on both MCQ and FRQ questions. Blinding is the practice of withholding information about which treatment a unit received to prevent response bias. In single-blind studies, the experimental units do not know which treatment they received, which prevents the placebo effect. In double-blind studies, both the units and the researchers measuring the response do not know which treatment was assigned, which prevents experimenter bias (where researchers unconsciously measure the response differently based on which treatment they think a unit received). Confounding occurs when a lurking variable is systematically associated with the treatment and also affects the response variable, so researchers cannot distinguish whether the observed response is due to the treatment or the lurking variable. Confounding is the most common flaw in poorly designed experiments, especially when random assignment is not used.

Worked Example

A college professor wants to test whether attending optional weekly review sessions improves final exam scores. He lets students choose whether to attend the review sessions, then compares the average scores of attendees vs non-attendees. He finds attendees scored 18 points higher on average than non-attendees. Name the source of bias here, explain the confounding, and describe how to fix it.

  1. The source of bias is confounding by student motivation: More motivated students are far more likely to choose to attend optional review sessions.
  2. **More motivated students also study more outside of review sessions and have higher average scores regardless of the review sessions themselves. We cannot distinguish whether the higher scores are caused by the review sessions or by the higher motivation of attendees, so the result is confounded.
  3. **To fix this confounding, use random assignment: label all students in the class with unique numbers, randomly select half to be required to attend the review sessions, and the other half to be not allowed to attend.
  4. To prevent grading bias, blind the professor grading the exams to which group each student is in, so grading does not get skewed by expectations.

Exam tip: When asked to identify confounding, always explicitly link the lurking variable to both the treatment assignment and the response — AP exam graders require this explicit link to award points.

5. Common Pitfalls (and how to avoid them)

  • Wrong move: Calling an observational study an experiment just because it uses random sampling. Why: Students confuse random sampling (used in both observational and experimental studies) with random assignment (the defining feature of an experiment). Correct move: Always check if the researcher imposed the treatment on the units — if yes, it’s an experiment; if not, it’s observational, regardless of random sampling.
  • Wrong move: Stating that blocking eliminates confounding from unknown lurking variables. Why: Students mix up the purpose of randomization (balances known and unknown lurking variables) and blocking (controls variation from known lurking variables only). Correct move: Always explicitly state blocking reduces variation from a known lurking variable; randomization balances unknown lurking variables.
  • Wrong move: Forgetting to mention randomization when describing a matched pairs design. Why: Students think matching replaces randomization, but randomization within the pair is still required to avoid bias. Correct move: Always add "randomly assign one member of the pair to the treatment (or randomize the order of treatments for within-subject pairs)" when describing a matched pairs design.
  • Wrong move: Claiming a completely randomized experiment can never have confounding. Why: Students think randomization eliminates all confounding, but small sample sizes can still lead to large imbalance in lurking variables. Correct move: If asked whether confounding is possible, state that randomization reduces the chance of confounding but it is still possible with small replication.
  • Wrong move: Calling a matched pairs design a completely randomized design. Why: Students forget that matched pairs is a type of block design, with each pair acting as a block of size 2. Correct move: If units are grouped into pairs before randomization, it is a matched pairs (block) design, not completely randomized.

6. Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

A agricultural scientist wants to test the effect of four different pesticides on the yield of apple trees. The orchard where the study will be held has sloped terrain, and soil moisture is significantly lower at the top of the slope than at the bottom. Which of the following is the most appropriate experimental design for this study? A) Completely randomized design, because we have four treatments B) Randomized block design, with blocks defined by slope position (top vs bottom) C) Matched pairs design, with one tree of each pesticide in each block D) Observational study, because the slope already exists before the study

Worked Solution: The scientist is imposing the treatment (pesticide application), so this is an experiment, eliminating D. Slope position is a known source of variation in yield that needs to be controlled, so we need to block on slope position to remove this variation from the treatment comparison, so A (which has no blocking) is incorrect. Matched pairs design is only used for two treatments, so C is incorrect. The correct answer is B.


Question 2 (Free Response)

A physical education researcher wants to test whether a 5-minute daily stretching routine improves 1-mile run times in high school cross country runners. 80 runners volunteer for the study. (a) Describe how to implement a completely randomized design for this study. (b) Explain why the researcher might choose to use a randomized block design with blocking by sex. What is the benefit of blocking here? (c) Would a matched pairs design be appropriate for this study? If yes, describe how to implement it.

Worked Solution: (a) First, label each of the 80 volunteers with a unique number from 1 to 80. Use a random number generator to select 40 unique numbers; the runners with these numbers are assigned to the 5-minute daily stretching routine. The remaining 40 runners do not change their current routine (control group). After 8 weeks, record the 1-mile run time for each runner and compare the average run time between the two groups. (b) On average, male high school runners have faster 1-mile run times than female high school runners of the same age. If we do not block on sex, sex-related variation adds to the overall variability in run times, making it harder to detect a true effect of stretching. Blocking on sex removes this sex-related variation from the comparison of stretching vs control, leading to a more precise test of the stretching effect. (c) Yes, a matched pairs design is appropriate. One valid implementation is to match pairs of runners by their baseline 1-mile run time and weekly training volume, which are both known to affect final run time. Within each pair, randomly assign one runner to the stretching routine and the other to the control group. After 8 weeks, compare the difference in run time within each pair. This removes between-runner variation in baseline ability, making the test more sensitive than a completely randomized design.


Question 3 (Application / Real-World Style)

A national retail chain wants to test whether changing the background music in stores to slower tempo increases average daily revenue compared to the current faster tempo. The chain has 36 stores: 12 large urban stores, 12 medium suburban stores, and 12 small rural stores. Average daily revenue is consistently highest in urban stores, followed by suburban, then rural. Propose and describe an appropriate experimental design for this study, run over 4 weeks.

Worked Solution: The appropriate design is a randomized block design, blocking by store type (urban, suburban, rural), since store type is a known source of variation in average revenue. Step 1: Separate the 36 stores into three blocks: 12 urban, 12 suburban, 12 rural. Step 2: Within each block, use a random number generator to select 6 stores to switch to slower tempo music (treatment group), and the other 6 stores keep the current faster tempo (control group). Step 3: After 4 weeks, calculate the average daily revenue for each store. Step 4: Compare the difference in average revenue between treatment and control within each block, then combine results across blocks to estimate the overall effect of slower music. Interpretation: This design removes revenue variation caused by store size and location, making it much more likely that the experiment will detect a true effect of slower tempo music if it exists.

7. Quick Reference Cheatsheet

Category Formula/Rule Notes
Experiment vs Observational N/A Experiment = researcher imposes treatment; random assignment is the defining feature of an experiment
Control Principle N/A Include a control group to compare against treatment; controls for placebo effect and natural variation
Randomization Principle N/A Randomly assign units to treatments; balances both known and unknown lurking variables across groups; reduces confounding
Replication Principle N/A Apply each treatment to multiple units; reduces random variation; improves ability to detect true treatment effects
Blocking Principle N/A Group units by known lurking variable that affects response; randomize within blocks; reduces variability from the known variable
Completely Randomized Design N/A All units randomized directly to treatments; no blocking; used when no major known lurking variables exist
Randomized Block Design N/A Units grouped into blocks by known lurking variable; randomize treatments within blocks; more precise than CRD when blocking variable is relevant
Matched Pairs Design N/A Special case of RBD with 2 units per block (or 1 unit with both treatments); most sensitive design for two treatments

8. What's Next

Designing experiments is the foundation for all statistical inference about causation, the core goal of most applied statistical studies in AP Statistics. Next in Unit 3, you will learn to generalize results from experiments and samples to broader populations, and distinguish between bias and random variation — skills that rely entirely on mastering the design principles covered here. Without correctly identifying design flaws or the correct design type, you cannot draw valid causal conclusions or correctly interpret experimental results. This topic also feeds directly into inference for means and proportions, where you will use the experiment design to choose the correct inference procedure (e.g., a matched pairs t-procedure vs a two-sample t-procedure).

Sampling Methods Inference for Matched Pairs Comparing Two Treatments Causal Inference

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →