Inference and Experiments — AP Statistics Study Guide
For: AP Statistics candidates sitting AP Statistics.
Covers: Distinguishing associative vs. causal inference, randomized comparative experiments, core principles of experimental design (control, replication, randomization, blocking), confounding variables, and evaluating the scope of inference for results from experiments per the AP Statistics CED Unit 3.
You should already know: the difference between an experiment and an observational study, basic sampling methods for population inference, the definition of confounding variables.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. What Is Inference and Experiments?
Inference is the process of drawing general conclusions about treatment effects or population characteristics that go beyond the raw observed data collected in a study. For experiments, inference has two distinct, commonly tested goals: generalizing results to a broader population, and establishing that a treatment causes a change in the measured response variable. This topic is part of AP Statistics CED Unit 3 (Collecting Data), accounting for 10–15% of total exam weight, and it appears in both multiple-choice (MCQ) and free-response (FRQ) sections, most often as a conceptual short FRQ or a set of 2–3 MCQs. Unlike inference from observational studies, inference from experiments relies on random assignment of treatments to subjects, rather than only random sampling from a population. This key difference changes what types of inference are valid: random assignment allows causal inference, while random sampling allows generalization to a broader population. The AP exam heavily tests your ability to identify which inferences are appropriate based on study design, not just calculation.
2. Causal vs Associative Inference
Inference from any study falls into one of two categories, and the AP exam constantly tests your ability to distinguish between them. Associative (or correlational) inference concludes that an association exists between two variables in a population, but does not claim one variable causes changes in the other. Causal inference concludes that changes in the explanatory (treatment) variable directly cause changes in the response variable. The core rule tested on the exam is: only studies that use random assignment of treatments to experimental units can support causal inference. Observational studies, even well-designed ones, can only support associative inference because of the persistent risk of confounding, where an unmeasured variable creates a spurious association between the explanatory and response variable. Random assignment balances all confounding variables (measured and unmeasured) across treatment groups on average, so any remaining difference between groups can be attributed to the treatment. It is important to note that causal inference and generalization are independent: a study can have one, both, or neither, depending on design choices.
Worked Example
A researcher tests whether daily 10-minute meditation reduces self-reported anxiety in high school seniors. She recruits 60 volunteers from a local high school, and randomly assigns 30 to meditate daily for 8 weeks, and 30 to a control group that does 10 minutes of daily quiet reading. At the end of the study, the meditation group has a statistically significantly lower average anxiety score. Can the researcher make a causal inference that meditation reduces anxiety? Can she generalize this result to all high school seniors in the U.S.? Justify your answer.
Solution:
- First, confirm random assignment: The researcher randomly assigned treatments (meditation vs quiet reading) to volunteers. This balances all confounding variables (like baseline anxiety, study habits, stress from college applications) across the two groups on average.
- Because random assignment was used, the difference in average anxiety can be attributed to the meditation treatment, so a causal inference is valid for this study.
- Next, check for random sampling: The sample consists of 60 volunteers from a single local high school, not a random sample of all U.S. high school seniors. The sample is not representative of the broader population, so generalization to all U.S. high school seniors is not valid.
Exam tip: When asked about inference scope, always address both causation and generalization explicitly, even if the question only asks one. AP exam graders expect you to demonstrate you know the difference between the two requirements, so stating both will help you earn full credit.
3. Principles of Experimental Design for Valid Inference
For inference from an experiment to be valid, the experiment must follow four core design principles: control, replication, randomization, and blocking (when appropriate). Each principle addresses a different threat to valid inference:
- Control: You need to compare the treatment group to a control group that receives no treatment, a placebo, or the current standard treatment. This controls for confounding effects like the placebo effect or natural changes in the response over time, allowing you to isolate the treatment effect.
- Replication: You apply each treatment to multiple independent experimental units (subjects, plants, test plots, etc.). Replication reduces the impact of random individual variation on your results, making it easier to detect a true treatment effect and leading to more precise inference.
- Randomization: You randomly assign treatments to experimental units. As noted earlier, this balances measured and unmeasured confounding variables across groups, enabling causal inference.
- Blocking: You group experimental units that are similar on a known confounding variable (e.g., age, gender, pre-existing health status) into blocks, then randomly assign treatments within each block. Blocking removes variability from the known confounding variable from your error term, making it easier to detect a true treatment effect.
Worked Example
A agriculture researcher wants to test whether a new fertilizer increases corn yield compared to a standard fertilizer. She knows that corn yield is affected by how much sunlight a plot receives, and her 20 test plots are split between a forest edge (lower sunlight) and an open field (higher sunlight). Design an appropriate experiment to test the new fertilizer, naming the principles you use.
Solution:
- Blocking: First, split the 20 plots into two blocks: 10 plots in the low-sunlight forest edge block, and 10 plots in the high-sunlight open field block. This accounts for the known effect of sunlight on yield, so we do not confuse sunlight variability with fertilizer variability.
- Randomization: Within each block, randomly assign 5 plots to get the new fertilizer and 5 plots to get the standard fertilizer. Randomization balances unmeasured confounding variables (like soil nutrient variation) across the two fertilizer groups.
- Replication: We have 5 plots per fertilizer in each block, for 10 total plots per fertilizer across all blocks. This gives us enough replication to reduce the impact of random plot-to-plot variation.
- Control: We use the standard fertilizer as a control, so we can compare the yield of the new fertilizer to the existing standard to measure the treatment effect.
Exam tip: If the study groups units by a pre-existing variable before randomizing treatments, that is blocking, not confounding. Confounding is for uncontrolled variables; blocking is an intentional technique to improve inference.
4. Confounding and Threats to Valid Inference
Confounding occurs when the effect of the treatment on the response is mixed up with the effect of another uncontrolled variable, so you cannot tell which variable caused the observed change in the response. Confounding is the primary reason causal inference is not valid for observational studies, but it can also occur in poorly designed experiments. Common sources of confounding in experiments include: selection bias (when subjects choose their own treatment, leading to systematic differences between groups), lack of blinding (when researchers or subjects know which treatment they get, leading to biased response measurement), and lurking variables (unmeasured variables that are associated with treatment assignment). AP questions frequently ask you to identify a possible confounding variable in a poorly designed experiment and explain how it threatens causal inference, so it is important to practice this skill. To be a valid confounding variable, the variable must be associated with both the treatment and the response.
Worked Example
A restaurant chain wants to test whether adding a new appetizer to the menu increases total monthly revenue. They add the new appetizer to 10 randomly selected locations, and leave the menu unchanged at 10 other locations. After 3 months, the locations with the new appetizer have 12% higher average revenue than the locations without. A manager claims the new appetizer caused the revenue increase. Identify a possible confounding variable and explain why it threatens the causal inference.
Solution:
- One possible confounding variable is location size: the chain may have assigned the new appetizer to larger, higher-revenue locations by chance, or intentionally.
- This variable is confounded with the new appetizer because location size is associated with the treatment (new appetizer is more likely to be added to larger locations) and associated with the response (larger locations naturally have higher total revenue).
- We cannot separate the effect of the new appetizer from the effect of location size, so the manager's claim that the new appetizer caused the revenue increase is not justified by this study.
- Other valid examples include regional marketing campaigns that ran at the same time the new appetizer was added, or different average foot traffic between the two groups of locations.
Exam tip: When asked to identify a confounding variable on an FRQ, you must explicitly explain how it is associated with both the treatment and the response to earn full credit. Naming the variable alone is not enough.
5. Common Pitfalls (and how to avoid them)
- Wrong move: Claiming causal inference from an observational study, or claiming generalization to a population from a non-random sample of volunteers. Why: Students confuse random sampling and random assignment, mixing up which type of inference each supports. Correct move: Always check for random assignment first for causation, random sampling second for generalization; explicitly reference the presence/absence of each in your justification.
- Wrong move: Calling a pre-grouped blocking variable a confounding variable. Why: Students confuse controlled design choices with uncontrolled sources of bias. Correct move: Remember blocking is an intentional technique to reduce variability, so blocked variables are not confounders. Only uncontrolled variables that are mixed with treatment are confounding.
- Wrong move: Claiming that replication means repeating the entire experiment multiple times, rather than having multiple units per treatment. Why: General science texts sometimes mention repeating experiments to confirm results, so students misapply the definition in AP Statistics experimental design. Correct move: In AP Statistics experimental design, replication means having multiple independent experimental units per treatment group.
- Wrong move: Forgetting that you can have causal inference without generalization, or generalization without causal inference. Why: Students assume that one implies the other, but the two are independent based on different study design choices. Correct move: Answer each inference question separately: if you have random assignment but no random sampling, causation is okay, generalization is not, and vice versa.
- Wrong move: Naming a confounding variable but not explaining how it is mixed with the treatment effect. Why: Students think just naming the variable is enough for full credit on FRQ. Correct move: Always add one sentence explaining that the confounding variable is associated with both the treatment and the response, so its effect cannot be separated from the treatment's effect.
- Wrong move: Claiming that a completely randomized design is inferior to a blocked design in all cases. Why: Students learn blocking reduces variability, so they assume blocking is always better. Correct move: Blocking is only better when the blocking variable is related to the response; blocking on an unrelated variable wastes degrees of freedom and reduces power for inference.
6. Practice Questions (AP Statistics Style)
Question 1 (Multiple Choice)
A researcher studying the effect of a new allergy drug recruits 120 adults from a local clinic who volunteer for the study. The researcher randomly assigns 60 adults to get the new drug and 60 to get a placebo. After 8 weeks, the new drug group has a statistically significant reduction in allergy symptoms compared to the placebo group. Which of the following conclusions is valid? A) The new drug causes reduced allergy symptoms in all adults at the local clinic. B) The new drug causes reduced allergy symptoms in the study volunteers, but we cannot generalize to all adults at the local clinic. C) The new drug causes reduced allergy symptoms in the volunteers, but we cannot conclude causation because the sample is not random. D) We cannot conclude causation or generalize because the study uses volunteers.
Worked Solution: First, we check for random assignment of treatments: the researcher randomly assigned the new drug and placebo to volunteers, so we can conclude causation. This eliminates options C and D, which incorrectly state we cannot conclude causation. Next, we check for random sampling: the sample consists only of volunteers from the clinic, not a random sample of all adults at the clinic, so we cannot generalize the result to all adults at the clinic. This eliminates option A. The only valid conclusion is option B. Correct answer: B
Question 2 (Free Response)
A coffee shop chain wants to test whether adding a splash of oat milk to their dark roast improves customer satisfaction compared to their traditional black dark roast. They recruit 80 regular customers who agree to participate in the study. Half of the customers prefer sweet coffee, and half prefer unsweetened coffee, since preference for sweetness is known to affect satisfaction with added milk. (a) Describe an appropriate blocked design for this study. (b) Explain why blocking is preferred over a completely randomized design here. (c) The study finds that the oat milk roast has a significantly higher average satisfaction score. Can the researchers conclude that adding oat milk caused the higher satisfaction? Justify.
Worked Solution: (a) First, separate the 80 participants into two blocks: one block of 40 customers who prefer sweet coffee, and one block of 40 customers who prefer unsweetened coffee. Within each block, randomly assign half of the customers to taste the oat milk dark roast and the other half to taste the traditional black dark roast. After tasting, collect each customer's satisfaction score on a standard 1–10 scale. (b) Blocking by sweetness preference accounts for the variability in satisfaction that is caused by pre-existing sweetness preference. By blocking, we remove this variability from the error term we use to test the treatment effect, which makes our test more powerful and our inference more precise. A completely randomized design would leave this variability in the error term, making it harder to detect a true treatment effect if it exists. (c) Yes, we can conclude causation. The study used random assignment of treatments to customers within each block, which balances all other confounding variables (like how often the customer visits the shop, preference for other coffee drinks) between the two treatment groups on average. Therefore, the difference in average satisfaction can be attributed to the addition of oat milk, so causal inference is valid.
Question 3 (Application / Real-World Style)
A marine biologist wants to test whether increased ocean acidification reduces growth rate of coral larvae. She has 40 lab-raised coral larvae, and randomly assigns 20 larvae to water with the current average ocean acidification level (control) and 20 to water with the acidification level predicted for 2100 (treatment). After 6 months, the treatment group has an average mass 15% lower than the control group, a difference that is statistically significant at the α = 0.05 level. Can the biologist make a causal inference that increased acidification reduces coral growth? Can she generalize this result to wild coral populations? Justify your answer in context.
Worked Solution: First, the biologist randomly assigned larvae to the two acidification treatments, which balances other variables that affect growth (like initial larval size, genetic variation) across groups. This means the observed difference in average mass can be attributed to the difference in acidification levels. Therefore, a causal inference that increased ocean acidification reduces coral larvae growth in this lab setting is valid. For generalization, the study uses a convenience sample of lab-raised larvae grown in controlled lab conditions, not a random sample of wild coral populations. Wild corals are exposed to other variables (like temperature fluctuations, predation, variable nutrient availability) that could interact with acidification to change growth rates. Therefore, we cannot automatically generalize this result to all wild coral populations without additional field studies. In context: This experiment provides strong causal evidence that increased acidification reduces coral growth in controlled lab conditions, which supports the hypothesis that rising ocean acidification will harm wild coral reefs, but additional research in natural settings is needed to confirm the effect in wild populations.
7. Quick Reference Cheatsheet
| Category | Rule | Notes |
|---|---|---|
| Causal Inference | Allowed if and only if treatments are randomly assigned to experimental units | Random assignment balances unobserved confounders across groups on average |
| Generalizable Inference | Allowed if and only if units are randomly sampled from the population of interest | Non-random samples (volunteers, convenience samples) do not support generalization |
| Core Experimental Design Principles | 1. Control 2. Replication 3. Randomization 4. Blocking (when needed) | Control = comparison to a control/placebo group; replication = multiple units per treatment |
| Confounding Variable | An uncontrolled variable associated with both treatment and response, whose effect is mixed with treatment effect | Only uncontrolled variables are confounders; blocked variables are controlled, not confounded |
| Blocked Design | Group units by a known pre-treatment variable related to response, then randomize within blocks | Reduces error variability, improves power to detect treatment effects |
| Completely Randomized Design | Randomize all treatments to all units with no pre-grouping | Used when no known confounding variable is measured before the experiment |
| Observational Study Inference | Only associative inference is allowed; no causal inference | Can generalize if random sampling is used, but still cannot conclude causation |
8. What's Next
This topic is the foundation for all causal inference you will do later in AP Statistics, from hypothesis testing for treatment effects to advanced observational study designs. Mastering the distinction between random assignment and random sampling, and between causal and associative inference, is critical for the experimental design question that almost always appears on the AP exam. Next, you will move into probability and sampling distributions, where you will learn how to quantify the uncertainty of inferences from studies, then later apply that to hypothesis testing for treatment effects in experiments. Without understanding what types of inferences are valid from which study designs, any statistical calculation you do will be meaningless because you cannot correctly interpret your results in context.
Random Sampling and Study Types Principles of Experimental Design Causal Inference for Observational Studies Hypothesis Testing for Two Treatment Means