Introduction to Planning a Study — AP Statistics Study Guide
For: AP Statistics candidates sitting AP Statistics.
Covers: Distinguishing populations from samples, parameters from statistics, censuses from sampling, observational studies from experiments, and identifying confounding variables in study planning.
You should already know: Basic descriptive statistics for categorical and quantitative data, basic definitions of statistical variables, how to distinguish between variable types.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. What Is Introduction to Planning a Study?
Introduction to Planning a Study is the foundational opening topic of Unit 3: Collecting Data, which accounts for 12-15% of the total AP Statistics exam weight per the College Board Course and Exam Description (CED). This topic covers all core definitions and preliminary decisions researchers make before collecting any data, answering the questions: what group do we want to learn about, what characteristic are we measuring, and what approach will we use to get reliable data? This topic appears on both the multiple-choice (MCQ) and free-response (FRQ) sections of the exam: you can expect 1-2 MCQ questions directly testing these definitions, and partial credit on a 3-5 point FRQ question asking you to evaluate or describe a study. Standard notation conventions used throughout this topic (and the rest of the course) are: for population size, for sample size, Greek letters for population parameters, and Latin letters/hats for sample statistics. Synonyms for this topic include introductory study design and study planning basics.
2. Populations, Samples, Parameters, and Statistics
The most fundamental distinction in study planning is between the group you want to learn about and the group you actually measure. A population is the entire group of interest that you want to draw conclusions about and generalize results to. It is not just the group that is easy to reach; it can include people, animals, objects, or events depending on the research question. A sample is a subset of the population that you actually collect data from, with the goal of using sample results to make inferences about the whole population.
A parameter is a numerical value that describes a specific characteristic of the entire population. Parameters are almost always unknown, because it is rarely feasible to measure every individual in a large population. A statistic is a numerical value calculated from sample data, used to estimate the unknown population parameter. Standard notation for these values follows a consistent rule: population parameters use Greek letters, while sample statistics use Latin letters or hats. For example:
- Population mean: ; Sample mean:
- Population proportion: ; Sample proportion:
- Population size: ; Sample size:
This distinction is critical, because all statistical inference later in the course relies on using a sample statistic to estimate a population parameter. Mixing these up is the most common error on introductory study design questions.
Worked Example
A city public health department wants to know what proportion of all city residents have received a flu shot this season. They randomly dial 400 residential phone numbers and find that 42% of respondents have received a flu shot. Identify the population, sample, parameter, and statistic in this context, with correct notation.
- Population: The population is all residents of the city; this is the entire group the health department wants to draw conclusions about.
- Sample: The sample is the 400 randomly selected residents who responded to the phone survey; this is the subset of the population that the department actually collected data from.
- Parameter: The parameter of interest is , the true proportion of all city residents that received a flu shot this season.
- Statistic: The statistic calculated from the sample is , the proportion of sampled residents that received a flu shot.
Exam tip: Always define these terms in the context of the problem, not just with generic definitions. AP exam readers require context for full credit, and a generic definition with no link to the problem will receive 0 points.
3. Census vs. Sampling
Once you define your population, the next decision is whether to collect data from the entire population or just a sample. A census is a study that collects data from every individual in the entire population of interest. Sampling is the practice of collecting data only from a subset (sample) of the population.
Censuses are only appropriate in specific scenarios: when the population is small and easy to access, or when you need 100% accurate data for every individual (e.g., counting the number of votes in a local election). Sampling is used far more often for four key reasons: 1) Time: measuring an entire large population is too slow, 2) Cost: sampling is far cheaper than measuring every individual, 3) Destructive testing: if testing an item destroys it (e.g., testing the lifespan of a battery, testing the strength of a bridge beam), you cannot test every item, 4) A well-designed random sample can produce extremely accurate estimates of population parameters, even for very large populations. A common misconception is that a census is always more accurate than a sample: in practice, censuses often have high nonresponse or measurement error that makes them less accurate than a well-designed sample.
Worked Example
A bakery wants to estimate the average sugar content of the chocolate chip cookies it bakes in a day. The bakery bakes 240 cookies per day. The head baker suggests testing every cookie to get the most accurate result. Is this a census or a sample? Is this plan appropriate? Justify your answer.
- This plan tests every cookie (the entire population of cookies baked that day), so it is a census.
- Testing sugar content requires destroying the cookie to send it to a lab for analysis, so this plan would leave no cookies available to sell to customers.
- Even if cost was not a concern, destroying 240 cookies would result in a total loss of revenue for the day, which is not practical.
- The appropriate plan is to take a random sample of 5-10 cookies, test those, and use the sample average to estimate the average sugar content of all cookies.
Exam tip: If an AP question asks whether a census is appropriate, always check for destructive testing first. It is the most common reason a census is impossible, and it is the most frequently tested scenario for this concept.
4. Observational Studies vs. Experiments
The next core distinction in study planning is between two broad types of studies, based on whether the researcher imposes an intervention. An observational study is a study where the researcher only observes individuals and measures variables of interest, without imposing any treatment or intervention on the subjects. The goal of an observational study is usually to describe a population or find an association between two variables. An experiment is a study where the researcher intentionally imposes a treatment (a specific condition or change) on subjects, to measure the effect of the treatment on a response variable.
The key difference between the two is that experiments assign treatments, while observational studies just observe existing behavior or characteristics. A critical consequence of this difference is that only well-designed experiments can support causal (cause-and-effect) conclusions. Observational studies can only show association, because of the risk of confounding variables. A confounding variable is a variable that is related to both the explanatory variable (the variable you think affects the response) and the response variable, making it impossible to tell which variable is actually causing changes in the response.
Worked Example
A researcher wants to test whether daily meditation reduces self-reported stress levels in working adults. The researcher recruits 300 working adults, asks them whether they already meditate regularly or not, then compares the average stress levels of the two groups. Is this an observational study or an experiment? Can the researcher conclude that meditation causes reduced stress? Explain.
- The researcher did not assign the treatment (daily meditation) to participants; participants already chose whether they meditate or not, and the researcher only observed their existing behavior and stress levels.
- Therefore, this is an observational study, not an experiment.
- This study is at risk of confounding variables: for example, adults who choose to meditate regularly may also have more flexible work schedules or higher incomes, both of which could reduce stress levels independent of meditation.
- Because this is an observational study, the researcher cannot conclude that meditation causes reduced stress; they can only conclude that there is an association between meditation and lower stress, if any is observed.
Exam tip: If an AP question asks whether a study can support a causal conclusion, the answer is only "yes" if the study is a randomized experiment. Any other study design (observational, census, sample survey) cannot support causal conclusions.
5. Common Pitfalls (and how to avoid them)
- Wrong move: Calling a population parameter a statistic (e.g., saying is the population mean instead of the sample mean). Why: Students mix up Greek vs Latin notation, or forget that parameters describe the whole population, not the sample. Correct move: When asked to identify either, always ask: "Is this describing the entire group I care about, or just the subset I measured?" If it's the entire group, it's a parameter; if it's the sample, it's a statistic.
- Wrong move: Claiming a census is always the better study design. Why: Students assume more data is always better, without considering practical constraints. Correct move: When evaluating whether a census is appropriate, always check for destructive testing, cost, time, and population size before concluding it's better.
- Wrong move: Calling an observational study an experiment because the researcher split the sample into two groups. Why: Students think any grouping means an experiment, but grouping based on existing characteristics is not treatment assignment. Correct move: Always check: did the researcher assign the treatment to the subjects, or did subjects choose their own group? If subjects self-select, it's observational, not experimental.
- Wrong move: Concluding causation from an observational study. Why: Students forget that association does not equal causation, and think any study with a comparison group can show causation. Correct move: Only allow a causal conclusion if the study is explicitly described as a randomized experiment with random assignment of treatments to subjects.
- Wrong move: Defining the population as "the people we surveyed" instead of the entire group of interest. Why: Students confuse the sample (which you actually measure) with the population (which you want to learn about). Correct move: Always define the population first, before the sample, when answering identification questions; the population is the group you want to generalize to, not the group you measured.
6. Practice Questions (AP Statistics Style)
Question 1 (Multiple Choice)
A botanist wants to estimate the average height of mature giant sequoia trees growing in Yosemite National Park. She randomly selects 32 mature giant sequoia trees from across the park and measures their height, then calculates the average height of the 32 trees. Which of the following correctly identifies the population and statistic in this study?
A) Population: all mature trees in Yosemite National Park; Statistic: the average height of all mature giant sequoia trees B) Population: all mature giant sequoia trees in Yosemite National Park; Statistic: the average height of the 32 selected trees C) Population: all giant sequoia trees in California; Statistic: the average height of the 32 selected trees D) Population: the 32 selected giant sequoia trees; Statistic: the average height of the 32 selected trees
Worked Solution: First, the botanist is interested specifically in mature giant sequoia trees in Yosemite National Park, so the population cannot include all mature trees, all sequoias in California, or just the 32 selected trees. This eliminates options A, C, and D. The average height of the 32 selected trees is calculated from sample data, so it is a statistic, and the population matches the description in B. The correct answer is B.
Question 2 (Free Response)
A university wants to know what proportion of its undergraduate students support a new increase in student activity fees to fund a new gym. Answer the following: (a) Identify the population and parameter of interest in this study, in context. (b) The university president suggests surveying every single undergraduate student to get the most accurate result. What type of study is this, and give one reason this plan may not be practical. (c) A researcher compares the support for the fee increase between first-year students and fourth-year students, based on their existing class year. Is this an observational study or an experiment? Can the researcher conclude that class year causes a difference in support for the fee increase? Explain.
Worked Solution: (a) Population: All undergraduate students currently enrolled at the university. Parameter of interest: , the true proportion of all undergraduate students that support the new student activity fee increase. (b) This is a census, because it attempts to collect data from every individual in the population of interest. This plan is not practical because: Many students will not respond to the survey, leading to high nonresponse bias that can make the results less accurate than a well-designed random sample. Alternately, surveying tens of thousands of students requires a large amount of staff time and money, which is unnecessary for an accurate estimate. (c) This is an observational study, because the researcher did not assign students to be first-year or fourth-year; class year is an existing characteristic that the researcher just observes. Because this is an observational study, there are potential confounding variables (for example, fourth-year students will not be enrolled when the gym is completed, so their lower support is caused by their graduation timeline, not their class year itself), so the researcher cannot conclude that class year causes a difference in support. Only a randomized experiment can support a causal conclusion.
Question 3 (Application / Real-World Style)
A fireworks manufacturer wants to estimate the average height that its 5-inch consumer aerial fireworks reach when launched. The manufacturer produced 120,000 of these fireworks this year. Explain why a census is inappropriate for this study, what type of study should be used instead, and whether this requires an experiment or an observational study.
Worked Solution: A census would require launching and measuring the height of every single one of the 120,000 fireworks produced this year. Launching a fireworks destroys it, so none of the tested fireworks could be sold, costing the manufacturer millions of dollars in lost revenue. This makes a census completely inappropriate. Instead, the manufacturer should take a random sample of 100-200 fireworks from the production run. This requires an experiment, because the manufacturer intentionally launches the fireworks (imposes the treatment of ignition) to measure the response variable of maximum height. In context: A sample experiment will give the manufacturer an accurate estimate of the average launch height for all fireworks at a tiny fraction of the cost of a census.
7. Quick Reference Cheatsheet
| Category | Definition/Notation | Notes |
|---|---|---|
| Population | Entire group of interest | The group you want to generalize conclusions to, not just the group you measure |
| Sample | Subset of the population you actually collect data from | Used to make inferences about the population |
| Population Parameter | Number describing a population characteristic | Notation: (size), (mean), (proportion); almost always unknown |
| Sample Statistic | Number calculated from sample data | Notation: (size), (mean), (proportion); used to estimate unknown parameters |
| Census | Study that measures every individual in the population | Only appropriate for small, non-destructive studies; rarely used for large populations |
| Observational Study | Study where researchers measure variables without imposing treatments | Can only show association between variables, not causation |
| Experiment | Study where researchers intentionally assign treatments to subjects | Only well-designed randomized experiments can support causal conclusions |
8. What's Next
Introduction to Planning a Study is the foundational prerequisite for all other topics in Unit 3 Collecting Data, and for all statistical inference for the rest of the AP Statistics course. If you cannot correctly distinguish parameters from statistics, or observational studies from experiments, you will not be able to evaluate study designs or interpret inference results on later topics. Immediately after this topic, you will apply the core definitions from this chapter to evaluate different sampling methods and identify sources of sampling bias, then learn to design full randomized experiments. This topic feeds into the core AP Statistics big idea that the quality of your study design determines whether your conclusions are valid, regardless of how sophisticated your statistical analysis is.
Follow-on topics: Sampling Methods and Bias Experimental Design Introduction to Statistical Inference