Skills Focus: Selecting an Inference Procedure — AP Statistics Study Guide
For: AP Statistics candidates sitting AP Statistics.
Covers: Matching research questions to inference for regression slopes, distinguishing slope inference from other t-procedures, checking conditions for inference on slopes, selecting between confidence intervals and hypothesis tests for slopes.
You should already know: Least squares regression line calculation and interpretation. Fundamentals of confidence intervals and hypothesis testing. Conditions for inference in regression.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. What Is Skills Focus: Selecting an Inference Procedure?
This skill requires you to identify, justify, and select the correct inference method for a given research question involving the slope of a least squares regression line, rather than just calculating results for a pre-specified procedure. Per the AP Statistics Course and Exam Description (CED), Unit 9 (Inference for Quantitative Data: Slopes) makes up 12-15% of the total AP exam score, and selecting an inference procedure is tested on both multiple-choice (MCQ) and free-response (FRQ) sections. On MCQs, it typically appears as a standalone question asking which procedure is appropriate for a given context. On FRQs, it is almost always the first part of a multi-part regression question, requiring you to name and justify your procedure before completing calculations. Standard notation for this topic uses for the true unknown population slope and for the sample slope calculated from observed data. This skill is prioritized by the AP because it assesses whether you understand when to use regression inference, rather than just memorizing calculation steps.
2. Matching Research Goals to Inference Type for Slopes
The first step in selecting an inference procedure is to identify your parameter of interest and your research goal. For regression contexts with two quantitative variables measured on the same observational unit, the parameter of interest is almost always the true population slope , which represents the average change in the response variable for every one-unit increase in the explanatory variable . There are two core types of inference for :
- Hypothesis test for slope: Used when you want to test a claim about the value of , most commonly testing whether there is any statistically significant linear relationship between and . The default null hypothesis for this test is , since a slope of 0 means no linear relationship.
- Confidence interval for slope: Used when you want to estimate the true value of with a range of plausible values, rather than testing a specific claim. If the question uses words like "estimate", "approximate", or "give a range for", it is almost always asking for a confidence interval.
Worked Example
A marine biologist wants to determine whether there is a statistically significant linear relationship between ocean surface temperature (°C, x) and coral growth rate (cm/year, y). They collect a random sample of 28 coral colonies across the Great Barrier Reef and fit a least squares regression line. Which inference procedure is appropriate for this question?
- Identify the parameter of interest: The true population slope relating ocean temperature to coral growth rate, since we have two quantitative variables measured on each coral colony.
- Identify the research goal: The question asks to test for the existence of a statistically significant linear relationship, which is a claim about , not an estimation of 's value.
- Eliminate inappropriate procedures: Inference for means, proportions, or other parameters do not address a slope, so these are ruled out. A confidence interval would be used for estimation, not hypothesis testing.
- Conclusion: The appropriate procedure is a t-test for the slope of a regression line.
Exam tip: Always identify the parameter of interest first. If the parameter is a slope, you can immediately eliminate all non-regression inference options on multiple-choice questions.
3. Distinguishing Slope Inference from Other Inference Procedures
A common source of error on the AP exam is confusing slope inference with other similar inference procedures that also use t-tests. It is critical to distinguish these based on context:
- Slope vs. two-sample difference of means: A two-sample t-procedure is used when you have one categorical explanatory variable (two groups) and one quantitative response. Slope inference is used when you have two quantitative variables, measuring the change in y per unit change in x.
- Slope vs. confidence interval for mean response: A confidence interval for the mean response estimates the average value of y at a specific fixed value of x, while a confidence interval for the slope estimates the change in y per unit change in x.
- Slope vs. z-procedures: All inference for slopes uses t-procedures, because the population standard deviation of the sampling distribution of the slope is always unknown and estimated from sample data, just like with inference for means.
Worked Example
A real estate analyst collects data on 45 randomly selected single-family homes for sale in a city, recording square footage (x) and listing price (y, thousands of dollars). She wants to estimate the average increase in listing price for each additional 100 square feet of space. Which inference procedure is appropriate?
- Parameter of interest: The average change in listing price per 100 additional square feet, which is the population slope of the regression of price on square footage.
- Eliminate wrong procedures: A two-sample t-interval for difference of means would compare mean price for small vs. large homes, not estimate the change per unit size. A t-interval for mean response would estimate the average price for a home of a specific size, not the change per square foot.
- Confirm research goal: The question asks to estimate the value of the slope, which requires a confidence interval rather than a hypothesis test.
- Conclusion: The appropriate procedure is a t-confidence interval for the slope of a regression line.
Exam tip: If you have two quantitative variables measured on the same observational unit, you are almost certainly dealing with slope inference, so you can immediately eliminate any procedures for comparing means across groups.
4. Verifying Conditions to Justify Procedure Selection
Selecting an inference procedure on the AP exam is not just naming the correct type—you must also confirm that the conditions for that procedure are met to earn full credit. For all inference on slopes, the conditions are remembered by the acronym LINE:
- Linear: The true relationship between x and y is linear. Check this with a residual plot; if there is no curved pattern, the condition is met.
- Independent: Observations are independent of each other. Check this by confirming random sampling/assignment and the 10% condition if sampling without replacement.
- Normal: The residuals are approximately normally distributed around the regression line. Check this with a normal probability plot of residuals, or rely on the Central Limit Theorem for large samples.
- Equal Variance: The spread of residuals is constant across all values of x. Check this with a residual plot; if there is no fan shape (increasing or decreasing spread), the condition is met.
If any condition is severely violated, you cannot justify using the selected inference procedure, even if it matches the research goal.
Worked Example
A business owner collects 30 consecutive months of data on monthly customer foot traffic (x) and monthly profit (y, thousands of dollars). They fit a regression line, and when plotting residuals against foot traffic, they notice the residuals get much more spread out as foot traffic increases. All other conditions (linearity, independence, normality) are met. Can the owner justify using a t-test for the slope to test for a linear relationship between foot traffic and profit?
- Recall the four conditions for inference on a slope: Linear, Independent, Normal, Equal Variance.
- The problem states that three of the four conditions are satisfied, but the residual plot shows increasing spread of residuals as x increases.
- This pattern directly violates the equal variance condition. When equal variance is violated, the standard error of the slope is biased, leading to unreliable p-values and inference.
- Conclusion: The owner cannot justify selecting the t-test for slope inference in this context.
Exam tip: On FRQs, always explicitly state that you have checked the relevant conditions and they are satisfied before confirming your procedure choice—this is almost always required for full credit.
5. Common Pitfalls (and how to avoid them)
- Wrong move: Selecting a two-sample t-procedure for difference in means when asked about the relationship between two quantitative variables measured on the same unit. Why: Students confuse comparing two groups defined by a categorical x with measuring a linear relationship between a continuous x and continuous y. Correct move: Count variables per unit: if you have one quantitative x and one quantitative y per unit, the parameter is a slope, so use slope inference.
- Wrong move: Selecting a hypothesis test for slope when the question asks to estimate the value of the true slope. Why: Students mix up the goal of testing (assess evidence for a relationship) and estimation (get a range for the slope size). Correct move: Circle the verb in the question: "test", "evidence" = hypothesis test; "estimate", "range" = confidence interval.
- Wrong move: Confusing a confidence interval for the slope with a confidence interval for the mean response at a given x. Why: Both use regression output, so students mix up what parameter is being estimated. Correct move: Ask: are we estimating a change in y per change in x (slope) or the value of y at a specific x (mean response)?
- Wrong move: Ignoring a violated condition when selecting an inference procedure, just picking the right type regardless of conditions. Why: Students think selecting a procedure is only naming it, not justifying it, which is required on AP FRQs. Correct move: Always check all four LINE conditions before confirming your procedure choice, and reject the procedure if any condition is severely violated.
- Wrong move: Using z-procedures for inference on a slope because the sample size is large. Why: Students default to z for large samples, but the population standard deviation of the slope is always unknown. Correct move: All inference for slopes uses t-procedures with degrees of freedom, never z-procedures, regardless of sample size.
6. Practice Questions (AP Statistics Style)
Question 1 (Multiple Choice)
A fitness researcher studies the relationship between weekly time spent doing cardio (hours, x) and body fat percentage (y) for 62 randomly selected adults. She wants to estimate how much body fat percentage changes, on average, for each additional hour of weekly cardio. Which of the following is the most appropriate inference procedure? A) One-sample t-test for a population mean B) Two-sample t-test for the difference in mean body fat between two groups C) t-test for the slope of a regression line D) t-confidence interval for the slope of a regression line
Worked Solution: First, we have two quantitative variables (cardio time and body fat percentage) measured on each adult, so the parameter of interest is the population slope , eliminating options A and B which are for inference on means, not slopes. Next, the research goal is to estimate the size of the average change in body fat per hour of cardio, which requires an interval estimation, not a hypothesis test. This eliminates option C. The correct answer is D.
Question 2 (Free Response)
An agricultural scientist studies the relationship between annual fertilizer application (pounds per acre, x) and corn yield (bushels per acre, y) on 22 randomly selected test plots in Iowa. (a) The scientist wants to test whether there is a positive linear relationship between fertilizer application and corn yield. What inference procedure is appropriate? Justify your choice. (b) Before conducting inference, the scientist checks a residual plot, finding no curved pattern and no change in residual spread across all fertilizer levels. A normal probability plot of residuals is roughly linear, and plots are independent of each other. What does this tell you about the appropriateness of your selected procedure? (c) State the null and alternative hypotheses for this test, using correct statistical notation.
Worked Solution: (a) We have two quantitative variables (fertilizer and yield) measured on each test plot, so the parameter of interest is the true population slope relating fertilizer to yield. The research goal is to test a claim about whether the slope is positive, which requires a hypothesis test. The appropriate procedure is a t-test for the slope of a least squares regression line. (b) All four LINE conditions for inference on a slope (Linear, Independent, Normal, Equal Variance) are satisfied. This means the selected t-test procedure is appropriate and reliable for inference. (c) The null hypothesis is that there is no linear relationship, and the alternative is that the slope is positive, matching the research question:
Question 3 (Application / Real-World Style)
A high school counselor studies the relationship between average weekly hours of after-school tutoring (x) and final AP Statistics exam score (y, 1-5 scale) for 19 randomly selected students who took the exam. She wants to know if there is statistically significant evidence that more tutoring is associated with a higher exam score, on average. Residual analysis confirms all LINE conditions are met, and students were randomly selected with independent observations. Select the appropriate inference procedure and explain why it fits this context.
Worked Solution: We have two quantitative variables (weekly tutoring hours and exam score) measured on each student, so the parameter of interest is the true population slope relating tutoring hours to exam score. The research goal is to test for a significant positive linear relationship, which requires a hypothesis test rather than an interval estimate. All conditions for inference on a slope are satisfied per the problem description. The appropriate procedure is a one-sided t-test for the slope of a regression line with degrees of freedom. This test will allow the counselor to determine if the observed positive slope in the sample is statistically significant evidence of a true positive relationship in the population of similar students.
7. Quick Reference Cheatsheet
| Category | Formula / Notation | Notes |
|---|---|---|
| True population slope | Average change in response per 1-unit increase in explanatory | |
| Sample slope statistic | Calculated from sample least squares regression | |
| Degrees of freedom for slope inference | Used for all t-procedures for slopes | |
| Hypotheses for test of no linear relationship | (or one-sided per question) | Used when testing for a significant linear relationship |
| Confidence interval for slope | Used when estimating the true value of the population slope | |
| Conditions for slope inference | LINE | Linear: no curved pattern in residuals; Independent: random sampling, 10% condition; Normal: residuals approximately normal; Equal Variance: constant residual spread |
| t-test statistic for slope | for most tests of no linear relationship | |
| Inference distribution | t-distribution | Always use t, never z, for slope inference |
8. What's Next
This chapter gives you the foundation to select and justify the correct inference procedure for regression slopes, the core assessed skill for Unit 9 on the AP exam. Next, you will deepen your understanding by calculating confidence intervals and conducting hypothesis tests for regression slopes, applying the procedures you learned to select here. Without being able to correctly match the research question to the right inference procedure, you cannot earn full credit on regression inference FRQs, which account for most of the Unit 9 exam score. This skill also connects to the broader AP Statistics goal of matching analytical methods to research questions, a skill tested across all units of the exam.