Confidence Intervals for the Slope of a Regression Model — AP Statistics Study Guide
For: AP Statistics candidates sitting AP Statistics.
Covers: Conditions for inference for a regression slope, point estimation of the true population slope, standard error of the slope, confidence interval construction, interval interpretation in context, and exam-specific problem-solving strategies for both MCQ and FRQ.
You should already know: Simple linear regression least squares estimation, basic confidence interval construction for means/proportions, properties of the t-distribution.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. What Is Confidence Intervals for the Slope of a Regression Model?
When we fit a least squares regression line to a random sample of bivariate quantitative data, we calculate a sample slope , which is a point estimate of the true unknown population slope (Greek beta) that describes the linear relationship between the explanatory variable and response variable for the entire population. Due to sampling variability, will almost never equal exactly, so we construct a confidence interval to quantify this uncertainty: the interval gives a range of plausible values for the true population slope . According to the AP Statistics CED, this topic makes up approximately 12% of the exam’s total weight within Unit 9 (Inference for Quantitative Data: Slopes), and appears regularly on both multiple-choice (MCQ) and free-response (FRQ) sections of the exam. It is most commonly tested as a conceptual multiple-choice question about interpretation or conditions, or as a 2-3 point part of a multi-part FRQ. Common alternative names include confidence interval for the population regression slope or interval estimate for a linear regression slope.
2. Conditions for Inference for Regression Slope
Before constructing a confidence interval for the slope, you must verify four core conditions for inference, commonly remembered by the acronym L.I.N.E.:
- Linear: The true relationship between and is linear, meaning the mean of at any value of falls exactly on the population regression line , with no underlying curvature. You check this by looking for no visible curvature in a scatterplot of the raw data or a plot of residuals versus .
- Independent: Individual observations are independent of one another. This is confirmed by study design: you must have a random sample from the population or a randomized experiment. If sampling without replacement, the 10% condition applies: the sample size must be less than 10% of the total population size.
- Normal: The residuals (the difference between observed and predicted ) are normally distributed with a mean of 0 at every value of . For small samples, you check this with a roughly linear normal probability plot of residuals. For large samples, the Central Limit Theorem reduces concerns about non-normality, but extreme outliers or skewness still violate the condition.
- Equal Variance (Homoscedasticity): The standard deviation of the residuals is constant across all values of . You check this by confirming the spread of residuals in a residual plot is consistent, with no fanning in or out.
Worked Example
A urban planner studies the relationship between the number of bike racks per city block (x) and the number of cyclists parked per day on the block (y). She randomly selects 28 blocks from a city with 420 total blocks. A residual plot shows residuals scattered randomly around 0, with no curvature and consistent spread across all values of x. A normal probability plot of residuals is roughly linear, with no extreme deviations. Are conditions for a confidence interval for the slope met? Justify your answer.
- Linear Condition: The residual plot shows no visible curvature, so the linear condition is satisfied for this study.
- Independent Condition: The sample is random, and 28 blocks is less than 10% of 420 total blocks, so the independence condition is satisfied.
- Normal Condition: The normal probability plot of residuals is roughly linear, so the normality of residuals condition is satisfied.
- Equal Variance Condition: The residual plot has consistent spread across all x values, with no fanning, so the equal variance condition is satisfied. All conditions for constructing a confidence interval for the slope are met.
Exam tip: On the AP exam, you must name and check every condition in the context of the problem to earn full credit. Only listing the acronym L.I.N.E. without context-specific checking will earn zero points for the condition check.
3. Calculating the Confidence Interval for the Slope
The confidence interval for the true population slope follows the same general structure as any confidence interval: point estimate ± critical value × standard error of the estimate. For regression slopes, we use the t-distribution for the critical value because we never know the population standard deviation of the residuals, so we estimate it from sample data.
The formula for the confidence interval is: Where:
- = the sample slope from the least squares regression fit
- = the critical t-value for the desired confidence level, with degrees of freedom (we subtract 2 because we estimate two population parameters: the intercept and slope )
- = the standard error of the sample slope, which measures how much the sample slope varies due to sampling variability. On the vast majority of AP exam questions, is given directly in regression output, so you will rarely need to calculate it by hand. If you do need to calculate it by hand, the formula is , where is the standard deviation of residuals and is the standard deviation of the explanatory variable .
Worked Example
A fitness researcher fits a regression model to predict maximum oxygen uptake (y, in mL/kg/min) from resting heart rate (x, in beats per minute) for 25 randomly selected adults. Partial regression output from the fit is below:
| Predictor | Coef | SE Coef |
|---|---|---|
| Intercept | 60.2 | 5.1 |
| Resting Heart Rate | -0.28 | 0.08 |
| Construct a 90% confidence interval for the true slope of the regression model. |
- Identify key values: , , .
- Find the critical t-value: For 90% confidence and , from the t-table.
- Calculate the margin of error: .
- Construct the interval: . The 90% confidence interval for the true slope is .
Exam tip: When reading regression output, always mark the slope row for your explanatory variable immediately after reading the question. Many students accidentally use the standard error of the intercept instead of the slope, costing easy points on the exam.
4. Interpreting a Confidence Interval for the Slope
Interpretation of a confidence interval for the slope is one of the most frequently tested skills on the AP exam, and requires two key components for full credit: a correct statement of confidence, and context-specific description of what the slope means.
The standard correct template for interpretation is: "We are [C]% confident that the interval from [lower bound] to [upper bound] captures the true change in the mean [response variable, with units] per one-unit increase in [explanatory variable, with units]."
A key inference connection: If the confidence interval does not contain 0, that means 0 is not a plausible value for the true slope . At the significance level (e.g., for 95% confidence), this means we have statistically significant evidence of a linear relationship between and . If the interval does contain 0, 0 is a plausible value for , so we do not have significant evidence of a linear relationship.
Worked Example
Interpret the 90% confidence interval from the previous fitness example, where x is resting heart rate (beats per minute) and y is maximum oxygen uptake (mL/kg/min). Then explain what the interval suggests about the relationship between the two variables.
- Interpretation: We are 90% confident that for each 1 beat per minute increase in resting heart rate, the mean maximum oxygen uptake decreases by between 0.14 mL/kg/min and 0.42 mL/kg/min.
- Inference conclusion: 0 is not contained in this 90% confidence interval, so 0 is not a plausible value for the true slope. This means we have statistically significant evidence at the level of a negative linear relationship between resting heart rate and maximum oxygen uptake.
Exam tip: Never use the wording "there is a 95% chance the true slope is in the interval". The true slope is a fixed constant, it is either in the interval or not; the 95% refers to the percentage of all random samples that would produce an interval capturing the true slope. This common mistake will cost you points on the AP exam.
5. Common Pitfalls (and how to avoid them)
- Wrong move: Using degrees of freedom instead of , or using a z-critical value instead of a t-critical value. Why: Students confuse regression inference with one-sample t-intervals for means, where , or default to z out of habit from proportion inference. Correct move: Always use t for slope confidence intervals, and always subtract 2 from for degrees of freedom, because we estimate two population parameters (intercept and slope) from the sample.
- Wrong move: Using the standard error of the intercept instead of the slope when calculating the interval, from regression output. Why: Outputs list standard error for both terms, and students often scan the wrong row when working quickly on the exam. Correct move: Immediately circle the row for your explanatory variable (the slope row) when you get the output, so you never grab the wrong standard error.
- Wrong move: Interpreting the interval as "we are 95% confident the sample slope is between a and b". Why: Students confuse the known sample slope with the unknown population slope we are trying to estimate. Correct move: Always explicitly reference the true population slope or the true change in mean response in your interpretation, never the sample slope.
- Wrong move: Forgetting to include the word "mean" when describing the change in the response variable. Why: Students remember the slope describes change in y, but forget that regression models the mean y at each value of x. Correct move: Always include "mean" before the response variable in your interpretation, e.g., "the mean maximum oxygen uptake decreases by...".
- Wrong move: Claiming a confidence interval containing 0 proves the true slope is 0. Why: Students confuse "no evidence of a relationship" with "evidence of no relationship". Correct move: If 0 is in the interval, only state that 0 is a plausible value for the true slope, and we do not have statistically significant evidence of a linear relationship. We can never prove the true slope is exactly 0.
- Wrong move: Only listing the L.I.N.E. acronym when checking conditions, with no context. Why: Students memorize the acronym and forget that AP requires context-specific checking to earn credit. Correct move: For each condition, explicitly connect your check to the problem's context, e.g., "the residual plot shows no curvature, so the linear condition is met for this study of resting heart rate and oxygen uptake".
6. Practice Questions (AP Statistics Style)
Question 1 (Multiple Choice)
An agricultural scientist studies the relationship between annual rainfall (x, in inches) and corn yield (y, in bushels per acre) for farms in a region. A random sample of 31 farms gives a 95% confidence interval for the slope of (1.8, 4.2). Which of the following is the correct interpretation of this interval?
A) 95% of all farms have a corn yield increase between 1.8 and 4.2 bushels per inch of rainfall. B) We are 95% confident that for each additional inch of annual rainfall, the mean corn yield increases by between 1.8 and 4.2 bushels per acre. C) There is a 95% probability that the true slope is between 1.8 and 4.2 bushels per acre per inch. D) We are 95% confident that the slope of the sample regression line is between 1.8 and 4.2.
Worked Solution: Eliminate incorrect options one by one. Option A describes the distribution of individual yields, not the slope of the regression relationship between rainfall and mean yield, so it is wrong. Option C uses incorrect probability wording: the true slope is a fixed constant, not a random variable, so it cannot have a probability of being in the interval. Option D refers to the sample slope, which we know exactly from the sample data, so it is wrong. Only Option B follows the correct interpretation template, with the right confidence statement and context. The correct answer is B.
Question 2 (Free Response)
A movie theater owner studies the relationship between the ticket price (x, in dollars) and the number of tickets sold for a Saturday evening show (y). He collects data from 15 randomly selected Saturday evening shows over the past year. Partial regression output is below:
| Predictor | Coef | SE Coef |
|---|---|---|
| Intercept | 412 | 35 |
| Ticket Price | -28.4 | 8.1 |
(a) Assume all conditions for inference are met. State what conditions need to be checked for this context, and confirm they are satisfied for this problem (theater has 52 annual Saturday evening shows, residual plots and normal probability plots confirm no violations). (b) Construct a 95% confidence interval for the true slope. (c) Interpret your interval in context, and explain what conclusion you can draw about the relationship between ticket price and number of tickets sold.
Worked Solution: (a) The four required conditions are: 1. Linear: No curvature in the residual plot, confirmed. 2. Independent: Random sample of 15 shows, which is less than 10% of 52 total shows, confirmed. 3. Normal: Normal probability plot of residuals is linear, confirmed. 4. Equal Variance: Residuals have constant spread, confirmed. All conditions are met. (b) , , , for 95% confidence. Margin of error: . Confidence interval: . (c) Interpretation: We are 95% confident that for each \alpha = 0.05$ level of a negative linear relationship between ticket price and number of tickets sold.
Question 3 (Application / Real-World Style)
A marine biologist studies the relationship between water temperature (x, in °C) at 100 meters depth and the number of coral colonies (y) observed in 1-square-meter quadrats on a reef. She collects data from 27 randomly selected quadrats on the Great Barrier Reef. She calculates a sample slope of with a standard error of the slope of . Construct a 99% confidence interval for the true slope, and interpret it in context.
Worked Solution: Degrees of freedom: . For 99% confidence, for . Margin of error: . Confidence interval: . Interpretation in context: We are 99% confident that for each 1°C increase in water temperature at 100 meters depth, the mean number of coral colonies per square meter changes by between -3.75 and 0.15 colonies. Because 0 is in the interval, we do not have statistically significant evidence at the level of a linear relationship between water temperature and coral colony count in this study.
7. Quick Reference Cheatsheet
| Category | Formula | Notes |
|---|---|---|
| Inference Conditions | L.I.N.E | Linear (no curvature), Independent (random sample, 10% condition), Normal (residuals normal), Equal Variance (constant residual spread) |
| Degrees of Freedom | Subtract 2 for the two estimated parameters (intercept and slope) | |
| Confidence Interval for Slope | = sample slope, = critical t-value, = standard error of the slope (usually given in output) | |
| Hand-Calculated | = standard deviation of residuals, = standard deviation of x | |
| Interpretation Template | "We are % confident that for each 1-unit increase in [x, units], the mean [y, units] changes by between and [y units]." | Always reference the true population slope, never the sample slope |
| Significance Check | 0 not in interval → significant linear relationship at | 0 in interval → 0 is plausible, no significant evidence of a linear relationship |
| Notation | = sample slope, = true population slope | Never confuse sample estimates with population parameters |
8. What's Next
This chapter gives you the foundation for all inference for regression slopes, the core of Unit 9 in the AP Statistics CED. Immediately after mastering confidence intervals for the slope, you will move on to hypothesis tests for the slope of a regression model, which use the same conditions, standard error, and degrees of freedom you learned here. Without understanding how to construct and interpret a confidence interval for the slope, you will not be able to connect interval estimation to significance testing for regression, a common multi-point task on AP FRQs. This topic also extends your earlier understanding of confidence intervals from univariate to bivariate quantitative data, building intuition for inference that applies across all statistical contexts.
Hypothesis Tests for Regression Slope Least Squares Regression Confidence Intervals for Population Means