College Board · cb-statistics · AP Statistics · Inference for Quantitative Data: Slopes · 16 min read · Updated 2026-05-07

Inference for Quantitative Data: Slopes — AP Statistics Stats Study Guide

For: AP Statistics candidates sitting AP Statistics.

Covers: Sampling distribution of the sample slope $b$ , confidence intervals for the true population slope $β$ , hypothesis tests for slope significance, and the four LINE conditions required for valid regression inference.

You should already know: Algebra 2, basic probability intuition.

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official College Board mark schemes for grading conventions.

1. What Is Inference for Quantitative Data: Slopes?

Inference for regression slopes is the set of statistical methods used to draw conclusions about the true linear relationship between two quantitative variables in a population, using data from a random sample. The core idea is that the slope of a sample least squares regression line ( $b$ , our point estimate) can be used to make claims about the unknown true slope of the population regression line ( $β$ ).

This topic falls under Unit 9 of the AP Statistics CED, accounting for 12-15% of your total exam score, and is frequently tested in the free-response section, often as part of the final investigative task (FRQ 6). Common synonyms for this topic include regression slope inference and inference for bivariate least squares regression.

2. Sampling distribution of slope $b$

When you collect repeated random samples of the same size from a population, fit a least squares regression line to each sample, and record the slope of each line, the distribution of these sample slopes is called the sampling distribution of $b$ . If inference conditions are met, this distribution has three key properties:

Unbiased center: The mean of the sampling distribution $μ_{b} = β$ , meaning the sample slope is an unbiased estimator of the true population slope.
Standard deviation: The true standard deviation of the sampling distribution is $σ_{b} = \frac{σ}{σ _{x} n - 1}$ , where $σ$ is the population residual standard deviation, $σ_{x}$ is the population standard deviation of the explanatory variable $x$ , and $n$ is sample size.
Shape: The sampling distribution is approximately normally distributed, provided the normality condition for inference is satisfied.

In practice, we almost never know the population parameters $σ$ and $σ_{x}$ , so we use sample statistics to calculate the standard error of the slope, the estimate of the spread of the sampling distribution: $S E_{b} = \frac{s}{s _{x} n - 1}$ Where $s = \frac{\sum ( y _{i} - y ^ _{i} ) ^{2}}{n - 2}$ is the sample residual standard error, and $s_{x}$ is the sample standard deviation of the explanatory variable $x$ . Note that degrees of freedom for all slope inference is $df = n - 2$ , because we estimate two parameters (intercept and slope) when fitting a regression line, losing 2 degrees of freedom.

Worked Example

A sample of 18 students is used to fit a regression line linking number of practice tests taken ( $x$ ) to AP Stats exam score ( $y$ ). You are given $s = 4.2$ , $s_{x} = 1.5$ . Calculate the standard error of the slope.

First calculate $n - 1 = 17 \approx 4.123$
Denominator = $1.5 * 4.123 = 6.185$
$S E_{b} = \frac{4.2}{6.185} \approx 0.679$

3. Confidence interval for true slope $β$

A confidence interval for the true population slope $β$ gives a range of plausible values for the average change in the response variable $y$ for every 1-unit increase in the explanatory variable $x$ . We use a $t$ -distribution for this interval, because we use the standard error $S E_{b}$ instead of the true standard deviation $σ_{b}$ .

The formula for a $C %$ confidence interval for $β$ is: $b \pm t^{*} \times S E_{b}$ Where $t^{*}$ is the critical $t$ -value for your chosen confidence level with $df = n - 2$ .

Key Interpretation Rule

When interpreting the interval, always use context: "We are [C]% confident that the true average change in [response variable] for each 1-unit increase in [explanatory variable] is between [lower bound] and [upper bound] units." Examiners will deduct marks if you fail to reference the context of the variables, or if you incorrectly interpret the interval as applying to individual observations rather than the average population trend.

Worked Example

The sample regression line for the 18 students from the earlier example has slope $b = 2.7$ , $S E_{b} = 0.679$ . Calculate and interpret a 95% confidence interval for the true slope.

Calculate degrees of freedom: $df = 18 - 2 = 16$
Find $t^{*}$ for 95% confidence and $df = 16$ : 2.120 (from $t$ -table or calculator)
Calculate margin of error: $2.120 * 0.679 \approx 1.44$
Final interval: $2.7 \pm 1.44 = (1.26, 4.14)$
Interpretation: We are 95% confident that each additional practice test taken is associated with an average increase in AP Statistics exam score between 1.26 and 4.14 points.

4. Hypothesis test for slope

Hypothesis tests for slope are almost always used to evaluate if there is a statistically significant linear relationship between the two variables. The most common test uses the following hypotheses:

Null hypothesis $H_{0} : β = 0$ : There is no linear relationship between $x$ and $y$ in the population.
Alternative hypothesis $H_{a} : β \neq = 0$ (two-tailed, default): There is a linear relationship between $x$ and $y$ in the population. One-tailed alternatives ( $H_{a} : β > 0$ or $H_{a} : β < 0$ ) are only used if the problem explicitly states a direction for the expected relationship.

The test statistic is a $t$ -score, calculated as: $t = \frac{b - β _{0}}{S E _{b}}$ Where $β_{0}$ is the hypothesized value of the slope (almost always 0 for standard tests). The $p$ -value is the probability of observing a $t$ -score as extreme or more extreme than your calculated value, assuming the null hypothesis is true, using $df = n - 2$ .

If the $p$ -value is less than your significance level $α$ (usually 0.05), you reject the null hypothesis and conclude there is convincing evidence of a linear relationship between the variables.

Worked Example

Use the sample data from the previous example (n=18, $b = 2.7$ , $S E_{b} = 0.679$ ) to test if there is a positive linear relationship between number of practice tests and exam score at the $α = 0.01$ significance level.

State hypotheses: $H_{0} : β = 0$ , $H_{a} : β > 0$
Calculate test statistic: $t = \frac{2.7 - 0}{0.679} \approx 3.98$
$df = 16$ , one-tailed $p$ -value for $t = 3.98$ is ~0.0005 (less than 0.01)
Conclusion: Since $p < 0.01$ , we reject the null hypothesis. There is convincing evidence at the 0.01 significance level that more practice tests are associated with higher AP Statistics exam scores.

5. Conditions — linearity, independence, normality, equal variance

All slope inference is only valid if the four LINE conditions are satisfied. Examiners require you to explicitly check each condition with supporting evidence (usually from plots) to earn full marks on FRQs:

Linearity: The true relationship between $x$ and $y$ in the population is linear. How to check: Look at a plot of residuals vs. $x$ (or residuals vs. predicted $\overset{y}{^}$ ). The plot should show random scatter around the 0 line, with no clear curved pattern.
Independence: Individual observations are independent of each other. How to check: Confirm data was collected via random sampling or random assignment. If sampling without replacement, verify the population is at least 10 times larger than the sample (10% condition). For time-series data, check that residuals show no pattern over time.
Normality: Residuals are normally distributed around a mean of 0. How to check: A normal probability plot of residuals should be roughly linear, or a histogram of residuals should be roughly symmetric, unimodal, and free of extreme outliers.
Equal variance (homoscedasticity): The spread of residuals is constant across all values of $x$ . How to check: The residual plot should have consistent vertical spread across all values of $x$ , with no fanning (wider spread as $x$ increases) or funnel (narrower spread as $x$ increases) patterns.

6. Common Pitfalls (and how to avoid them)

Wrong move: Using a $z$ -distribution instead of a $t$ -distribution for slope intervals or tests. Why students do it: They confuse slope inference with proportion inference, which uses $z$ -scores. Correct move: Always use a $t$ -distribution with $df = n - 2$ for all slope inference, since we estimate the standard error of the slope from sample data.
Wrong move: Interpreting a slope confidence interval as a prediction for an individual observation. Why students do it: They mix up slope intervals and prediction intervals. Correct move: Slope intervals describe the average change in $y$ per unit $x$ for the population; prediction intervals estimate a single $y$ value for one specific $x$ value.
Wrong move: Listing LINE conditions without justifying them with plot evidence. Why students do it: They forget the exam requires you to show you know how to verify conditions, not just memorize them. Correct move: For every condition, state what plot you use and what pattern you look for (e.g. "Linearity is satisfied because the residual plot shows no curved trend, only random scatter around 0").
Wrong move: Claiming causation from a significant slope for observational data. Why students do it: They confuse correlation and causation. Correct move: Only state causation if the data comes from a randomized controlled experiment. For observational data, only state that there is a statistically significant association between the variables.
Wrong move: Using $df = n - 1$ instead of $df = n - 2$ for slope inference. Why students do it: They carry over the degrees of freedom rule from one-sample mean inference. Correct move: Slope inference uses $df = n - 2$ because you estimate two parameters (intercept and slope) when fitting the regression line, losing two degrees of freedom.

7. Practice Questions (AP Statistics Style)

Question 1

A biologist studies the relationship between tree age in years ( $x$ ) and trunk diameter in inches ( $y$ ) for 25 randomly selected oak trees. The least squares regression line is $\overset{y}{^} = 1.2 + 0.43 x$ , with $S E_{b} = 0.09$ . Calculate a 99% confidence interval for the true slope, and interpret it in context.

Solution

$df = 25 - 2 = 23$ . The critical $t^{*}$ value for 99% confidence and $df = 23$ is 2.807.
Margin of error: $2.807 * 0.09 = 0.253$
Interval: $0.43 \pm 0.253 = (0.177, 0.683)$
Interpretation: We are 99% confident that each additional year of age is associated with an average increase in oak tree trunk diameter between 0.177 and 0.683 inches.

Question 2

A teacher tests if there is a linear relationship between number of absences ( $x$ ) and final course grade ( $y$ ) for 31 students. The sample slope is $b = - 1.8$ , $S E_{b} = 0.62$ . Conduct a two-tailed hypothesis test at the $α = 0.05$ significance level, and state your conclusion.

Solution

Hypotheses: $H_{0} : β = 0$ , $H_{a} : β \neq = 0$
Test statistic: $t = \frac{- 1.8 - 0}{0.62} \approx - 2.90$
$df = 31 - 2 = 29$ . The two-tailed $p$ -value for $t = - 2.90$ is ~0.007, which is less than 0.05.
Conclusion: Reject the null hypothesis. There is convincing evidence at the 0.05 significance level that there is a linear relationship between number of absences and final course grade.

Question 3

A student fits a regression line for 12 data points measuring time spent scrolling social media per day vs. self-reported happiness score, and produces the following diagnostic plots:

Residual vs. $x$ plot has a clear downward curved pattern
Normal probability plot of residuals is roughly linear
Residual vs. $x$ plot has consistent vertical spread across all $x$ values
Data was collected via random sampling from a population of 150 students

Which inference condition is violated? Justify your answer, and explain the impact of this violation on your results.

Solution

The linearity condition is violated. The residual plot shows a clear curved pattern, which means the true relationship between time spent scrolling social media and happiness score is not linear, so a linear regression model is not appropriate for this data. Any confidence intervals or hypothesis tests calculated from this model will be unreliable, as they assume a linear population relationship.

8. Quick Reference Cheatsheet

Core Formulas

Quantity	Formula	Notes
Standard Error of Slope	$S E_{b} = \frac{s}{s _{x} n - 1}$	$s$ = residual standard error, $s_{x}$ = sample std dev of $x$
Confidence Interval for $β$	$b \pm t^{*} \times S E_{b}$	$df = n - 2$ , $t^{*}$ from $t$ -distribution
Slope Hypothesis Test $t$ -statistic	$t = \frac{b - β _{0}}{S E _{b}}$	$β_{0}$ is almost always 0, $df = n - 2$

LINE Conditions Check List

Condition	Verification Step
Linearity	Residual plot has no curved pattern, random scatter around 0
Independence	Random sample/assignment, 10% condition met, no time-series autocorrelation
Normality	Normal probability plot of residuals is linear, no extreme outliers
Equal Variance	Residual plot has constant vertical spread, no fanning/funnel pattern

Exam Reminders

Never use $z$ -scores for slope inference
Interpret all intervals and conclusions in context
Do not claim causation for observational data
You must justify all conditions with plot evidence to earn full marks

9. What's Next

Slope inference builds directly on your prior knowledge of bivariate regression and correlation from Unit 2 of the AP Statistics syllabus, and it is one of the most frequently tested topics on the exam, often combined with exploratory data analysis skills in the free-response section. Mastering this topic also lays the foundation for more advanced statistical methods you may encounter in college, including multiple linear regression, logistic regression, and causal inference modeling. On the AP exam, you can expect to see at least one FRQ that requires you to conduct a full slope inference workflow, from checking conditions to interpreting your final conclusion.

If you have any questions about sampling distributions, confidence intervals, hypothesis tests, or condition checks for regression slopes, don't hesitate to ask Ollie for extra practice problems, step-by-step walkthroughs, or clarification of confusing concepts. You can also find more AP Statistics study resources aligned to the College Board CED on the homepage, including topic quizzes and full-length practice exams tailored to help you score a 5 on test day.

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →

Inference for Quantitative Data: Slopes — AP Statistics Stats Study Guide

1. What Is Inference for Quantitative Data: Slopes?

2. Sampling distribution of slope b

Worked Example

3. Confidence interval for true slope β

Key Interpretation Rule

Worked Example

4. Hypothesis test for slope

Worked Example

5. Conditions — linearity, independence, normality, equal variance

6. Common Pitfalls (and how to avoid them)

7. Practice Questions (AP Statistics Style)

Question 1

Solution

Question 2

Solution

Question 3

Solution

8. Quick Reference Cheatsheet

Core Formulas

LINE Conditions Check List

Exam Reminders

9. What's Next

More study guides

2. Sampling distribution of slope $b$

3. Confidence interval for true slope $β$