| Study Guides
AP · Introducing Inference for Slope · 14 min read · Updated 2026-05-10

Introducing Inference for Slope — AP Statistics Study Guide

For: AP Statistics candidates sitting AP Statistics.

Covers: Sampling distribution of the sample slope, t-test for a population slope, confidence intervals for a population slope, conditions for inference for regression slope, and context interpretation of slope inference results.

You should already know: Least-squares regression line calculation and interpretation, sampling distribution basics for other inference procedures, how to check conditions for inference.

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.


1. What Is Introducing Inference for Slope?

In linear regression, we almost always calculate a least-squares regression line (LSRL) from a random sample of pairs, not the entire population of interest. The calculated sample slope is a statistic that estimates the unknown population slope (Greek beta, the true population parameter of interest). Introducing inference for slope gives us the formal inferential tools to use this sample statistic to draw conclusions about the true linear relationship between two quantitative variables. According to the AP Statistics Course and Exam Description (CED), Unit 9 (Inference for Quantitative Data: Slopes) accounts for 12-15% of the total AP exam score, and inference for slope appears in both multiple-choice (MCQ) and free-response questions (FRQ) on the exam. Most often, you will see 2-3 MCQ on this topic and one full multi-part FRQ testing your understanding of conditions, calculations, and interpretation. This topic differs from inference for means or proportions because it answers questions about the strength and direction of a linear association, rather than a single population parameter for a categorical or one-variable quantitative distribution.

2. Conditions for Inference for Slope

Any inference procedure requires specific conditions to be met for the sampling distribution of the statistic to match the theoretical distribution we use for p-values and critical values. For inference for slope, the four conditions are commonly remembered by the acronym LINE:

  1. Linear: The true relationship between (explanatory variable) and (response variable) is linear, meaning the population regression line is straight, not curved.
  2. Independent: Individual observations are independent of one another. For sampling without replacement, this requires the 10% condition: the sample size is less than 10% of the total population size.
  3. Normal: The response is normally distributed around the population regression line at every value of , which translates to residuals being approximately normally distributed.
  4. Equal Variance (homoscedasticity): The standard deviation of (and residuals) around the regression line is the same for all values of — the spread of points does not widen or narrow as increases.

To check each condition: Linear is confirmed with a scatterplot of raw data and a residual plot with no curved pattern. Independent is confirmed by random sampling/experiment design plus the 10% condition if applicable. Normal is checked with a normal probability plot or histogram of residuals, with no strong skewness or outliers. Equal variance is checked with a residual plot, confirming consistent spread of residuals across all values.

Worked Example

A botanist wants to study the relationship between average monthly rainfall (x, in cm) and the growth of a species of wildflower (y, in cm per month). She collects a random sample of 38 flower patches from a region with 6000 total patches. She produces the following plots: (1) A scatterplot of growth vs rainfall shows a roughly straight positive trend, (2) A residual plot shows residuals scattered evenly around 0, no curved pattern, and consistent vertical spread across all rainfall values, (3) A normal probability plot of residuals is very close to a straight 45-degree line. State whether each condition for inference for slope is met, with justification.

  1. Linear Condition: Met. Justification: The scatterplot of raw data shows a straight trend, and the residual plot has no curved pattern, so the true relationship is approximately linear.
  2. Independent Condition: Met. Justification: The sample is random, and the sample size 38 is less than 10% of 6000 (600), so the 10% condition is satisfied and observations are independent.
  3. Normal Condition: Met. Justification: The straight normal probability plot confirms residuals are approximately normally distributed.
  4. Equal Variance Condition: Met. Justification: The residual plot shows consistent spread of residuals across all rainfall values, so homoscedasticity holds.

Exam tip: On FRQ, you must explicitly link your check of each condition to the study design or plot provided; you will lose points if you just list conditions without justifying each one in context.

3. Sampling Distribution of the Sample Slope

When we take repeated random samples of the same size from the population of pairs, and calculate the LSRL slope for each sample, the distribution of these values is the sampling distribution of the sample slope. If all four LINE conditions are met, this sampling distribution is approximately normal, centered at the true population slope , with standard deviation (called the standard error of the slope, ) given by: where is the standard error of the estimate (average spread of points around the LSRL), is the sample standard deviation of the explanatory variable , and is the number of pairs.

Because we almost never know the true population standard deviation of the slope, we use the t-distribution with degrees of freedom for inference, instead of the z-distribution. We lose 2 degrees of freedom because we estimate two parameters (the intercept and the slope) from the sample.

Intuition for the formula: If spread around the line () is larger, is larger, meaning our slope estimate is less precise. If the spread of values () or the sample size is larger, is smaller, so the slope estimate is more precise. This makes sense: having more values spread further apart lets us estimate slope more accurately than if all are clustered.

Worked Example

A sociologist studying the relationship between neighborhood median income (x, in thousands of dollars) and average home price per square foot (y, in dollars) collects a sample of neighborhoods. He calculates dollars per square foot, and thousand dollars. Calculate and the correct degrees of freedom for inference.

  1. Recall the formula: . We have , , .
  2. Calculate the denominator: , so .
  3. Divide to get : .
  4. Calculate degrees of freedom: .

Exam tip: If is given directly in computer output on the exam, you do not need to calculate it from scratch. This formula is most often tested on MCQ asking how changing , , or affects .

4. Confidence Intervals for the Population Slope

A confidence interval for the population slope gives a range of plausible values for the unknown true population slope . The general formula is: where is the sample slope from the LSRL, is the critical t-value for the desired confidence level with , and is the standard error of the slope.

To interpret a confidence interval for slope in context: "We are [C]% confident that the true [mean change in y per 1-unit change in x] is between [lower bound] and [upper bound] [units]." A key check: if the interval does not contain 0, we have statistically significant evidence of a non-zero linear relationship between and at the corresponding significance level. If it contains 0, we do not have significant evidence.

Worked Example

For the neighborhood income and home price example above, the sample slope dollars per square foot per thousand dollars of median income. Calculate and interpret a 95% confidence interval for the true population slope, using for 95% confidence and , and .

  1. Identify all values: , , .
  2. Calculate margin of error: .
  3. Calculate the interval: .
  4. Interpret: We are 95% confident that the true mean increase in home price per square foot is between 2.12 for each $1000 increase in neighborhood median income.

Exam tip: Always check if 0 is inside or outside your interval — this tells you if the slope is statistically significant, which is a common unspoken part of many exam questions.

5. Hypothesis Test for the Population Slope

The most common hypothesis test for slope tests whether there is any linear relationship between and in the population. The default null hypothesis is , which means no linear relationship. The alternative is usually two-sided: , meaning a non-zero linear relationship, but can be one-sided if a direction is specified before data collection.

The test statistic for the t-test for slope is: Since (the hypothesized slope) is almost always 0, the formula simplifies to . We compare this test statistic to the t-distribution with to get a p-value. If , we reject and conclude there is convincing evidence of a non-zero linear relationship. If , we fail to reject and do not have convincing evidence.

Worked Example

For the neighborhood income and home price example, we have , , , . Test whether there is a positive linear relationship between neighborhood median income and home price per square foot.

  1. State hypotheses: , , where is the true population slope of home price per square foot on neighborhood median income (in thousands of dollars).
  2. Confirm conditions: All LINE conditions are satisfied (as checked in prior examples), so we proceed with the t-test.
  3. Calculate test statistic: , with .
  4. Find p-value: For a one-sided test with and , .
  5. Conclusion: Since , we reject . There is convincing evidence at the 0.05 significance level that there is a positive linear relationship between neighborhood median income and home price per square foot.

Exam tip: Unless the problem explicitly asks for a one-sided test, always use a two-sided alternative. Most AP exam computer output gives a two-sided p-value directly, so you can use it without recalculation.

6. Common Pitfalls (and how to avoid them)

  • Wrong move: Claiming the linear condition is met just because the correlation coefficient is close to 1 or -1. Why: Students confuse strength of a linear relationship with linearity — a strong curved relationship can also have a high . Correct move: Always check a residual plot for curved patterns to confirm the linear condition, regardless of the value of .
  • Wrong move: Using for inference for slope, instead of . Why: Students remember degrees of freedom for one-sample t-procedures and incorrectly reuse that number. Correct move: For any inference for slope, always subtract 2 from the sample size to get degrees of freedom, because we estimate both slope and intercept from the sample.
  • Wrong move: Interpreting a confidence interval for slope as "we are 95% confident that the slope of our sample is between...". Why: We know the sample slope exactly — it is calculated from our data. Inference is for the unknown population parameter. Correct move: Always state that the interval estimates the true population slope, and include units and context for x and y.
  • Wrong move: Concluding that because we fail to reject , there is no relationship between x and y at all. Why: Students forget that this test only tests for a linear relationship. Correct move: If you fail to reject , conclude that there is no convincing evidence of a linear relationship between x and y, which leaves open the possibility of a non-linear relationship.
  • Wrong move: Claiming that increasing the spread of x values (increasing ) will increase , making the slope estimate less precise. Why: Students misremember the formula for and mix up the position of in the fraction. Correct move: Remember that is in the denominator of , so larger gives smaller and more precise slope estimates.

7. Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

A researcher studies the relationship between the number of semesters a college student has completed (x) and their cumulative GPA (y). From a random sample of 40 students, the 95% confidence interval for the population slope is calculated as . Which of the following conclusions is most appropriate at the significance level? A) There is not convincing evidence of a linear relationship between semesters completed and cumulative GPA, because the interval contains only positive values. B) There is convincing evidence of a linear relationship between semesters completed and cumulative GPA, because the interval does not contain 0. C) We are 95% confident that the sample slope is between 0.12 and 0.28. D) The probability that the true slope is between 0.12 and 0.28 is 0.95.

Worked Solution: A confidence interval for the population slope contains all plausible values of the true at the given significance level. If 0 is not in the interval, 0 is not a plausible value for , so we reject and conclude there is convincing evidence of a linear relationship. Option A reverses the conclusion. Option C is incorrect because confidence intervals estimate the population slope, not the known sample slope. Option D is incorrect because the 95% confidence level refers to the method of constructing intervals, not the probability for a single specific interval. The correct answer is B.


Question 2 (Free Response)

A café owner studies the relationship between average daily temperature (x, in degrees Fahrenheit) and daily iced coffee sales (y, in dollars). A random sample of 20 days gives the following regression output:

Predictor Coef SE Coef T P
Constant 42.8 18.3 2.34 0.03
Temperature 3.12 0.95 3.28 0.004

(a) State the appropriate hypotheses for a test to determine whether there is a linear relationship between temperature and iced coffee sales. Define the parameter of interest in context. (b) Calculate the test statistic for this test and state the correct degrees of freedom. (c) Using , state your conclusion in context.

Worked Solution: (a) The parameter of interest is , the true population slope of daily iced coffee sales on average daily temperature (in °F). Hypotheses: ; . (b) For , the test statistic is (also given directly in output). Degrees of freedom: . (c) The p-value is 0.004, which is less than . We reject the null hypothesis. There is convincing evidence at the 0.05 significance level that there is a non-zero linear relationship between average daily temperature and daily iced coffee sales.


Question 3 (Application / Real-World Style)

An environmental scientist studies the relationship between distance from a factory (x, in kilometers) and the concentration of fine particulate matter in the air (y, in micrograms per cubic meter). She collects a random sample of 15 locations around the factory, and calculates a sample slope of μg/m³ per km, with . Construct and interpret a 90% confidence interval for the true population slope, using for 90% confidence and .

Worked Solution: The formula for the confidence interval is . Calculate the margin of error: . Calculate the interval: . Interpretation in context: We are 90% confident that the true mean concentration of fine particulate matter decreases by between 0.46 and 1.94 micrograms per cubic meter for each additional kilometer away from the factory. Since 0 is not in the interval, there is statistically significant evidence of a negative linear relationship between distance from the factory and particulate concentration at the significance level.

8. Quick Reference Cheatsheet

Category Formula Notes
Population slope parameter Unknown true population slope, the parameter of interest
Sample slope statistic Calculated from LSRL, estimates
Standard error of the estimate Measures average spread of points around the LSRL
Standard error of the slope = sample standard deviation of
Degrees of freedom for slope inference Subtract 2 for estimated intercept and slope
Confidence interval for Range of plausible values for
t-test statistic for almost always 0, so
Default null hypothesis Null = no linear relationship in the population
Conditions for inference (LINE) N/A Linear, Independent, Normal, Equal Variance

9. What's Next

This chapter lays the foundation for all inference for linear regression, which you will apply immediately to more complex scenarios involving interpreting computer output, justifying statistical significance, and connecting slope inference to correlation. Immediately after this introduction, you will learn how to connect confidence interval conclusions to hypothesis test results, and how to analyze full regression output from statistical software — one of the most common FRQ tasks on the AP exam. Without mastering the conditions, sampling distribution structure, and basic calculation of intervals and tests for slope here, you will not be able to correctly interpret results or earn full points on more complex questions. This topic builds on all previous inference concepts and feeds into the core AP Statistics idea of drawing conclusions about population parameters from sample statistics.

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →