AP · Correlation · 14 min read · Updated 2026-05-10

Correlation — AP Statistics Study Guide

For: AP Statistics candidates sitting AP Statistics.

Covers: Pearson product-moment correlation coefficient ( $r$ ), its z-score and deviation formula, hand calculation, properties of $r$ , interpretation of strength and direction of linear association, and common exam pitfalls including correlation vs causation.

You should already know: Scatterplots and description of two-variable association, z-scores for standardization, calculation of means and sample standard deviations for quantitative variables.

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.

1. What Is Correlation?

Correlation measures the strength and direction of the linear relationship between two quantitative variables. It is a core topic in Unit 2: Exploring Two-Variable Data, which accounts for 5-7% of the total AP Statistics exam weight per the official College Board CED. Correlation appears in both multiple-choice (MCQ) and free-response (FRQ) sections of the exam: it is often tested as a standalone MCQ on properties or interpretation, or as a foundational component of longer FRQs on linear regression.

The most widely used correlation measure tested on the AP exam is the Pearson product-moment correlation coefficient, denoted $r$ for a sample and $ρ$ (rho) for a full population. The CED almost exclusively focuses on sample $r$ for this topic, so we will center our discussion on that. Unlike regression slope, $r$ has no units and is bounded strictly between -1 and 1, so it is unaffected by unit changes or scaling of either variable. A positive $r$ indicates positive linear association (as $x$ increases, $y$ tends to increase), while a negative $r$ indicates negative linear association. A value of $r = 0$ means no linear association between the two variables.

2. Calculating the Correlation Coefficient

There are two equivalent common forms of the correlation formula, both of which you may need to use on the AP exam. The z-score form gives clear intuition: correlation is the average of the product of standardized z-scores for the two variables. The formula for sample $r$ is: $r = \frac{1}{n - 1} i = 1 \sum n z_{x_{i}} z_{y_{i}}$ where $n$ is the number of observations, $z_{x_{i}}$ is the z-score of the $i$ -th $x$ -value, and $z_{y_{i}}$ is the z-score of the $i$ -th $y$ -value. The equivalent deviation form, which is easier for hand calculation, is: $r = \frac{\sum ( x _{i} - x ˉ ) ( y _{i} - y ˉ )}{( n - 1 ) s _{x} s _{y}}$ where $\overset{x}{ˉ}, \overset{y}{ˉ}$ are sample means, and $s_{x}, s_{y}$ are sample standard deviations. The numerator captures how $x$ and $y$ deviate from their means in the same direction: if both are above or both below their means, the product is positive, pulling $r$ up; if one is above and the other below, the product is negative, pulling $r$ down.

Worked Example

A tutor collects 4 pairs of data on number of practice problems completed (x) and quiz score out of 10 (y): $(1, 3), (2, 5), (3, 6), (4, 8)$ . Calculate the sample correlation coefficient $r$ .

Calculate sample means: $\overset{x}{ˉ} = \frac{1 + 2 + 3 + 4}{4} = 2.5$ , $\overset{y}{ˉ} = \frac{3 + 5 + 6 + 8}{4} = 5.5$
Calculate the sum of cross-products of deviations from the mean:

$x_{i}$ $y_{i}$ $x_{i} - \overset{x}{ˉ}$ $y_{i} - \overset{y}{ˉ}$ Product

1 3 -1.5 -2.5 3.75

2 5 -0.5 -0.5 0.25

3 6 0.5 0.5 0.25

4 8 1.5 2.5 3.75

Sum of products = $3.75 + 0.25 + 0.25 + 3.75 = 8$
Calculate sample standard deviations: $s_{x} = \frac{( - 1.5 ) ^{2} + ( - 0.5 ) ^{2} + 0. 5 ^{2} + 1. 5 ^{2}}{3} = \frac{5}{3} \approx 1.291$ , $s_{y} = \frac{( - 2.5 ) ^{2} + ( - 0.5 ) ^{2} + 0. 5 ^{2} + 2. 5 ^{2}}{3} = \frac{13}{3} \approx 2.082$
Plug into the formula: $r = \frac{8}{( 3 ) ( 1.291 ) ( 2.082 )} \approx \frac{8}{8.06} \approx 0.993$

$x_{i}$	$y_{i}$	$x_{i} - \overset{x}{ˉ}$	$y_{i} - \overset{y}{ˉ}$	Product
1	3	-1.5	-2.5	3.75
2	5	-0.5	-0.5	0.25
3	6	0.5	0.5	0.25
4	8	1.5	2.5	3.75
Sum of products = $3.75 + 0.25 + 0.25 + 3.75 = 8$

Exam tip: On FRQs requiring hand calculation, AP readers accept both simplified decimal (accurate to 2+ decimal places) and unsimplified fraction forms of $r$ , so you do not need to do extra work to simplify if you are comfortable leaving it in fraction form.

3. Key Properties of the Correlation Coefficient

Most MCQ questions on correlation test knowledge of core properties of $r$ , so understanding these is critical for earning full points. The key properties tested on the AP exam are:

Bounds: $r$ is always between -1 and 1 ( $- 1 \leq r \leq 1$ ). $r = 1$ is a perfect positive linear relationship, $r = - 1$ is a perfect negative linear relationship, and $r = 0$ means no linear association.
No units: $r$ is unaffected by linear transformations (adding a constant or multiplying by a positive constant) of one or both variables. Changing units from inches to centimeters or pounds to kilograms will never change the value of $r$ .
Symmetry: The correlation of $x$ on $y$ is identical to the correlation of $y$ on $x$ . Swapping the two variables does not change $r$ , unlike the slope of a regression line.
Linear only: $r$ only measures linear association. It can be close to 0 even if there is a strong non-linear relationship between the two variables.
Sensitivity to outliers: $r$ is very sensitive to extreme outliers. A single outlier can drastically shift $r$ toward or away from 0.

Worked Example

The correlation between distance commuted to work (in miles) and monthly gas spending (in dollars) is 0.81 for a sample of workers. If distance commuted is converted to kilometers (1 mile ≈ 1.61 km), what is the new correlation?

Recall that unit conversion is a linear transformation that multiplies all $x$ -values by a positive constant.
For any linear transformation multiplying by a positive constant, the z-scores of the transformed values remain identical to the original z-scores: $z_{k x_{i}} = \frac{k x _{i} - k x ˉ}{k s _{x}} = \frac{k ( x _{i} - x ˉ )}{k s _{x}} = z_{x_{i}}$ .
Since $r$ is the average product of z-scores, $r$ does not change. The new correlation is still 0.81.

Exam tip: If an MCQ asks how a unit conversion or adding a constant to all values affects $r$ , the answer is always that $r$ does not change; this is one of the most frequently tested properties of correlation.

4. Interpreting Correlation in Context

AP FRQs almost always require you to interpret the value of $r$ in the context of the problem, and the grading rubric has strict requirements for full credit. To earn full points, your interpretation must include three core elements: (1) the direction of the relationship (positive/negative), (2) the strength of the linear relationship (strong/moderate/weak, matched to the magnitude of $r$ ), (3) context that names both variables. A common convention for strength that is accepted on the exam is: $∣ r ∣ > 0.7$ = strong, $0.3 < ∣ r ∣ < 0.7$ = moderate, $∣ r ∣ < 0.3$ = weak. You must explicitly mention that $r$ measures linear association to avoid losing points for vague interpretation.

Worked Example

A study of 30 coffee shops finds a correlation of -0.42 between distance from the nearest downtown subway stop and daily number of customers. Interpret this correlation in context.

Identify direction: $r$ is negative, so as distance from the subway increases, daily customers tend to decrease.
Identify strength: $∣ r ∣ = 0.42$ , which is a moderate linear relationship.
Combine into a full contextual interpretation: "There is a moderate negative linear relationship between distance from the nearest subway stop and daily customer count for these coffee shops: coffee shops located farther from a subway tend to have fewer daily customers."
This interpretation includes all required elements: direction, strength, explicit linear reference, and context, so it would earn full credit.

Exam tip: Always include the word "linear" in your interpretation and explicitly name both variables; generic interpretations like "there is a moderate negative correlation" will lose points even if the direction and strength are correct.

5. Common Pitfalls (and how to avoid them)

Wrong move: Interpreting $r = 0$ to mean there is no relationship of any kind between the two variables. Why: Students confuse "no linear association" with "no association at all". Strong non-linear relationships can still have $r = 0$ . Correct move: Always state $r = 0$ means there is no linear relationship between the variables, not no relationship at all.
Wrong move: Claiming correlation proves causation, e.g., "a positive correlation between number of firefighters at a fire and damage means firefighters cause damage". Why: Students forget that correlation from observational data can be explained by lurking third variables. Correct move: Always state that correlation alone does not provide evidence of a causal relationship between two variables.
Wrong move: Calculating or interpreting correlation between two categorical variables, e.g., correlation between gender and major. Why: Students confuse correlation with measures of association for categorical data. Correlation is only defined for quantitative variables. Correct move: Confirm both variables are quantitative before using or interpreting correlation; if one or both are categorical, correlation is meaningless here.
Wrong move: Claiming $r$ changes when you swap the explanatory and response variables. Why: Students confuse correlation with regression slope, which does change when you swap variables. Correct move: Remember $r$ is symmetric: swapping $x$ and $y$ leaves $r$ exactly unchanged.
Wrong move: Stating a correlation of 0.8 is twice as strong as a correlation of 0.4. Why: Correlation is not a linear scale of strength; the distance from 0 to 0.4 is not equivalent to the distance from 0.4 to 0.8. Correct move: Only describe strength relative to how close $r$ is to -1, 0, or 1; never describe strength as a ratio of correlation values.
Wrong move: Ignoring outliers when interpreting $r$ , assuming the calculated $r$ represents the relationship of all observations. Why: Students forget that $r$ is highly sensitive to extreme outliers, which can drastically change its value. Correct move: Always check the scatterplot for outliers before interpreting $r$ , and note if an outlier is influencing the correlation value.

6. Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

Which of the following statements about the correlation coefficient $r$ is true? A) The correlation between tree diameter and tree height will be negative, because larger trees have larger diameters. B) If the correlation between two variables is 0, there is no relationship between them. C) The correlation of $x$ on $y$ equals the correlation of $y$ on $x$ . D) A correlation of 0.6 indicates a stronger relationship than a correlation of -0.7.

Worked Solution: Evaluate each option: Option A is incorrect because larger diameter should be associated with larger height, so the correlation would be positive, not negative. Option B is incorrect because $r = 0$ means no linear relationship, not no relationship of any kind. Option C is correct: correlation is symmetric, so swapping $x$ and $y$ does not change $r$ . Option D is incorrect because strength depends on the absolute value of $r$ ; $∣ - 0.7∣ = 0.7 > 0.6$ , so -0.7 indicates a stronger relationship. The correct answer is C.

Question 2 (Free Response)

A small bakery collects 5 pairs of data on average daily temperature ( $x$ , in °F) and number of ice cream cones sold per day ( $y$ ): $(70, 22), (75, 30), (80, 34), (85, 42), (90, 48)$ . (a) Calculate the sample correlation coefficient $r$ for these data. (b) Interpret your calculated $r$ in the context of this problem. (c) How would $r$ change if we added 5 degrees to every temperature measurement to account for a measurement error? Justify your answer.

Worked Solution: (a) Calculate means: $\overset{x}{ˉ} = 80$ , $\overset{y}{ˉ} = 35.2$ . Sum of cross products of deviations: $(- 10) (- 13.2) + (- 5) (- 5.2) + 0 (- 1.2) + 5 (6.8) + 10 (12.8) = 132 + 26 + 0 + 34 + 128 = 320$ . $s_{x} = \frac{1 0 ^{2} + 5 ^{2} + 0 ^{2} + 5 ^{2} + 1 0 ^{2}}{4} = 6.455$ , $s_{y} = \frac{13. 2 ^{2} + 5. 2 ^{2} + 1. 2 ^{2} + 6. 8 ^{2} + 12. 8 ^{2}}{4} = 10.325$ . $r = \frac{320}{( 4 ) ( 6.455 ) ( 10.325 )} \approx 0.996$ . (b) There is a very strong positive linear relationship between average daily temperature and number of ice cream cones sold at this bakery: on days with higher temperatures, the bakery tends to sell more ice cream cones. (c) $r$ will remain unchanged at 0.996. Adding a constant to all values of $x$ is a linear transformation that shifts all values by the same amount, so deviations from the mean and z-scores remain unchanged. This means the correlation does not change.

Question 3 (Application / Real-World Style)

A public health researcher finds a correlation of 0.65 between annual alcohol consumption and annual healthcare spending for a sample of 1000 adults. A policy analyst argues that this correlation proves alcohol consumption causes higher healthcare costs, so policymakers should tax alcohol to reduce healthcare spending. Is the analyst's argument supported by the correlation? Explain what the correlation actually tells us.

Worked Solution: The analyst's argument that correlation proves causation is not supported by the available data. The correlation of 0.65 tells us that there is a moderately strong positive linear relationship between annual alcohol consumption and annual healthcare spending: adults who consume more alcohol per year tend to have higher annual healthcare spending. However, this correlation could be explained by lurking variables such as age, smoking status, or income that affect both alcohol consumption and healthcare spending. Since this is an observational study, the correlation alone does not prove that alcohol consumption causes higher healthcare costs, so the analyst's argument is not valid.

7. Quick Reference Cheatsheet

Category	Formula	Notes
Correlation (z-score form)	$r = \frac{1}{n - 1} \sum_{i = 1}^{n} z_{x_{i}} z_{y_{i}}$	For sample correlation; $z_{x}, z_{y}$ are standard scores
Correlation (deviation form)	$r = \frac{\sum ( x _{i} - x ˉ ) ( y _{i} - y ˉ )}{( n - 1 ) s _{x} s _{y}}$	Easier for hand calculation on FRQs
Bounds of $r$	$- 1 \leq r \leq 1$	Any value outside this range indicates a calculation error
Symmetry	$r (x, y) = r (y, x)$	Swapping variables does not change $r$ , unlike regression slope
Linear transformation effect	$r (a + b x, c + d y) = sign (b d) \cdot r (x, y)$	Positive $b, d$ leave $r$ unchanged; one negative flips the sign
What $r$ measures	Strength and direction of linear association	Does not measure non-linear association
Causation rule	Correlation $\neq =$ Causation	Only randomized experiments can prove causation
Sensitivity	$r$ is sensitive to outliers	One extreme outlier can drastically change $r$

8. What's Next

Correlation is the foundational prerequisite for linear regression, the next major topic in Unit 2: Exploring Two-Variable Data. The correlation coefficient directly determines the slope of the least squares regression line, and the square of $r$ (the coefficient of determination) measures how much variation in the response variable is explained by the linear model. Without understanding the properties and interpretation of $r$ , you will not be able to correctly interpret regression output or answer FRQ questions about model fit, which make up a large portion of the AP exam. Correlation also plays a key role in later topics such as inference for regression, where you test whether a population correlation is significantly different from zero.

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →

Correlation — AP Statistics Study Guide

1. What Is Correlation?

2. Calculating the Correlation Coefficient

Worked Example

3. Key Properties of the Correlation Coefficient

Worked Example

4. Interpreting Correlation in Context

Worked Example

5. Common Pitfalls (and how to avoid them)

6. Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

Question 2 (Free Response)

Question 3 (Application / Real-World Style)

7. Quick Reference Cheatsheet

8. What's Next

More study guides