Representing Two-Variable Quantitative Data — AP Statistics Study Guide
For: AP Statistics candidates sitting AP Statistics.
Covers: Classification of explanatory vs response variables, constructing and interpreting scatterplots, describing direction, shape, strength, and outliers in bivariate associations, and distinguishing linear from non-linear relationships for two quantitative variables.
You should already know: How to distinguish quantitative vs categorical variables. How to construct and interpret univariate quantitative data distributions. Basic coordinate plane plotting conventions.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. What Is Representing Two-Variable Quantitative Data?
Two-variable (bivariate) quantitative data consists of paired measurements of two different quantitative variables collected on the same individual or observational unit. The core goal of representing this data graphically is to visualize any association between the two variables: do changes in one variable tend to correspond to predictable changes in the other? According to the AP Statistics Course and Exam Description (CED), this foundational topic is part of Unit 2, which accounts for 5-7% of total AP exam weight, with this specific subtopic appearing in both MCQ and FRQ sections. You will often answer multiple-choice questions about interpreting scatterplots or variable roles, and you will be required to construct or describe a scatterplot in the early parts of a free-response question.
Standard notation uses for the explanatory variable (predictor/independent variable) and for the response variable (outcome/dependent variable). Synonyms you may see on the exam include bivariate quantitative visualization, scatterplot analysis, and association plotting. Unlike univariate data, which focuses on the distribution of a single variable, bivariate representation focuses exclusively on the relationship between two variables, making it the foundation for all subsequent topics in Unit 2.
2. Explanatory vs Response Variables
The first critical step in representing bivariate quantitative data is correctly classifying the two variables by their role in the research question. An explanatory variable is the variable that is used to predict or explain changes in the other variable. A response variable is the outcome variable that is measured to see if it changes in response to changes in the explanatory variable. Plotting convention strictly places explanatory variables on the horizontal -axis and response variables on the vertical -axis; swapping these axes completely reverses the interpretation of any relationship between the variables.
Roles are determined by the research question, not just the variables themselves. For example, if you are studying how hours of study predict test score, hours of study is explanatory and test score is response. If you are studying how test score predicts college first-year GPA, test score becomes the explanatory variable. In rare cases where you are only testing for any association (no prediction goal), roles do not need to be assigned, but you will almost always be given a clear research question that defines roles on the AP exam.
Worked Example
A public health researcher studies 30 urban neighborhoods to examine how the number of fast-food restaurants per square mile predicts the rate of adult obesity (as a percent of the population). Identify which variable is explanatory, which is response, and state the correct axis for each in a scatterplot.
- The research goal explicitly states that number of fast-food restaurants is used to predict obesity rate, so the variable used for prediction is the explanatory variable by definition.
- Explanatory variable = number of fast-food restaurants per square mile.
- The variable being predicted is the response variable: obesity rate (percent of adult population).
- By plotting convention, explanatory variables go on the horizontal -axis and response variables go on the vertical -axis.
Final answer: Explanatory = number of fast-food restaurants (x-axis), Response = obesity rate (y-axis)
Exam tip: If you are unsure of roles, look for phrasing like "use A to predict B" — A is always explanatory, B is always response. This is the most common phrasing for this question type on the AP exam.
3. Constructing and Interpreting Scatterplots
A scatterplot is the standard graphical representation for two-variable quantitative data. Each observational unit is represented by a single point placed at the intersection of its (explanatory) and (response) values. To construct a scatterplot for full credit on an AP FRQ, you must: 1) label both axes with the variable name and its units, 2) use a consistent, appropriate scale that fits all data points, 3) plot each point accurately, and 4) never connect points with lines (connecting is only done for time series plots, not standard scatterplots of independent observational units).
When interpreting a scatterplot, the AP exam requires you to describe four key features of the association, which can be remembered with the mnemonic DSSO: Direction, Shape, Strength, Outliers. Definitions of each feature:
- Direction: Positive = as increases, tends to increase; Negative = as increases, tends to decrease; No direction = no clear association.
- Shape: Most commonly linear or non-linear (curved).
- Strength: Strong = points lie close to the overall pattern; Weak = points are widely spread from the pattern.
- Outliers: Any point that falls far outside the overall pattern of the association.
Worked Example
A real estate agent collects data on 12 recently sold homes, measuring square footage of the home (x, in hundreds of square feet) and sale price (y, in thousands of USD). The resulting scatterplot shows that sale price tends to increase as square footage increases, points lie close to a straight line, and one small 1,000 square foot home sold for $800,000, far above the trend of other points of the same size. Describe all four key features of the association.
- Direction: Sale price tends to increase as square footage increases, so the direction is positive.
- Shape: Points follow a straight-line trend, so the shape is linear.
- Strength: Points lie close to the linear trend, so the association is strong.
- Outliers: There is one clear outlier: a small home that sold for a much higher price than the overall pattern predicts.
Exam tip: Even if there are no outliers, you must explicitly state "there are no clear outliers" to get full credit on an AP FRQ description question. Skipping this step will cost you a point.
4. Distinguishing Linear vs Non-Linear Associations
One of the most important tasks when representing bivariate data is distinguishing between linear and non-linear associations, because all simple linear regression methods you will learn later only produce valid results for linear associations. A linear association is one where the overall pattern of points follows a straight line, meaning the rate of change of with respect to is roughly constant across all values of . A non-linear association has an overall pattern that follows a curve, meaning the rate of change of with respect to changes as increases.
Common non-linear patterns you may see on the exam include: increasing at an increasing rate (concave up, e.g., bacterial population growth over time), increasing at a decreasing rate (concave down, e.g., crop yield increasing with fertilizer use that levels off at high fertilizer amounts), and U-shaped or inverted U-shaped curves (e.g., test performance vs anxiety, which peaks at moderate anxiety). It is critical to note that non-linear association is not the same as no association: non-linear associations have a clear pattern, just not a straight one.
Worked Example
An ecologist studies the relationship between elevation (in meters above sea level) and the number of native plant species found per 100 square meter plot, across 18 plots in a mountain range. The scatterplot shows that the number of species is low at low elevation, increases to a maximum at middle elevation, then decreases again at high elevation. Is this association linear or non-linear? Justify your answer.
- A linear association requires the rate of change of species count with respect to elevation to be constant across all elevation values.
- In this case, species count first increases with elevation, then decreases, so the rate of change changes from positive to negative as elevation increases, meaning it is not constant.
- The overall pattern is a curved, inverted U-shape, not a straight line.
- Therefore, the association is non-linear.
Exam tip: Don't assume an association is linear just because it is positive. Always check if the trend is straight, not just increasing or decreasing.
5. Common Pitfalls (and how to avoid them)
- Wrong move: Swapping the x and y axes when plotting explanatory and response variables for a prediction question. Why: Students assume any variable can go on any axis, and do not tie axis assignment to the research question. Correct move: Always check the goal first: the variable being predicted is always response on the y-axis, the predictor is always explanatory on the x-axis.
- Wrong move: Connecting points with line segments when constructing a scatterplot for independent observational units. Why: Students confuse scatterplots with algebra class line graphs or time series plots. Correct move: Only plot individual points; never connect them unless the problem explicitly says the data is a time sequence.
- Wrong move: Forgetting to mention one of the four DSSO features when describing a scatterplot on an FRQ. Why: Students remember direction and strength, but forget to address shape or explicitly note that there are no outliers. Correct move: Use the DSSO mnemonic to check off all four features before moving on.
- Wrong move: Calling any point with an extreme x-value an outlier. Why: Students confuse extreme values with outliers from the association pattern. Correct move: A point is only an outlier if it falls far away from the overall relationship between x and y, regardless of its x-position.
- Wrong move: Calling a non-linear association "no association". Why: Students confuse "not linear" with "no relationship between variables". Correct move: If there is a clear curved pattern, describe it as a non-linear association, not no association.
- Wrong move: Forgetting to include units when labeling scatterplot axes on an FRQ. Why: Students label variables but omit units, which is required for full credit. Correct move: Always add units in parentheses after the variable name, e.g., "Elevation (meters)".
6. Practice Questions (AP Statistics Style)
Question 1 (Multiple Choice)
A psychologist studies the relationship between the number of hours of sleep participants get per night, and their score on a memory test out of 20 points. She wants to use sleep time to predict test performance. Which of the following correctly describes variable placement on a scatterplot? A) Hours of sleep is explanatory, plotted on the x-axis; memory test score is response, plotted on the y-axis B) Hours of sleep is explanatory, plotted on the y-axis; memory test score is response, plotted on the x-axis C) Memory test score is explanatory, plotted on the x-axis; hours of sleep is response, plotted on the y-axis D) Because the study is observational, there is no explanatory or response variable, so any placement is acceptable.
Worked Solution: First, recall that the variable used for prediction is the explanatory variable, which goes on the x-axis, and the variable being predicted is the response variable on the y-axis. In this problem, sleep hours is used to predict memory test score, so sleep hours is explanatory on the x-axis, test score is response on the y-axis. Option B swaps axes, C swaps variable roles, and D is incorrect because even observational studies have defined variable roles based on research question. The correct answer is A.
Question 2 (Free Response)
An agricultural researcher studies the relationship between the amount of fertilizer applied (in kg per hectare) and wheat yield (in tonnes per hectare) for 20 test plots. (a) Identify the explanatory and response variables if the goal is to predict wheat yield from fertilizer amount. (b) The scatterplot of the data shows that wheat yield tends to increase as fertilizer increases, points follow a straight line closely for fertilizer amounts 0-100 kg per hectare, and there are no points far from the pattern. Describe all four key features of the association for this range of data. (c) A student concludes that the scatterplot proves that more fertilizer causes higher yield. Is this conclusion justified? Explain.
Worked Solution: (a) Fertilizer amount (kg per hectare) is the explanatory variable, since it is used to predict yield. Wheat yield (tonnes per hectare) is the response variable, since it is the outcome being predicted. (b) Using the DSSO framework: 1) Direction: Positive association, because yield increases as fertilizer amount increases. 2) Shape: Linear, since the trend follows a straight line. 3) Strength: Strong, because points lie close to the linear trend. 4) Outliers: There are no clear outliers, since no points fall far from the overall pattern. (c) The conclusion is not justified. A scatterplot only shows an association between two variables, not causation. Confounding variables (e.g., higher water availability on plots that got more fertilizer) could explain the relationship even if fertilizer itself does not cause higher yield.
Question 3 (Application / Real-World Style)
A small business owner collects data on 8 months of operation, measuring the number of dollars spent on social media advertising per month (x, in hundreds of dollars) and the total monthly revenue (y, in thousands of dollars). The data is:
| Advertising ($100s) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| Revenue ($1000s) | 12 | 18 | 22 | 25 | 27 | 28 | 28.5 | 28.8 |
Describe the association between advertising spend and monthly revenue, including direction, shape, strength, and outliers. Interpret your conclusion in context.
Worked Solution:
- Direction: As advertising spend increases, monthly revenue increases, so the association is positive.
- Shape: Revenue increases quickly when advertising is low, but the rate of increase slows and levels off as advertising increases, so the pattern is curved, meaning the association is non-linear.
- Strength: All points follow the curved pattern very closely, so the association is strong.
- Outliers: No points fall far from the overall pattern, so there are no outliers.
In context: Increasing social media advertising spend is associated with higher monthly revenue, but the gains in revenue get smaller as you spend more, with revenue leveling off after ~$700 in advertising per month.
7. Quick Reference Cheatsheet
| Category | Rule/Definition | Notes |
|---|---|---|
| Explanatory Variable | Variable used to predict or explain changes in another variable | Always plotted on the horizontal x-axis |
| Response Variable | Variable that is measured or predicted from the explanatory variable | Always plotted on the vertical y-axis |
| Scatterplot Construction | Each observational unit is a single point at | Never connect points unless it is a time series plot |
| Direction of Association | Positive = ; Negative = | No pattern = no association |
| Shape of Association | Linear = points follow a straight line; Non-linear = points follow a curve | Linear association is required for simple linear regression |
| Strength of Association | Strong = points lie close to the pattern; Weak = points are spread far from the pattern | Strength does not depend on direction |
| Outlier | A point that falls far outside the overall pattern of association | Only an outlier if it deviates from the relationship, not just an extreme x or y value |
| AP Description Requirement | Always describe DSSO: Direction, Shape, Strength, Outliers | Full credit requires all four components, even if no outliers exist |
8. What's Next
This chapter is the foundational first step for all of Unit 2: Exploring Two-Variable Data. The patterns you identify in a scatterplot are the basis for all subsequent analysis: next you will measure the strength of linear associations using correlation, which relies on correctly classifying association direction and shape from your scatterplot. After correlation, you will learn to fit least-squares regression lines to linear associations, which only produces valid results if you have already confirmed that the relationship is linear from your scatterplot. Without correctly identifying variables, describing association features, and spotting outliers in this step, all later correlation and regression results will be misinterpreted, which is a common source of lost points on the AP exam.
Follow-on topics to study next: