Representing a Quantitative Variable with Graphs — AP Statistics Study Guide
For: AP Statistics candidates sitting AP Statistics.
Covers: Dotplots, stemplot (stem-and-leaf) displays, histograms, and boxplots for one-variable quantitative data, how to construct and interpret each graph, and how to describe a full distribution using shape, center, spread, and outlier identification.
You should already know: Difference between quantitative and categorical variables, basic frequency counting, definitions of center and spread.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. What Is Representing a Quantitative Variable with Graphs?
A quantitative variable is any variable that records a numerical measurement where arithmetic operations (like averaging) make sense. Representing a quantitative variable with graphs means building and interpreting visual displays that show the distribution of the variable: what values the variable takes, and how often it takes those values. According to the official AP Statistics Course and Exam Description (CED), this topic falls within Unit 1: Exploring One-Variable Data, which accounts for 15-23% of the total AP exam score. Content from this topic appears on both multiple-choice questions (MCQ) and free-response questions (FRQ): you can expect 1-2 MCQ testing graph interpretation or construction rules, and this topic is almost always tested as part of the first or second FRQ, which requires describing a distribution in context. Standard notation convention for all graphs of quantitative variables places the values of the variable on the horizontal (x) axis, and the count, relative frequency, or density of observations at each value on the vertical (y) axis. Common synonyms in AP materials include “distribution plots for quantitative data” or “one-variable quantitative graphs.”
2. Dotplots and Stemplots
Dotplots and stemplots (also called stem-and-leaf displays) are the simplest graphs for quantitative data, designed for small-to-moderate sized datasets where retaining individual data points is useful. A dotplot plots each individual observation as a dot along a horizontal numerical axis matching the variable’s scale; dots that share the same value are stacked vertically. Dotplots make it very easy to pick out the mode, center, gaps, clusters, and outliers by eye. A stemplot separates each data value into two parts: the stem (all leading digits of the value) and the leaf (the last trailing digit, usually one digit). Like dotplots, stemplots retain all original data values, so you can reconstruct the full dataset from the graph, which is a key advantage over histograms. Stemplots work for datasets up to ~200 observations, after which they become too cluttered. Key construction rules for stemplots (tested often on AP): always order leaves from smallest to largest, do not omit stems that have no leaves (omission distorts the overall shape of the distribution), and split stems if more than ~10 leaves fall on a single stem to improve readability.
Worked Example
Problem: The number of minutes 15 high school students spent studying for a 10-point statistics quiz are: 12, 18, 20, 15, 32, 18, 25, 22, 17, 20, 24, 16, 18, 28, 15. Construct a correctly formatted stemplot for this data, using the tens digit as the stem. Solution:
- First, identify the range of stems: the smallest observation is 12 (tens digit 1) and the largest is 32 (tens digit 3), so we include all stems from 1 to 3 (no stems skipped).
- Extract the leaf (the ones digit, the trailing digit) for each observation, grouping leaves by their stem: Stem 1 gets [2,8,5,8,7,6,8,5], Stem 2 gets [0,5,2,0,4,8], Stem 3 gets [2].
- Sort each group of leaves in ascending order, left to right.
- Add a required key to explain how to read the stemplot, with units. The final correctly formatted stemplot is:
1 | 2 5 5 6 7 8 8 8
2 | 0 0 2 4 5 8
3 | 2
Key: 1|2 = 12 minutes of study time
Exam tip: Always add a key to a stemplot with correct units, and never skip empty stems. AP exam graders require both to get full credit, and 10% of students lose points for omitting one or both.
3. Histograms
Histograms are the most common graph for large quantitative datasets, where displaying every individual data point is impractical. A histogram divides the full range of the quantitative variable into a set of contiguous, equal-width bins (intervals) on the horizontal x-axis. The height of each rectangular bar corresponds to the number (frequency), proportion (relative frequency), or density of observations that fall into that bin. A critical distinction that AP tests repeatedly: histograms are for quantitative variables, while bar charts are for categorical variables. Histograms never have gaps between bars (unless a bin has zero observations, which leaves an empty space), because the x-axis is a continuous numerical scale, unlike bar charts which have gaps between categorical labels. When constructing a histogram, aim for 5 to 15 bins: too few bins hide important features like multiple modes, while too many bins leave too much empty space and make shape hard to see. Relative frequency histograms use proportions instead of counts on the y-axis, making it easy to compare distributions of different sizes, but they are interpreted the same way as frequency histograms. Shape features like skewness and modality are easiest to see in histograms for large datasets.
Worked Example
Problem: The 30 observations below are the heights (in inches) of 30 randomly selected 10th grade boys: 61, 62, 63, 63, 64, 65, 65, 65, 66, 66, 67, 67, 67, 67, 68, 68, 68, 69, 69, 70, 70, 70, 71, 71, 72, 72, 73, 74, 75. Construct a frequency histogram with 5-inch width bins starting at 60. Solution:
- First, define the bins according to the requirements: starting at 60, 5-inch width gives four bins: , , , .
- Count the frequency of observations in each bin: 60-65: 5 observations, 65-70: 14 observations, 70-75: 11 observations, 75-80: 0 observations.
- Label the horizontal axis Height (inches) and the vertical axis Frequency, with scales matching the bin ranges and frequency counts.
- Draw rectangular bars for each bin: the width of each bar matches the bin width, the height matches the bin frequency. No gaps are left between bars, and the empty 75-80 bin is left as blank space. The resulting histogram has a roughly symmetric, unimodal shape centered near 68 inches.
Exam tip: Never confuse a histogram with a bar chart for categorical data. If a question asks you to identify the correct graph for a quantitative variable, the option with gaps between bars is almost always wrong.
4. Boxplots (Box-and-Whisker Plots)
Boxplots (also called box-and-whisker plots) are compact graphical displays of quantitative data based on the five-number summary: minimum, first quartile (), median, third quartile (), and maximum. They are particularly useful for comparing multiple distributions side-by-side, as they clearly show differences in center, spread, and skewness without extra clutter. Boxplots incorporate the rule for outlier identification: the interquartile range is , so any observation below or above is classified as an outlier, plotted as an individual point separate from the plot. Whiskers extend from the box to the farthest non-outlier observation in each direction, not to the extreme minimum/maximum if there are outliers. Boxplots can be drawn horizontally or vertically, but usually horizontal for one-variable data. A key limitation often tested on AP: boxplots do not show modality (e.g., you cannot tell if a distribution is bimodal from a boxplot), unlike histograms or dotplots.
Worked Example
Problem: Use the sorted study time dataset from the earlier stemplot example: [12, 15, 15, 16, 17, 18, 18, 18, 20, 20, 22, 24, 25, 28, 32]. Construct a correctly annotated boxplot and identify any outliers using the rule. Solution:
- Calculate the five-number summary for sorted data: minimum = 12, first quartile (median of the first 7 observations), median = 18 (8th observation), third quartile (median of the last 7 observations), maximum = 32.
- Calculate , then calculate the outlier fences:
- Check all observations against the fences: all values fall between 4 and 36, so there are no outliers in this dataset.
- Draw the boxplot: draw a box from to on a numerical x-axis labeled "Study Time (minutes)", draw a vertical line through the box at the median 18, draw whiskers from the box edges to the minimum (12) and maximum (32), with no separate outlier points.
Exam tip: Always remember that whiskers on a boxplot extend to the farthest non-outlier, not to the absolute minimum and maximum if outliers are present. Drawing whiskers past outliers is a common point deduction on FRQ.
5. Describing Distributions from Graphs
The most common FRQ task for this topic is describing the distribution of a quantitative variable from a graph. AP requires you to always address four key features in context, using the common mnemonic SOCS: Shape, Outliers, Center, Spread. Shape: Describe the modality (unimodal = one clear peak, bimodal = two clear peaks, uniform = roughly the same frequency across all values) and symmetry. If the distribution is not symmetric, identify the direction of skewness: a right-skewed distribution has a long tail extending to higher numerical values, while a left-skewed distribution has a long tail extending to lower numerical values. Outliers: Name any values that fall far outside the overall pattern of the data, and note their approximate value if possible. Center: Give an approximate value for the center (the middle of the distribution, usually the median) in context. Spread: Describe how much the values vary, by giving the approximate range of typical values, or the full range of the data, in context. The AP grading rubric always requires all four features to be addressed in the context of the problem to earn full credit; omitting context is a common reason for lost points.
Worked Example
Problem: Describe the distribution of student study times from the stemplot constructed earlier, following AP expectations. Solution:
- Shape: The distribution of study times is unimodal (one clear peak at 18 minutes) and moderately right-skewed, with a long tail extending toward higher study time values.
- Outliers: There are no outliers in this distribution; no values fall far outside the overall pattern.
- Center: The center of the distribution (median) is 18 minutes, meaning half of the 15 students studied less than 18 minutes for the quiz, and half studied more.
- Spread: Study times range from 12 minutes to 32 minutes, with most study times falling between 12 and 28 minutes. All four features are addressed in context, so this answer would earn full credit on an AP FRQ.
Exam tip: Never forget to describe all four SOCS features, and always include units and context. 1 in 3 students lose at least one point on a describe question for missing context, even if the numerical values are correct.
6. Common Pitfalls (and how to avoid them)
- Wrong move: Confusing a histogram with a bar chart, or using a bar chart for a quantitative variable. Why: Students learn bar charts first for categorical data, and assume all bar graphs are interchangeable, forgetting the gap rule and variable type difference. Correct move: On any graph question, first confirm variable type: for quantitative variables, the correct graph will not have arbitrary gaps between bars, and the x-axis will be a continuous numerical scale.
- Wrong move: Skipping empty stems in a stemplot or empty bins in a histogram to "save space". Why: Students think empty stems/bins add no information, so omitting them makes the graph cleaner. Correct move: Always include all stems from the minimum to maximum value, and all bins across the full range of the data, to avoid distorting the shape of the distribution.
- Wrong move: Drawing boxplot whiskers all the way to the absolute minimum and maximum when outliers are present. Why: Students memorize the five-number summary as min, Q1, median, Q3, max, so they automatically extend whiskers to those extremes regardless of outliers. Correct move: After identifying outliers with the rule, extend whiskers only to the farthest observation that is not an outlier, and plot outliers as separate points.
- Wrong move: Describing a distribution without including context (e.g., saying "center is 18" instead of "median study time is 18 minutes"). Why: Students rush through FRQs and forget to connect their answer to the problem's scenario. Correct move: After writing any description, check that you mentioned the variable name and units for every feature you describe.
- Wrong move: Claiming a distribution is skewed right because the peak is on the right, instead of the tail. Why: Students confuse the position of the peak with the direction of the skew. Correct move: Always remember: "The skew is the tail" — if the long tail points to higher values, it's right-skewed, and vice versa.
- Wrong move: Claiming you can identify a bimodal distribution from a boxplot. Why: Students think boxplots show all shape features, but they aggregate all data in the box and whiskers. Correct move: Only describe modality from a dotplot, stemplot, or histogram; never state modality based on a boxplot.
7. Practice Questions (AP Statistics Style)
Question 1 (Multiple Choice)
A researcher studies the distribution of annual personal income in a small town, which includes a small number of multi-millionaires. Which of the following options correctly identifies the best graph to display the shape of the full distribution, and the most likely shape? A) Bar chart, skewed left B) Histogram, skewed right C) Boxplot, symmetric D) Stemplot, bimodal
Worked Solution: First, annual income is a quantitative variable, so bar charts (for categorical variables) are immediately eliminated, ruling out A. The small number of very high incomes (multi-millionaires) creates a long tail extending toward higher values, so the distribution is skewed right. A boxplot is useful for comparing distributions but does not clearly show overall shape as well as a histogram for this large dataset, so C is wrong. A stemplot would be far too cluttered for a dataset of all incomes in a town, and the distribution is unimodal (one peak at low/moderate incomes) not bimodal, so D is wrong. The correct answer is B.
Question 2 (Free Response)
A community garden association records the yield (in pounds of tomatoes) from 20 plots in their garden, sorted as follows: 8, 10, 12, 12, 14, 15, 15, 16, 17, 18, 19, 20, 21, 22, 22, 23, 24, 25, 26, 28. (a) Construct a stemplot for this yield data, using the tens digit as the stem. (b) Identify any outliers in the data using the rule. (c) Describe the distribution of tomato yields using SOCS, following AP requirements.
Worked Solution: (a) We include all stems from 0 (yields < 10 pounds) to 2 (yields 20-29 pounds), sort leaves, and add a key:
0 | 8
1 | 0 2 2 4 5 5 6 7 8 9
2 | 0 1 2 2 3 4 5 6 8
Key: 1|0 = 10 pounds of tomatoes
(b) For sorted observations, median = , , . . Outlier fences: and . All observations fall between 1 and 37, so there are no outliers. (c) Full description:
- Shape: The distribution of tomato yields is unimodal and roughly symmetric, with a single peak near 15 pounds.
- Outliers: There are no outliers in the dataset.
- Center: The median yield is 18.5 pounds, meaning half the plots yielded less than 18.5 pounds of tomatoes, and half yielded more.
- Spread: Yields range from 8 pounds to 28 pounds, with most yields falling between 10 and 26 pounds.
Question 3 (Application / Real-World Style)
A wildlife biologist measures the body mass (in kilograms) of 25 adult wild moose in a national park, resulting in the following five-number summary: 210, 290, 360, 410, 580. Construct a correctly annotated boxplot for this data, identify any outliers, and interpret the boxplot in context.
Worked Solution:
- Calculate . Calculate outlier fences: Lower fence = , Upper fence = .
- Check extreme values: the minimum 210 kg is above 110 kg, and the maximum 580 kg is below 590 kg, so there are no outliers.
- Boxplot construction: Draw a box from 290 kg to 410 kg on a numerical x-axis labeled "Moose Body Mass (kg)", add a line at the median 360 kg, draw whiskers from the box edges to the minimum 210 kg and maximum 580 kg, with no outlier points.
- Interpretation: Half of the adult moose in the sample have a body mass between 290 kg and 410 kg, with a median mass of 360 kg. The distribution of body masses ranges from 210 kg to 580 kg, with no unusually large or small moose in the sample.
8. Quick Reference Cheatsheet
| Category | Formula / Rule | Notes |
|---|---|---|
| Stemplot Construction | Stem = leading digits, Leaf = trailing digit | Always add a key with units, do not skip empty stems |
| 1.5×IQR Outlier Rule | Lower = , Upper = | Values outside the fences are classified as outliers |
| Histogram vs Bar Chart | Histogram: no gaps between bars, x-axis = numerical | Bar chart: gaps between bars, x-axis = categorical labels |
| Boxplot Whisker Rule | Whiskers extend to farthest non-outlier | Do not extend whiskers past outliers to absolute min/max |
| Describe Distribution Mnemonic | SOCS = Shape, Outliers, Center, Spread | All four features + context required for full credit |
| Skewness Direction | Skew direction = direction of the long tail | Right skew = tail to high values, left skew = tail to low values |
| Boxplot Limitation | Cannot identify modality from a boxplot | Use dotplots, stemplots, or histograms to assess modality |
9. What's Next
After mastering representing quantitative variables with graphs, the next step in Unit 1: Exploring One-Variable Data is calculating formal summary statistics for the center and spread of quantitative distributions, including mean, median, standard deviation, and interquartile range. This chapter is a critical prerequisite: you need to be able to visualize a distribution first to identify skewness and outliers, which determine which summary statistics are appropriate to report. For example, a strongly skewed distribution with outliers should use median and IQR for summary, while a symmetric distribution with no outliers uses mean and standard deviation. This topic also feeds into every later section of the AP course: comparing distributions, exploring two-variable data, and checking normality assumptions for inference all rely on correctly reading and describing graphs of quantitative variables.