Representing a Categorical Variable with Tables — AP Statistics Study Guide

For: AP Statistics candidates sitting AP Statistics.

Covers: Frequency tables, relative frequency tables, and cumulative frequency tables for categorical variables, including calculation of frequencies, relative frequencies, proportions, and percents, plus mode identification and interpretation of table values for one-variable data.

You should already know: The difference between categorical and quantitative variables, basic percentage arithmetic, the definition of a variable in a statistical context.

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.

1. What Is Representing a Categorical Variable with Tables?

A categorical variable places each individual observation into a distinct group or category (rather than a numerical measurement), and the distribution of a variable describes what values it takes and how often those values occur. The most common starting point for summarizing the distribution of a one-variable categorical dataset is a table, which is the focus of this topic. Per the AP Statistics Course and Exam Description (CED), this topic is part of Unit 1: Exploring One-Variable Data, which accounts for 15-23% of the total AP exam score. Content from this topic appears in both multiple-choice questions (MCQ) and as the routine opening step of free-response questions (FRQ) focused on data analysis or inference. Standard notation used across this topic is: $n$ = total number of observations in the dataset, $f_{i}$ = the raw count (frequency) of observations falling into the $i$ th category. This topic is the foundation for all future work with categorical data, including visual representations, two-way tables, and inference for proportions, so mastering table construction and interpretation is non-negotiable for exam success.

2. Frequency Tables

A frequency table is the most basic table representation for a categorical variable. It lists every distinct category of the variable in one column, and the raw frequency (count) of observations that fall into that category in a second column. The core rule for all valid frequency tables is that the sum of all frequencies equals the total sample size $n$ , which can be written formally as: $i = 1 \sum k f_{i} = n$ where $k$ is the number of distinct categories. Intuition for this rule is simple: every observation in your dataset belongs to exactly one category, so adding all individual category counts must give you the total number of observations you started with. Frequency tables are most useful when you need to know the actual number of observations per category, for example planning how many textbooks to order for each class section based on student enrollment. Unlike graphs, they give exact numerical values for counts, which is why they are often the first step in any categorical data analysis on the AP exam.

Worked Example

A high school teacher recorded the final letter grade for 30 students in their AP Statistics class. The raw grades are: A, A, B, C, A, B, B, D, A, C, B, A, A, B, C, A, F, B, A, C, B, B, A, C, A, D, B, A, B, C. Construct a frequency table for this data, and find how many students earned a B or higher grade.

First, list all distinct categories in logical order (from highest to lowest grade): A, B, C, D, F.
Count the frequency of each category: A = 12, B = 10, C = 5, D = 2, F = 1.
Confirm the total: $12 + 10 + 5 + 2 + 1 = 30$ , which matches the total number of students, so counts are correct.
Build the frequency table:

Grade Frequency

A 12

B 10

C 5

D 2

F 1

Total 30
Add frequencies for B or higher: $12 + 10 = 22$ . 22 students earned a B or higher.

Grade	Frequency
A	12
B	10
C	5
D	2
F	1
Total	30

Exam tip: Always add a total row to your frequency table when constructing it for an FRQ. This lets you immediately check if your counts add up to $n$ , catching counting errors before you move on to further calculations.

3. Relative Frequency Tables

A relative frequency table replaces raw frequency counts with the proportion (or percentage) of observations that fall into each category. Relative frequency standardizes counts to the total sample size, making it easy to compare distributions across different-sized samples, which is why it is the most commonly used table on the AP exam. The formula for relative frequency of the $i$ th category is: $p_{i} = \frac{f _{i}}{n}$ To get a percentage, multiply by 100: $Percent = p_{i} \times 100%$ . The sum of all relative frequencies should equal 1 (or 100% for percentages), within a small margin of error from rounding individual values. Intuition: If you have 10 people with red hair in a sample of 20, that is 50% of the sample; if you have 10 people with red hair in a sample of 100, that is only 10%. Raw counts do not tell you how common a category is, but relative frequency does.

Worked Example

Using the same grade distribution from the frequency table above ( $n = 30$ , $f (A) = 12$ , $f (B) = 10$ , $f (C) = 5$ , $f (D) = 2$ , $f (F) = 1$ ), construct a relative frequency table with proportions rounded to two decimal places, then find what percentage of the class earned a grade below a C.

Recall $n = 30$ , so calculate each $p_{i} = \frac{f _{i}}{30}$ :
- A: $\frac{12}{30} = 0.40$
- B: $\frac{10}{30} \approx 0.33$
- C: $\frac{5}{30} \approx 0.17$
- D: $\frac{2}{30} \approx 0.07$
- F: $\frac{1}{30} \approx 0.03$
Check the total: $0.40 + 0.33 + 0.17 + 0.07 + 0.03 = 1.00$ , so calculations are correct.
Build the relative frequency table:

Grade Relative Frequency (Proportion)

A 0.40

B 0.33

C 0.17

D 0.07

F 0.03

Total 1.00
Grades below C are D and F, so add their relative frequencies: $0.07 + 0.03 = 0.10$ , or 10%. 10% of the class earned a grade below a C.

Grade	Relative Frequency (Proportion)
A	0.40
B	0.33
C	0.17
D	0.07
F	0.03
Total	1.00

Exam tip: When asked for relative frequency, always check if the question asks for proportion or percent. A proportion of 0.1 is not the same answer as 10% on the AP exam, and mixing these up will cost you a point.

4. Cumulative Frequency Tables and Mode Identification

For ordered categorical (ordinal) variables (e.g., grades, satisfaction levels, income brackets), we can construct cumulative frequency tables to simplify answering questions about the number of observations at or below (or at or above) a given category. Cumulative frequency for the $i$ th category is the sum of frequencies for all categories up to and including the $i$ th category. Cumulative relative frequency is the cumulative sum of relative frequencies, giving the proportion of observations in that range. Any frequency or relative frequency table can also be used to find the mode, the only valid measure of center for a categorical distribution: the mode is the category with the highest frequency (or relative frequency). If two categories tie for the highest frequency, the distribution is bimodal.

Worked Example

Using the AP Statistics grade distribution (ordered from highest to lowest grade: A, B, C, D, F; $n = 30$ , $f (A) = 12$ , $f (B) = 10$ , $f (C) = 5$ , $f (D) = 2$ , $f (F) = 1$ ), (a) construct a cumulative frequency table, (b) identify the mode, (c) use the table to find how many students earned a B or higher.

(a) Calculate cumulative frequency starting from the highest grade:
- A: Cumulative frequency = $12$ (only A counts)
- B: Cumulative frequency = $12 + 10 = 22$ (A and B)
- C: Cumulative frequency = $22 + 5 = 27$ (A, B, C)
- D: Cumulative frequency = $27 + 2 = 29$
- F: Cumulative frequency = $29 + 1 = 30$ The final table:
  
  Grade Frequency Cumulative Frequency
  
  A 12 12
  
  B 10 22
  
  C 5 27
  
  D 2 29
  
  F 1 30
(b) The highest frequency is 12 for grade A, so the mode of the distribution is A.
(c) The cumulative frequency for B is 22, which is exactly the number of students who earned a B or higher, matching our earlier calculation.

Grade	Frequency	Cumulative Frequency
A	12	12
B	10	22
C	5	27
D	2	29
F	1	30

Exam tip: Cumulative frequency only makes sense for ordered categorical variables. If your variable is unordered (nominal, like favorite coffee drink), you cannot create a meaningful cumulative frequency table, so the AP exam will never ask you to do this.

5. Common Pitfalls (and how to avoid them)

Wrong move: Calculating relative frequency by dividing $n$ by the category frequency, e.g. writing 30/12 = 2.5 as the relative frequency for an A. Why: Students often mix up numerator and denominator when rushing, especially for percentage questions. Correct move: Always write the formula $p_{i} = f_{i} / n$ at the top of your work before starting calculations to anchor yourself.
Wrong move: Forgetting cumulative frequency depends on category order, e.g. for grades ordered A (highest) to F (lowest), reporting cumulative frequency for B as the number of students who earned B or lower instead of B or higher. Why: Students assume cumulative frequency always goes from lowest to highest, regardless of context. Correct move: Always note the order of categories in the table, and confirm what the cumulative value measures before answering.
Wrong move: Calling a distribution bimodal because two categories have similar frequencies close to the highest. Why: Students confuse "similar frequency" with "equal highest frequency". Correct move: Only call a distribution bimodal if two distinct categories have the same highest frequency (or both are clearly, meaningfully higher than all other categories per the question context).
Wrong move: Concluding a relative frequency table is wrong because the sum of rounded proportions is 0.98 or 1.02 instead of exactly 1. Why: Students do not account for rounding error when reporting values to two decimal places. Correct move: If the sum is within 0.02 of 1 (or 2% for percentages), this is acceptable rounding error and does not mean your calculations are wrong.
Wrong move: Calculating the mean of a categorical variable from a frequency table. Why: Students are used to quantitative data and automatically try to calculate a numerical center. Correct move: For categorical data, the only measure of center you can report from a table is the mode.

6. Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

A coffee shop records the size of drink ordered by 120 customers on a Saturday. The relative frequency of a small drink is 0.25, the relative frequency of a medium is 0.4, and the relative frequency of a large is 0.35. How many more medium drinks were ordered than small drinks? A) 15 B) 18 C) 20 D) 25

Worked Solution: To find the number of observations in a category from relative frequency, we use the rearranged formula $f_{i} = p_{i} \times n$ , where $n = 120$ total customers. The number of small drinks is $0.25 \times 120 = 30$ , and the number of medium drinks is $0.4 \times 120 = 48$ . The difference is $48 - 30 = 18$ . This question tests the common AP skill of converting between relative frequency and raw counts. The correct answer is B.

Question 2 (Free Response)

A survey of 50 first-year college students asked: "What is your preferred primary social media platform?" The results are: TikTok: 22, Instagram: 18, X (Twitter): 5, Facebook: 3, Other: 2. (a) Construct a frequency and relative frequency table for this data. (b) What proportion of students prefer a platform other than TikTok or Instagram? (c) Identify the mode of this distribution and explain what it means in context.

Worked Solution: (a) We calculate relative frequency as $p_{i} = f_{i} /50$ for each category:

Platform	Frequency	Relative Frequency
TikTok	22	0.44
Instagram	18	0.36
X	5	0.10
Facebook	3	0.06
Other	2	0.04
Total	50	1.00
Total frequency confirms $22 + 18 + 5 + 3 + 2 = 50 = n$ , and total relative frequency is 1.00, so calculations are correct.

(b) The combined proportion for TikTok and Instagram is $0.44 + 0.36 = 0.80$ . The proportion of students who prefer other platforms is $1 - 0.80 = 0.20$ .

(c) The mode is TikTok, since it has the highest frequency (22) and highest relative frequency (0.44). In context, this means TikTok is the most commonly preferred primary social media platform among the surveyed first-year college students.

Question 3 (Application / Real-World Style)

A biologist studying fur color in a population of 200 wild gray squirrels collects the following counts: Black fur: 38, Gray fur: 142, Brown fur: 20. (a) Convert this data to a relative frequency table. (b) Based on this empirical distribution, what is the probability a randomly caught squirrel from this population has gray fur? Interpret your result in context.

Worked Solution: (a) Total sample size $n = 200$ . We calculate relative frequency for each category:

Fur Color	Relative Frequency
Black	$38/200 = 0.19$
Gray	$142/200 = 0.71$
Brown	$20/200 = 0.10$
Total	1.00

(b) The empirical probability of a randomly selected squirrel having gray fur equals the relative frequency of gray fur, which is 0.71. In context, this means 71% of the squirrels in the studied population have gray fur, so there is a 71% chance that a randomly caught individual from this population will have gray fur.

7. Quick Reference Cheatsheet

Category	Formula	Notes
Total Frequency Check	$\sum f_{i} = n$	Always confirm counts add up to total sample size to catch counting errors
Relative Frequency	$p_{i} = \frac{f _{i}}{n}$	Proportion between 0 and 1; multiply by 100 to get percent
Frequency from Relative Frequency	$f_{i} = p_{i} \times n$	Used to convert proportions back to raw counts
Cumulative Frequency (Ordered Categories)	$C F_{i} = \sum_{j = 1}^{i} f_{j}$	Only valid for ordinal (ordered) categorical variables; depends on category order
Cumulative Relative Frequency	$C R F_{i} = \sum_{j = 1}^{i} p_{j}$	Sum will equal 1 (within rounding error) for the last category
Mode	Category with maximum $f_{i}$ (or $p_{i}$ )	Only valid measure of center for categorical data; bimodal if two categories tie for maximum
Total Relative Frequency Check	$\sum p_{i} = 1$	Acceptable to have 0.98 - 1.02 due to rounding of individual proportions

8. What's Next

This topic is the foundational first step for all work with categorical data in AP Statistics. Immediately after this, you will learn to represent categorical variables with graphs (bar charts and pie charts), which rely on the frequency and relative frequency values you calculate in tables to be constructed correctly. Without mastering how to count, calculate, and interpret frequencies from tables, you will not be able to correctly interpret graphical displays or answer later questions about comparing categorical distributions. This topic also feeds directly into two-way tables for two categorical variables (Unit 2), which in turn are the basis for inference for proportions and chi-square tests later in the course.

Representing a Categorical Variable with Tables — AP Statistics Study Guide

1. What Is Representing a Categorical Variable with Tables?

2. Frequency Tables

Worked Example

3. Relative Frequency Tables

Worked Example

4. Cumulative Frequency Tables and Mode Identification

Worked Example

5. Common Pitfalls (and how to avoid them)

6. Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

Question 2 (Free Response)

Question 3 (Application / Real-World Style)

7. Quick Reference Cheatsheet

8. What's Next

More study guides