Chi-Squared Tests — AP Statistics Stats Study Guide
For: AP Statistics candidates sitting AP Statistics.
Covers: Goodness-of-fit tests, tests of independence and homogeneity, validity conditions for chi-squared inference, and interpretation of test statistics and p-values for AP exam questions.
You should already know: Algebra 2, basic probability intuition.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official College Board mark schemes for grading conventions.
1. What Is Chi-Squared Tests?
Chi-squared tests are non-parametric hypothesis tests used to analyze categorical (count) data, rather than numerical measurements, to compare observed counts to expected counts under a null hypothesis. The test follows the chi-squared (, pronounced "kai-squared") probability distribution, a right-skewed distribution defined by its degrees of freedom. Common synonyms include chi-square tests and tests. This topic makes up all of Unit 8 in the AP Statistics CED, worth 2-5% of your total exam score, and appears frequently in multiple choice and the first two free response questions.
2. Goodness-of-fit test
The chi-squared goodness-of-fit test is used to determine if the distribution of a single categorical variable matches a pre-specified hypothesized distribution. For example, you might test if a 6-sided die is fair, or if the racial demographics of a local school match state-wide demographic proportions.
Key definitions and formula
- Observed counts (): The actual number of observations recorded in each category of your variable.
- Expected counts (): The number of observations you would expect in each category if the null hypothesis is true, calculated as , where is total sample size and is the hypothesized proportion for category from the null hypothesis.
- Hypotheses:
- : The distribution of [variable name] matches the hypothesized distribution.
- : The distribution of [variable name] does not match the hypothesized distribution.
- Test statistic:
- Degrees of freedom (df): , where is the number of categories for your variable.
Worked example
You roll a 6-sided die 120 times, and record observed counts for each face: 1: 25, 2: 17, 3: 19, 4: 23, 5: 16, 6: 20. Test if the die is fair at the significance level.
- : Each face has a probability of of landing up; : At least one face has a probability not equal to .
- Expected count for each face: , so all .
- Calculate test statistic:
- . The p-value for with is ~0.699, which is greater than 0.05, so we fail to reject : there is no evidence the die is unfair.
Exam tip: Always label expected counts clearly in free response answers; examiners regularly dock marks for unlabeled expected value calculations.
3. Test of independence and homogeneity
These two tests use identical mathematical calculations, but differ in study design and hypothesis framing, so it is critical to distinguish them when writing conclusions.
Test of independence
Use this test when you have one random sample measured on two categorical variables, to test if the two variables are associated (not independent) in the population. For example: Is there a relationship between gender and preferred ice cream flavor among high school students?
- Hypotheses:
- : [Variable 1] and [Variable 2] are independent (no association) in the population.
- : [Variable 1] and [Variable 2] are not independent (there is an association) in the population.
Test of homogeneity
Use this test when you have two or more independent random samples from different populations, to test if the distribution of a single categorical variable is identical across all populations. For example: Is the distribution of favorite ice cream flavor the same for 9th graders and 12th graders?
- Hypotheses:
- : The distribution of [variable name] is the same across all populations.
- : The distribution of [variable name] is not the same across all populations.
Shared calculations
Both tests use two-way contingency tables to organize observed counts. The expected count for each cell in the table is: Degrees of freedom for both tests are: where is the number of rows and is the number of columns in the contingency table. The test statistic formula is identical to the goodness-of-fit test.
Worked example
200 students are surveyed about their gender (male/female) and ice cream preference (chocolate/vanilla/strawberry). Observed counts are given in the table below:
| Chocolate | Vanilla | Strawberry | Row total | |
|---|---|---|---|---|
| Male | 50 | 30 | 20 | 100 |
| Female | 30 | 40 | 30 | 100 |
| Col total | 80 | 70 | 50 | 200 |
| Test for an association between gender and ice cream preference at . |
- : Gender and ice cream preference are independent; : Gender and ice cream preference are associated.
- Expected counts: Male + Chocolate = , Male + Vanilla = 35, Male + Strawberry = 25, Female + Chocolate = 40, Female + Vanilla = 35, Female + Strawberry = 25.
- Calculate test statistic:
- . The p-value is ~0.0148 < 0.05, so we reject : there is sufficient evidence of an association between gender and ice cream preference.
4. Conditions for chi-squared
All three chi-squared tests require the same three conditions to be met for inference to be valid. You must explicitly check all three conditions on every free response chi-squared question to earn full marks:
- Random: The data comes from a random sample from the population of interest, or from a randomized experiment. For tests of homogeneity, each sample must be randomly selected from its respective population.
- Independent: Individual observations are independent of each other. For sampling without replacement, the population must be at least 10 times the sample size (10% condition) to ensure sampling bias is negligible. Paired or matched data is not eligible for chi-squared tests.
- Large Counts: All expected counts () are at least 5. The chi-squared distribution is only a valid approximation of the sampling distribution if expected counts are sufficiently large. If any expected count is less than 5, combine adjacent logically related categories to raise all expected counts to ≥5, or use a Fisher's exact test (combining categories is the expected response for AP Stats).
Exam tip: A common trap is asking you to check observed counts instead of expected counts for the large counts condition. Always verify expected counts, not observed counts, to meet this requirement.
5. Interpreting test statistic and p-value
Test statistic interpretation
The test statistic is always non-negative, as it is a sum of squared differences divided by positive expected counts. Larger values indicate larger gaps between observed and expected counts, meaning stronger evidence against the null hypothesis. For example, a value of 2 with indicates very small differences between observed and expected counts, while a value of 20 with indicates very large differences and strong evidence against .
P-value interpretation
The p-value is the probability of observing a test statistic as large or larger than your calculated value, assuming the null hypothesis is true. It is not the probability the null hypothesis is true, a common misinterpretation that costs marks on the exam.
- Correct interpretation for the die example: "The p-value of 0.699 means that if the die is fair, there is a 69.9% chance of observing a test statistic of 3 or larger purely by random chance. Since this is greater than our significance level of 0.05, we do not have sufficient evidence to conclude the die is unfair."
- Correct interpretation for the ice cream example: "The p-value of 0.0148 means that if gender and ice cream preference are independent, there is only a 1.48% chance of observing an association as strong or stronger than the one in our sample purely by chance. Since this is less than our significance level of 0.05, we have sufficient evidence to conclude there is an association between gender and ice cream preference."
Exam tip: Always tie your interpretation back to the specific context of the problem. Generic interpretations without reference to the variables being studied will not earn full marks on free response.
6. Common Pitfalls (and how to avoid them)
- Wrong move: Using proportions instead of counts to calculate expected values, e.g., using instead of for a sample of 100 with hypothesized proportion 0.3. Why it happens: Students confuse sample size and proportion when setting up calculations. Correct move: Always confirm expected counts are whole numbers (or decimals ≥5) and calculated as for goodness-of-fit, or for two-way tests.
- Wrong move: Using the wrong degrees of freedom, e.g., using for a 2x3 contingency table instead of . Why it happens: Students mix up goodness-of-fit df () and two-way test df (). Correct move: Explicitly note which test you are running before calculating df, and label your df calculation to show your work.
- Wrong move: Stating a causal conclusion from a test of independence, e.g., claiming gender causes ice cream preference because the variables are associated. Why it happens: Students confuse association and causation, a recurring AP Stats exam theme. Correct move: Always note that chi-squared tests only measure association, not causation, unless the data comes from a randomized controlled experiment (rare for chi-squared tests).
- Wrong move: Writing a one-sided alternative hypothesis, e.g., for a goodness-of-fit test. Why it happens: Students carry over habits from z-tests or t-tests for means/proportions. Correct move: All chi-squared tests are inherently two-sided, as the test statistic measures any deviation from the null regardless of direction; the alternative is always "does not match", "not independent", or "not homogeneous".
- Wrong move: Skipping condition checks on free response questions. Why it happens: Students assume conditions are met and skip writing them out. Correct move: Explicitly list all three conditions and verify each one with reference to the problem context; missing any condition will cost you at least 1 point per question.
7. Practice Questions (AP Statistics Style)
Question 1 (Goodness-of-fit)
A coffee shop claims 40% of customers order lattes, 30% order drip coffee, 20% order cold brew, and 10% order tea. In a random sample of 200 customers, 75 ordered lattes, 65 ordered drip, 35 ordered cold brew, 25 ordered tea. Perform a chi-squared goodness-of-fit test at to test if the shop's claim is accurate.
Solution
- Hypotheses: : The distribution of drink orders matches the shop's claimed distribution; : The distribution does not match the claimed distribution.
- Check conditions: Random sample is given; population of coffee shop customers is >2000, so 10% condition is met; expected counts: latte = 80, drip = 60, cold brew = 40, tea = 20, all ≥5, so large counts condition met.
- Calculate test statistic:
- , p-value = ~0.457. Since 0.457 > 0.05, we fail to reject : there is insufficient evidence to dispute the coffee shop's claim.
Question 2 (Test of Independence)
A researcher surveys 150 high school students, asking if they play a school sport and if they have a part-time job. Observed counts: 40 play sport and have a job, 35 play sport and no job, 30 don't play sport and have a job, 45 don't play sport and no job. Test if playing a sport and having a part-time job are independent at .
Solution
- Hypotheses: : Playing a sport and having a part-time job are independent; : The two variables are associated.
- Check conditions: Random sample is given; population of high school students is >1500, so 10% condition met; expected counts: 35, 40, 35, 40, all ≥5, so large counts condition met.
- Calculate test statistic:
- , p-value = ~0.102. Since 0.102 > 0.01, we fail to reject : there is no evidence of an association between playing a sport and having a part-time job.
Question 3 (Test of Homogeneity)
A school district tests if the distribution of AP exam scores (1-5) is the same for in-person and online learners. Random samples of 100 in-person and 100 online learners are selected, with observed counts: In-person: 5:20, 4:30, 3:30, 2:15, 1:5; Online: 5:15, 4:25, 3:25, 2:20, 1:15. Test at .
Solution
- Hypotheses: : AP score distribution is identical for in-person and online learners; : Distributions are not identical.
- Check conditions: Independent random samples are given; population of test takers is >2000, so 10% condition met; all expected counts ≥10, so large counts condition met.
- Calculate test statistic: , , p-value = ~0.169. Since 0.169 > 0.05, we fail to reject : there is no evidence of a difference in AP score distributions between in-person and online learners.
8. Quick Reference Cheatsheet
| Test Type | Use Case | Core Hypotheses | Expected Count Formula | Degrees of Freedom |
|---|---|---|---|---|
| Goodness-of-fit | 1 sample, 1 categorical variable, test against hypothesized distribution | : Distribution matches claim : Distribution does not match claim |
(k = number of categories) | |
| Independence | 1 sample, 2 categorical variables, test for association | : Variables are independent : Variables are associated |
(r = rows, c = columns) | |
| Homogeneity | ≥2 independent samples, 1 categorical variable, test for equal distributions | : Distributions are identical across groups : Distributions are not identical |
Same as independence | Same as independence |
Universal Rules
- Test statistic for all tests:
- Conditions for all tests: Random sample/assignment, independent observations (10% condition for sampling without replacement), all expected counts ≥5
- Larger = stronger evidence against
- P-value = probability of observed or larger if is true
9. What's Next
Chi-squared tests are the final major inference procedure covered in the AP Statistics syllabus, building on your prior knowledge of hypothesis testing for proportions and two-way tables to analyze categorical data. You will see this topic combined with exploratory data analysis (interpreting contingency tables) and probability concepts on both multiple choice and free response sections, and it is often paired with questions about inference scope (causation vs association, generalizability of results). Mastery of chi-squared tests also prepares you for more advanced statistics courses in college, where you will extend these methods to analyze more complex categorical data structures.
If you have any questions about calculating expected counts, checking conditions, or interpreting chi-squared results for specific AP exam problems, feel free to ask Ollie for step-by-step help on Ollie. You can also access more AP Statistics practice tests and topic guides on the homepage to reinforce your skills before exam day.