Chi-Square Test for Goodness of Fit — AP Statistics Study Guide
For: AP Statistics candidates sitting AP Statistics.
Covers: Null and alternative hypothesis setup, conditions for inference, chi-square test statistic calculation, degrees of freedom calculation, p-value interpretation, and contextual conclusion for chi-square goodness of fit tests.
You should already know: Basic hypothesis testing terminology (null, alternative, p-value, significance level). Calculating expected values for categorical probability distributions. Properties of the chi-square distribution shape.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. What Is Chi-Square Test for Goodness of Fit?
The Chi-Square Test for Goodness of Fit (often abbreviated GOF) is a hypothesis test used to determine whether the observed frequency distribution of a single categorical variable matches the distribution claimed by the null hypothesis. This test is the first of three chi-square inference procedures covered in AP Statistics Unit 8, which accounts for 5-10% of the overall AP exam score, with goodness of fit making up roughly 2-3% of total exam points. It appears in both multiple-choice (MCQ) and free-response (FRQ) sections, most often as a 3-4 point standalone FRQ question or a concept check in MCQ.
Unlike other chi-square tests that analyze relationships between two categorical variables, goodness of fit focuses on one categorical variable with two or more categories. For example, you might use it to test if the distribution of candy color counts matches a company’s advertised proportions, or if genetic ratio predictions are supported by experimental data. Because any deviation (positive or negative from the expected count) increases the test statistic, all goodness of fit tests are inherently right-tailed. Synonyms for this test include chi-square goodness-of-fit test and GOF test.
2. Hypothesis Setup and Conditions for Inference
The first step of any chi-square GOF test is defining parameters, stating hypotheses, and verifying conditions for inference.
For a categorical variable with categories, let be the true population proportion of observations in category , and be the hypothesized proportion for category given by the null hypothesis. The null hypothesis is always that the true distribution matches the hypothesized distribution exactly: H_0: p_1 = p_{10}, p_2 = p_{20}, ..., p_k = p_{k0}}. The alternative hypothesis is always that at least one of the true proportions differs from the hypothesized value: . It is incorrect to state as all proportions differing from the null, because the null only requires all proportions to be correct, so even one incorrect proportion is enough to reject .
Next, three conditions must be verified for inference:
- Random: The data comes from a random sample or randomized experiment.
- Independence: Individual observations are independent; when sampling without replacement, the sample size is less than 10% of the total population (the 10% condition).
- Expected Count: All expected counts are at least 5. The expected count for category is calculated as , where is the total sample size.
Worked Example
A snack company claims that 20% of its snack packs contain extra chips, 30% contain extra cookies, and 50% contain extra candy. A consumer group takes a random sample of 80 snack packs to test the company’s claim. State the appropriate hypotheses and check all conditions for inference.
- Define parameters: Let true proportion of packs with extra chips, true proportion with extra cookies, true proportion with extra candy.
- State hypotheses: . At least one proportion differs from the claimed values.
- Calculate expected counts: , so , , .
- Check conditions: Random (stated in problem, so condition met). 10% condition: The population of all snack packs is far larger than , so independence holds. Expected counts: 16, 24, 40, all ≥ 5, so this condition is met.
Exam tip: On the AP exam, you must explicitly name and check all three conditions, and reference numerical values for expected counts (do not just say “all conditions are met”) to earn full credit on FRQ.
3. Test Statistic, Degrees of Freedom, and p-Value
Once hypotheses are set and conditions are verified, the next step is calculating the test statistic and finding the p-value. The chi-square test statistic formula measures the total squared deviation between observed and expected counts, adjusted for the size of the expected count:
Where is the observed count for category , is the expected count, and is the number of categories. Squaring the deviation ensures all deviations are positive (we care about any deviation from expected, regardless of direction), and dividing by adjusts for the fact that larger expected counts naturally have more sampling variation. A larger value means more deviation from the null distribution, so more evidence against .
Degrees of freedom () for a chi-square GOF test is always , where is the number of categories. We lose 1 degree of freedom because the total sample size is fixed: once you know the expected counts for categories, the -th expected count is determined by the total . Because only large values give evidence against , the p-value is the probability of getting a at least as large as your calculated value, from a chi-square distribution with . This means all GOF p-values are for a right-tailed test.
Worked Example
Continuing the snack pack example from the previous section, the consumer group gets the following observed counts: 16 packs with extra chips, 14 packs with extra cookies, 50 packs with extra candy. Calculate the chi-square test statistic, degrees of freedom, and p-value.
- List and for each category: Chips: ; Cookies: ; Candy: .
- Calculate each term of the sum: Chips: ; Cookies: ; Candy: .
- Sum to get .
- Degrees of freedom: categories, so .
- p-value: with is approximately 0.036.
Exam tip: On FRQ, you must show at least one calculation of to earn the test statistic calculation point; writing only the final value will not earn full credit.
4. Conclusion in Context
The final step of any hypothesis test, including chi-square GOF, is drawing a conclusion that is linked to your p-value, significance level, and stated in the context of the problem. The same logic used for all other hypothesis tests applies here:
- If the p-value is less than your significance level (almost always unless stated otherwise), you reject the null hypothesis. There is convincing evidence that the true distribution differs from the hypothesized distribution.
- If the p-value is greater than or equal to , you fail to reject the null hypothesis. There is not convincing evidence that the true distribution differs from the hypothesized distribution.
Critical rules for AP credit: 1) Never say you “accept ” — we cannot prove the null is true, only fail to find evidence against it. 2) Always state your conclusion in the context of the problem, not just in terms of . 3) If no significance level is given, explicitly state you are using the standard .
Worked Example
Continuing the snack pack example, we got a p-value of 0.036. Use to write the appropriate conclusion.
- Compare p-value to : .
- Conclusion for the null hypothesis: Reject the null hypothesis.
- Contextual conclusion: There is convincing evidence at the level that the distribution of extra snack types differs from the company’s claimed distribution.
If our p-value had been 0.06 instead, the conclusion would be: , so we fail to reject the null hypothesis. There is not convincing evidence at the level that the distribution of extra snack types differs from the company’s claimed distribution.
Exam tip: If the problem gives a specific significance level, you must reference it explicitly in your conclusion to earn full credit; omitting this step will cost you a point.
Common Pitfalls (and how to avoid them)
- Wrong move: Stating the alternative hypothesis as “all proportions differ from the claimed values”. Why: Students copy the null structure and assume Ha reverses all claims, but the null only requires all proportions to be correct, so Ha only needs at least one incorrect. Correct move: Always write Ha as “at least one proportion differs from the hypothesized value”.
- Wrong move: Using proportions instead of counts for O and E in the chi-square formula. Why: Students confuse the proportion of observations with the count, so they plug 0.28 and 0.20 into the formula instead of 28 and 20. Correct move: Always confirm O and E are counts (whole numbers summing to n) before plugging into the chi-square formula.
- Wrong move: Calculating degrees of freedom as instead of , where n is sample size and k is number of categories. Why: Students confuse degrees of freedom for t-tests with chi-square GOF df. Correct move: Always count the number of categories k, then subtract 1 to get df for GOF.
- Wrong move: Calculating a two-tailed or left-tailed p-value. Why: Students think deviations can go either way so the test should be two-tailed. Correct move: Remember that any deviation (positive or negative from expected) increases the χ² test statistic, so all GOF tests use a right-tailed p-value.
- Wrong move: Claiming “we accept H0” when p-value > α. Why: Hypothesis testing can never prove the null is true, only fail to find evidence against it. Correct move: Always use the phrase “fail to reject the null hypothesis” when p-value is greater than α.
- Wrong move: Saying observed counts need to be ≥ 5, or forgetting to check the 10% condition for independence. Why: Students mix up conditions and misremember which requirement applies to which. Correct move: Explicitly check three conditions every time: random sample/assignment, 10% for independence, all expected counts ≥ 5.
Practice Questions (AP Statistics Style)
Question 1 (Multiple Choice)
A casino claims that its roulette wheel is fair, meaning the probability of landing on red, black, or green is equal to the manufacturer’s specifications. A regulator wants to test this claim with a chi-square goodness of fit test. If roulette has 3 outcome categories (red, black, green), what is the correct degrees of freedom for this test? A) 1 B) 2 C) 3 D) Cannot be determined without knowing the sample size
Worked Solution: For a chi-square goodness of fit test, degrees of freedom equals the number of categories minus 1. This problem has 3 categories, so df = 3 - 1 = 2. Option A is incorrect, it would be correct for 2 categories. Option C is incorrect, it does not subtract 1 for the fixed total sample size. Option D is incorrect, df for GOF only depends on the number of categories, not sample size. The correct answer is B.
Question 2 (Free Response)
A university registrar claims that the distribution of class standings for undergraduate students is: 35% first-year, 28% second-year, 22% third-year, 15% fourth-year. A student government advisor takes a random sample of 200 undergraduate students to test if the distribution of enrolled students differs from the registrar’s claim. (a) State the null and alternative hypotheses for this test, and calculate all expected counts. (b) The advisor calculates a chi-square test statistic of 9.12. Find the degrees of freedom and p-value for this test. (c) Using α = 0.05, state the conclusion of the test in context.
Worked Solution: (a) Let = true proportion of first-years, = true proportion of second-years, = true proportion of third-years, = true proportion of fourth-years. At least one proportion differs from the claimed values. Expected counts: : , , , . (b) Number of categories , so . The p-value is with , which is approximately 0.028. (c) Since , we reject the null hypothesis. There is convincing evidence at the α = 0.05 level that the distribution of class standings for enrolled undergraduate students differs from the registrar’s claimed distribution.
Question 3 (Application / Real-World Style)
A plant geneticist tests a theory that crossing two heterozygous tall pea plants will produce offspring in a 3 tall : 1 short height ratio. The geneticist grows 160 offspring from the cross, and observes 126 tall plants and 34 short plants. Conduct a full chi-square test for goodness of fit at α = 0.05 to evaluate if the observed data matches the genetic theory.
Worked Solution:
- Hypotheses: Let = true proportion of tall offspring, = true proportion of short offspring. . At least one proportion differs from the theory’s prediction.
- Conditions: Random sampling of offspring is assumed, population of potential offspring is far more than 10160 = 1600, expected counts are $E_t = 1600.75 = 120E_s = 160*0.25 = 40$, both ≥ 5, so all conditions are met.
- Test statistic: .
- df = 2 - 1 = 1, p-value = .
- Conclusion: Since 0.273 > 0.05, we fail to reject the null hypothesis. In context: There is no convincing evidence at the α = 0.05 level that the observed distribution of plant heights deviates from the 3:1 ratio predicted by genetic theory.
Quick Reference Cheatsheet
| Category | Formula / Rule | Notes |
|---|---|---|
| GOF Hypotheses | All ; At least one | Sum of all hypothesized = 1; never state as all proportions differ |
| Expected Count | = total sample size; always use counts, not proportions | |
| Chi-Square Test Statistic | Larger = more evidence against ; all terms are non-negative | |
| GOF Degrees of Freedom | = number of categories; do NOT use (that is for t-tests) | |
| Inference Conditions | 1. Random sample/experiment; 2. of population; 3. All | All three must be checked explicitly on AP FRQ |
| p-value for GOF | All GOF tests are right-tailed; never use two-tailed p-values | |
| Conclusion Rule | If : Reject , convincing evidence of deviation. If : Fail to reject , no convincing evidence of deviation. | Never write "accept "; always state conclusion in problem context |
What's Next
The chi-square test for goodness of fit is the foundation for all other chi-square inference procedures in AP Statistics Unit 8. It introduces the core logic of chi-square testing (comparing observed counts to expected counts, calculating the same test statistic, verifying the same conditions) that carries over to the other chi-square tests. Without mastering GOF, you will struggle to distinguish between the three chi-square procedures and correctly calculate degrees of freedom for the other tests. Goodness of fit also reinforces core hypothesis testing concepts you learned earlier for one and two proportion tests, extending that logic to a single categorical variable with multiple categories. This connects to the big idea of inference that unites the entire AP Statistics course: we use sample data to draw conclusions about population parameters.
Next topics to study: Chi-Square Test for Independence Chi-Square Test for Homogeneity Inference for a Single Proportion