AP · Chi-Square Test for Independence · 14 min read · Updated 2026-05-10

Chi-Square Test for Independence — AP Statistics Study Guide

For: AP Statistics candidates sitting AP Statistics.

Covers: Hypotheses formulation, contingency table structure, expected count calculation, test statistic calculation, degrees of freedom, inference conditions, p-value interpretation, and contextual conclusion writing for the chi-square test for independence.

You should already know: Basics of hypothesis testing for population inference, how to construct contingency tables for two categorical variables, how to calculate p-values from chi-square distributions.

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.

1. What Is Chi-Square Test for Independence?

The chi-square test for independence is a hypothesis testing procedure used to test whether two categorical variables measured on the same sample of individuals are associated (dependent) or independent in the larger population. It is one of three core inference topics in Unit 8: Inference for Categorical Data: Chi-Square, which makes up 6-12% of the total AP Statistics exam score. This topic appears on both multiple-choice (MCQ) and free-response (FRQ) sections of the exam, and is often the focus of a full 4-5 point FRQ question.

Synonyms for this test that you may see on the exam include chi-square test of association, since rejecting the null hypothesis of independence means you have found evidence of an association between the two variables. Standard notation used for this test is: $O$ for the observed count of individuals in a contingency table cell, $E$ for the expected count under the null hypothesis, $χ^{2}$ for the chi-square test statistic, and $df$ for degrees of freedom. Unlike the chi-square goodness-of-fit test, which tests a hypothesized distribution for a single categorical variable, this test addresses questions of association between two categorical variables, making it one of the most widely used inference procedures in the social and life sciences.

2. Hypotheses, Conditions, and Contingency Table Setup

Before calculating any test statistics, you must correctly formulate your hypotheses, set up your contingency table of observed counts, and verify all conditions for inference. This step is worth as many points on the AP exam as your final calculation, so it is critical to get it right.

For any chi-square test for independence:

Null Hypothesis ( $H_{0}$ ): The two categorical variables are independent (no association) in the population of interest. Independent means that the probability of falling into a category for one variable does not depend on the category of the other variable.
Alternative Hypothesis ( $H_{a}$ ): The two categorical variables are dependent (there is an association between them) in the population of interest.

Three conditions must be satisfied for inference to be valid:

Random: The data comes from a random sample from the population of interest, or a randomized experiment.
Independence: Individual observations are independent of each other. When sampling without replacement, this requires the 10% condition: the sample size is less than 10% of the total population size.
Large Counts: All expected cell counts are at least 5. (Some sources allow one expected count between 1 and 5 if degrees of freedom are large, but AP exam guidelines require all $E \geq 5$ .)

Worked Example

A student government researcher surveys 180 randomly selected students at a large state university to test whether preference for in-person vs. online classes is independent of whether the student lives on campus or off campus. The smallest expected count for this study is calculated to be 7.1. State the correct hypotheses and check all conditions for inference.

Solution:

Hypotheses: $H_{0}$ : Class format preference and living location are independent (no association) in the population of all students at this university. $H_{a}$ : Class format preference and living location are associated in the population of all students at this university.
Random: The problem explicitly states 180 randomly selected students, so this condition is satisfied.
Independence: The total population of students at a large state university is more than $10 \times 180 = 1800$ , so the 10% condition is satisfied, and observations are independent.
Large Counts: The smallest expected count is 7.1, which is ≥ 5, so this condition is satisfied.

Exam tip: Always state hypotheses in the context of your specific problem and reference the population of interest. AP graders consistently deduct points for generic hypotheses that do not mention the population or the specific variables being tested.

3. Expected Counts and Chi-Square Test Statistic Calculation

Once your hypotheses and conditions are confirmed, the next step is to calculate expected cell counts and the chi-square test statistic. An expected count is the number of observations you would expect to see in a given contingency table cell if the null hypothesis of independence were true.

If the null hypothesis is true, the proportion of observations in a row should be the same across all columns. This gives the expected count formula: $E = \frac{( Row Total ) \times ( Column Total )}{Grand Total}$

The chi-square test statistic measures how far your observed counts are from the expected counts. Any deviation from expected (whether observed is higher or lower than expected) increases the test statistic, because we square the difference. The formula is: $χ^{2} = \sum \frac{( O - E ) ^{2}}{E}$ where the sum is over all cells in the contingency table. Degrees of freedom for the test are calculated as: $df = (r - 1) (c - 1)$ where $r$ is the number of rows in the table, and $c$ is the number of columns. The degrees of freedom follow from the fact that once you fill $(r - 1) (c - 1)$ cells, the remaining cells are fixed by the row and column totals, so that is how many independent pieces of information you have.

Worked Example

A 2×3 contingency table (2 class standing categories: undergraduate, graduate; 3 preference categories: in-person, hybrid, online) has the following row and column totals: Row totals: 120 (undergraduate), 60 (graduate); Column totals: 55 (in-person), 65 (hybrid), 60 (online); Grand total: 180. Calculate the expected count for the (undergraduate, hybrid) cell, then find the degrees of freedom for this test.

Solution:

Identify the row total for undergraduate (120), column total for hybrid (65), and grand total (180).
Calculate expected count: $E = \frac{( 120 ) ( 65 )}{180} = \frac{7800}{180} \approx 43.33$ .
Calculate degrees of freedom: $r = 2$ , $c = 3$ , so $df = (2 - 1) (3 - 1) = (1) (2) = 2$ .

Exam tip: After calculating all expected counts, add them up to confirm they match the original grand total of observed counts. This is a 10-second check that catches arithmetic errors from misadding row or column totals, which are a common source of lost points.

4. P-Value Calculation and Contextual Conclusion

The final step of the test is calculating the p-value and writing a valid conclusion in context. Because any deviation from expected counts increases the $χ^{2}$ test statistic, chi-square tests for independence are always right-tailed. The p-value is the probability of observing a $χ^{2}$ statistic as large or larger than the one you calculated, assuming the null hypothesis is true.

On the AP exam, you can calculate the p-value either by using a chi-square distribution table (which gives you a range for the p-value) or by using your calculator's χ²cdf function (which gives an exact numerical p-value). To make a conclusion, compare the p-value to your significance level $α$ (almost always 0.05, unless stated otherwise):

If $p < α$ : Reject $H_{0}$ , there is sufficient evidence of an association.
If $p \geq α$ : Fail to reject $H_{0}$ , there is not sufficient evidence of an association.

Worked Example

For the class preference and living location study, you calculate a chi-square test statistic of 4.23 with 1 degree of freedom, and use a significance level of $α = 0.05$ . Find the p-value and write a complete conclusion.

Solution:

Confirm the test is right-tailed: All chi-square tests for independence are right-tailed, so we calculate $P (χ^{2} \geq 4.23)$ with $df = 1$ .
Calculate p-value: Using a chi-square table, 4.23 falls between 3.841 ( $p = 0.05$ ) and 5.024 ( $p = 0.025$ ), so $0.025 < p < 0.05$ . Using a calculator, χ²cdf(4.23, 1e99, 1) ≈ 0.04.
Compare to $α$ : $0.04 < 0.05$ , so we reject the null hypothesis.
Conclusion: There is sufficient evidence at the $α = 0.05$ significance level to conclude that class format preference and living location are associated in the population of students at this university.

Exam tip: Never write "we accept the null hypothesis" on the AP exam. You cannot prove that two variables are independent, only that you do not have enough evidence to conclude they are associated. Always use "fail to reject the null hypothesis" to avoid losing points.

Common Pitfalls (and how to avoid them)

Wrong move: Stating hypotheses about the sample, not the population. For example: "H₀: Study location and class year are independent in our sample." Why: Students forget that inference generalizes results from the sample to the larger population, we already know the counts in the sample. Correct move: Always add "in the population of [your context]" when writing hypotheses.
Wrong move: Calculating degrees of freedom as $n - 1$ instead of $(r - 1) (c - 1)$ . Why: Students confuse the degrees of freedom formula for chi-square independence with the formula for t-tests or chi-square goodness-of-fit. Correct move: Always write down the number of rows $r$ and columns $c$ first, then explicitly calculate $(r - 1) (c - 1)$ before finding the p-value.
Wrong move: Forgetting to check the 10% condition for independence, only checking large counts. Why: Students focus on the unique large counts condition for chi-square tests and miss the general 10% condition required for all sampling without replacement inference. Correct move: Use the fixed checklist: Random → 10% Independence → Large Counts, and check every box every time.
Wrong move: Using a two-tailed p-value for the test. Why: Students generalize that all hypothesis tests can be two-tailed, forgetting the structure of the chi-square statistic. Correct move: Remember that any deviation from expected increases $χ^{2}$ , so all chi-square tests for independence are right-tailed.
Wrong move: Interpreting a significant result as evidence that one variable causes the other. Why: Students confuse association and causation, and most chi-square tests for independence use observational data. Correct move: Never state causation in your conclusion for an observational study, only that an association exists between the two variables.

Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

A grocery store manager wants to test if customer preference for paper bags, plastic bags, or no bags is independent of customer age group (under 30, 30-50, over 50). He collects data from a random sample of 200 customers and calculates a chi-square test statistic of 9.1. Using α = 0.05, what is the correct conclusion? A) Reject H₀, there is sufficient evidence that bag preference is independent of age B) Reject H₀, there is sufficient evidence that bag preference is associated with age C) Fail to reject H₀, there is not sufficient evidence that bag preference is associated with age D) Fail to reject H₀, there is sufficient evidence that bag preference is independent of age

Worked Solution: First, find degrees of freedom: 3 bag preference categories, 3 age groups, so $df = (3 - 1) (3 - 1) = 4$ . The critical value for χ² with df=4 and α=0.05 is 9.488. Our test statistic is 9.1 < 9.488, so p-value > 0.05, meaning we fail to reject H₀. Option D is incorrect because we cannot conclude independence is true, only that we lack evidence of association. The correct answer is C.

Question 2 (Free Response)

A researcher studies whether voting in a local election is associated with whether a registered voter rents or owns their home. They randomly sample 400 registered voters from a city, with the following contingency table:

	Voted	Did Not Vote	Row Total
Rents	78	112	190
Owns	122	88	210
Column Total	200	200	400

(a) State the appropriate hypotheses for this test and check all conditions for inference. (b) Calculate the chi-square test statistic and degrees of freedom. (c) The p-value for this test is < 0.001. Using α = 0.05, state your conclusion in context.

Worked Solution: (a) $H_{0}$ : Voting status and housing type (rent vs own) are independent in the population of registered voters in this city. $H_{a}$ : Voting status and housing type are associated in the population. Conditions: Random sample is stated, so random condition satisfied. Total registered voters in the city are more than 10*400 = 4000, so 10% condition satisfied. The smallest expected count is $\frac{( 190 ) ( 200 )}{400} = 95 \geq 5$ , so large counts condition satisfied. (b) Expected counts: Rents/Voted = 95, Rents/No Vote = 95, Owns/Voted = 105, Owns/No Vote = 105. $χ^{2} = \frac{( 78 - 95 ) ^{2}}{95} + \frac{( 112 - 95 ) ^{2}}{95} + \frac{( 122 - 105 ) ^{2}}{105} + \frac{( 88 - 105 ) ^{2}}{105} \approx 3.05 + 3.05 + 2.75 + 2.75 = 11.60$ . Degrees of freedom: $(2 - 1) (2 - 1) = 1$ . (c) Since p-value < 0.001 < 0.05, we reject the null hypothesis. There is sufficient evidence at the 0.05 significance level to conclude that voting participation is associated with housing type (rent vs own) among registered voters in this city.

Question 3 (Application / Real-World Style)

A public health researcher surveys 320 randomly selected adults to test whether vaccination status against a seasonal flu (fully vaccinated, unvaccinated) is associated with infection status (infected, not infected) during one season. She calculates a chi-square test statistic of 7.8 with 1 degree of freedom. At α = 0.05, what conclusion should she draw, and what does this mean for public health messaging?

Worked Solution: For df=1 and α=0.05, the critical chi-square value is 3.841. The calculated test statistic 7.8 is greater than 3.841, so p-value < 0.05. We reject the null hypothesis of independence. In context, there is sufficient evidence that vaccination status and infection status are associated, meaning fully vaccinated adults are less likely to get infected than unvaccinated adults in this population. This supports public health messaging encouraging vaccination to reduce infection risk.

Quick Reference Cheatsheet

Category	Formula / Statement	Notes
Null Hypothesis	$H_{0} :$ Two categorical variables are independent (no association)	Always state in terms of the population, not the sample
Alternative Hypothesis	$H_{a} :$ Two categorical variables are dependent (there is an association)
Expected Cell Count	$E = \frac{( Row Total ) \times ( Column Total )}{Grand Total}$	Calculated under the assumption $H_{0}$ is true
Chi-Square Test Statistic	$χ^{2} = \sum \frac{( O - E ) ^{2}}{E}$	$O$ = observed count; larger values = stronger evidence against $H_{0}$
Degrees of Freedom	$df = (r - 1) (c - 1)$	$r$ = number of rows, $c$ = number of columns
Conditions for Inference	1. Random sample/random assignment; 2. 10% of population for independence; 3. All $E \geq 5$	All three conditions must be checked for a valid test
P-Value Definition	$P (χ^{2} \geq calculated χ^{2} ∣ H_{0} true)$	All chi-square tests for independence are right-tailed

What's Next

The chi-square test for independence shares most calculation steps with the next core topic in Unit 8: the chi-square test for homogeneity of proportions, which tests whether the distribution of one categorical variable is the same across multiple populations. The skills you mastered here, from checking conditions to calculating expected counts and degrees of freedom, are directly transferable to the homogeneity test, so mastering this topic is required to avoid confusion on that topic. This topic also reinforces core hypothesis testing skills that are the foundation of all inference procedures on the AP exam, from t-tests to regression inference.

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →

Chi-Square Test for Independence — AP Statistics Study Guide

1. What Is Chi-Square Test for Independence?

2. Hypotheses, Conditions, and Contingency Table Setup

Worked Example

3. Expected Counts and Chi-Square Test Statistic Calculation

Worked Example

4. P-Value Calculation and Contextual Conclusion

Worked Example

Common Pitfalls (and how to avoid them)

Practice Questions (AP Statistics Style)

Question 1 (Multiple Choice)

Question 2 (Free Response)

Question 3 (Application / Real-World Style)

Quick Reference Cheatsheet

What's Next

More study guides