Hypothesis Testing — A-Level Mathematics Stats Study Guide
For: A-Level Mathematics candidates sitting Paper 5 (Probability & Statistics 1).
Covers: Null and alternative hypotheses, one- and two-tailed tests, critical regions and significance levels, Type I and Type II errors, and hypothesis tests for binomial proportions and normal means, aligned to the latest A-Level Mathematics syllabus.
You should already know: Basic probability, summation, integration (Pure 1 calculus).
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the A-Level Mathematics style for educational use. They are not reproductions of past Cambridge International examination papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official Cambridge mark schemes for grading conventions.
1. What Is Hypothesis Testing?
Hypothesis testing is a statistical framework to evaluate whether a claim about a population parameter is supported by sample data, using probability to quantify the risk of drawing an incorrect conclusion. It is a high-weight topic in A-Level Mathematics Paper 5, accounting for 10-15% of total marks on most exam papers, and forms the foundation for all advanced statistical analysis in further mathematics syllabi. Common synonyms include significance testing and statistical hypothesis testing.
2. Null and alternative hypotheses
Every hypothesis test starts with two competing statements about a population parameter, never about a sample statistic.
- The null hypothesis () is the default, widely accepted claim, assuming no effect, no difference, or no change from a known reference value. It always includes an equality sign for the population parameter (e.g., , ).
- The alternative hypothesis () is the competing claim you are testing against , assuming there is a measurable effect, difference, or change from the reference value. It uses inequality signs (, , or ) depending on the direction of the claim.
Worked Example
A mobile network claims its average 5G download speed is 120 Mbps. A customer suspects the speed is slower than advertised. The hypotheses are: Exam tip: Examiners deduct 1 mark if you write hypotheses using sample statistics (e.g., ) instead of population parameters, so double-check this before moving on.
3. One-tailed and two-tailed tests
The "tail" of a test is determined by the inequality in the alternative hypothesis, and dictates how you calculate significance thresholds.
- One-tailed test: uses a single directional inequality ( or ), so you are only testing for an effect in one specific direction. Left-tailed tests use , right-tailed tests use .
- Two-tailed test: uses , so you are testing for any difference from the reference value, regardless of direction. The total significance level is split equally between the upper and lower tails of the distribution.
Worked Example
Using the mobile network scenario:
- If you only suspect speeds are slower, this is a left-tailed one-tailed test.
- If you suspect speeds are either faster or slower than advertised (no directional claim), Mbps, so this is a two-tailed test. Exam tip: Context clues in the question will tell you which test to use: phrases like "increased", "reduced", or "biased towards" indicate a one-tailed test, while "different from" or "changed" indicate a two-tailed test.
4. Critical region and significance level
The significance level and critical region set the threshold for how unlikely a sample result has to be before you reject the null hypothesis.
- The significance level () is the maximum acceptable probability of rejecting when it is actually true. Common values used in A-Level exams are 1%, 5%, and 10%.
- The critical region (or rejection region) is the set of values of the test statistic for which you reject . The boundary of this region is called the critical value.
Worked Example
You run a right-tailed test for a binomial proportion, with , , and . First, calculate cumulative binomial probabilities: Since 0.021 < 0.05 and 0.057 > 0.05, the critical region is . For a two-tailed test at 5% significance, you would split into 2.5% per tail, finding lower and upper critical values where and .
5. Type I and Type II errors
No hypothesis test is 100% accurate: there are two possible errors you can make when drawing a conclusion:
- Type I error: Reject when is actually true (a "false positive"). The probability of a Type I error is equal to the significance level , or the exact probability of the test statistic falling in the critical region under .
- Type II error: Fail to reject when is actually false (a "false negative"). The probability of a Type II error is denoted , and can only be calculated if you are given the true value of the population parameter.
Worked Example
For a test with , , , and critical region :
- Probability of Type I error = (~2.1%), which is below the 5% significance level.
- If the true value of is 0.7, probability of Type II error = (~58.4%). Exam tip: Use the mnemonic to remember: Type I = Innocent person convicted, Type II = Guilty person acquitted.
6. Test for a binomial proportion or normal mean
These are the two hypothesis test variants you will be asked to conduct in A-Level Mathematics Paper 5.
Test for a binomial proportion
Used for discrete count data with fixed independent trials, two outcomes, and constant success probability . The steps are:
- State and as appropriate.
- Define the test statistic , the number of successes in trials, which follows under .
- Calculate the p-value (probability of observing a result as extreme or more extreme than the sample value under ) or compare to the critical region.
- Conclude: reject if the p-value < or is in the critical region, else fail to reject .
Worked Example
A coin is tossed 20 times, landing heads 14 times. Test at 5% significance if the coin is biased towards heads: Since 0.0577 > 0.05, fail to reject : there is insufficient evidence at the 5% level to conclude the coin is biased towards heads.
Test for a normal mean
Used for continuous data where the population is normally distributed with known variance . The test statistic is the z-score: This follows a standard normal distribution under .
Worked Example
The average test score for a national exam is 65, with standard deviation 10. A class of 25 students has an average score of 69. Test at 1% significance if the class score is different from the national average: The two-tailed 1% critical values are . Since , fail to reject : there is insufficient evidence at the 1% level to conclude the class score is different from the national average.
7. Common Pitfalls (and how to avoid them)
- Wrong move: Writing hypotheses using sample statistics (e.g., ) instead of population parameters. Why students do it: They confuse sample results with the population value being tested. Correct move: Always use population parameters (, ) for and ; sample values are only used to calculate test statistics.
- Wrong move: Using the full significance level for both tails in a two-tailed test. Why students do it: They forget that two-tailed tests split equally between upper and lower tails. Correct move: For a 5% two-tailed test, use 2.5% in each tail when calculating critical values or p-values.
- Wrong move: Writing "accept " when the test statistic is not in the critical region. Why students do it: They assume no evidence against means is proven true. Correct move: Always write "fail to reject " or "there is insufficient evidence to reject ", as you cannot prove the null hypothesis is true, only that you lack data to disprove it.
- Wrong move: Calculating Type II error probability using the parameter value instead of the given true parameter value. Why students do it: They mix up the conditions for Type I and Type II errors. Correct move: Type I error is calculated under being true; Type II error is calculated under the given true value of the parameter, which is always different from the value.
- Wrong move: Using the population standard deviation instead of the standard error of the mean for normal tests. Why students do it: They forget that the variance of the sample mean is , not . Correct move: Always use as the denominator in the z-score formula for mean tests.
8. Practice Questions (A-Level Mathematics Paper 5 Style)
Question 1
A café claims that 70% of customers rate their service as "excellent". A new manager suspects the rating is lower, so she surveys a random sample of 12 customers. 5 of them rate the service as excellent. (a) State suitable null and alternative hypotheses for the test. [2 marks] (b) Test the manager’s claim at the 10% significance level. [5 marks]
Solution
(a) Let = proportion of customers who rate service as excellent. (b) Let = number of customers who rate service as excellent, so under , . Since 0.0386 < 0.1, the test statistic falls inside the critical region. We reject at the 10% significance level: there is sufficient evidence to support the manager’s claim that the proportion of customers rating service as excellent is lower than 70%.
Question 2
The mass of a standard bag of flour is normally distributed with mean 1kg and standard deviation 40g. A consumer group suspects bags are underfilled, so they sample 16 bags and find their mean mass is 980g. (a) Find the critical region for a one-tailed test at the 5% significance level. [3 marks] (b) State the conclusion of the test. [2 marks]
Solution
(a) . The test statistic is: The 5% left-tailed critical z-value is -1.645. Solve for : The critical region is . (b) The sample mean is 980g, which falls inside the critical region. We reject at the 5% significance level: there is evidence that bags are being underfilled.
Question 3
For the test in Question 2: (a) Calculate the probability of a Type I error. [1 mark] (b) If the true mean mass of the flour bags is 975g, calculate the probability of a Type II error. [4 marks]
Solution
(a) Probability of Type I error = significance level = 0.05. (b) Type II error is failing to reject when . We calculate : The probability of a Type II error is 0.196 (19.6%).
9. Quick Reference Cheatsheet
| Concept | Rule/Formula |
|---|---|
| Hypotheses | : population parameter = reference value; : parameter <, >, or ≠ reference value |
| Test Type | One-tailed if has < or >; two-tailed if has ≠, split equally between tails |
| Significance Level | = max P(Type I error); common values: 5% = 0.05, 1% = 0.01 |
| Critical Values (z-test) | 5% one-tailed: ±1.645; 5% two-tailed: ±1.96; 1% two-tailed: ±2.576 |
| Errors | P(Type I) = (under ); P(Type II) = P(not in critical region |
| Binomial Test | under , compare p-value to |
| Normal Mean Test | , compare to z critical values |
| Conclusion | Reject if test statistic in critical region / p-value < ; else fail to reject (never "accept ") |
10. What's Next
Hypothesis testing is a core building block for all advanced statistics topics in the A-Level Mathematics and A-Level Further Mathematics (Further Maths) syllabi. In Paper 5, you will apply these rules to combine with other topics like the Poisson distribution and sampling methods, while in Further Maths you will extend them to chi-squared tests, t-tests, and non-parametric tests. Mastering the 5-step framework outlined here will cut down your working time for 10-15 mark long questions on Paper 5 significantly, as every hypothesis test follows the same structure regardless of the distribution used.
If you struggle with any of the concepts, worked examples, or practice questions in this guide, you can ask Ollie for personalized explanations, additional practice questions, or step-by-step walkthroughs tailored to your weak spots at any time. Head to Ollie, the AI tutor built into OwlsPrep, to get instant support, or browse our full library of A-Level Mathematics Paper 5 study guides to cover the rest of the syllabus before your exam.
Aligned with the Cambridge International AS & A Level Mathematics 9709 syllabus. OwlsAi is not affiliated with Cambridge Assessment International Education.