Statistics and Probability — IB Math AA SL AA SL Study Guide
For: IB Math AA SL candidates sitting IB Math: Analysis & Approaches SL.
Covers: Descriptive statistics, conditional and independent probability, basic binomial and normal distributions, and linear regression and correlation, aligned with the IBO DP Math AA SL syllabus.
You should already know: IGCSE / pre-DP math.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the IB Math AA SL style for educational use. They are not reproductions of past IBO papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official IBO mark schemes for grading conventions.
1. What Is Statistics and Probability?
Statistics and probability are the branch of mathematics focused on collecting, analyzing, and interpreting numerical data, and quantifying the likelihood of random events. Combined, they make up Topic 4 of the IB Math AA SL syllabus, weighted 15-20% of your final exam mark, and are used across real-world fields from biology and psychology to economics and data science. The core framework prioritizes practical, context-driven problem-solving, with heavy use of your graphic display calculator (GDC) for calculations.
2. Descriptive Statistics
Descriptive statistics summarize raw data into meaningful, interpretable metrics, and form the foundation of all statistical analysis. First, distinguish between a population (the full group you are studying) and a sample (a subset of the population used for analysis), and between discrete data (countable values, e.g. number of students in a class) and continuous data (measurable values, e.g. student height, grouped into intervals for analysis).
Key Metrics
- Measures of central tendency:
- Mean: Average value, calculated as for raw data, or for grouped frequency data (using midpoints of intervals, so it is an estimated mean).
- Median: 50th percentile value, the middle value of ordered data, not affected by outliers.
- Mode: Most frequently occurring value, or modal class for grouped data.
- Measures of dispersion:
- Range: Difference between maximum and minimum values, sensitive to outliers.
- Interquartile Range (IQR): , where is the 25th percentile and the 75th percentile, eliminates outlier bias. Outliers are defined as values or .
- Standard deviation: Average distance of values from the mean, sample standard deviation , population standard deviation .
Worked Example
For the data set: 3, 5, 7, 8, 8, 10, 14
- Mean:
- Median: 4th term = 8
- Mode: 8
- , , , no outliers.
Exam tip: Examiners often require you to state that a grouped data mean is an estimate to get full marks.
3. Probability — conditional, independent
Probability quantifies the chance of an event occurring, with , where 0 means impossible and 1 means certain. The complement of event A (A not occurring) is .
Core Rules
- Addition rule: For any two events, , where is the intersection (both events occur). If events are mutually exclusive (cannot occur at the same time), , so .
- Conditional probability: Probability of A occurring given B has already occurred, calculated as (valid when ).
- Independent events: The occurrence of one event does not affect the probability of the other, so . This rearranges to the test for independence: .
Worked Example
A class has 12 girls and 8 boys. 7 girls and 3 boys play basketball. What is the probability a randomly selected basketball player is a girl?
- Let = student is girl, = plays basketball. , .
- .
Exam tip: Use tree diagrams for sequential probability questions (e.g. drawing items without replacement) to avoid missing cases and earn method marks even if your final answer is wrong.
4. Binomial and normal distributions (basics)
Probability distributions describe the probability of all possible outcomes of a random variable, and are the most frequently tested subtopic of this unit in IB Math AA SL exams.
Binomial Distribution
A discrete distribution for scenarios with:
- Fixed number of independent trials
- Two possible outcomes (success/failure)
- Constant probability of success per trial Notation:
- Mean:
- Variance:
- Probability of successes: , where
Worked Binomial Example
You roll a fair 6-sided die 8 times. Let = number of times you roll a 3. . What is ?
Normal Distribution
A continuous, symmetric bell-shaped distribution defined by its mean and variance . Notation: .
- 68-95-99.7 rule: ~68% of data lies within , 95% within , 99.7% within .
- Standard normal distribution: , convert any normal value to a z-score with to calculate probabilities using your GDC or z-tables.
Worked Normal Example
Test scores are normally distributed with , . What is ?
- Z-score:
Exam tip: Always write the distribution notation (e.g. ) before calculating values, as this earns 1 method mark even if your GDC calculation is wrong.
5. Linear regression and correlation
Linear regression and correlation analyze the relationship between two continuous variables (bivariate data), where is the independent explanatory variable and is the dependent response variable.
Pearson's Product-Moment Correlation Coefficient
Measures the strength and direction of the linear association between and , with :
- : Perfect positive linear correlation
- : Perfect negative linear correlation
- : No linear correlation (may have non-linear correlation)
Regression Line of on
The line of best fit used to predict values from values, written as , where:
- = slope: , where = standard deviation of , = standard deviation of
- = y-intercept:
Worked Example
Bivariate data for study hours () and test score (): , , , , . Find the regression line.
- Regression line:
Exam tip: Never use the regression line to predict values for outside the range of your original data (extrapolation), as this is unreliable and examiners penalize unqualified extrapolation.
6. Common Pitfalls (and how to avoid them)
- Wrong move: Using population standard deviation instead of sample standard deviation when analyzing a subset of data. Why students do it: Mix up GDC output labels. Correct move: Explicitly state if you are using sample or population SD in your working, and confirm the question context before selecting a value.
- Wrong move: Writing normal distribution notation with standard deviation instead of variance, e.g. instead of . Why students do it: Confuse variance and SD parameters. Correct move: Always square the SD for the second parameter, and double-check units to catch errors.
- Wrong move: Forgetting to subtract when calculating the union of non-mutually exclusive events. Why students do it: Mix up mutually exclusive and independent event definitions. Correct move: First check if events can occur at the same time; if yes, use the full addition rule.
- Wrong move: Calculating instead of for conditional probability questions. Why students do it: Misread "given that" phrasing. Correct move: Circle the condition first when reading the question, and write at the top of your working to avoid mix-ups.
- Wrong move: Assuming means no correlation between variables. Why students do it: Forget only measures linear correlation. Correct move: Explicitly state "no linear correlation" if is near zero, as a strong non-linear relationship may still exist.
7. Practice Questions (IB Math AA SL Style)
Question 1 (4 marks)
The grouped frequency table below shows weekly grocery spending for 40 households:
| Spending ($) | 0-<50 | 50-<100 | 100-<150 | 150-<200 | 200-<250 |
|---|---|---|---|---|---|
| Frequency | 5 | 11 | 15 | 7 | 2 |
| (a) Calculate the estimated mean weekly spending. (2 marks) | |||||
| (b) State the modal class and median class. (2 marks) |
Solution 1
(a) Midpoints: 25, 75, 125, 175, 225. Sum of . Estimated mean = . (b) Modal class = 100-<150 (highest frequency). Median is the 20th/21st value, cumulative frequency up to 50-<100 = 16, up to 100-<150 = 31, so median class = 100-<150.
Question 2 (6 marks)
In a year group, 70% of students study Physics, 55% study Chemistry, and 40% study both. (a) Find the probability a randomly selected student studies neither subject. (2 marks) (b) Given a student studies Chemistry, find the probability they also study Physics. (2 marks) (c) Justify if studying Physics and studying Chemistry are independent events. (2 marks)
Solution 2
Let = study Physics, = study Chemistry. , , . (a) . . (b) . (c) Test independence: , so events are not independent.
Question 3 (4 marks)
(a) The number of broken lightbulbs in a box of 10 follows a binomial distribution with . Find the probability a box has exactly 2 broken bulbs. (2 marks) (b) Cat weights are normally distributed with mean 4.2kg, standard deviation 0.8kg. Find the probability a randomly selected cat weighs less than 3kg. (2 marks)
Solution 3
(a) . . (b) . Z-score: . .
8. Quick Reference Cheatsheet
| Category | Key Formulas & Rules |
|---|---|
| Descriptive Statistics | Mean (raw): Mean (grouped): Outlier: or |
| Probability | Complement: Addition: Conditional: $P(A |
| Distributions | Binomial : , Normal : , 68-95-99.7 rule |
| Regression & Correlation | (linear correlation only) Regression line: , No extrapolation outside x data range |
9. What's Next
Mastery of these core statistics and probability concepts is critical for success across the rest of the IB Math AA SL syllabus: it is frequently combined with calculus in Paper 2 extended response questions (e.g. modeling probability density functions), and forms the basis of data analysis for your Internal Assessment if you choose a quantitative research question. You will also build on this foundation if you choose to practice extended statistics topics for optional exam questions.
If you struggle with any of the concepts covered in this guide, or want more personalized practice questions tailored to your weak areas, you can ask Ollie for step-by-step explanations, additional worked examples, or custom quizzes at any time on the homepage. You can also access our full library of IB Math AA SL study guides, past paper walkthroughs, and exam strategy resources to ensure you are fully prepared for your final exams.