The Geometric Distribution — AP Statistics Study Guide
For: AP Statistics candidates sitting AP Statistics.
Covers: Conditions for a geometric probability setting, geometric probability mass function, cumulative geometric probability, mean (expected value), standard deviation, and distinguishing geometric from binomial distributions for the AP exam.
You should already know: Basic properties of discrete random variables, probability rules for independent events, binomial distribution conditions and calculations.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Statistics style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. What Is The Geometric Distribution?
The geometric distribution is a discrete probability distribution that models the number of independent trials required to get the first success in a series of repeated Bernoulli (two-outcome) trials. It is a core topic in AP Statistics Unit 4, making up roughly 7-10% of the unit’s exam weight, and it appears in both multiple-choice (MCQ) and free-response (FRQ) sections, often combined with binomial or other discrete distribution concepts.
A common synonym for the geometric distribution is the "waiting-time distribution," because we measure how long we wait (how many trials we run) for the first success. Unlike the binomial distribution, which fixes the number of trials and counts the number of successes, the geometric distribution reverses this framing: it fixes the probability of success per trial, and lets the number of trials be the random variable of interest. AP Statistics exclusively uses the "shifted" geometric distribution convention, where we count trials starting at 1, which matches the standard CED definition. It is almost always tested in real-world context, from counting shots until the first basket to counting parts inspected until the first defect.
2. Conditions for a Geometric Setting
Before you can use the geometric distribution to calculate probabilities or expected values, you must confirm that your scenario meets all four required conditions, often abbreviated BITS (similar to the BINS framework for binomial distributions):
- B: Two possible outcomes per trial: each trial results in either a "success" (the outcome we are waiting for) or a "failure" (the other outcome).
- I: Independent trials: the outcome of one trial does not change the probability of success for any other trial.
- T: Wait for the first success: the number of trials is not fixed in advance; the value we measure is the number of trials needed to get the first success.
- S: Constant success probability: the probability of success (p) is the same for every trial.
The most common point of confusion is distinguishing a geometric setting from a binomial setting. Both use two-outcome, independent, constant-probability trials, but binomial settings fix the number of trials and count successes, while geometric settings count trials until the first success, so the number of trials is random.
Worked Example
A coffee shop runs a promotion where 15% of coffee cups have a coupon for a free donut. A customer buys one coffee at a time until they get a coupon. Can the number of coffees the customer buys be modeled with a geometric distribution? Check all conditions.
- Step 1: Check the two-outcome condition: Each coffee cup either has a coupon (success) or does not (failure). Only two outcomes, so condition B is satisfied.
- Step 2: Check independence: Coupons are randomly distributed, so one cup having a coupon does not change the chance another cup has a coupon. Condition I is satisfied.
- Step 3: Check what we count: We count the number of coffees (trials) until the first coupon, so the number of trials is not fixed in advance. Condition T is satisfied.
- Step 4: Check constant probability: The probability of a coupon is 15% for all cups, so (p=0.15) is constant. Condition S is satisfied.
- Step 5: Conclusion: All conditions are met, so this can be modeled with a geometric distribution.
Exam tip: When asked to identify the appropriate distribution for a scenario, always answer the question "are we counting trials until a success, or counting successes in fixed trials?" first—this eliminates 50% of wrong answers immediately.
3. Geometric Probability Calculations (PMF and CDF)
Once you confirm a scenario meets the geometric conditions, you can calculate probabilities using two core formulas: the probability mass function (PMF) for the probability of first success on an exact trial, and the cumulative distribution function (CDF) for the probability of first success by a certain trial.
To get the probability that the first success occurs exactly on the (k)-th trial, you must have (k-1) consecutive failures first, followed by a success on the (k)-th trial. Because trials are independent, we multiply the probabilities: for (k = 1, 2, 3, ...).
For cumulative probability, the probability that the first success occurs on or before the (k)-th trial is equal to 1 minus the probability that the first (k) trials are all failures, which gives a convenient shortcut: We can rearrange this to get the probability that the first success occurs after the (k)-th trial: This shortcut saves significant time on the exam, as you do not need to sum multiple individual probabilities.
Worked Example
For the coffee shop promotion with (p=0.15), what is the probability the customer gets their first coupon on the 3rd coffee? What is the probability they get their first coupon within the first 4 coffees?
- Step 1: For the first question, we want (P(X=3)), so use the PMF with (k=3), (p=0.15).
- Step 2: Calculate: (P(X=3) = (1-0.15)^{3-1}(0.15) = (0.85)^2(0.15) = (0.7225)(0.15) \approx 0.1084).
- Step 3: For the second question, we want (P(X \leq 4)), which we calculate with the CDF shortcut.
- Step 4: Calculate: (P(X \leq 4) = 1 - (0.85)^4 \approx 1 - 0.5220 = 0.4780).
- Step 5: The probabilities are approximately 0.108 (exactly on 3rd) and 0.478 (within first 4).
Exam tip: If you are asked for (P(X < k)), always adjust the cutoff to get the correct exponent: (P(X < k) = P(X \leq k-1) = 1 - (1-p)^{k-1}) to avoid off-by-one errors that are common on MCQs.
4. Mean and Standard Deviation of a Geometric Random Variable
The geometric distribution has simple, intuitive formulas for the mean (expected value) and standard deviation. The expected value, which is the long-run average number of trials needed to get the first success, is: This makes intuitive sense: if the probability of success is 1/10, you expect to wait 10 trials on average for the first success. Lower probability of success means a higher expected number of trials, which matches the formula.
The variance of (X) is (\text{Var}(X) = \frac{1-p}{p^2}), so the standard deviation (a measure of the spread of the distribution) is: All geometric distributions are right-skewed: the highest probability is always at (k=1), and probabilities get smaller as (k) increases. This means the mean is always larger than the median, which is a common fact tested in distribution shape questions. On FRQs, you are almost always required to interpret the expected value in context, which requires connecting it to the long-run average over many repetitions.
Worked Example
For the coffee shop promotion with (p=0.15), what is the expected number of coffees the customer will buy, and what is the standard deviation? Interpret the expected value in context.
- Step 1: Use the expected value formula for geometric distributions, (E(X) = 1/p).
- Step 2: Calculate (E(X) = 1/0.15 \approx 6.67).
- Step 3: Calculate standard deviation: (\sigma_X = \sqrt{(1-0.15)/(0.15)^2} = \sqrt{0.85 / 0.0225} \approx \sqrt{37.78} \approx 6.15).
- Step 4: Interpret the expected value: If many customers buy coffees one at a time until they get a coupon, the average number of coffees each customer buys is about 6.67.
Exam tip: Always include the phrases "on average" and "over many repetitions" when interpreting expected value on FRQs to earn full credit for the interpretation.
5. Common Pitfalls (and how to avoid them)
- Wrong move: Using the zero-based geometric PMF (P(X=k) = (1-p)^k p) for an AP question where (X) counts trials until first success, leading to (P(X=3) = (0.85)^3(0.15)) instead of the correct ((0.85)^2(0.15)). Why: Confusion between two different conventions for geometric distribution used in different textbooks. AP exclusively uses the shifted (trials until first success) convention. Correct move: Always check how the random variable is defined: if it counts trials until first success, you have (k-1) failures before the first success, so the exponent is (k-1).
- Wrong move: Calling a distribution geometric when counting the number of successes in 10 fixed trials, or binomial when counting trials until first success. Why: Both use Bernoulli trials, so students forget to check what is being counted. Correct move: Always ask "is the number of trials fixed or random?" before selecting a distribution.
- Wrong move: Calculating (P(X < 5)) as (1 - (1-p)^5) for a geometric random variable. Why: Off-by-one error from misinterpreting the inequality cutoff. Correct move: Rewrite the inequality to match the CDF form: (P(X < 5) = P(X \leq 4) = 1 - (1-p)^4), and adjust the exponent to match the upper bound of (X).
- Wrong move: Calculating expected value as (p) instead of (1/p), getting an expected value of 0.15 for (p=0.15). Why: Confusion between probability of success and expected number of trials. Correct move: Remember the intuition: lower probability of success means you wait more trials, so expected value must be larger than 1 when (p < 1), which matches (E(X) = 1/p).
- Wrong move: Using the geometric distribution for sampling without replacement from a small population of 20 parts, where you test until you find 1 defective part, without checking the 10% condition. Why: Students forget that independence is violated when sampling without replacement from small populations, just like in binomial settings. Correct move: If sampling without replacement, confirm the population is at least 10 times the maximum expected sample size before using the geometric distribution.
6. Practice Questions (AP Statistics Style)
Question 1 (Multiple Choice)
A warehouse ships smartphones, and 8% of all smartphones have a battery defect. A quality control inspector tests one phone at a time, randomly selected, until he finds a phone with a battery defect. What is the probability that he finds the first defective phone on the 5th phone he tests? A) 0.053 B) 0.069 C) 0.340 D) 0.660
Worked Solution: This scenario meets all geometric conditions: we count trials until first success, with independent trials and constant 8% defect probability. We use the geometric PMF (P(X=k) = (1-p)^{k-1}p), where (k=5) and (p=0.08). Substituting values gives (P(X=5) = (0.92)^4(0.08) \approx 0.7164 * 0.08 \approx 0.057), which rounds to 0.053, the closest option. Option B uses the wrong zero-based convention, option C is (P(X \leq 5)), and option D is (P(X > 5)). Correct answer: A.
Question 2 (Free Response)
A street artist sells hand-painted portraits, and has a 15% chance of making a sale to any random passerby who stops to look at their work. Assume each passerby is independent. Let (X) be the number of passersby who stop before the artist makes their first sale of the day. (a) Verify that (X) can be modeled with a geometric distribution. (b) Calculate (P(X > 6)) and interpret this probability in context. (c) Find the expected value of (X) and interpret it in context.
Worked Solution: (a) We check the four BITS conditions: 1. Two outcomes: each passerby either buys a portrait (success) or does not (failure), so B is satisfied. 2. Independent: the problem states passersby are independent, so I is satisfied. 3. We count passersby (trials) until the first sale, so the number of trials is not fixed, T is satisfied. 4. Probability of sale is 15% for all passersby, so S is satisfied. All conditions are met. (b) Using the geometric shortcut for (P(X > k)), we get (P(X > 6) = (1 - 0.15)^6 = (0.85)^6 \approx 0.377). Interpretation: There is about a 37.7% chance that the artist will not make a sale to the first 6 passersby who stop. (c) The expected value of a geometric random variable is (E(X) = 1/p = 1/0.15 \approx 6.67). Interpretation: Over many days where the artist waits for the first sale of the day, the average number of passersby who stop before the first sale is about 6.67.
Question 3 (Application / Real-World Style)
A geneticist is studying a recessive trait in pea plants. Each offspring plant has a 25% chance of expressing the recessive trait, independent of other offspring. The geneticist is growing plants one at a time until they get 1 plant that expresses the recessive trait for an experiment. Each plant takes 10 days to grow to maturity, and costs $1.20 in supplies. What is the expected total cost of the experiment? What is the probability the geneticist gets the desired plant within the first 3 plants grown?
Worked Solution: Let (X) be the number of plants grown until the first recessive trait plant is obtained. (X) is a geometric random variable with (p=0.25). The expected number of plants is (E(X) = 1/0.25 = 4). Multiply by the cost per plant to get expected total cost: (4 * $1.20 = $4.80). For the probability, we calculate (P(X \leq 3) = 1 - (1-0.25)^3 = 1 - (0.75)^3 = 1 - 0.4219 = 0.5781). Interpretation: The experiment is expected to cost $4.80 on average, and there is a 57.8% chance the geneticist will get the desired plant within the first 3 plants grown.
7. Quick Reference Cheatsheet
| Category | Formula | Notes |
|---|---|---|
| Geometric Setting Conditions | BITS | B = 2 outcomes, I = independent trials, T = count trials until first success, S = constant success probability (p) |
| Probability first success exactly on (k)-th trial (PMF) | (P(X = k) = (1-p)^{k-1}p) | AP convention (counts trials, not failures before success), valid for (k=1,2,3,...) |
| Probability first success on or before (k)-th trial (CDF) | (P(X \leq k) = 1 - (1-p)^k) | Shortcut avoids summing multiple individual probabilities |
| Probability first success after (k)-th trial | (P(X > k) = (1-p)^k) | Most useful shortcut for exam cumulative probability questions |
| Expected Value (Mean) | (E(X) = \mu_X = \frac{1}{p}) | Long-run average number of trials until first success |
| Variance | (\text{Var}(X) = \frac{1-p}{p^2}) | Intermediate step for calculating standard deviation |
| Standard Deviation | (\sigma_X = \frac{\sqrt{1-p}}{p}) | Measures spread of the distribution of number of trials |
| Distribution Shape | Always right-skewed | Highest probability at (k=1), probabilities decrease as (k) increases |
8. What's Next
Mastering the geometric distribution is a prerequisite for understanding other discrete waiting-time distributions, most notably the negative binomial distribution, which models the number of trials required to get a fixed number of successes (the geometric distribution is just the special case of 1 success). Geometric distribution also reinforces the core distinction between fixed and random number of trials that is critical for all discrete probability distributions in AP Statistics. It is often combined with binomial distribution concepts in MCQ and FRQ questions that test your ability to select the correct distribution for a given context. Without a clear understanding of geometric conditions and formulas, you will struggle to correctly answer these classification questions, which make up a significant portion of Unit 4 exam points.
Follow-on topics for further study: The Binomial Distribution The Negative Binomial Distribution Discrete Random Variables Normal Probability Distributions