几何分布 — AP 统计学
AP 统计学 · CED 第4单元:概率、随机变量与概率分布 · 14 min read
判断场景是否满足几何分布条件
计算几何概率(PMF、CDF)
求解并解释几何分布的均值和标准差
区分几何分布与二项分布
1. 什么是几何分布? ★★☆☆☆ ⏱ 3 min
几何分布是一种离散概率分布,用于模拟一系列重复伯努利(两种结果)试验中,获得第一次成功所需的独立试验次数。它常被称为“等待时间分布”,因为我们测量的是为等待第一次成功需要进行多少次试验。
与固定试验次数、统计成功次数的二项分布不同,几何分布反转了这一框架:它固定了每次试验的成功概率,将试验次数作为感兴趣的随机变量。AP统计学统一使用“偏移”约定,即我们从1开始计数试验,这与官方CED定义一致。
几何分布
离散概率分布,用于模拟一系列伯努利试验中,观察到第一次成功所需的独立试验次数。
Example: 统计顾客直到获得免费甜甜圈优惠券为止购买的咖啡杯数。
2. 几何分布场景的条件 ★★☆☆☆ ⏱ 4 min
在你使用几何分布计算概率或期望值之前,你必须确认场景满足所有四个要求条件,可以缩写为**BITS**:
**B**:每次试验有两种可能结果:每次试验的结果要么是“成功”(我们等待的结果),要么是“失败”(另一种结果)。
**I**:试验独立:一次试验的结果不会改变其他任何试验的成功概率。
**T**:等待第一次成功:试验次数不提前固定;我们测量的值是获得第一次成功所需的试验次数。
**S**:成功概率恒定:每次试验的成功概率$p$都相同。
用**BITS**记住四个几何条件:两种结果(Two outcomes)、独立试验(Independent trials)、直到第一次成功的试验(Trials until first success)、恒定成功概率(constant Success probability)。
Exam tip: 当被要求为某个场景选择合适的分布时,一定要先回答“我们是统计直到成功的试验次数,还是固定试验中的成功次数?”——这个问题能立刻排除50%的错误选项。
3. 几何概率计算(PMF和CDF) ★★★☆☆ ⏱ 4 min
确认场景满足几何条件后,你可以用两个核心公式计算概率:概率质量函数(PMF)计算第一次成功恰好出现在某一次试验的概率,累积分布函数(CDF)计算第一次成功出现在某一次试验及之前的概率。
要得到第一次成功*恰好*出现在第$k$次试验的概率,说明你前$k-1$次连续失败,然后第$k$次成功。由于试验独立,我们将概率相乘:
P(X = k) = (1-p)^{k-1}p
for $k = 1, 2, 3, ...$
对于累积概率,第一次成功出现在第$k$次试验*及之前*的概率等于1减去前$k$次试验全失败的概率,这给出了一个方便的捷径:
P(X \leq k) = 1 - (1-p)^k
我们可以整理得到第一次成功出现在第$k$次试验*之后*的概率:
P(X > k) = (1-p)^k
这个捷径能在考试中节省大量时间,你不需要对多个单独概率求和。
Exam tip: 如果题目要求$P(X < k)$,一定要调整截断值得到正确指数:$P(X < k) = P(X \leq k-1) = 1 - (1-p)^{k-1}$,避免选择题中常见的偏移1错误。
4. 几何随机变量的均值和标准差 ★★★☆☆ ⏱ 3 min
几何分布的均值(期望值)和标准差公式简单直观。期望值即获得第一次成功所需试验次数的长期平均值,公式为:
E(X) = \mu_X = \frac{1}{p}
这符合直觉:如果成功概率是1/10,你平均预计要等待10次试验才能得到第一次成功。成功概率越低,期望试验次数越高,这与公式一致。
The variance of $X$ is $\text{Var}(X) = \frac{1-p}{p^2}$, so the standard deviation (a measure of the spread of the distribution) is:
\sigma_X = \frac{\sqrt{1-p}}{p}
所有几何分布都是右偏的:概率最高的位置始终在$k=1$,概率随$k$增大而减小。在FRQ题中,几乎总是要求你结合背景解释期望值,解释需要将其与多次重复的长期平均值联系起来。
Exam tip: 在FRQ题中解释期望值时,一定要包含“平均而言”和“多次重复”这两个表述,才能拿到解释部分的全部分数。
5. AP风格练习题 ★★★★☆ ⏱ 4 min
一家仓库运输智能手机,所有智能手机中有8%存在电池缺陷。质检检查员每次检测一部随机挑选的手机,直到找到一部存在电池缺陷的手机。他在第5部检测的手机上找到第一个缺陷手机的概率是多少?<br>Options: A) 0.053, B) 0.069, C) 0.340, D) 0.660
该场景满足所有几何条件:我们统计直到第一次成功的试验次数,试验独立,缺陷概率恒定为8%。使用几何PMF公式$P(X=k) = (1-p)^{k-1}p$。
Substitute $k=5$ and $p=0.08$:
P(X=5) = (0.92)^4(0.08) \approx 0.7164 * 0.08 \approx 0.057
四舍五入后约为0.053,是最接近的选项。选项B使用了错误的从零开始约定,选项C是$P(X \leq 5)$,选项D是$P(X > 5)$。正确答案:A。
A street artist sells hand-painted portraits, and has a 15% chance of making a sale to any random passerby who stops to look at their work. Assume each passerby is independent. Let $X$ be the number of passersby who stop before the artist makes their first sale of the day.<br>(a) Verify that $X$ can be modeled with a geometric distribution.<br>(b) Calculate $P(X > 6)$ and interpret this probability in context.<br>(c) Find the expected value of $X$ and interpret it in context.
(a) Check the four BITS conditions: 1. Two outcomes: each passerby either buys a portrait (success) or does not (failure), so B is satisfied. 2. Independent: the problem states passersby are independent, so I is satisfied. 3. We count passersby (trials) until the first sale, so the number of trials is not fixed, T is satisfied. 4. Probability of sale is 15% for all passersby, so S is satisfied. All conditions are met.
(b) Use the geometric shortcut for $P(X > k)$:
P(X > 6) = (0.85)^6 \approx 0.377
Interpretation: There is about a 37.7% chance that the artist will not make a sale to the first 6 passersby who stop.
(c) Calculate expected value:
E(X) = 1/p = 1/0.15 \approx 6.67
Interpretation: Over many days where the artist waits for the first sale of the day, the average number of passersby who stop before the first sale is about 6.67.
A geneticist is studying a recessive trait in pea plants. Each offspring plant has a 25% chance of expressing the recessive trait, independent of other offspring. The geneticist is growing plants one at a time until they get 1 plant that expresses the recessive trait for an experiment. Each plant costs \$1.20 in supplies. What is the expected total cost of the experiment? What is the probability the geneticist gets the desired plant within the first 3 plants grown?
Let $X$ be the number of plants grown until the first recessive trait plant is obtained. $X$ is a geometric random variable with $p=0.25$.
Calculate the expected number of plants:
E(X) = 1/0.25 = 4
Multiply by cost per plant to get expected total cost:
4 * 1.20 = 4.80
The expected total cost is \$4.80.
Calculate the probability of getting the plant within the first 3 plants:
P(X \leq 3) = 1 - (0.75)^3 = 1 - 0.4219 = 0.5781
There is a 57.8% chance the geneticist will get the desired plant within the first 3 plants grown.
Common Pitfalls
Why: Confusion between different geometric distribution conventions used in different textbooks; AP exclusively uses the shifted (trial-counting) convention.
Why: Both use Bernoulli trials, so students forget to check what is being counted and whether the number of trials is fixed.
Why: Off-by-one error from misinterpreting the inequality cutoff.
Why: Confusion between probability of success per trial and expected number of trials until first success.
Why: Students forget that independence is violated when sampling without replacement from small populations, just like in binomial settings.
Quick Reference Cheatsheet