Phylogenetics — AP Biology Study Guide
For: AP Biology candidates sitting AP Biology.
Covers: Phylogenetic tree construction and interpretation, shared ancestral/derived characters, outgroup analysis, maximum parsimony, monophyletic groups, molecular clocks, and common misconceptions about evolutionary relatedness.
You should already know: Evolution by natural selection produces descent with modification. Homology is similarity in traits due to shared ancestry. DNA sequence mutations accumulate over generations.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Biology style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. What Is Phylogenetics?
Phylogenetics is the hypothesis-driven study of evolutionary relatedness among groups of organisms (species, populations, higher taxa) based on shared heritable traits, including morphological characters and molecular sequence data. The end product of a phylogenetic analysis is a phylogenetic tree, a branching diagram that represents hypothesized evolutionary relationships. In AP Biology, this topic falls within Unit 7: Natural Selection, which makes up 13–20% of the total AP exam score. Phylogenetics questions appear regularly in both multiple-choice (MCQ) and free-response (FRQ) sections, often paired with questions about evidence for evolution or speciation.
Common notation conventions are consistent across all AP exam problems: each branch tip represents an extant (living) or extinct taxon, internal nodes represent speciation events (the formation of a new lineage from a common ancestor), and the root of the tree marks the most recent common ancestor of all taxa included. Phylogenetics is often used interchangeably with cladistics in introductory contexts, though cladistics technically refers to a specific method of building trees based on shared derived characters. All phylogenetic trees are testable hypotheses, not permanent facts, and are revised as new data becomes available.
2. Shared Ancestral vs Derived Characters & Outgroup Analysis
To build a phylogenetic tree using cladistics, the first step is sorting heritable traits into two functional categories: shared ancestral characters and shared derived characters. A shared ancestral character is a trait shared by all members of a clade that was inherited from a common ancestor outside the clade, so it does not help resolve relationships within the clade. A shared derived character (also called a synapomorphy) is a new trait that evolved after a clade split from its ancestral lineage, so it is unique to that clade and can be used to sort relationships between nested groups.
The key distinction between these two categories is relative: a trait can be derived for a large clade and ancestral for a smaller nested clade within it. To reliably distinguish between ancestral and derived characters, biologists use outgroup comparison. An outgroup is a closely related taxon that is known to have diverged from the common ancestor of the study group (called the ingroup) before any of the ingroup taxa diverged from each other. Any character present in both the outgroup and the ingroup is classified as ancestral; any character present only in a subset of the ingroup is classified as derived. This method also allows researchers to correctly root the tree (assign the position of the common ancestor of all ingroup taxa).
Worked Example
You are studying phylogenetic relationships between four ingroup taxa: trout, lizard, bear, and human. The outgroup is lamprey (a jawless fish). The character table below summarizes presence (1) / absence (0) of three traits:
| Taxon | Jaws | Lungs | Hair |
|---|---|---|---|
| Lamprey | 0 | 0 | 0 |
| Trout | 1 | 0 | 0 |
| Lizard | 1 | 1 | 0 |
| Bear | 1 | 1 | 1 |
| Human | 1 | 1 | 1 |
Classify each character as ancestral or derived for the entire ingroup, and identify which characters are derived for the nested lizard-bear-human clade.
- Step 1: The outgroup (lamprey) lacks all three traits, so any trait absent in lamprey and present in any ingroup subset is derived for that subset.
- Step 2: Jaws are present in all ingroup taxa and absent in the outgroup, so jaws are a derived character for the entire ingroup.
- Step 3: Lungs are absent in the outgroup and trout, but present in lizard, bear, and human, so lungs are a derived character for the lizard-bear-human clade. For that clade, jaws are a shared ancestral character, because they are inherited from the common ancestor of the entire ingroup.
- Step 4: Hair is present only in bear and human, so it is a derived character for the bear-human clade.
Exam tip: Always remember that character classification is relative. A trait that is derived for a large clade will always be ancestral for any smaller nested clade within it — don't misclassify traits based on their origin in the larger group.
3. Maximum Parsimony
Maximum parsimony is the core principle used to select the best phylogenetic tree hypothesis from multiple possible trees. The principle states that the simplest explanation for observed data is most likely to be correct. For phylogenetics, this means the tree that requires the fewest independent evolutionary changes (gains or losses of derived traits) is the preferred hypothesis. This is because the independent origin of the same trait multiple times in different lineages is a rare event, so the tree that minimizes the number of such events is more probable.
Maximum parsimony works for both morphological and molecular data. For any given set of taxa, you can generate all possible tree topologies (branching patterns), count the minimum number of changes required for each topology to match the observed data, and select the topology with the lowest number of changes. This method is the most commonly tested tree-building approach on the AP Biology exam.
Worked Example
You have three ingroup taxa (A, B, C) and one outgroup (O). Trait data is shown below (0 = absent, 1 = present):
| Taxon | Trait 1 | Trait 2 | Trait 3 | Trait 4 | Trait 5 |
|---|---|---|---|---|---|
| O | 0 | 0 | 0 | 0 | 0 |
| A | 0 | 1 | 1 | 0 | 1 |
| B | 0 | 1 | 1 | 1 | 0 |
| C | 1 | 1 | 1 | 1 | 0 |
Identify the maximum parsimony tree from the three possible rooted trees: Tree 1: ((A,B), C); Tree 2: ((A,C), B); Tree 3: ((B,C), A).
- Step 1: Count the minimum number of changes for each tree. Start with Tree 3 ((B,C), A):
- Root (O) has all 0s. The first split leads to A (one change: trait 3 from 0→1) and the B-C common ancestor (two changes: trait 2 and 3 from 0→1). Split B and C: one change (trait 1 from 0→1). Total changes = 1 + 2 + 1 = 4.
- Step 2: Count changes for Tree 1 ((A,B), C): The minimum number of changes required is 6, as trait 1 and trait 4 must evolve twice independently.
- Step 3: Count changes for Tree 2 ((A,C), B): The minimum number of changes required is 5, as trait 4 must evolve twice independently.
- Step 4: Tree 3 has the fewest changes, so it is the maximum parsimony tree.
Exam tip: When asked to identify the most parsimonious tree on the AP exam, always explicitly count the number of trait changes rather than guessing based on intuitive similarity. Even small counting errors can lead to selecting the wrong tree.
4. Reading and Interpreting Phylogenetic Trees
Most AP Biology phylogenetics questions test your ability to read and interpret existing trees, not build them from scratch. There are several key rules for correct interpretation:
- Rotating nodes around a common ancestor does not change evolutionary relationships. The order of tips along the page is arbitrary; only branching order (which taxa share which common ancestors) matters.
- Extant taxa at the tips of a tree are all equally evolved. A taxon that branches off early near the root is not "ancestral" or "less evolved" than taxa that branch off later — all lineages have evolved for the same amount of time from the root.
- Sister taxa are two taxa that share a most recent common ancestor that no other taxon shares. They are each other's closest relatives.
- A monophyletic group (or clade) includes a common ancestor and all of its descendants. Paraphyletic groups include a common ancestor but not all descendants, and polyphyletic groups include taxa that do not share the same most recent common ancestor. Only monophyletic groups are considered valid for biological classification.
Worked Example
A phylogenetic tree rooted with orangutan (outgroup) has the following topology: (orangutan, (gorilla, (chimpanzee, human))). Answer the following questions:
- What is the sister taxon to humans?
- Is the group (gorilla, human) monophyletic?
- If you rotate the node that splits gorilla from the chimpanzee-human clade, so that chimpanzee and human are on the left and gorilla on the right, does the relationship between chimpanzee and human change?
Solutions steps:
- Step 1: To find the sister taxon, identify the most recent common ancestor of humans. That node is only shared with chimpanzee, so the sister taxon to humans is chimpanzee.
- Step 2: A monophyletic group includes all descendants of the common ancestor. The common ancestor of gorilla and human is also the ancestor of chimpanzee, which is excluded from the group, so (gorilla, human) is paraphyletic, not monophyletic.
- Step 3: Rotating nodes only changes the position of tips on the page, not the branching order, so the relationship between chimpanzee and human remains identical.
Exam tip: When asked to identify which taxon is most closely related to a given taxon, always trace back to the most recent common ancestor. Do not rely on how close the tips are on the page — spacing is arbitrary and can be misleading.
5. Molecular Clocks
A molecular clock is a method that uses the rate of accumulation of neutral mutations in DNA sequences to estimate the time when two lineages diverged from a common ancestor. The core assumption is that neutral mutations (which do not affect fitness) accumulate at a roughly constant rate over time across lineages, so the number of sequence differences between two lineages is proportional to the time since they diverged.
The formula for time since divergence is: where = time since divergence, = number of nucleotide differences between the two sequences, = mutation rate per nucleotide per unit time, and = total length of the sequence. The factor of 2 appears because each lineage accumulates mutations independently after divergence, so total differences are the sum of mutations in both lineages. Mutation rates are calibrated using fossil evidence that gives a known divergence time for a different pair of lineages.
Worked Example
Two species of oak trees differ by 18 nucleotides in a 900-base-pair chloroplast gene. The mutation rate for this gene is mutations per nucleotide per year. Estimate the time since the two species diverged.
- Step 1: List all known values: , bp, substitutions per nucleotide per year.
- Step 2: Substitute into the molecular clock formula:
- Step 3: Check the result: the factor of 2 correctly accounts for mutations accumulating in both lineages, so the estimate is 1 million years.
Exam tip: Never forget the factor of 2 in the molecular clock formula. Most student errors on molecular clock questions come from omitting this term, leading to an estimate that is twice the correct value.
6. Common Pitfalls (and how to avoid them)
- Wrong move: Claiming an extant taxon that branches off near the root (e.g., a fish) is ancestral to other extant taxa (e.g., mammals) and "less evolved". Why: Students confuse early branching with being ancestral, but all extant taxa have evolved for the same amount of time from the root. Correct move: Always state that all extant taxa are equally evolved, and early branching only means the lineage split off earlier from the common ancestor.
- Wrong move: Counting the number of nodes between two tips to determine relatedness, instead of identifying the most recent common ancestor. Why: Students assume fewer nodes between tips means closer relatedness, which is not true for all cladogram types. Correct move: To find the closest relative of a taxon, always identify the most recent common ancestor; the other taxon that shares that node is the closest relative, regardless of the number of nodes to the root.
- Wrong move: Using shared ancestral characters to resolve relationships within an ingroup. Why: Students assume any shared trait is evidence of close relatedness, but ancestral traits are shared by all ingroup members so they cannot sort relationships. Correct move: Always use outgroup comparison first to identify derived traits, and only use derived traits to sort ingroup relationships.
- Wrong move: Treating phylogenetic trees as proven, unchanging facts instead of testable hypotheses. Why: Textbooks present trees as fixed, so students assume they cannot be revised. Correct move: Remember that trees are hypotheses based on available data, and they are revised when new data (e.g., new molecular sequences, new fossils) are obtained.
- Wrong move: Assuming rotating nodes or reordering tips changes the evolutionary relationships shown on the tree. Why: Students are used to reading left-to-right order as meaningful, so they assume tip order equals relatedness. Correct move: Always ignore tip order and only look at branching order (which nodes connect which taxa) to determine relationships.
7. Practice Questions (AP Biology Style)
Question 1 (Multiple Choice)
A phylogenetic tree for five primate taxa has the topology: (lemur, (tarsier, (gorilla, (chimpanzee, human)))). Which of the following pairs are sister taxa? A) Lemur and tarsier B) Gorilla and chimpanzee C) Chimpanzee and human D) Tarsier and gorilla
Worked Solution: Sister taxa are pairs of taxa that share an exclusive most recent common ancestor (no other taxon shares that ancestor). In the given topology, the innermost node connects only chimpanzee and human, so they are sister taxa. Checking the other options: lemur shares its common ancestor with all four other taxa, gorilla shares its common ancestor with the chimpanzee-human clade, and tarsier shares its common ancestor with the gorilla-chimpanzee-human clade. None of these pairs are exclusive. The correct answer is C.
Question 2 (Free Response)
A researcher studies phylogenetic relationships between four grass species, using a brome (outgroup) to root the tree. The trait table is below:
| Taxon | Stigma feathery | Awn on seed | Perennial growth |
|---|---|---|---|
| Brome (O) | No | Yes | No |
| Kentucky bluegrass (K) | Yes | No | Yes |
| Bermuda grass (B) | Yes | No | No |
| Ryegrass (R) | Yes | Yes | Yes |
| Tall fescue (F) | Yes | Yes | Yes |
(a) (2 points) Identify which trait is a shared derived character for the entire ingroup (all four grasses). Justify your answer. (b) (2 points) State how many changes the most parsimonious tree requires, and draw the tree topology. (c) (2 points) Explain why maximum parsimony is used to select between possible tree topologies.
Worked Solution: (a) Feathery stigma is the shared derived character for the entire ingroup. All four ingroup grasses have feathery stigmas, and the outgroup brome does not, so it evolved after the split between brome and the ingroup. Awn on seed is present in the outgroup, so it is ancestral; perennial growth is only present in a subset of the ingroup, so it is not a derived character for the entire ingroup. (b) The most parsimonious tree requires 4 changes. The topology rooted with brome is: Brome → (Bermuda grass, (Kentucky bluegrass, (Ryegrass, Tall fescue))). Changes are: 1) gain of feathery stigma in the ingroup, 2) loss of awn in the Kentucky bluegrass-Bermuda grass clade, 3) gain of perennial growth in Kentucky bluegrass, 4) re-gain of awn in the Ryegrass-Tall fescue clade. No other topology has fewer than 4 changes. (c) Maximum parsimony follows the principle that the simplest explanation is most likely to be correct. Independent evolution of the same trait multiple times is a rare evolutionary event, so the tree that requires the fewest independent changes is the most probable hypothesis for the true evolutionary relationship.
Question 3 (Application / Real-World Style)
Conservation biologists studying endangered island foxes sequence a 1200 base-pair mitochondrial gene from two populations: one on Santa Cruz Island and one on Santa Rosa Island. They find 12 nucleotide differences between the two populations. The mutation rate for this mitochondrial gene is substitutions per nucleotide per year. (a) Estimate how long ago the two populations diverged. (b) Explain why this result matters for designing conservation management plans.
Worked Solution: (a) Use the molecular clock formula: , , . Substitute: (b) The two populations have been isolated for 200,000 years, enough time for them to accumulate significant genetic differences and become distinct evolutionary lineages. This means conservation plans should protect both populations separately to preserve their unique genetic diversity, rather than managing them as a single population.
8. Quick Reference Cheatsheet
| Category | Formula/Rule | Notes |
|---|---|---|
| Shared derived character | Trait unique to a subset of ingroup | Resolves relationships within ingroup; classification is always relative to the clade |
| Shared ancestral character | Trait present in outgroup and all ingroup | Inherited from common ancestor; does not resolve ingroup relationships |
| Outgroup analysis | Compare trait presence in outgroup vs ingroup | Roots trees and distinguishes ancestral vs derived traits |
| Maximum parsimony | Select the tree with the fewest independent evolutionary changes | Simplest explanation is most probable; works for morphological and molecular data |
| Sister taxa | Two taxa sharing an exclusive most recent common ancestor | They are each other's closest relatives |
| Monophyletic group | Includes common ancestor + all descendants | Only monophyletic groups are valid clades |
| Molecular clock time since divergence | = number of differing nucleotides, = mutation rate per nucleotide, = sequence length. Factor of 2 accounts for independent mutation accumulation in both lineages | |
| Node rotation rule | Rotating nodes does not change evolutionary relationships | Tip order on the page is arbitrary; only branching order matters |
9. What's Next
Phylogenetics is the foundational framework for studying all evolutionary patterns across the tree of life, and it builds directly on the core principles of natural selection you learned earlier in Unit 7. Next you will apply the concepts from this chapter to the study of speciation and macroevolution, where you will use phylogenetic trees to trace the origin of new species and analyze patterns of adaptive radiation and mass extinction. Without mastering how to read trees, distinguish ancestral and derived traits, and apply principles like maximum parsimony, you will struggle to interpret evolutionary relationships and answer connection questions that are common on the AP exam. Follow-on topics that build on this chapter: Speciation, Macroevolution, Evidence for Evolution, Population Genetics