Transcription and RNA Processing — AP Biology Study Guide
For: AP Biology candidates sitting AP Biology.
Covers: Prokaryotic vs eukaryotic transcription initiation, elongation, and termination, 5' capping, 3' polyadenylation, RNA splicing, alternative splicing, introns, exons, and the core role of RNA polymerase in gene expression.
You should already know: Central dogma of molecular biology (DNA → RNA → protein). Structure of nucleic acids and base pairing rules. Prokaryotic vs eukaryotic cell compartmentalization.
A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the AP Biology style for educational use. They are not reproductions of past College Board / Cambridge / IB papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official mark schemes for grading conventions.
1. What Is Transcription and RNA Processing?
Transcription is the first step of gene expression, where RNA polymerase synthesizes a complementary RNA strand from a DNA template. Only 1-2% of the human genome codes for proteins, so transcription produces many functional non-coding RNAs (rRNA, tRNA, siRNA) in addition to messenger RNA (mRNA) that encodes proteins. RNA processing is a set of post-transcriptional modifications that exclusively occur in eukaryotic cells to convert pre-mRNA (the primary transcript) into mature, translation-ready mRNA. Per the AP Biology CED, this topic makes up approximately 3-4% of the total exam weight, within the 12-16% total weight for Unit 6: Gene Expression and Regulation. Questions on this topic appear regularly in both multiple-choice (MCQ) and free-response (FRQ) sections; MCQs often test identification of steps or comparisons between prokaryotes and eukaryotes, while FRQs frequently connect RNA processing to gene regulation or phenotypic variation. This topic also draws clear distinctions between prokaryotic and eukaryotic gene expression, a consistent point of exam emphasis.
2. Core Steps of Transcription
Transcription occurs in three core steps shared by all living organisms, with key differences between prokaryotes and eukaryotes. In initiation, RNA polymerase binds to a promoter sequence upstream of the gene, with the help of sigma factors (prokaryotes) or general transcription factors and the TATA box (eukaryotes). Only one strand of DNA, the template strand, is transcribed; the other strand, the coding strand, matches the sequence of the resulting mRNA except for thymine vs uracil. In elongation, RNA polymerase moves along the template strand 3' → 5', building the mRNA strand 5' → 3' by adding complementary ribonucleotides to the free 3' hydroxyl group. In termination, transcription ends when RNA polymerase reaches a termination sequence: rho-dependent or rho-independent sequences in prokaryotes, and a polyadenylation signal sequence in eukaryotes. Unlike DNA polymerase, RNA polymerase does not require a primer to start synthesis.
Worked Example
The 3' → 5' template strand sequence for a short gene is 3' - TAC GAT AGC TTA - 5'. What is the sequence of the matching coding strand, and what is the sequence of the pre-mRNA transcribed from this template?
- Transcription follows standard base-pairing rules, and strands are always antiparallel, so the coding strand will be complementary to the template and oriented 5' → 3'.
- The coding strand of DNA matches the mRNA sequence (same 5'→3' orientation) except thymine replaces uracil. Using complementary base pairing, the coding strand sequence is 5' - ATG CTA TCG AAT - 3'.
- Pre-mRNA is complementary to the template strand, with uracil replacing thymine. This gives the same sequence and orientation as the coding strand, with T swapped for U.
- The resulting pre-mRNA sequence is 5' - AUG CUA UCG AAU - 3'.
Exam tip: Always check the orientation of the DNA strand given in the question. If given the 5'→3' coding strand, the mRNA sequence is identical except T→U — you do not need to reverse-complement unless given the template strand.
3. Eukaryotic Pre-mRNA Processing
After transcription of eukaryotic protein-coding genes, the primary pre-mRNA transcript undergoes three key modifications before it can be exported to the cytoplasm and translated. First, a modified guanine 5' cap is added to the 5' end of the transcript; this protects the mRNA from degradation by nucleases and helps the ribosome bind to the mRNA to start translation. Second, a poly-A tail, a string of 50-250 adenine nucleotides, is added to the 3' end; this also protects against degradation and aids export of the mature mRNA from the nucleus. Third, RNA splicing removes non-coding intervening sequences called introns, and joins together the coding expressed sequences called exons. Splicing is carried out by the spliceosome, a complex made of small nuclear ribonucleoproteins (snRNPs) that recognize splice sites at the ends of introns.
Worked Example
A eukaryotic pre-mRNA has the following structure, in order from 5' to 3': 5' untranslated region (part of Exon 1: 50 bp), Exon 1 coding sequence (70 bp), Intron 1 (350 bp), Exon 2 (180 bp), Intron 2 (240 bp), Exon 3 (90 bp), Intron 3 (410 bp), Exon 4 (210 bp, includes 3' untranslated region). After standard (non-alternative) splicing, addition of a 1-nucleotide 5' cap, and a 200-nucleotide poly-A tail, what is the total length of the mature mRNA in nucleotides?
- Introns are completely removed during splicing, so we only sum the lengths of all exons, including untranslated regions which are part of exons.
- Sum of all exonic sequence: nucleotides.
- Add the 5' cap (1 nucleotide) and poly-A tail (200 nucleotides), which are both part of the final mature mRNA.
- Total length = nucleotides.
Exam tip: Remember that untranslated regions (UTRs) at the 5' and 3' ends of mRNA are part of exons, so they are retained in the mature mRNA, not spliced out like introns.
4. Alternative Splicing
Alternative splicing is a regulated post-transcriptional process unique to eukaryotes, where different combinations of exons from the same pre-mRNA transcript are assembled into different mature mRNA molecules. For example, a gene with 4 exons might produce a mature mRNA that includes all 4 exons in one cell type, and exclude one exon to produce a shorter mRNA in another cell type. This process explains the "gene number paradox": humans have only ~20,000 protein-coding genes, but produce hundreds of thousands of distinct proteins. Alternative splicing is also a form of gene regulation, allowing different cell types to produce different protein isoforms with distinct functions from the same gene. Common examples include tissue-specific cytoskeletal proteins and antibody variants in immune cells.
Worked Example
A pre-mRNA has 4 exons in order: Exon 1, Exon 2, Exon 3, Exon 4. Exon 2 can be either included or excluded during splicing, and Exon 3 can be either included or excluded. Exons 1 and 4 are always included, and exon order cannot be rearranged. List all possible unique mature mRNA sequences, and explain why this process increases eukaryotic phenotypic complexity.
- We retain all combinations of included/excluded Exons 2 and 3, keeping the original order of exons and always including Exons 1 and 4.
- The four unique mature mRNA sequences are: (1) Exon1-Exon2-Exon3-Exon4, (2) Exon1-Exon2-Exon4, (3) Exon1-Exon3-Exon4, (4) Exon1-Exon4.
- Each unique mature mRNA has a distinct nucleotide sequence, which is translated into a distinct protein with a different amino acid sequence.
- A single gene can now produce multiple functional proteins, increasing the total size of the proteome (all proteins an organism can produce) without increasing the number of genes. This allows for greater phenotypic complexity, as different cell types can produce specialized proteins from the same gene.
Exam tip: AP FRQs often ask you to connect alternative splicing to phenotypic variation: remember that one gene → multiple proteins → multiple phenotypes, which explains why organisms with relatively small numbers of genes can have complex traits.
5. Common Pitfalls (and how to avoid them)
- Wrong move: Stating that RNA processing occurs in prokaryotes. Why: Students confuse prokaryotic coupled transcription-translation with eukaryotic compartmentalization, and forget that processing is only needed for eukaryotic nuclear pre-mRNA. Correct move: Always remember that prokaryotic mRNA is not processed; transcription and translation occur simultaneously in the cytoplasm, so no capping, splicing, or polyadenylation occurs.
- Wrong move: Confusing the template strand and coding strand when writing the mRNA sequence, leading to reversed orientation or incorrect base pairing. Why: Many students memorize that mRNA is complementary to DNA, but forget that only the template strand is transcribed, and the coding strand matches mRNA except T→U. Correct move: Always label the orientation of the given DNA strand first: if given 3'→5' template, mRNA is 5'→3' complementary; if given 5'→3' coding, mRNA is same sequence with T→U.
- Wrong move: Counting introns when calculating the length of mature mRNA. Why: Students remember introns are non-coding, but forget they are completely removed during splicing before the mRNA is mature. Correct move: When calculating mature mRNA length, always exclude all intron sequences, sum only exons, then add the 5' cap and poly-A tail length if given.
- Wrong move: Claiming introns are "junk DNA" with no function that are just discarded. Why: Many textbooks historically called introns junk, but AP Bio now tests that introns can have regulatory functions, and some are processed into non-coding RNAs. Correct move: Never refer to introns as useless junk; acknowledge they are removed from pre-mRNA but many have functional roles in gene regulation.
- Wrong move: Stating that transcription copies the entire DNA molecule into RNA. Why: Students confuse transcription with DNA replication, which copies the entire genome. Correct move: Remember that transcription only copies individual genes (or operons in prokaryotes), initiated at a promoter sequence for the specific gene being expressed.
- Wrong move: Thinking alternative splicing rearranges the order of exons to make new proteins. Why: Students confuse alternative splicing with DNA recombination. Correct move: Alternative splicing only includes or excludes entire exons, never changes the order of exons in the mature mRNA.
6. Practice Questions (AP Biology Style)
Question 1 (Multiple Choice)
A researcher sequences a section of the human genome and identifies the following double-stranded DNA sequence corresponding to the beginning of a gene:
5' - ATGCGTACGTAG... - 3' (Coding Strand)
3' - TACGCATGCATC... - 5' (Template Strand)
Which of the following is the correct sequence of the first 10 nucleotides of the pre-mRNA transcribed from this gene? A. 5' - AUGCGUACGU - 3' B. 3' - UACGCAUGCA - 5' C. 5' - ATGCGTACG - 3' D. 5' - UACGCAUGCA - 3'
Worked Solution: The question provides the 5'→3' coding strand of DNA. The mRNA transcribed from the template strand has the same sequence and 5'→3' orientation as the coding strand, with the only difference being uracil (U) replaces thymine (T) in RNA. The first 10 nucleotides of the coding strand are ATGCGTACGT; replacing all T with U gives 5' - AUGCGUACGU - 3'. Common wrong answers come from reversing the orientation, using T instead of U, or incorrectly complementing the template strand. The correct answer is A.
Question 2 (Free Response)
A researcher is studying a gene expressed in both liver cells and brain cells of humans. The pre-mRNA for this gene has 5 exons, with the structure: Exon 1 - Intron 1 - Exon 2 - Intron 2 - Exon 3 - Intron 3 - Exon 4 - Intron 4 - Exon 5. Exon 2 is included in mature mRNA from liver cells but excluded in mature mRNA from brain cells. (a) Identify three modifications that are made to this pre-mRNA in both cell types before it exits the nucleus. (b) Predict the difference between the protein produced by the brain cell mRNA and the protein produced by the liver cell mRNA. Justify your prediction. (c) Explain how this process allows humans to produce more distinct proteins than the number of genes they have in their genome.
Worked Solution: (a) Three universal eukaryotic pre-mRNA modifications are: (1) addition of a modified guanine 5' cap to the 5' end, (2) addition of a poly-A tail to the 3' end, (3) splicing to remove introns and join exons. All three modifications occur in both cell types before the mature mRNA is exported to the cytoplasm. (b) Prediction: The brain cell protein will be shorter (have fewer amino acids) than the liver cell protein, with a different three-dimensional structure and function. Justification: Exclusion of Exon 2 removes the codons encoded by that exon from the mature mRNA. Since the genetic code is read in consecutive 3-nucleotide codons, the resulting protein will be missing all amino acids encoded by Exon 2, leading to a shorter protein with an altered amino acid sequence and function. (c) Alternative splicing allows different combinations of exons from the same pre-mRNA transcript to be assembled into different mature mRNA molecules. Each different mature mRNA is translated into a distinct protein with a unique amino acid sequence and function. As a result, a single gene can produce multiple distinct protein products, increasing the total number of proteins (the proteome) beyond the number of genes present in the genome.
Question 3 (Application / Real-World Style)
Beta-thalassemia is a human genetic disorder caused by a point mutation in the beta-globin gene. The mutation changes a single nucleotide in the 130-nucleotide first intron of the beta-globin pre-mRNA, preventing the spliceosome from recognizing the intron's 3' splice site. As a result, the entire intron is retained in the mature mRNA. Explain the effect of this mutation on the beta-globin protein produced from this mutated gene.
Worked Solution: The spliceosome normally removes all introns from pre-mRNA during RNA processing. If the spliceosome cannot recognize the splice site, the 130-nucleotide intron will remain in the mature mRNA. Ribosomes read mRNA in consecutive 3-nucleotide codons, and 130 is not a multiple of 3, so the mutation shifts the reading frame of all codons downstream of the retained intron. This causes every amino acid after the intron insertion to be incorrect, and almost always introduces a premature stop codon, resulting in a non-functional beta-globin protein. In context, this non-functional protein leads to the reduced oxygen transport characteristic of beta-thalassemia.
7. Quick Reference Cheatsheet
| Category | Rule | Notes |
|---|---|---|
| Transcription Direction | RNA always synthesized , reads DNA template | Applies to all prokaryotes and eukaryotes |
| mRNA vs DNA Sequence | mRNA matches 5'→3' coding strand: T → U | Complement 3'→5' template strand if template is given |
| 5' Cap Function | Protects mRNA from degradation, aids ribosome binding | Only added to eukaryotic pre-mRNA |
| Poly-A Tail Function | Protects mRNA from degradation, aids nuclear export | Only added to eukaryotic pre-mRNA |
| RNA Splicing Rule | Introns removed; exons (including UTRs) retained | Spliced by spliceosome made of snRNPs |
| Alternative Splicing Rule | One pre-mRNA → multiple mature mRNAs via different exon combinations | Exon order is always preserved, never rearranged |
| Compartmentalization | Transcription/processing in nucleus; translation in cytoplasm (eukaryotes) | Prokaryotes: coupled transcription/translation in cytoplasm, no processing |
| RNA Polymerase | Catalyzes phosphodiester bond formation between ribonucleotides | Does not require a primer, unlike DNA polymerase |
8. What's Next
This chapter lays the foundational prerequisite for translation, the next step in the central dogma, where mature mRNA is decoded to build a protein. Without understanding how mature mRNA is produced and processed, you cannot interpret how mutations in splice sites or promoters affect protein function, a common AP exam topic. Transcription and RNA processing also underpin all of gene regulation, the core of Unit 6, because many regulatory mechanisms act at the level of transcription initiation or alternative splicing to control when and where proteins are produced. This topic also connects directly to mutations and phenotypic variation, which is tested frequently in FRQ questions across all exam forms. Next topics you will apply these concepts to: