A-Level · cie-9618 · A-Level Computer Science · Data Representation · 18 min read · Updated 2026-05-06

Data Representation — A-Level Computer Science Study Guide

For: A-Level Computer Science candidates sitting A-Level Computer Science.

Covers: Number bases (binary, decimal, hexadecimal), two's complement for signed integers, floating-point representation, character encoding (ASCII, Unicode), and bitmap/vector image + sound encoding rules and calculations.

You should already know: Basic programming concepts; one of Python / Java / VB.

A note on the practice questions: All worked questions in the "Practice Questions" section below are original problems written by us in the A-Level Computer Science style for educational use. They are not reproductions of past Cambridge International examination papers and may differ in wording, numerical values, or context. Use them to practise the technique; cross-check with official Cambridge mark schemes for grading conventions.

1. What Is Data Representation?

Data Representation is the set of standardised, hardware-compatible formats used to store all types of information (numbers, text, images, audio) as binary digits (bits, 0s and 1s) in computer memory. Since all digital processors only recognise binary signals, every piece of input to a computer must be converted to this standard format before processing, and converted back to human-readable form for output. This is a core Paper 1 topic for A-Level Computer Science, tested in both multiple choice and structured questions, and accounts for 5-8% of total exam marks on average.

2. Number bases — binary, decimal, hex

All number systems use place values equal to powers of their base, and only use digits smaller than the base value:

Decimal (base 10): The human-readable standard, uses digits 0-9, place values are powers of 10.
Binary (base 2): The native format for computers, uses only digits 0 and 1, place values are powers of 2.
Hexadecimal (base 16): A shorthand for binary used for memory addresses, colour codes, and debugging, uses digits 0-9 and A-F (where A=10, B=11, ..., F=15), place values are powers of 16. Each hex digit maps directly to 4 binary bits, making it far more compact than binary for human use.

Worked Example

Convert decimal 142 to 8-bit binary: Divide by 2 repeatedly and collect remainders from last to first: $14 2_{10} = 1000111 0_{2}$ (pad with one leading zero to reach 8 bits).
Convert the 8-bit binary $1000111 0_{2}$ to hex: Group bits in sets of 4 from the right: $100 0_{2} = 8_{16}$ , $111 0_{2} = E_{16}$ , so total $8 E_{16}$ .
Convert hex $A 3 F_{16}$ to decimal: $10 \times 1 6^{2} + 3 \times 1 6^{1} + 15 \times 1 6^{0} = 2560 + 48 + 15 = 262 3_{10}$ .

Exam tip: Examiners often specify a fixed bit length for binary answers, so always pad leading zeros to meet the required length before converting to hex or performing two's complement calculations.

3. Two's complement for signed integers

Standard unsigned binary only represents positive integers. Two's complement is the global standard for storing signed (positive and negative) integers because it eliminates the need for separate addition and subtraction circuits for negative numbers, simplifying processor design. Key rules for n-bit two's complement:

The leftmost bit is the sign bit: 0 = positive number, 1 = negative number.
The range of values is $- 2^{n - 1}$ to $2^{n - 1} - 1$ (for 8-bit, this is -128 to +127).
To get the two's complement of a negative number: Write the positive equivalent as n-bit binary, flip all bits, then add 1.

Worked Example

Find the 8-bit two's complement representation of -47:

Write +47 as 8-bit binary: $0010111 1_{2}$
Flip all bits: $1101000 0_{2}$
Add 1: $1101000 1_{2}$
Verify: The unsigned value of $1101000 1_{2}$ is 209, so $209 - 2^{8} = 209 - 256 = - 47$ , which is correct.

You can also perform arithmetic directly on two's complement values: Adding $0010000 0_{2}$ (+32) to $1101000 1_{2}$ (-47) gives $1111000 1_{2}$ , which converts to $241 - 256 = - 15$ , the correct result of 32 - 47.

4. Floating-point representation

Integers cannot represent fractions, very large numbers, or very small numbers efficiently. Floating-point representation is analogous to scientific notation, and stores values as a combination of a mantissa (significand) and exponent, both stored as two's complement values. Key rules for A-Level Computer Science floating-point systems:

Value = $mantissa \times 2^{exponent}$
Normalisation is required to maximise precision and ensure every value has a unique representation: For positive mantissas, the first two bits must be 01; for negative mantissas, the first two bits must be 10.
When shifting the mantissa to normalise, adjust the exponent to compensate: A left shift of k bits reduces the exponent by k, a right shift of k bits increases the exponent by k.

Worked Example

A 16-bit floating point system uses a 10-bit two's complement mantissa and 6-bit two's complement exponent, normalised.

Calculate the value of mantissa = $011010000 0_{2}$ , exponent = $00010 1_{2}$ :

The mantissa is a fractional value: $0.110 1_{2} = 0.5 + 0.25 + 0.0625 = 0.812 5_{10}$
The exponent is $00010 1_{2} = 5_{10}$
Total value = $0.8125 \times 2^{5} = 0.8125 \times 32 = 2 6_{10}$

Normalise an unnormalised mantissa $000110100 0_{2}$ with exponent $00011 1_{2}$ :

Shift the mantissa left 2 times to get $011010000 0_{2}$ (meets positive normalisation rule)
Subtract 2 from the exponent: $00011 1_{2} - 00001 0_{2} = 00010 1_{2}$

5. Character encoding — ASCII, Unicode

Characters (letters, numbers, symbols, emojis) are stored as binary values via standardised encoding mappings that assign a unique binary code to each character.

ASCII (American Standard Code for Information Interchange): The original character encoding standard, uses 7 bits to store 128 characters, including upper and lowercase English letters, digits, punctuation, and control characters (e.g. line break). Extended ASCII uses 8 bits to store 256 characters, adding accented letters for Western European languages. Limitation: Cannot support non-Latin scripts like Chinese, Arabic, or Cyrillic, or emojis.
Unicode: A universal encoding standard designed to represent every written language in the world, plus symbols and emojis. Common Unicode encodings include:
UTF-8: Uses 1-4 bytes per character, backward compatible with ASCII, and is the dominant encoding for the web.
UTF-16: Uses 2 or 4 bytes per character, common for operating system internal text storage.

Worked Example

The ASCII code for uppercase 'A' is $6 5_{10} = 0100000 1_{2}$ , and lowercase 'a' is $9 7_{10} = 0110000 1_{2}$ (the 6th bit is flipped between cases).
The Unicode code point for the euro symbol € is U+20AC, represented as $E 282 A C$ in UTF-8, and $20 A C$ in UTF-16.

Exam tip: Examiners frequently ask for comparisons of ASCII and Unicode. Always reference use cases: ASCII is sufficient for simple English-only text and uses less storage, while Unicode is required for multilingual applications.

6. Bitmap vs vector images, sound encoding

Multimedia content (images and audio) follows specific encoding rules to balance quality and file size.

Image Encoding

Bitmap (raster) images: Stored as a grid of individual pixels, each with a colour value defined by the colour depth (number of bits per pixel, e.g. 24-bit = 16.7 million colours). Resolution is the number of pixels per inch.
File size (bytes) = $\frac{width (pixels) \times height (pixels) \times colour depth}{8}$
Pros: Supports photorealistic detail, standard for photos.
Cons: Pixelates when scaled up, large file sizes for high-resolution content.
Vector images: Stored as mathematical definitions of shapes, lines, curves, and fill colours, with coordinate references.
Pros: Infinitely scalable without quality loss, very small file sizes for simple graphics.
Cons: Cannot replicate photorealistic detail, requires specialised software to edit.

Sound Encoding

Analog sound waves are converted to digital format via sampling:

Sample rate: Number of samples of the wave taken per second, measured in Hz (standard CD quality is 44.1 kHz = 44100 samples per second).
Bit depth: Number of bits used to store each sample (standard CD quality is 16-bit).
File size (bytes) = $\frac{sample rate \times bit depth \times number of channels \times duration (seconds)}{8}$

Worked Example

File size of a 1920×1080 24-bit bitmap: $\frac{1920 \times 1080 \times 24}{8} = 6, 220, 800$ bytes = ~6MB.
File size of a 3-minute stereo (2-channel) audio track with 44.1 kHz sample rate and 16-bit depth: $\frac{44100 \times 16 \times 2 \times 180}{8} = 31, 752, 000$ bytes = ~31.75MB.

7. Common Pitfalls (and how to avoid them)

Wrong move: Forgetting to pad leading zeros when converting binary to hex, leading to incorrect grouping of 4 bits. Why: Students often group bits from the left instead of the right. Correct move: Always group bits from the least significant (rightmost) bit first, pad leading zeros to the left to make the total number of bits a multiple of 4.
Wrong move: Calculating two's complement of a negative number by only flipping bits and forgetting to add 1. Why: Confusion between one's complement and two's complement rules. Correct move: After flipping all bits of the positive equivalent, always add 1, then verify by converting back to decimal to check your result.
Wrong move: Shifting the mantissa during normalisation but forgetting to adjust the exponent. Why: Students focus only on meeting the normalisation format for the mantissa and ignore the scaling factor. Correct move: Every left shift of the mantissa subtracts 1 from the exponent, every right shift adds 1 to the exponent.
**Wrong move: Confusing sample rate and bit depth when calculating audio file size. Why: Both are measured in number-based units and are often listed close together in question text. Correct move: Label each value in your working explicitly, then apply the formula step by step to avoid mixing up variables.
Wrong move: Stating that Unicode uses 4 bytes for all characters. Why: Misunderstanding of variable-length Unicode encodings. Correct move: Specify that UTF-8 uses 1 byte for ASCII characters and 2-4 bytes for other characters, so Unicode character size varies by encoding, it is not fixed at 4 bytes.

8. Practice Questions (A-Level Computer Science Style)

Question 1

(a) Convert the decimal number 217 to 8-bit binary, then to hexadecimal. (2 marks) (b) Represent the decimal number -72 as an 8-bit two's complement integer. (2 marks)

Solution

(a) Divide 217 by 2 repeatedly, collect remainders from last to first: $1101100 1_{2}$ . Group into 4 bits: $1101 = D_{16}$ , $1001 = 9_{16}$ , so hex value $D 9_{16}$ . (1 mark for correct binary, 1 mark for correct hex) (b) Write +72 as 8-bit binary: $0100100 0_{2}$ . Flip all bits: $1011011 1_{2}$ . Add 1: $1011100 0_{2}$ . Verify: $184 - 256 = - 72$ , correct. (2 marks for correct final value, 1 mark for valid working if final answer is wrong)

Question 2

A 16-bit floating point system uses a 10-bit two's complement mantissa and 6-bit two's complement exponent, normalised. (a) Calculate the decimal value of the following floating point number: Mantissa = $010110000 0_{2}$ , Exponent = $00001 1_{2}$ . (3 marks) (b) Normalise the following unnormalised floating point number: Mantissa = $000011010 0_{2}$ , Exponent = $00011 0_{2}$ . Give the new mantissa and exponent. (2 marks)

Solution

(a) Mantissa value = $0.101 1_{2} = 0.5 + 0.125 + 0.0625 = 0.687 5_{10}$ . Exponent value = $00001 1_{2} = 3_{10}$ . Total value = $0.6875 \times 2^{3} = 5. 5_{10}$ . (1 mark for correct mantissa value, 1 mark for correct exponent value, 1 mark for final answer) (b) Shift mantissa left 3 times to meet normalisation rule: $011010000 0_{2}$ . Subtract 3 from exponent: $000110 - 000011 = 00001 1_{2}$ . New mantissa: $011010000 0_{2}$ , new exponent: $00001 1_{2}$ . (1 mark for correct mantissa, 1 mark for correct exponent)

Question 3

(a) Calculate the file size of a 1280×720 16-bit colour bitmap image, give your answer in megabytes (to 2 decimal places, use $1 MB = 1024 \times 1024$ bytes). (3 marks) (b) State two advantages of vector images over bitmap images for logo design. (2 marks)

Solution

(a) File size in bytes = $\frac{1280 \times 720 \times 16}{8} = 1, 843, 200$ bytes. Convert to MB: $\frac{1 , 843 , 200}{1024 \times 1024} \approx 1.76$ MB. (1 mark for correct formula application, 1 mark for correct byte value, 1 mark for correct MB conversion) (b) Any two valid advantages: 1) Vector logos can be scaled to any size (e.g. billboards, business cards) without pixelation or quality loss. 2) Vector logos have smaller file sizes for simple designs, making them faster to load on websites. 3) Vector logos are easier to edit (e.g. change colour, adjust shape) without quality loss. (1 mark per valid advantage, max 2 marks)

9. Quick Reference Cheatsheet

Category	Key Rules & Formulas
Number Bases	1. Decimal → Binary: Divide by 2, collect remainders from last to first. 2. Binary ↔ Hex: Group 4 bits from right, map each group to 0-F. 3. Hex → Decimal: $\sum (digit \times 1 6^{position})$ , position starts at 0 from the right.
Two's Complement	1. n-bit range: $- 2^{n - 1}$ to $2^{n - 1} - 1$ . 2. Negative value: Flip all bits of positive equivalent, add 1. 3. Value of n-bit signed number: If sign bit = 1, subtract $2^{n}$ from unsigned value of bits.
Floating Point	1. Value = $mantissa \times 2^{exponent}$ . 2. Normalisation rule: Positive mantissa starts with 01, negative mantissa starts with 10. 3. Left shift mantissa by k: subtract k from exponent; right shift by k: add k to exponent.
Character Encoding	1. ASCII: 7-bit (128 chars), 8-bit extended (256 chars), only Latin script. 2. Unicode: Supports all global languages; UTF-8 (1-4 bytes, web standard, ASCII compatible), UTF-16 (2/4 bytes).
Media Representation	1. Bitmap file size (bytes): $\frac{width \times height \times colour depth}{8}$ . 2. Audio file size (bytes): $\frac{sample rate \times bit depth \times channels \times duration (s)}{8}$ . 3. Vector: Math-based, scalable, small for simple graphics; Bitmap: Pixel-based, photorealistic, scales poorly.

10. What's Next

Data Representation is a foundational topic that underpins almost every other unit in the A-Level Computer Science syllabus. You will use number base and two's complement knowledge when learning processor architecture and low-level assembly programming, floating point concepts when studying data types in high-level programming and algorithm efficiency, media encoding when working with file handling and compression algorithms, and character encoding when building web or database applications that handle multilingual text. A strong grasp of this topic will also make it much easier to answer practical programming questions related to data manipulation and binary file operations, which are common in Paper 2 practical assessments.

If you have any questions about specific conversion steps, normalisation rules, or exam mark scheme conventions, you can ask Ollie, our AI tutor, at any time for personalised explanations and extra practice questions tailored to your weak spots. You can also find more A-Level Computer Science study materials and past paper practice on the homepage to test your understanding ahead of your exam.

Aligned with the Cambridge International AS & A Level Computer Science 9618 syllabus. OwlsAi is not affiliated with Cambridge Assessment International Education.

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →

Data Representation — A-Level Computer Science Study Guide

1. What Is Data Representation?

2. Number bases — binary, decimal, hex

Worked Example

3. Two's complement for signed integers

Worked Example

4. Floating-point representation

Worked Example

5. Character encoding — ASCII, Unicode

Worked Example

6. Bitmap vs vector images, sound encoding

Image Encoding

Sound Encoding

Worked Example

7. Common Pitfalls (and how to avoid them)

8. Practice Questions (A-Level Computer Science Style)

Question 1

Solution

Question 2

Solution

Question 3

Solution

9. Quick Reference Cheatsheet

10. What's Next

More study guides