Data Representation & Data Types
A2 Level — Unit 4: Architecture, Data, Communication & Applications
Binary Arithmetic at A2 Level
Binary Addition
Binary addition follows the same rules as denary addition, using four basic rules:
| Operation | Result | Carry |
|---|---|---|
| 0 + 0 | 0 | 0 |
| 0 + 1 | 1 | 0 |
| 1 + 0 | 1 | 0 |
| 1 + 1 | 0 | 1 |
When adding 1 + 1 + 1 (from a carry), the result is 1 with a carry of 1.
Worked Example: Add 01101010 (106) and 00110101 (53)
1 1 1 1 1 0
0 1 1 0 1 0 1 0 (106)
+ 0 0 1 1 0 1 0 1 ( 53)
-------------------
1 0 0 1 1 1 1 1 (159)
Binary Subtraction Using Two’s Complement
To subtract binary number B from binary number A, we add A to the two’s complement of B. This avoids the need for a dedicated subtraction circuit.
Method: A - B = A + (two’s complement of B)
Worked Example: Calculate 10010110 (150) - 00110010 (50) using 8-bit unsigned values converted to two’s complement subtraction.
Step 1: Find two’s complement of 00110010
- Flip all bits: 11001101
- Add 1: 11001110
Step 2: Add the original number to the two’s complement
1 1 1 1 1 1 0
1 0 0 1 0 1 1 0 (150)
+ 1 1 0 0 1 1 1 0 (-50 in two's complement)
-------------------
1 0 1 1 0 0 1 0 0
Step 3: Discard the overflow (9th) bit. Result: 01100100 = 100.
150 - 50 = 100. Correct.
When performing binary subtraction, always use the two’s complement method: negate the number being subtracted (flip bits, add 1), then add. If there is a carry out of the most significant bit, discard it. Show every step of your working clearly in exam answers.
Two’s Complement Representation
Two’s complement is the standard method for representing signed (positive and negative) integers in binary. The most significant bit (MSB) acts as the sign bit: 0 for positive, 1 for negative. The key advantage is that addition and subtraction work identically for both positive and negative numbers without special handling.
Converting Positive Numbers
Positive numbers in two’s complement are identical to standard unsigned binary, with a leading 0 as the sign bit.
Example: +25 in 8-bit two’s complement = 00011001
Converting Negative Numbers
To convert a negative denary number to two’s complement:
- Write the positive version in binary
- Flip all bits (one’s complement)
- Add 1
Worked Example: Convert -25 to 8-bit two’s complement
- +25 = 00011001
- Flip: 11100110
- Add 1: 11100111
Converting Two’s Complement Back to Denary
For a negative two’s complement number (MSB = 1):
- Flip all bits
- Add 1
- The result is the magnitude; apply a negative sign
Worked Example: Convert 11010100 to denary
- Flip: 00101011
- Add 1: 00101100 = 44
- Answer: -44
Alternatively, you can calculate directly: the MSB has a place value of -128, and all other bits have their normal positive place values.
11010100 = -128 + 64 + 16 + 4 = -44
Range of Two’s Complement
For an n-bit two’s complement number:
- Minimum value: -2^(n-1)
- Maximum value: 2^(n-1) - 1
- Total values: 2^n
| Bits | Minimum | Maximum |
|---|---|---|
| 4 | -8 | +7 |
| 8 | -128 | +127 |
| 16 | -32,768 | +32,767 |
Two’s Complement Arithmetic
Worked Example: Calculate -35 + 20 using 8-bit two’s complement
-35 in two’s complement:
- 35 = 00100011
- Flip: 11011100
- Add 1: 11011101
20 in two’s complement: 00010100
1 1 1
1 1 0 1 1 1 0 1 (-35)
+ 0 0 0 1 0 1 0 0 (+20)
-------------------
1 1 1 1 0 0 0 1
Convert result 11110001 back: flip = 00001110, add 1 = 00001111 = 15, so answer = -15
Check: -35 + 20 = -15. Correct.
Sign and Magnitude Representation
Sign and magnitude is a simpler method for representing signed integers. The most significant bit represents the sign (0 = positive, 1 = negative) and the remaining bits represent the magnitude (absolute value) of the number.
Examples
- +25 in 8-bit sign and magnitude: 00011001 (sign bit 0, magnitude 25)
- -25 in 8-bit sign and magnitude: 10011001 (sign bit 1, magnitude 25)
Range of Sign and Magnitude
For an n-bit sign and magnitude number:
- Minimum value: -(2^(n-1) - 1)
- Maximum value: +(2^(n-1) - 1)
- Total unique values: 2^n - 1 (because zero has two representations: +0 and -0)
| Bits | Minimum | Maximum |
|---|---|---|
| 4 | -7 | +7 |
| 8 | -127 | +127 |
Comparison: Two’s Complement vs Sign and Magnitude
| Feature | Two’s Complement | Sign and Magnitude |
|---|---|---|
| Range (8-bit) | -128 to +127 | -127 to +127 |
| Representations of zero | One (00000000) | Two (+0 = 00000000, -0 = 10000000) |
| Arithmetic | Standard addition/subtraction works directly | Requires special handling for different signs |
| Hardware complexity | Simpler (single adder circuit) | More complex (must check signs, handle magnitude separately) |
| Industry use | Standard in virtually all modern computers | Rarely used for integers; used in some floating point mantissa representations |
Two’s complement is preferred over sign and magnitude because: (1) there is only one representation of zero, (2) the same addition circuit works for both positive and negative numbers, and (3) it has a slightly wider range (one extra negative value). These are the three key points examiners look for.
Floating Point Representation
Floating point representation stores real numbers (numbers with fractional parts) using two components: the mantissa (which holds the significant digits of the number) and the exponent (which determines the position of the binary point). This is analogous to scientific notation in denary.
A floating point number is calculated as:
Value = Mantissa x 2^Exponent
For example, using a 12-bit floating point with 8-bit mantissa and 4-bit exponent (both in two’s complement):
- The mantissa represents a fixed-point fractional value with the binary point immediately after the sign bit
- The exponent is an integer in two’s complement that shifts the binary point
Normalisation
Normalisation is the process of adjusting the mantissa and exponent so that the mantissa uses all available bits to store significant digits, thereby maximising the precision of the representation.
A normalised floating point number follows these rules:
- Positive numbers: the mantissa starts with 0.1 (the sign bit is 0, and the next bit is 1)
- Negative numbers: the mantissa starts with 1.0 (the sign bit is 1, and the next bit is 0)
In other words, the first two bits of the mantissa must be different.
Why normalise? An unnormalised mantissa has leading zeros (positive) or leading ones (negative) after the sign bit that waste bits without adding precision. Normalisation eliminates these redundant bits, giving the maximum number of significant digits for the available bit length.
Example of unnormalised vs normalised:
Unnormalised: mantissa = 0.0010110, exponent = 0101 Normalised: mantissa = 0.1011000, exponent = 0011
Both represent the same value, but the normalised form has more significant bits available.
Worked Example 1: Convert Denary to Floating Point
Convert 13.5 to a 12-bit floating point number with 8-bit mantissa and 4-bit exponent (both two’s complement).
Step 1: Convert 13.5 to fixed-point binary
- 13 = 1101
- 0.5 = .1
- 13.5 = 1101.1
Step 2: Write in binary with the point after the sign bit position
- 0.11011 x 2^4 (shift the point 4 places left to get 0.1 at the start)
Step 3: Encode
- Mantissa: 0.1101100 = 01101100 (padded with trailing zeros)
- Exponent: 4 = 0100
Result: 01101100 0100
Worked Example 2: Convert Negative Denary to Floating Point
Convert -5.75 to a 12-bit floating point number with 8-bit mantissa and 4-bit exponent.
Step 1: Convert +5.75 to binary
- 5 = 101
- 0.75 = .11
- 5.75 = 101.11
Step 2: In normalised positive form: 0.10111 x 2^3
Step 3: Negate the mantissa using two’s complement to represent -5.75
- Positive mantissa: 01011100
- Flip: 10100011
- Add 1: 10100100
Step 4: Encode exponent: 3 = 0011
Result: 10100100 0011
Verify: Mantissa starts with 1.0 (sign bit = 1, next bit = 0), so it is correctly normalised for a negative number.
Worked Example 3: Convert Floating Point to Denary
Convert mantissa = 01011000, exponent = 0011 to denary.
Step 1: Mantissa as fixed-point fraction = 0.1011000
Step 2: Exponent = 0011 = 3
Step 3: Shift the binary point 3 places to the right: 0101.1000
Step 4: Convert to denary: 4 + 1 + 0.5 = 5.5
Worked Example 4: Convert Negative Floating Point to Denary
Convert mantissa = 10110000, exponent = 0100 to denary.
Step 1: The mantissa is negative (MSB = 1). Find the magnitude using two’s complement.
- Flip: 01001111
- Add 1: 01010000
- So the magnitude mantissa = 0.1010000
Step 2: Exponent = 0100 = 4
Step 3: Shift point 4 places right: 01010.000 = 10.0 (magnitude)
Wait, let’s be more careful. The magnitude mantissa 0.1010000 with exponent 4 means shift 4 right: 01010.000 = 10.
Step 4: Apply the negative sign: -10
To check normalisation: the first two bits of the mantissa must differ. For positive numbers: 01…, for negative numbers: 10…. If you see 00… or 11… at the start of the mantissa, it is not normalised. When converting negative numbers, negate the mantissa using two’s complement after normalising the positive version.
Advantages and Disadvantages: Integer vs Floating Point
| Feature | Integer | Floating Point |
|---|---|---|
| Precision | Exact representation of whole numbers | May introduce rounding errors |
| Range | Limited to whole numbers within a fixed range | Can represent very large and very small numbers |
| Speed | Faster arithmetic operations | Slower due to mantissa/exponent handling |
| Storage | More efficient (every bit stores magnitude) | Less efficient (bits split between mantissa and exponent) |
| Use cases | Counting, indexing, loop counters, exact values | Scientific calculations, measurements, currency (with care), graphics |
Key trade-off: Floating point provides a much larger range at the cost of precision. Integer provides exact precision for whole numbers but cannot represent fractional values.
The precision of floating point depends on the number of bits allocated to the mantissa. More mantissa bits means more significant figures, but fewer exponent bits, reducing the range. This is a fundamental trade-off in floating point design.
Truncation and Rounding
When a number cannot be represented exactly in the available bits, it must be approximated. There are two main methods.
Truncation
Truncation simply removes (discards) the excess bits beyond the available precision. The remaining bits are left unchanged. This always rounds towards zero.
Example: Storing 0.1011011 in 5 fractional bits
- Truncated: 0.10110 (the last two bits are simply dropped)
- Actual value: 0.7109375
- Truncated value: 0.6875
- Error: 0.0234375
Truncation is simpler to implement but introduces a systematic bias towards zero, because it always makes positive numbers smaller and negative numbers larger (closer to zero).
Rounding
Rounding adjusts the value to the nearest representable number. If the discarded portion is greater than or equal to half the value of the last retained bit, the last retained bit is rounded up.
Example: Storing 0.1011011 in 5 fractional bits
- Look at the first discarded bit: 1 (which is >= half)
- Rounded: 0.10111 (the last retained bit is increased)
- Actual value: 0.7109375
- Rounded value: 0.71875
- Error: 0.0078125
Rounding produces smaller errors on average and does not introduce a systematic bias, but is more complex to implement.
Absolute and Relative Errors
| Absolute error is the magnitude of the difference between the true value and the approximated value: | true value - stored value | . |
Relative error is the absolute error expressed as a proportion of the true value: absolute error / |true value|. It is often expressed as a percentage.
Worked Example: A value of 7.3 is stored as 7 due to truncation.
-
Absolute error = 7.3 - 7 = 0.3 - Relative error = 0.3 / 7.3 = 0.0411 or 4.11%
Worked Example: A value of 150.8 is stored as 151 due to rounding.
-
Absolute error = 150.8 - 151 = 0.2 - Relative error = 0.2 / 150.8 = 0.00133 or 0.133%
Relative error is more meaningful than absolute error when comparing the significance of errors across different magnitudes. An absolute error of 0.5 is much more significant for the value 2 (25% relative error) than for the value 1000 (0.05% relative error).
Exam questions often ask you to compare truncation and rounding. Key points: truncation is simpler but introduces systematic bias; rounding is more accurate on average but more complex. Always calculate both absolute and relative error when asked, and show your formula and working.
Shift Functions
Shift operations move all bits in a binary number left or right by a specified number of positions. They are fundamental to efficient multiplication and division.
Logical Shift Left
Logical shift left moves all bits to the left by the specified number of positions. Vacated positions on the right are filled with 0s. Bits shifted out of the left end are lost. Each shift left multiplies the value by 2.
Example: Logical shift left by 2 on 00101100 (44)
Before: 0 0 1 0 1 1 0 0 (44)
After: 1 0 1 1 0 0 0 0 (176)
44 x 2^2 = 44 x 4 = 176. Correct.
Logical Shift Right
Logical shift right moves all bits to the right by the specified number of positions. Vacated positions on the left are filled with 0s. Bits shifted out of the right end are lost. Each shift right divides the value by 2 (integer division, truncating any remainder).
Example: Logical shift right by 1 on 00101100 (44)
Before: 0 0 1 0 1 1 0 0 (44)
After: 0 0 0 1 0 1 1 0 (22)
44 / 2 = 22. Correct.
Arithmetic Shift Left
Arithmetic shift left works identically to logical shift left: bits move left, vacated positions on the right are filled with 0s. The sign bit may change, potentially causing overflow.
For signed numbers, arithmetic shift left is the same operation as logical shift left, but the programmer must watch for overflow if the sign bit changes.
Arithmetic Shift Right
Arithmetic shift right moves all bits to the right by the specified number of positions. Vacated positions on the left are filled with a copy of the sign bit (the MSB). This preserves the sign of the number, making it suitable for dividing signed (two’s complement) numbers by powers of 2.
Example: Arithmetic shift right by 1 on 11010100 (-44 in two’s complement)
Before: 1 1 0 1 0 1 0 0 (-44)
After: 1 1 1 0 1 0 1 0 (-22)
The sign bit (1) is preserved and copied into the vacated position. -44 / 2 = -22. Correct.
Difference Between Logical and Arithmetic Shifts
| Feature | Logical Shift | Arithmetic Shift |
|---|---|---|
| Left shift fill | 0s on right | 0s on right (same as logical) |
| Right shift fill | 0s on left | Copy of sign bit on left |
| Treats number as | Unsigned | Signed (two’s complement) |
| Right shift of negative number | Produces incorrect result (becomes positive) | Preserves sign (correct division) |
| Use case | Unsigned binary manipulation, bit masking | Signed integer multiplication/division |
Applying Shifts for Multiplication and Division
Shifting is much faster than using the ALU’s multiply/divide circuits, so compilers often replace multiplication/division by powers of 2 with shift operations.
| Operation | Shift | Example |
|---|---|---|
| Multiply by 2 | Shift left by 1 | 5 « 1 = 10 |
| Multiply by 4 | Shift left by 2 | 5 « 2 = 20 |
| Multiply by 8 | Shift left by 3 | 5 « 3 = 40 |
| Multiply by 2^n | Shift left by n | General rule |
| Divide by 2 | Shift right by 1 | 20 » 1 = 10 |
| Divide by 4 | Shift right by 2 | 20 » 2 = 5 |
| Divide by 2^n | Shift right by n | General rule |
Code Examples
# Python shift operations
value = 12
# Shift left (multiply)
result = value << 1 # 12 * 2 = 24
print(f"{value} << 1 = {result}")
result = value << 3 # 12 * 8 = 96
print(f"{value} << 3 = {result}")
# Shift right (divide)
result = value >> 1 # 12 / 2 = 6
print(f"{value} >> 1 = {result}")
result = value >> 2 # 12 / 4 = 3
print(f"{value} >> 2 = {result}")
# Combining shifts for multiplication by non-powers of 2
# Multiply by 10 = multiply by 8 + multiply by 2
x = 7
result = (x << 3) + (x << 1) # (7*8) + (7*2) = 56 + 14 = 70
print(f"{x} * 10 = {result}")
' VB.NET shift operations
Dim value As Integer = 12
' Shift left (multiply)
Dim result As Integer = value << 1 ' 12 * 2 = 24
Console.WriteLine($"{value} << 1 = {result}")
result = value << 3 ' 12 * 8 = 96
Console.WriteLine($"{value} << 3 = {result}")
' Shift right (divide)
result = value >> 1 ' 12 / 2 = 6
Console.WriteLine($"{value} >> 1 = {result}")
result = value >> 2 ' 12 / 4 = 3
Console.WriteLine($"{value} >> 2 = {result}")
' Combining shifts for multiplication by non-powers of 2
' Multiply by 10 = multiply by 8 + multiply by 2
Dim x As Integer = 7
result = (x << 3) + (x << 1) ' (7*8) + (7*2) = 56 + 14 = 70
Console.WriteLine($"{x} * 10 = {result}")
When answering shift questions, always state whether the shift is logical or arithmetic and what fills the vacated bits. For multiplication, shift LEFT; for division, shift RIGHT. Remember that shifting right by n is integer division: any fractional part is lost. For example, 7 » 1 = 3 (not 3.5). To multiply by a number that is not a power of 2, combine shifts and additions (e.g. multiply by 6 = shift left 2 then add shift left 1, since 6 = 4 + 2).
Overflow and Underflow
Overflow
Overflow occurs when the result of an arithmetic operation is too large (positive) or too small (negative) to be represented in the available number of bits.
Causes of overflow:
- Adding two large positive numbers whose sum exceeds the maximum representable value
- Adding two large negative numbers whose sum is below the minimum representable value
- Shifting bits left so that significant bits are lost from the MSB
- In floating point: when the exponent exceeds its maximum positive value
Example (8-bit two’s complement, range -128 to +127):
0 1 1 1 1 1 1 1 (+127)
+ 0 0 0 0 0 0 0 1 (+1)
-------------------
1 0 0 0 0 0 0 0 (-128 in two's complement!)
Adding 127 + 1 should give 128, but this cannot be stored in 8-bit two’s complement (max +127). The result wraps around to -128, which is incorrect.
Detecting overflow in two’s complement: Overflow has occurred if:
- Two positive numbers are added and the result is negative, OR
- Two negative numbers are added and the result is positive
Overflow cannot occur when adding a positive and a negative number.
Underflow
Underflow occurs in floating point arithmetic when a number is too close to zero to be represented. The exponent becomes more negative than the minimum allowed value, and the number collapses to zero. This is different from overflow, which involves numbers that are too large.
Causes of underflow:
- Dividing a very small floating point number by a very large number
- Multiplying two very small floating point numbers
- Subtracting two nearly equal floating point numbers
- When the exponent goes below its minimum negative value
Example: With a 4-bit two’s complement exponent (range -8 to +7), if a calculation requires an exponent of -9, underflow occurs because -9 is outside the representable range. The number is too small to store and is typically rounded to zero.
Consequences of Overflow and Underflow
| Issue | Consequence |
|---|---|
| Overflow | Result wraps around to an incorrect value (e.g. large positive becomes negative); silent errors in programs that do not check; potential security vulnerabilities (e.g. buffer overflow exploits) |
| Underflow | Small values collapse to zero; loss of precision in calculations; division by zero errors if the underflowed value is subsequently used as a divisor |
Detecting and Handling in Code
# Python handles arbitrary precision integers natively,
# so integer overflow does not occur in Python.
# However, we can demonstrate the concept:
import sys
# Python integers have no fixed size, but we can simulate 8-bit
max_8bit = 127
result = max_8bit + 1
print(f"In Python: 127 + 1 = {result}") # 128 (no overflow)
# For floating point, Python can detect overflow:
try:
huge = 1.7976931348623157e+308 * 2
print(f"Result: {huge}") # Will print 'inf'
except OverflowError:
print("Floating point overflow detected")
# Underflow example
tiny = 5e-324 # Smallest positive float
result = tiny / 2
print(f"Underflow: {tiny} / 2 = {result}") # Prints 0.0
' VB.NET overflow detection
Module OverflowDemo
Sub Main()
' Integer overflow detection
Try
Dim max As Integer = Integer.MaxValue ' 2,147,483,647
Dim result As Integer = checked(max + 1)
' The checked keyword forces overflow checking
Catch ex As OverflowException
Console.WriteLine("Integer overflow detected!")
End Try
' Floating point overflow
Dim huge As Double = Double.MaxValue * 2
If Double.IsInfinity(huge) Then
Console.WriteLine("Floating point overflow: result is infinity")
End If
' Floating point underflow
Dim tiny As Double = Double.Epsilon / 2
Console.WriteLine($"Underflow result: {tiny}") ' Prints 0
End Sub
End Module
Overflow and underflow are distinct concepts. Overflow = result too large to represent (applies to both integers and floating point). Underflow = result too close to zero to represent (applies only to floating point). For integer overflow detection in two’s complement, check whether the signs of the inputs and the sign of the result are inconsistent. In exam answers, always give a specific numerical example to demonstrate the error.