Algorithms & Programs
AS Level — Unit 1: Fundamentals of Computer Science
What Is an Algorithm?
An algorithm is a finite, ordered sequence of unambiguous instructions that, given a set of inputs, produces a defined set of outputs and terminates in a finite amount of time.
An algorithm is essentially a step-by-step recipe for solving a problem. Before any code is written, a programmer designs an algorithm to describe the logic of the solution. Algorithms are language-independent – the same algorithm can be implemented in Python, VB.NET, Java, or any other language.
Characteristics of an Algorithm
- Input – An algorithm has zero or more inputs taken from a specified set.
- Output – An algorithm produces one or more outputs that have a defined relationship to the inputs.
- Finiteness – An algorithm must always terminate after a finite number of steps.
- Definiteness – Each step must be precisely and unambiguously defined.
- Effectiveness – Every step must be basic enough that it could, in principle, be carried out by a person using pencil and paper.
Why Algorithms Matter
- They allow us to plan a solution before writing code.
- They help us communicate solutions to other programmers regardless of programming language.
- They allow us to analyse the efficiency of a solution before implementing it.
- They form the basis for correctness proofs and testing strategies.
Representing Algorithms
Before implementing an algorithm in code, we represent it using one of three standard methods.
Pseudocode
Pseudocode is a structured, informal way of writing algorithms that uses natural language mixed with programming-like constructs. It is not tied to any specific programming language and is used to plan and communicate algorithms clearly.
Why Use Pseudocode?
- It allows you to focus on the logic of an algorithm without worrying about syntax.
- It is language-independent, so anyone can understand it regardless of their programming background.
- It is quicker to write than actual code during the design phase.
- It can be easily translated into any programming language.
Common Pseudocode Conventions
Variable assignment:
SET total TO 0
SET name TO "Alice"
Input and output:
INPUT age
OUTPUT "Your age is: ", age
Selection (IF/ELSE):
IF age >= 18 THEN
OUTPUT "Adult"
ELSE IF age >= 13 THEN
OUTPUT "Teenager"
ELSE
OUTPUT "Child"
END IF
Iteration (FOR loop):
FOR i FROM 1 TO 10
OUTPUT i
END FOR
Iteration (WHILE loop):
SET password TO ""
WHILE password != "secret"
INPUT password
END WHILE
Iteration (REPEAT-UNTIL loop):
REPEAT
INPUT number
UNTIL number >= 0
Subroutines:
FUNCTION calculateArea(length, width)
RETURN length * width
END FUNCTION
PROCEDURE displayMessage(message)
OUTPUT message
END PROCEDURE
Arrays:
SET marks TO [45, 67, 89, 32, 78]
OUTPUT marks[0]
SET marks[2] TO 90
Key Principles of Good Pseudocode
- Use consistent indentation to show the structure of loops and selections.
- Use meaningful variable names (e.g.
totalScorenotx). - Use capital letters for keywords (SET, IF, THEN, ELSE, WHILE, FOR, REPEAT, UNTIL, INPUT, OUTPUT, RETURN, FUNCTION, PROCEDURE, END).
- Keep it simple and readable – anyone should be able to follow the logic.
- Do not include language-specific syntax (e.g. do not use
:or{}orDim).
In the WJEC exam, pseudocode does not need to follow an exact standard, but it must be clear and unambiguous. Always use indentation and keywords consistently. If the examiner cannot follow your logic, you will lose marks. Write pseudocode as if someone else needs to implement it in any language.
Flowcharts
A flowchart is a diagrammatic representation of an algorithm. It uses standard symbols connected by arrows to show the flow of control through a program.
Standard Flowchart Symbols
| Symbol | Shape | Purpose |
|---|---|---|
| Terminator | Rounded rectangle (oval) | Marks the start or end of the flowchart. Labelled “Start” or “End” (or “Stop”). |
| Process | Rectangle | Represents a step or action (e.g. “Set total to 0”, “Calculate tax”). |
| Input/Output | Parallelogram | Represents input from the user or output to the screen (e.g. “Input name”, “Display result”). |
| Decision | Diamond | Represents a decision or condition with two outcomes: Yes/No or True/False. Must have exactly two exit paths. |
| Flow line | Arrow | Shows the direction of flow from one symbol to the next. |
| Subroutine | Rectangle with double vertical lines on sides | Represents a call to a pre-defined subroutine (function or procedure). |
Rules for Drawing Flowcharts
- Every flowchart must have exactly one Start terminator.
- Every flowchart must have at least one End terminator.
- Flow lines must have arrows showing direction.
- Decision diamonds must have exactly two labelled exit paths (Yes/No or True/False).
- All symbols must be connected – there should be no orphaned symbols.
- Flow should generally go from top to bottom and left to right.
Example: Check if a Number is Positive, Negative, or Zero
flowchart TD
A([Start]) --> B[/Input number/]
B --> C{"number > 0?"}
C -- Yes --> D[/Output 'Positive'/]
C -- No --> E{"number < 0?"}
E -- Yes --> F[/Output 'Negative'/]
E -- No --> G[/Output 'Zero'/]
D --> H([End])
F --> H
G --> H
Example: Validate Password Entry (Loop)
flowchart TD
A([Start]) --> B[/Input password/]
B --> C{"password = 'abc123'?"}
C -- Yes --> D[/Output 'Access Granted'/]
D --> E([End])
C -- No --> F[/Output 'Incorrect, try again'/]
F --> B
Example: Calculate Sum of Numbers 1 to 10
flowchart TD
A([Start]) --> B[Set total = 0]
B --> C[Set i = 1]
C --> D{"i <= 10?"}
D -- No --> E[/Output total/]
E --> F([End])
D -- Yes --> G[total = total + i]
G --> H[i = i + 1]
H --> D
A flowchart is a visual representation of an algorithm using standard symbols (terminators, processes, decisions, input/output) connected by arrows to show the flow of control.
In the exam, always use the correct shapes for each symbol. The most common mistake is using a rectangle for a decision instead of a diamond, or forgetting to label the Yes/No paths on a decision. Practice drawing flowcharts neatly by hand. Every decision must have exactly two clearly labelled exits.
Structured English
Structured English uses a restricted subset of natural language combined with programming constructs. It avoids technical jargon and is useful for communicating with non-technical stakeholders.
Example:
1. Start with the first item in the list.
2. Compare it with every other item.
3. If a larger item is found, remember it as the new largest.
4. After checking all items, display the largest item.
In the exam, you may be asked to write an algorithm in pseudocode or to trace through a flowchart. Make sure you are comfortable with all three representations and can convert between them.
Constants and Their Importance
A constant is a named value that is set once and cannot be changed during program execution. Constants are declared in a similar way to variables but their value is fixed.
# Constants (Python convention: use UPPER_CASE names)
VAT_RATE = 0.20
MAX_STUDENTS = 30
SCHOOL_NAME = "Greenfield Academy"
PI = 3.14159265
def calculate_price(base_price):
return base_price * (1 + VAT_RATE)
' Constants (VB.NET uses the Const keyword)
Const VAT_RATE As Double = 0.2
Const MAX_STUDENTS As Integer = 30
Const SCHOOL_NAME As String = "Greenfield Academy"
Const PI As Double = 3.14159265
Function CalculatePrice(basePrice As Double) As Double
Return basePrice * (1 + VAT_RATE)
End Function
Why Use Constants?
| Reason | Explanation |
|---|---|
| Readability | VAT_RATE is more meaningful than 0.20 scattered through the code |
| Maintainability | If the VAT rate changes, you only need to update it in one place |
| Prevents accidental modification | The compiler/interpreter will raise an error if you try to change a constant |
| Self-documenting | The constant name explains what the value represents |
| Reduces errors | Avoids mistyping a value in multiple places (e.g. typing 0.02 instead of 0.20) |
A constant is a named data item whose value is fixed at the time of declaration and cannot be changed during program execution. Constants improve readability, maintainability, and reduce errors.
Naming Conventions and Self-Documenting Code
Naming Conventions
Good naming conventions make code easier to read and maintain. Different conventions are used in different languages, but consistency is key.
| Convention | Style | Example | Typically Used In |
|---|---|---|---|
| camelCase | First word lowercase, subsequent words capitalised | studentName, totalScore |
Python variables, JavaScript |
| PascalCase | Every word capitalised | StudentName, CalculateTotal |
VB.NET methods and classes |
| snake_case | All lowercase, words separated by underscores | student_name, total_score |
Python functions and variables |
| UPPER_SNAKE_CASE | All uppercase, words separated by underscores | MAX_SIZE, VAT_RATE |
Constants in most languages |
Rules for Good Variable Names
- Use meaningful, descriptive names:
total_pricenottporx. - Avoid single-letter names except for simple loop counters (e.g.
i,j). - Do not use reserved words (e.g.
print,if,for). - Start with a letter, not a number or special character.
- Be consistent with your chosen convention throughout the program.
Self-Documenting Code
Self-documenting code is code that is written so clearly that it explains itself without needing extensive comments. This is achieved through:
- Meaningful variable and subroutine names.
- Clear, logical program structure.
- Appropriate use of constants instead of “magic numbers”.
- Short subroutines that each do one thing well.
# Poor naming - not self-documenting
def calc(a, b):
c = a * b * 0.2
return a * b + c
# Good naming - self-documenting
def calculate_total_with_vat(quantity, unit_price):
vat = quantity * unit_price * VAT_RATE
return quantity * unit_price + vat
' Poor naming - not self-documenting
Function Calc(a As Integer, b As Double) As Double
Dim c As Double = a * b * 0.2
Return a * b + c
End Function
' Good naming - self-documenting
Function CalculateTotalWithVAT(quantity As Integer, unitPrice As Double) As Double
Dim vat As Double = quantity * unitPrice * VAT_RATE
Return quantity * unitPrice + vat
End Function
In the exam, if you are asked to write code, always use meaningful variable names. Writing x = a + b when you mean total_price = subtotal + delivery_charge will cost you marks. Similarly, if asked about self-documenting code, explain that it uses meaningful names and clear structure so that the purpose of the code is obvious without needing comments.
Local and Global Variables
Local Variables
A local variable is declared inside a subroutine and can only be accessed from within that subroutine. It is created when the subroutine starts and destroyed when the subroutine ends.
def calculate_total():
# 'subtotal' and 'tax' are local variables
subtotal = 100.00
tax = subtotal * 0.20
total = subtotal + tax
return total
result = calculate_total()
print(result) # Works: prints 120.0
# print(subtotal) # Error! 'subtotal' is not defined here
Function CalculateTotal() As Double
' subtotal and tax are local variables
Dim subtotal As Double = 100.0
Dim tax As Double = subtotal * 0.2
Dim total As Double = subtotal + tax
Return total
End Function
Dim result As Double = CalculateTotal()
Console.WriteLine(result) ' Works: prints 120
' Console.WriteLine(subtotal) ' Error! subtotal is not accessible here
Global Variables
A global variable is declared outside any subroutine and can be accessed from anywhere in the program.
# Global variable
score = 0
def add_points(points):
global score # Must use 'global' keyword in Python to modify a global variable
score = score + points
def display_score():
print(f"Current score: {score}")
add_points(10)
add_points(5)
display_score() # Output: Current score: 15
Module Program
' Global variable
Dim score As Integer = 0
Sub AddPoints(points As Integer)
score = score + points
End Sub
Sub DisplayScore()
Console.WriteLine($"Current score: {score}")
End Sub
Sub Main()
AddPoints(10)
AddPoints(5)
DisplayScore() ' Output: Current score: 15
End Sub
End Module
Why Local Variables Are Preferred
| Reason | Explanation |
|---|---|
| Avoids naming conflicts | Two subroutines can use the same variable name without conflict |
| Reduces side effects | Changes in one subroutine cannot accidentally affect another |
| Easier debugging | You only need to look at the subroutine to understand the variable’s behaviour |
| Memory efficient | Local variables are destroyed when the subroutine ends, freeing memory |
| Portability | Subroutines with local variables are self-contained and can be reused elsewhere |
A local variable is declared inside a subroutine and is only accessible within that subroutine. A global variable is declared outside all subroutines and is accessible throughout the entire program. Best practice is to use local variables wherever possible and pass data between subroutines using parameters.
Scope and Lifetime
Scope
The scope of a variable refers to the region of the program where it can be accessed. A variable can only be used within its scope.
- Local scope: the variable is accessible only within the subroutine where it is declared.
- Global scope: the variable is accessible from anywhere in the program.
If a local variable has the same name as a global variable, the local variable takes precedence within its subroutine (this is called shadowing).
name = "Global" # Global scope
def greet():
name = "Local" # Local scope - shadows the global variable
print(name) # Prints "Local"
greet()
print(name) # Prints "Global" - global variable is unaffected
Module Program
Dim name As String = "Global" ' Global scope
Sub Greet()
Dim name As String = "Local" ' Local scope - shadows the global variable
Console.WriteLine(name) ' Prints "Local"
End Sub
Sub Main()
Greet()
Console.WriteLine(name) ' Prints "Global"
End Sub
End Module
Lifetime
The lifetime of a variable is the period during which it exists in memory.
- Local variables exist from the moment the subroutine is called until the subroutine finishes. They are created and destroyed each time the subroutine runs.
- Global variables exist for the entire duration of the program’s execution.
| Concept | Local Variable | Global Variable |
|---|---|---|
| Scope | Within the subroutine only | Entire program |
| Lifetime | Created when subroutine starts; destroyed when it ends | Created when program starts; destroyed when it ends |
| Initialisation | Reinitialised each time the subroutine is called | Initialised once when the program starts |
Be careful not to confuse scope and lifetime. Scope is about where a variable can be accessed. Lifetime is about how long the variable exists in memory. A local variable’s scope is its subroutine; its lifetime is the duration of that subroutine’s execution.
Parameter Passing
Parameters are the values passed into a subroutine when it is called. There are two main mechanisms for passing parameters: by value and by reference.
Passing by Value
- A copy of the data is passed to the subroutine.
- Any changes made to the parameter inside the subroutine have no effect on the original variable in the calling code.
- This is the default in most languages (including Python for immutable types and VB.NET with
ByVal). - Safer because the original data is protected from accidental modification.
# Pass by value (Python passes immutable types like int by value effectively)
def double_value(num):
num = num * 2
print(f"Inside function: {num}")
original = 5
double_value(original)
print(f"Outside function: {original}")
# Output:
# Inside function: 10
# Outside function: 5 <-- original is unchanged
' Pass by value (ByVal is the default in VB.NET)
Sub DoubleValue(ByVal num As Integer)
num = num * 2
Console.WriteLine($"Inside sub: {num}")
End Sub
Dim original As Integer = 5
DoubleValue(original)
Console.WriteLine($"Outside sub: {original}")
' Output:
' Inside sub: 10
' Outside sub: 5 <-- original is unchanged
Passing by Reference
- A reference (memory address) to the original data is passed to the subroutine.
- Any changes made to the parameter inside the subroutine directly affect the original variable in the calling code.
- Used when you need the subroutine to modify the original variable.
- Must be used carefully as it can lead to unexpected side effects.
# Pass by reference (Python passes mutable types like lists by reference)
def add_item(shopping_list, item):
shopping_list.append(item)
my_list = ["bread", "milk"]
add_item(my_list, "eggs")
print(my_list)
# Output: ['bread', 'milk', 'eggs'] <-- original list is modified
' Pass by reference (ByRef keyword in VB.NET)
Sub DoubleValue(ByRef num As Integer)
num = num * 2
Console.WriteLine($"Inside sub: {num}")
End Sub
Dim original As Integer = 5
DoubleValue(original)
Console.WriteLine($"Outside sub: {original}")
' Output:
' Inside sub: 10
' Outside sub: 10 <-- original IS changed
Comparison
| Feature | By Value | By Reference |
|---|---|---|
| What is passed | A copy of the data | A reference to the original data |
| Effect on original | Original is unchanged | Original can be changed |
| Safety | Safer – protects original data | Riskier – can cause side effects |
| Memory | Uses more memory (creates a copy) | Uses less memory (no copy made) |
| VB.NET keyword | ByVal (default) |
ByRef |
| When to use | When the subroutine only needs to read the value | When the subroutine needs to modify the original variable |
A common exam question is to trace through code that uses ByVal and ByRef and predict the output. Remember: with ByVal, changes inside the subroutine are discarded when the subroutine ends. With ByRef, changes persist because the subroutine operates on the original variable.
Mathematical Operations: DIV and MOD
DIV (integer division) returns the whole number result of dividing two integers, discarding the remainder. MOD (modulus) returns the remainder after integer division.
| Expression | Result | Explanation |
|---|---|---|
17 DIV 5 |
3 | 17 ÷ 5 = 3 remainder 2 |
17 MOD 5 |
2 | 17 ÷ 5 = 3 remainder 2 |
20 MOD 4 |
0 | Exactly divisible — no remainder |
7 MOD 2 |
1 | Odd number — remainder 1 |
100 DIV 7 |
14 | 100 ÷ 7 = 14 remainder 2 |
# Python — DIV and MOD
number = 17
quotient = number // 5 # DIV — result: 3
remainder = number % 5 # MOD — result: 2
# Check if even or odd
if number % 2 == 0:
print("Even")
else:
print("Odd")
# Convert 137 minutes into hours and minutes
hours = 137 // 60 # DIV — result: 2
minutes = 137 % 60 # MOD — result: 17
print(f"{hours}h {minutes}m") # Output: 2h 17m
' VB.NET — DIV and MOD
Dim number As Integer = 17
Dim quotient As Integer = number \ 5 ' DIV — result: 3
Dim remainder As Integer = number Mod 5 ' MOD — result: 2
' Check if even or odd
If number Mod 2 = 0 Then
Console.WriteLine("Even")
Else
Console.WriteLine("Odd")
End If
' Convert 137 minutes into hours and minutes
Dim hours As Integer = 137 \ 60 ' DIV — result: 2
Dim minutes As Integer = 137 Mod 60 ' MOD — result: 17
Console.WriteLine(hours & "h " & minutes & "m") ' Output: 2h 17m
Common Uses
| Task | DIV / MOD | Example |
|---|---|---|
| Check if even/odd | n MOD 2 |
7 MOD 2 = 1 → odd |
| Convert minutes to hours | mins DIV 60 and mins MOD 60 |
137 DIV 60 = 2, 137 MOD 60 = 17 |
| Extract last digit | n MOD 10 |
1234 MOD 10 = 4 |
| Remove last digit | n DIV 10 |
1234 DIV 10 = 123 |
| Wrap around (circular buffer) | (index + 1) MOD size |
Used in circular queues |
DIV and MOD appear frequently in exam questions. The most common tasks are: checking even/odd (n MOD 2), converting units (minutes DIV 60 and minutes MOD 60), and extracting or removing digits. Note that in Python, DIV is // and MOD is %. In VB.NET, DIV is \ (backslash) and MOD is Mod.
Bubble Sort
Bubble sort is a simple sorting algorithm that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. The largest unsorted value “bubbles up” to the end of the list on each pass.
How It Works
- Start at the beginning of the list.
- Compare the first two adjacent elements. If the first is greater than the second, swap them.
- Move to the next pair and repeat the comparison and swap.
- Continue until the end of the list – this completes one pass. The largest element is now in its correct position.
- Repeat passes for the remaining unsorted portion of the list.
- If a complete pass is made with no swaps, the list is sorted and the algorithm terminates early.
Code Examples
def bubble_sort(data):
n = len(data)
for i in range(n - 1):
swapped = False
for j in range(n - 1 - i):
if data[j] > data[j + 1]:
data[j], data[j + 1] = data[j + 1], data[j]
swapped = True
if not swapped:
break # List is already sorted
return data
# Example usage
numbers = [64, 34, 25, 12, 22, 11, 90]
print(bubble_sort(numbers))
# Output: [11, 12, 22, 25, 34, 64, 90]
Sub BubbleSort(data() As Integer)
Dim n As Integer = data.Length
For i As Integer = 0 To n - 2
Dim swapped As Boolean = False
For j As Integer = 0 To n - 2 - i
If data(j) > data(j + 1) Then
Dim temp As Integer = data(j)
data(j) = data(j + 1)
data(j + 1) = temp
swapped = True
End If
Next
If Not swapped Then
Exit For ' List is already sorted
End If
Next
End Sub
' Example usage
Dim numbers() As Integer = {64, 34, 25, 12, 22, 11, 90}
BubbleSort(numbers)
For Each num As Integer In numbers
Console.Write(num & " ")
Next
' Output: 11 12 22 25 34 64 90
Performance
- Best case: O(n) – the list is already sorted (only one pass with no swaps needed).
- Worst case: O(n^2) – the list is in reverse order.
- Average case: O(n^2).
- Space complexity: O(1) – it sorts in place (no additional memory needed beyond a temporary swap variable).
Insertion Sort
Insertion sort builds the sorted list one element at a time. It takes each element from the unsorted portion and inserts it into its correct position within the sorted portion, shifting elements as necessary.
How It Works
- Consider the first element as a sorted sub-list of one element.
- Take the next element (the “key”) from the unsorted portion.
- Compare the key with each element in the sorted portion, moving from right to left.
- Shift elements in the sorted portion one position to the right to make space.
- Insert the key into its correct position.
- Repeat until all elements have been processed.
Code Examples
def insertion_sort(data):
for i in range(1, len(data)):
key = data[i]
j = i - 1
while j >= 0 and data[j] > key:
data[j + 1] = data[j] # Shift element to the right
j -= 1
data[j + 1] = key # Insert key in correct position
return data
# Example usage
numbers = [12, 11, 13, 5, 6]
print(insertion_sort(numbers))
# Output: [5, 6, 11, 12, 13]
Sub InsertionSort(data() As Integer)
For i As Integer = 1 To data.Length - 1
Dim key As Integer = data(i)
Dim j As Integer = i - 1
While j >= 0 AndAlso data(j) > key
data(j + 1) = data(j) ' Shift element to the right
j -= 1
End While
data(j + 1) = key ' Insert key in correct position
Next
End Sub
' Example usage
Dim numbers() As Integer = {12, 11, 13, 5, 6}
InsertionSort(numbers)
For Each num As Integer In numbers
Console.Write(num & " ")
Next
' Output: 5 6 11 12 13
Performance
- Best case: O(n) – the list is already sorted (no shifts needed, just one comparison per element).
- Worst case: O(n^2) – the list is in reverse order.
- Average case: O(n^2).
- Space complexity: O(1) – it sorts in place.
When Is Insertion Sort Useful?
- When the data set is small.
- When the data is nearly sorted (it performs very well in this case, approaching O(n)).
- It is an online algorithm – it can sort a list as it receives new data.
Merge Sort
Merge Sort, Big O Notation, and the sections that follow are beyond the AS specification. They are included as extension reading for students progressing to A Level.
Merge sort is a divide and conquer sorting algorithm. It recursively splits the list into halves until each sub-list contains a single element, then merges the sub-lists back together in sorted order.
How It Works
- Divide: Split the list into two halves.
- Conquer: Recursively apply merge sort to each half.
- Merge: Combine the two sorted halves into a single sorted list by comparing elements from each half and placing the smaller one first.
Worked example with [38, 27, 43, 3]:
Step 1 (Divide): [38, 27, 43, 3]
/ \
[38, 27] [43, 3]
/ \ / \
[38] [27] [43] [3]
Step 2 (Merge):
[38] [27] [43] [3]
\ / \ /
[27, 38] [3, 43]
\ /
[3, 27, 38, 43]
Code Examples
def merge_sort(data):
if len(data) <= 1:
return data
mid = len(data) // 2
left_half = merge_sort(data[:mid])
right_half = merge_sort(data[mid:])
return merge(left_half, right_half)
def merge(left, right):
result = []
i = 0
j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
# Add any remaining elements
result.extend(left[i:])
result.extend(right[j:])
return result
# Example usage
numbers = [38, 27, 43, 3, 9, 82, 10]
print(merge_sort(numbers))
# Output: [3, 9, 10, 27, 38, 43, 82]
Function MergeSort(data() As Integer) As Integer()
If data.Length <= 1 Then
Return data
End If
Dim mid As Integer = data.Length \ 2
Dim leftHalf(mid - 1) As Integer
Dim rightHalf(data.Length - mid - 1) As Integer
Array.Copy(data, 0, leftHalf, 0, mid)
Array.Copy(data, mid, rightHalf, 0, data.Length - mid)
leftHalf = MergeSort(leftHalf)
rightHalf = MergeSort(rightHalf)
Return Merge(leftHalf, rightHalf)
End Function
Function Merge(left() As Integer, right() As Integer) As Integer()
Dim result(left.Length + right.Length - 1) As Integer
Dim i As Integer = 0
Dim j As Integer = 0
Dim k As Integer = 0
While i < left.Length AndAlso j < right.Length
If left(i) <= right(j) Then
result(k) = left(i)
i += 1
Else
result(k) = right(j)
j += 1
End If
k += 1
End While
While i < left.Length
result(k) = left(i)
i += 1
k += 1
End While
While j < right.Length
result(k) = right(j)
j += 1
k += 1
End While
Return result
End Function
' Example usage
Dim numbers() As Integer = {38, 27, 43, 3, 9, 82, 10}
Dim sorted() As Integer = MergeSort(numbers)
For Each num As Integer In sorted
Console.Write(num & " ")
Next
' Output: 3 9 10 27 38 43 82
Performance
- Best case: O(n log n).
- Worst case: O(n log n).
- Average case: O(n log n).
- Space complexity: O(n) – it requires additional memory to hold the sub-lists during merging.
Why O(n log n)?
The list is divided in half at each level of recursion, giving log n levels. At each level, all n elements must be compared during the merge step. Therefore, the total work is n multiplied by log n.
Comparison of Sorting Algorithms
| Feature | Bubble Sort | Insertion Sort | Merge Sort |
|---|---|---|---|
| Best case | O(n) | O(n) | O(n log n) |
| Worst case | O(n^2) | O(n^2) | O(n log n) |
| Average case | O(n^2) | O(n^2) | O(n log n) |
| Space complexity | O(1) – in place | O(1) – in place | O(n) – extra memory needed |
| Stable? | Yes | Yes | Yes |
| Adaptive? | Yes (with early exit) | Yes (fast on nearly sorted data) | No |
| Suitable for | Small or nearly sorted data | Small or nearly sorted data | Large data sets |
| Approach | Comparison and swap | Shift and insert | Divide and conquer |
A stable sorting algorithm preserves the relative order of elements with equal keys. All three algorithms covered here (bubble, insertion, merge) are stable.
In the exam you may be asked to recommend a sorting algorithm for a specific scenario. For large data sets, merge sort is preferred because its worst case is O(n log n). For small or nearly sorted data, insertion sort is often the best practical choice. Bubble sort is mainly used for educational purposes due to its simplicity.
Linear Search
Linear search (also called sequential search) is an algorithm that checks every element in a list, one by one, from the beginning until the target value is found or the end of the list is reached.
How It Works
- Start at the first element of the list.
- Compare the current element with the target value.
- If they match, return the position (index) of the element.
- If they do not match, move to the next element.
- If the end of the list is reached without finding the target, report that the item was not found.
Code Examples
def linear_search(data, target):
for i in range(len(data)):
if data[i] == target:
return i # Return the index where target is found
return -1 # Target not found
# Example usage
numbers = [4, 7, 2, 9, 1, 5, 8]
result = linear_search(numbers, 9)
if result != -1:
print(f"Found at index {result}")
else:
print("Not found")
Function LinearSearch(data() As Integer, target As Integer) As Integer
For i As Integer = 0 To data.Length - 1
If data(i) = target Then
Return i ' Return the index where target is found
End If
Next
Return -1 ' Target not found
End Function
' Example usage
Dim numbers() As Integer = {4, 7, 2, 9, 1, 5, 8}
Dim result As Integer = LinearSearch(numbers, 9)
If result <> -1 Then
Console.WriteLine("Found at index " & result)
Else
Console.WriteLine("Not found")
End If
Performance
- Best case: O(1) – the target is the first element.
- Worst case: O(n) – the target is the last element or not present.
- Average case: O(n) – on average, half the list is searched.
Advantages and Disadvantages
| Advantages | Disadvantages |
|---|---|
| Simple to understand and implement | Slow for large data sets |
| Works on unsorted data | Checks every element in the worst case |
| Works on any data structure (arrays, linked lists) | Not efficient compared to binary search on sorted data |
Binary Search
Binary search is a search algorithm that works on sorted data by repeatedly dividing the search interval in half. It compares the target value to the middle element and eliminates half of the remaining elements at each step.
How It Works
- Set
lowto the first index andhighto the last index. - Find the middle index:
mid = (low + high) // 2. - If
data[mid]equals the target, returnmid. - If the target is less than
data[mid], sethigh = mid - 1(search the left half). - If the target is greater than
data[mid], setlow = mid + 1(search the right half). - Repeat steps 2-5 until the target is found or
low > high(target not present).
Code Examples
def binary_search(data, target):
low = 0
high = len(data) - 1
while low <= high:
mid = (low + high) // 2
if data[mid] == target:
return mid
elif data[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1 # Target not found
# Example usage
numbers = [1, 3, 5, 7, 9, 11, 13, 15]
result = binary_search(numbers, 7)
if result != -1:
print(f"Found at index {result}")
else:
print("Not found")
Function BinarySearch(data() As Integer, target As Integer) As Integer
Dim low As Integer = 0
Dim high As Integer = data.Length - 1
While low <= high
Dim mid As Integer = (low + high) \ 2
If data(mid) = target Then
Return mid
ElseIf data(mid) < target Then
low = mid + 1
Else
high = mid - 1
End If
End While
Return -1 ' Target not found
End Function
' Example usage
Dim numbers() As Integer = {1, 3, 5, 7, 9, 11, 13, 15}
Dim result As Integer = BinarySearch(numbers, 7)
If result <> -1 Then
Console.WriteLine("Found at index " & result)
Else
Console.WriteLine("Not found")
End If
Performance
- Best case: O(1) – the target is the middle element.
- Worst case: O(log n) – the search space is halved at each step.
- Average case: O(log n).
Why O(log n)?
Each comparison eliminates half the remaining elements. For a list of 1,000,000 items, binary search needs at most about 20 comparisons (since log2(1,000,000) is approximately 20). Linear search would need up to 1,000,000 comparisons.
Comparison of Searching Algorithms
| Feature | Linear Search | Binary Search |
|---|---|---|
| Data must be sorted? | No | Yes |
| Best case | O(1) | O(1) |
| Worst case | O(n) | O(log n) |
| Average case | O(n) | O(log n) |
| Implementation complexity | Very simple | Slightly more complex |
| Suitable for | Small or unsorted data sets | Large, sorted data sets |
| Works on linked lists? | Yes (efficiently) | Not efficiently (no random access) |
A common exam question asks you to explain when you would use each algorithm. If the data is unsorted and small, linear search is appropriate. If the data is sorted and large, binary search is far more efficient. Remember: binary search requires sorted data – if the data is unsorted, the cost of sorting it first must be considered.
Big O Notation
Big O notation is a mathematical notation used to describe the upper bound of an algorithm’s time complexity or space complexity. It describes how the resource usage of an algorithm grows relative to the input size n, focusing on the worst-case scenario.
Big O ignores constants and lower-order terms. For example, an algorithm that takes 3n^2 + 5n + 2 steps is described as O(n^2), because the n^2 term dominates as n grows large.
Common Big O Complexities
| Notation | Name | Description | Example |
|---|---|---|---|
| O(1) | Constant | The time taken does not depend on the input size. | Accessing an array element by index; checking if a number is odd/even. |
| O(log n) | Logarithmic | The time grows logarithmically. The algorithm halves the problem size at each step. | Binary search. |
| O(n) | Linear | The time grows in direct proportion to the input size. | Linear search; summing all elements in a list. |
| O(n log n) | Linearithmic | A combination of linear and logarithmic growth. Common in efficient sorting. | Merge sort; the best comparison-based sorts. |
| O(n^2) | Quadratic | The time grows as the square of the input size. Typically caused by nested loops. | Bubble sort; insertion sort (worst case). |
| O(2^n) | Exponential | The time doubles with each additional input element. Very slow for even moderate n. | Recursive Fibonacci (naive); brute-force subset enumeration. |
| O(n!) | Factorial | The time grows factorially. Extremely slow; only practical for very small n. | Travelling Salesman Problem (brute-force); generating all permutations. |
Growth Rate Comparison
For an input of size n = 1,000:
| Complexity | Approximate Operations |
|---|---|
| O(1) | 1 |
| O(log n) | 10 |
| O(n) | 1,000 |
| O(n log n) | 10,000 |
| O(n^2) | 1,000,000 |
| O(2^n) | Astronomically large (impractical) |
Time Complexity vs Space Complexity
- Time complexity measures how the running time of an algorithm grows with input size.
- Space complexity measures how the memory usage of an algorithm grows with input size.
For example, merge sort has time complexity O(n log n) but space complexity O(n) because it requires additional arrays during merging. Bubble sort has time complexity O(n^2) but space complexity O(1) because it sorts in place.
You must be able to identify the Big O complexity of the standard algorithms: linear search is O(n), binary search is O(log n), bubble sort is O(n^2), insertion sort is O(n^2), and merge sort is O(n log n). You should also be able to explain why each algorithm has its particular complexity.
Recursion vs Iteration
Recursion is a technique where a function calls itself to solve a smaller instance of the same problem. Each recursive call works towards a base case that stops the recursion. Iteration uses loops (for, while) to repeat a block of code.
Structure of a Recursive Function
Every recursive function must have:
- A base case – a condition that stops the recursion and returns a value directly.
- A recursive case – the function calls itself with a modified argument that moves towards the base case.
Example: Factorial
The factorial of n (written n!) is defined as:
- 0! = 1 (base case)
- n! = n * (n-1)! (recursive case)
Recursive approach:
def factorial_recursive(n):
if n == 0: # Base case
return 1
else: # Recursive case
return n * factorial_recursive(n - 1)
print(factorial_recursive(5)) # Output: 120
Function FactorialRecursive(n As Integer) As Integer
If n = 0 Then ' Base case
Return 1
Else ' Recursive case
Return n * FactorialRecursive(n - 1)
End If
End Function
Console.WriteLine(FactorialRecursive(5)) ' Output: 120
Iterative approach:
def factorial_iterative(n):
result = 1
for i in range(1, n + 1):
result = result * i
return result
print(factorial_iterative(5)) # Output: 120
Function FactorialIterative(n As Integer) As Integer
Dim result As Integer = 1
For i As Integer = 1 To n
result = result * i
Next
Return result
End Function
Console.WriteLine(FactorialIterative(5)) ' Output: 120
Example: Fibonacci Sequence
# Recursive Fibonacci
def fibonacci_recursive(n):
if n <= 1: # Base case
return n
else: # Recursive case
return fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2)
# Iterative Fibonacci
def fibonacci_iterative(n):
if n <= 1:
return n
a, b = 0, 1
for i in range(2, n + 1):
a, b = b, a + b
return b
print(fibonacci_recursive(7)) # Output: 13
print(fibonacci_iterative(7)) # Output: 13
' Recursive Fibonacci
Function FibonacciRecursive(n As Integer) As Integer
If n <= 1 Then ' Base case
Return n
Else ' Recursive case
Return FibonacciRecursive(n - 1) + FibonacciRecursive(n - 2)
End If
End Function
' Iterative Fibonacci
Function FibonacciIterative(n As Integer) As Integer
If n <= 1 Then
Return n
End If
Dim a As Integer = 0
Dim b As Integer = 1
For i As Integer = 2 To n
Dim temp As Integer = b
b = a + b
a = temp
Next
Return b
End Function
Console.WriteLine(FibonacciRecursive(7)) ' Output: 13
Console.WriteLine(FibonacciIterative(7)) ' Output: 13
Comparison Table
| Feature | Recursion | Iteration |
|---|---|---|
| Uses | Function calls itself | Loop constructs (for, while) |
| Termination | Base case | Loop condition becomes false |
| Memory usage | Higher (each call adds to the call stack) | Lower (no call stack overhead) |
| Risk | Stack overflow if base case not reached or recursion too deep | Infinite loop if condition never met |
| Readability | Often more elegant and closer to mathematical definitions | Can be more straightforward for simple repetition |
| Speed | Often slower due to function call overhead | Usually faster |
| Best for | Problems with recursive structure (trees, divide and conquer) | Simple repetition and counting |
A common exam question asks you to explain the advantages and disadvantages of recursion versus iteration. Always mention the call stack: each recursive call adds a new frame to the call stack, which uses memory. If the recursion is too deep, this can cause a stack overflow error.
Problem Analysis
Before writing code, a problem must be fully understood and broken down. Two key techniques are abstraction and decomposition.
Abstraction
Abstraction removes unnecessary detail, focusing only on what is relevant to solving the problem. It produces a simplified model of a complex situation.
- A student records system abstracts away physical details (paper, filing cabinets) and represents only the data and operations needed: name, grade, search, update.
- Abstraction lets programmers manage complexity by ignoring irrelevant detail.
Decomposition
Decomposition breaks a large problem into smaller, more manageable sub-problems that can be solved independently.
Example: A booking system decomposes into:
- Search for available slots
- Validate user input
- Record the booking
- Send a confirmation
Pattern Recognition
Pattern recognition involves identifying similarities, trends, or repeated structures within and between problems. Recognising a pattern means you can reuse or adapt a solution that already worked elsewhere, rather than solving every problem from scratch.
- If a program needs to validate multiple input fields, the same validation logic appears repeatedly — recognising this pattern leads to a reusable validation subroutine.
- Sorting algorithms share patterns (comparing and swapping elements) regardless of what data is being sorted.
- Pattern recognition supports generalisation — once a pattern is identified, the solution can be applied to an entire class of problems.
Example: If a teacher notices that calculating the average of a list of numbers appears in multiple programs (test scores, attendance figures, rainfall data), they recognise the pattern and write a single calculateAverage function that works for all of them.
Pattern recognition is the process of identifying similarities, trends, and repeated structures within or between problems. It allows solutions to be generalised and reused.
Problem Analysis Steps
- Understand the problem — identify inputs, outputs, and constraints.
- Identify data — what types, validation rules, and structures are needed?
- Decompose — break into sub-tasks.
- Design algorithms — write pseudocode or flowcharts for each sub-task.
- Plan test cases — consider normal, boundary, and erroneous data before coding.
Always identify inputs, outputs, process, and constraints when analysing a problem. State these before designing an algorithm — examiners award marks for clearly defining the problem before attempting a solution.
Programming Constructs
All algorithms are built from three fundamental constructs: sequence, selection, and repetition.
Sequence
Instructions execute one after another in the order written — the default behaviour of a program.
SET total TO 0
INPUT price
SET total TO total + price
OUTPUT total
Selection
Selection allows the program to choose between paths based on a condition.
INPUT mark
IF mark >= 70 THEN
OUTPUT "Distinction"
ELSE IF mark >= 50 THEN
OUTPUT "Pass"
ELSE
OUTPUT "Fail"
END IF
mark = int(input("Enter mark: "))
if mark >= 70:
print("Distinction")
elif mark >= 50:
print("Pass")
else:
print("Fail")
Dim mark As Integer = CInt(Console.ReadLine())
If mark >= 70 Then
Console.WriteLine("Distinction")
ElseIf mark >= 50 Then
Console.WriteLine("Pass")
Else
Console.WriteLine("Fail")
End If
Repetition
Repetition executes a block of instructions multiple times.
Count-controlled (FOR):
for i in range(1, 11):
print(i)
For i As Integer = 1 To 10
Console.WriteLine(i)
Next
Condition-controlled (WHILE):
password = ""
while password != "correct":
password = input("Enter password: ")
Dim password As String = ""
Do While password <> "correct"
password = Console.ReadLine()
Loop
Counts and Rogue Values
Counts track how many times a loop has run and stop it after a set number of iterations.
Rogue values (sentinel values) are special inputs outside the valid data range that signal the end of data entry — the loop stops when the rogue value is entered.
# Rogue value example — -1 signals end of input
total = 0
number = int(input("Enter number (-1 to stop): "))
while number != -1:
total += number
number = int(input("Enter number (-1 to stop): "))
print(f"Total: {total}")
Dim total As Integer = 0
Dim number As Integer = CInt(Console.ReadLine())
Do While number <> -1
total = total + number
number = CInt(Console.ReadLine())
Loop
Console.WriteLine($"Total: {total}")
The rogue value (-1) is never added to the total — the loop exits when it is detected.
Sequence runs instructions in order. Selection chooses between paths on a condition. Repetition repeats instructions while a condition holds. A count controls loop iterations. A rogue value is a special input that terminates a loop.
When tracing a loop with a rogue value, always check whether the rogue value itself gets processed — it should not. The condition check must happen before (or immediately after) the body, ensuring the rogue value causes exit without being included in any calculation.
Structure of a Typical Program
A well-structured program follows a logical layout that makes it easy to read, debug, and maintain. Although the exact syntax differs between programming languages, most programs share common structural elements.
Common Structural Elements
- Import/library statements – bring in external modules or libraries at the top of the file.
- Constant declarations – values that do not change throughout execution are declared early.
- Global variable declarations – variables accessible throughout the entire program (used sparingly).
- Subroutine definitions – functions and procedures that contain the main logic, defined before they are called.
- Main program / entry point – the block of code that runs when the program starts, typically calling subroutines to carry out tasks.
- Comments – annotations explaining the purpose of code, placed throughout to aid readability.
# Import statements
import math
# Constants
PI = 3.14159
MAX_ATTEMPTS = 3
# Function definitions
def calculate_area(radius):
return PI * radius ** 2
def get_radius():
radius = float(input("Enter radius: "))
return radius
# Main program
def main():
r = get_radius()
area = calculate_area(r)
print(f"The area is {area:.2f}")
main()
' Import statements
Imports System.Math
Module Program
' Constants
Const PI As Double = 3.14159
Const MAX_ATTEMPTS As Integer = 3
' Function definitions
Function CalculateArea(radius As Double) As Double
Return PI * radius ^ 2
End Function
Function GetRadius() As Double
Console.Write("Enter radius: ")
Dim radius As Double = Convert.ToDouble(Console.ReadLine())
Return radius
End Function
' Main program
Sub Main()
Dim r As Double = GetRadius()
Dim area As Double = CalculateArea(r)
Console.WriteLine($"The area is {area:F2}")
End Sub
End Module
A subroutine is a named block of code that performs a specific task. Subroutines include both functions (which return a value) and procedures (which do not return a value). Using subroutines is fundamental to writing structured, modular programs.
Modular Design and Decomposition
Modular design is the practice of breaking a large program into smaller, independent, self-contained modules. Each module handles one specific aspect of the program’s functionality. This is the practical application of top-down design during the coding phase.
Benefits of Modular Design
| Benefit | Explanation |
|---|---|
| Readability | Smaller modules are easier to read and understand than one monolithic block of code |
| Reusability | A well-written module can be reused in other programs without modification |
| Ease of debugging | Bugs are isolated within specific modules, making them easier to locate and fix |
| Team working | Different programmers can develop and test different modules simultaneously |
| Maintainability | Individual modules can be updated or replaced without affecting the rest of the program |
| Testing | Each module can be tested independently (unit testing) before being integrated |
Decomposition in Practice
Decomposition means dividing a problem into sub-problems, each of which becomes a module. For example, a student records system might be decomposed into:
- A module to add new students
- A module to search for student records
- A module to update student details
- A module to generate reports
- A module to handle file input/output
Each module would be implemented as one or more subroutines.
When asked about the benefits of a modular approach, always relate your answer to practical programming scenarios. For example: “Modular design allows team working because programmer A can write the input validation module while programmer B works on the report generation module, reducing overall development time.”
Functions and Procedures
Subroutines come in two forms: functions and procedures. The key difference is that a function returns a value, while a procedure does not.
Procedures
A procedure performs a task but does not return a value to the calling code. It is called for its side effects (e.g. displaying output, writing to a file).
# Procedure - does not return a value
def display_welcome(name):
print(f"Welcome, {name}!")
print("Please select an option from the menu.")
# Calling the procedure
display_welcome("Alice")
' Procedure - does not return a value
Sub DisplayWelcome(name As String)
Console.WriteLine($"Welcome, {name}!")
Console.WriteLine("Please select an option from the menu.")
End Sub
' Calling the procedure
DisplayWelcome("Alice")
Functions
A function performs a task and returns a value to the calling code. The returned value can be stored in a variable or used directly in an expression.
# Function - returns a value
def calculate_vat(price, rate=0.20):
vat = price * rate
return vat
# Calling the function and using the returned value
item_price = 50.00
tax = calculate_vat(item_price)
total = item_price + tax
print(f"Total including VAT: £{total:.2f}")
' Function - returns a value
Function CalculateVAT(price As Double, Optional rate As Double = 0.2) As Double
Dim vat As Double = price * rate
Return vat
End Function
' Calling the function and using the returned value
Dim itemPrice As Double = 50.0
Dim tax As Double = CalculateVAT(itemPrice)
Dim total As Double = itemPrice + tax
Console.WriteLine($"Total including VAT: £{total:F2}")
Key Differences
| Feature | Function | Procedure |
|---|---|---|
| Returns a value | Yes | No |
| Can be used in expressions | Yes (e.g. x = myFunc()) |
No |
| Keyword in Python | def (with return) |
def (without return) |
| Keyword in VB.NET | Function ... End Function |
Sub ... End Sub |
| Typical use | Calculations, lookups, conversions | Displaying output, modifying data structures |
A function is a subroutine that returns a value. A procedure (called a Sub in VB.NET) is a subroutine that performs an action but does not return a value.
Data Compression
Compression reduces file size, making data faster to transmit and requiring less storage. At AS level you need to understand both categories and specific algorithms.
Lossless vs Lossy Compression
| Feature | Lossless | Lossy |
|---|---|---|
| Data loss | None — original data perfectly reconstructed | Some data permanently removed |
| File size | Larger than lossy | Smaller than lossless |
| Use cases | Text, code, spreadsheets, medical images, archives | Photos, music, video, streaming |
| File formats | ZIP, PNG, GIF, FLAC, 7Z | JPEG, MP3, MP4/MPEG, AAC |
| How it works | Identifies and encodes patterns/redundancy | Removes data humans are unlikely to notice |
Lossless compression reduces file size while allowing the original data to be perfectly reconstructed. Lossy compression achieves greater size reduction by permanently removing some data, meaning the original cannot be fully restored.
Run-Length Encoding (RLE)
RLE is a lossless algorithm that replaces consecutive runs of the same value with a count and the value.
Worked Example
Original data: AAAAAABBBBCCDDDDDDD
| Run | Count | Value | Encoded |
|---|---|---|---|
| AAAAAA | 6 | A | 6A |
| BBBB | 4 | B | 4B |
| CC | 2 | C | 2C |
| DDDDDDD | 7 | D | 7D |
Encoded: 6A4B2C7D
Space calculation:
- Original: 19 characters (19 bytes if 1 byte per character)
- Encoded: 8 characters (8 bytes)
- Saving: (19 - 8) / 19 × 100 = 57.9%
When RLE Works Well vs Poorly
- Works well: Data with long runs of repeated values — simple graphics, icons, diagrams, fax documents
- Works poorly: Data with little repetition — photographs, random data. The encoded version may be larger than the original (e.g.
ABCDEF→1A1B1C1D1E1Fis longer)
# Python — Simple RLE encoding
def rle_encode(data):
encoded = ""
i = 0
while i < len(data):
count = 1
while i + count < len(data) and data[i + count] == data[i]:
count += 1
encoded += str(count) + data[i]
i += count
return encoded
def rle_decode(encoded):
decoded = ""
i = 0
while i < len(encoded):
count = ""
while encoded[i].isdigit():
count += encoded[i]
i += 1
decoded += encoded[i] * int(count)
i += 1
return decoded
original = "AAAAAABBBBCCDDDDDDD"
compressed = rle_encode(original)
print(f"Original: {original} ({len(original)} chars)")
print(f"Compressed: {compressed} ({len(compressed)} chars)")
print(f"Decompressed: {rle_decode(compressed)}")
' VB.NET — Simple RLE encoding
Function RLEEncode(data As String) As String
Dim encoded As String = ""
Dim i As Integer = 0
While i < data.Length
Dim count As Integer = 1
While i + count < data.Length AndAlso data(i + count) = data(i)
count += 1
End While
encoded &= count.ToString() & data(i)
i += count
End While
Return encoded
End Function
Dim original As String = "AAAAAABBBBCCDDDDDDD"
Dim compressed As String = RLEEncode(original)
Console.WriteLine("Original: " & original & " (" & original.Length & " chars)")
Console.WriteLine("Compressed: " & compressed & " (" & compressed.Length & " chars)")
Dictionary-Based (LZ) Compression
Dictionary-based compression (named after Lempel and Ziv — LZ77, LZ78, LZW) builds a dictionary of recurring patterns in the data and replaces them with shorter codes.
How It Works
- The algorithm scans the data and identifies repeated sequences
- Each unique sequence is added to a dictionary with a short index code
- Subsequent occurrences of the sequence are replaced with the dictionary code
- The dictionary is stored with the compressed data so it can be decoded
Worked Example
Original text: THE CAT SAT ON THE MAT
| Dictionary Index | Entry |
|---|---|
| 0 | THE |
| 1 | CAT |
| 2 | SAT |
| 3 | ON |
| 4 | MAT |
Encoded: 0 1 2 3 0 4 (plus spaces/structure)
The word “THE” appears twice but is only stored once in the dictionary. The second occurrence is replaced by its index code (0), saving space.
Key characteristics:
- Lossless — original data is perfectly reconstructed using the dictionary
- More effective than RLE for text and general-purpose compression
- Used in ZIP, GIF, PNG, and PDF formats
- Effectiveness depends on how much repetition exists in the data
Huffman Encoding
Huffman encoding is a lossless algorithm that assigns variable-length binary codes to characters based on their frequency. More frequent characters get shorter codes, less frequent characters get longer codes.
How It Works
- Count the frequency of each character in the data
- Build a binary tree by repeatedly combining the two lowest-frequency characters
- Assign binary codes by traversing the tree (left = 0, right = 1)
- Replace each character with its Huffman code
Worked Example
Text: AABBBCCCCDDDD (14 characters)
Step 1 — Frequency count:
| Character | Frequency |
|---|---|
| A | 2 |
| B | 3 |
| C | 4 |
| D | 5 |
Step 2 — Build the tree (combine two smallest repeatedly):
- Combine A(2) + B(3) = AB(5)
- Combine C(4) + AB(5) = CAB(9)
- Combine D(5) + CAB(9) = Root(14)
Step 3 — Assign codes (traversing the tree):
| Character | Code | Code Length |
|---|---|---|
| D | 0 | 1 bit |
| C | 10 | 2 bits |
| A | 111 | 3 bits |
| B | 110 | 3 bits |
Step 4 — Encode:
AABBBCCCCDDDD → 111 111 110 110 110 10 10 10 10 0 0 0 0 0
Space calculation:
- Using fixed-length codes (e.g. ASCII 7-bit): 14 × 7 = 98 bits
- Using Huffman codes: (2×3) + (3×3) + (4×2) + (5×1) = 6 + 9 + 8 + 5 = 28 bits
- Saving: (98 - 28) / 98 × 100 = 71.4%
(The dictionary must also be stored, adding some overhead, but for large files the savings are substantial.)
In exam questions on Huffman coding, you may be asked to: (1) build a frequency table, (2) construct the Huffman tree, (3) determine the codes for each character, or (4) calculate the space saving compared to fixed-length encoding. Always show your working clearly. Remember that more frequent characters get shorter codes — this is the key insight that makes Huffman efficient.
Compression File Formats Summary
| Format | Type | Algorithm | Typical Use |
|---|---|---|---|
| ZIP | Lossless | LZ + Huffman (DEFLATE) | General file archiving |
| PNG | Lossless | LZ + filtering | Web graphics, screenshots, diagrams |
| GIF | Lossless | LZW | Simple animations, icons |
| FLAC | Lossless | Linear prediction + Rice coding | High-quality audio archiving |
| 7Z | Lossless | LZMA | High-ratio file archiving |
| JPEG | Lossy | DCT + quantisation + Huffman | Photographs |
| MP3 | Lossy | Psychoacoustic model + Huffman | Music, podcasts |
| MP4/MPEG | Lossy | Motion estimation + DCT | Video |
| AAC | Lossy | Improved psychoacoustic model | Music streaming (Apple, YouTube) |
When recommending a compression method, consider the type of data and whether quality loss is acceptable. Text files and program code must use lossless compression (any data loss would corrupt them). Photos and music can use lossy compression because small quality reductions are often imperceptible.
Test Data
To test a program thoroughly, you need to select appropriate test data. Test data should be carefully chosen to check all possible paths through the program and to identify potential errors. There are three main categories.
Normal (Valid) Data
- Data that is within the expected range and of the correct format.
- The system should accept this data and process it correctly.
- Tests that the program works under normal operating conditions.
Boundary (Extreme) Data
- Data that is at the very edges of the valid range.
- This is where errors most commonly occur (off-by-one errors).
- You should test values at the boundary itself and values just inside and just outside the boundary.
Erroneous (Invalid) Data
- Data that is outside the expected range, of the wrong type, or otherwise invalid.
- The system should reject this data gracefully without crashing.
- Tests the robustness of input validation.
Example: Age Input (Valid Range 18–65)
| Test Data | Type | Expected Result | Reason |
|---|---|---|---|
| 25 | Normal | Accepted | Within valid range |
| 40 | Normal | Accepted | Within valid range |
| 18 | Boundary | Accepted | At lower boundary (minimum valid) |
| 17 | Boundary | Rejected | Just below lower boundary |
| 65 | Boundary | Accepted | At upper boundary (maximum valid) |
| 66 | Boundary | Rejected | Just above upper boundary |
| -5 | Erroneous | Rejected | Negative number, outside range |
| 200 | Erroneous | Rejected | Far outside valid range |
| “abc” | Erroneous | Rejected | Wrong data type (string, not integer) |
| 18.5 | Erroneous | Rejected | Real number, not integer |
| ”” (empty) | Erroneous | Rejected | No data entered |
When creating test data in the exam, always include at least one example of each type: normal, boundary, and erroneous. For boundary data, test the value at the boundary AND the values either side of it (e.g. for a range of 1–100, test 0, 1, 100, and 101). Some mark schemes consider boundary data that falls just outside the range as erroneous – include both to be safe.
Trace Tables
A trace table is a technique used to dry-run an algorithm by recording the values of variables at each step of execution. It is used to check the correctness of an algorithm and to understand its behaviour.
How to Create a Trace Table
- Create a column for each variable in the algorithm.
- Add a column for any output.
- Add a column for the condition being tested (optional but helpful).
- Work through the algorithm step by step, recording the value of each variable after every instruction that changes it.
Example: Tracing a Linear Search
Algorithm to search for the value 5 in the list [3, 8, 5, 1]:
data = [3, 8, 5, 1]
target = 5
FOR i FROM 0 TO 3
IF data[i] == target THEN
OUTPUT i
END IF
END FOR
Trace table:
| Step | i | data[i] | data[i] == target? | Output |
|---|---|---|---|---|
| 1 | 0 | 3 | No | – |
| 2 | 1 | 8 | No | – |
| 3 | 2 | 5 | Yes | 2 |
| 4 | 3 | 1 | No | – |
Example: Tracing a Bubble Sort Pass
First pass of bubble sort on [5, 3, 8, 1]:
| Step | j | data[j] | data[j+1] | Swap? | List after step |
|---|---|---|---|---|---|
| 1 | 0 | 5 | 3 | Yes | [3, 5, 8, 1] |
| 2 | 1 | 5 | 8 | No | [3, 5, 8, 1] |
| 3 | 2 | 8 | 1 | Yes | [3, 5, 1, 8] |
After the first pass, the largest value (8) has “bubbled” to the end.
Example: Tracing Recursion
Tracing factorial_recursive(4):
| Call | n | n == 0? | Returns |
|---|---|---|---|
| factorial_recursive(4) | 4 | No | 4 * factorial_recursive(3) |
| factorial_recursive(3) | 3 | No | 3 * factorial_recursive(2) |
| factorial_recursive(2) | 2 | No | 2 * factorial_recursive(1) |
| factorial_recursive(1) | 1 | No | 1 * factorial_recursive(0) |
| factorial_recursive(0) | 0 | Yes | 1 |
Now the returns unwind: 1 * 1 = 1, then 2 * 1 = 2, then 3 * 2 = 6, then 4 * 6 = 24.
Trace tables are a very common exam question. Practise by tracing through algorithms by hand. Always work methodically – update one variable at a time and do not skip steps. For recursive algorithms, show the chain of calls and the order in which values are returned.
Dry Running / Desk Checking
Dry running (also called desk checking) is the process of manually working through a program on paper, without using a computer, to check that it produces the correct results.
How to Dry Run a Program
- Write out the variable names as column headings in a trace table.
- Work through the code line by line, updating the variable values in the table.
- For each decision (IF statement), evaluate the condition and follow the correct branch.
- For each loop, track the loop counter and repeat until the exit condition is met.
- Compare the final output with the expected output.
Benefits of Dry Running
- Can identify logic errors before the program is even entered into a computer.
- Helps programmers understand the flow of execution through complex code.
- Useful during exams where you may not have access to a computer.
- Forces careful, systematic analysis of the algorithm.
Example: Dry Run
numbers = [4, 7, 2, 9, 1]
largest = numbers[0]
for i in range(1, len(numbers)):
if numbers[i] > largest:
largest = numbers[i]
print(largest)
Dim numbers() As Integer = {4, 7, 2, 9, 1}
Dim largest As Integer = numbers(0)
For i As Integer = 1 To numbers.Length - 1
If numbers(i) > largest Then
largest = numbers(i)
End If
Next
Console.WriteLine(largest)
Trace table:
| i | numbers(i) | numbers(i) > largest? | largest |
|---|---|---|---|
| - | - | - | 4 |
| 1 | 7 | 7 > 4? Yes | 7 |
| 2 | 2 | 2 > 7? No | 7 |
| 3 | 9 | 9 > 7? Yes | 9 |
| 4 | 1 | 1 > 9? No | 9 |
Output: 9
Dry running (desk checking) is manually working through a program on paper, line by line, to verify that it produces the correct output. A trace table is a tabular record of variable values at each step of execution.
Error Types
Errors in programs fall into three main categories. Understanding the differences is essential for effective debugging.
Syntax Errors
A syntax error occurs when the code violates the rules (grammar) of the programming language. The program will not compile or run until syntax errors are fixed.
# Syntax errors in Python
print("Hello" # Missing closing bracket
if x == 5 # Missing colon
prit("Five") # Misspelled keyword (not caught as syntax but as NameError)
for i in range(10) # Missing colon
' Syntax errors in VB.NET
Console.WriteLine("Hello" ' Missing closing bracket
If x = 5 ' Missing Then keyword
Console.WrteLine("Five") ' Misspelled method name
For i As Integer = 1 To 10 ' Missing Next at the end
Characteristics:
- Detected by the compiler/interpreter before the program runs.
- Usually easy to find because the error message indicates the line and type of error.
- The program cannot execute until all syntax errors are corrected.
Logic Errors
A logic error occurs when the program runs without crashing but produces incorrect results. The code is syntactically valid but the algorithm is wrong.
# Logic error: using + instead of *
def calculate_area(length, width):
return length + width # Should be length * width
# Logic error: wrong comparison operator
def is_adult(age):
return age > 18 # Should be age >= 18 (18-year-olds are adults)
# Logic error: off-by-one error in loop
total = 0
for i in range(1, 10): # Misses 10; should be range(1, 11)
total = total + i
' Logic error: using + instead of *
Function CalculateArea(length As Double, width As Double) As Double
Return length + width ' Should be length * width
End Function
' Logic error: wrong comparison operator
Function IsAdult(age As Integer) As Boolean
Return age > 18 ' Should be age >= 18
End Function
' Logic error: off-by-one error in loop
Dim total As Integer = 0
For i As Integer = 1 To 9 ' Misses 10; should be 1 To 10
total = total + i
Next
Characteristics:
- NOT detected by the compiler/interpreter – the program runs without error messages.
- The program produces incorrect or unexpected output.
- The hardest type of error to find because there are no error messages pointing to the problem.
- Found through testing (using test data and comparing actual output to expected output) or dry running.
Runtime Errors
A runtime error occurs during program execution, causing the program to crash or terminate abnormally. The code is syntactically correct but encounters a problem that cannot be handled during execution.
# Runtime errors in Python
numbers = [1, 2, 3]
print(numbers[5]) # IndexError: list index out of range
result = 10 / 0 # ZeroDivisionError: division by zero
age = int("hello") # ValueError: invalid literal for int()
file = open("nonexistent.txt") # FileNotFoundError
' Runtime errors in VB.NET
Dim numbers() As Integer = {1, 2, 3}
Console.WriteLine(numbers(5)) ' IndexOutOfRangeException
Dim result As Integer = 10 \ 0 ' DivideByZeroException
Dim age As Integer = CInt("hello") ' InvalidCastException / FormatException
Dim file As String = IO.File.ReadAllText("nonexistent.txt") ' FileNotFoundException
Characteristics:
- NOT detected at compile time – the program starts running but crashes during execution.
- An error message is displayed indicating the type and location of the error.
- Often caused by unexpected input, missing resources, or edge cases.
- Can be prevented using input validation, error handling (try/catch), and defensive programming.
Comparison of Error Types
| Feature | Syntax Error | Logic Error | Runtime Error |
|---|---|---|---|
| When detected | Compile/interpret time | During testing or use | During execution |
| Program runs? | No | Yes | Yes, but crashes |
| Error message? | Yes (from compiler) | No | Yes (when it crashes) |
| Difficulty to find | Easy | Hard | Medium |
| Example | Missing bracket | Wrong formula | Division by zero |
| How to fix | Read error message, fix syntax | Test with data, use trace table | Add validation, use try/catch |
The exam often gives a piece of code and asks you to identify the type of error. Remember: if the code would not run at all, it is a syntax error. If it runs but gives the wrong answer, it is a logic error. If it runs but crashes at some point during execution, it is a runtime error. Be prepared to identify the specific error and explain how to fix it.