Algorithms & Programs

AS Level — Unit 1: Fundamentals of Computer Science

What Is an Algorithm?

An algorithm is a finite, ordered sequence of unambiguous instructions that, given a set of inputs, produces a defined set of outputs and terminates in a finite amount of time.

An algorithm is essentially a step-by-step recipe for solving a problem. Before any code is written, a programmer designs an algorithm to describe the logic of the solution. Algorithms are language-independent – the same algorithm can be implemented in Python, VB.NET, Java, or any other language.

Characteristics of an Algorithm

Input – An algorithm has zero or more inputs taken from a specified set.
Output – An algorithm produces one or more outputs that have a defined relationship to the inputs.
Finiteness – An algorithm must always terminate after a finite number of steps.
Definiteness – Each step must be precisely and unambiguously defined.
Effectiveness – Every step must be basic enough that it could, in principle, be carried out by a person using pencil and paper.

Why Algorithms Matter

They allow us to plan a solution before writing code.
They help us communicate solutions to other programmers regardless of programming language.
They allow us to analyse the efficiency of a solution before implementing it.
They form the basis for correctness proofs and testing strategies.

Representing Algorithms

Before implementing an algorithm in code, we represent it using one of three standard methods.

Pseudocode

Pseudocode is a structured, informal way of writing algorithms that uses natural language mixed with programming-like constructs. It is not tied to any specific programming language and is used to plan and communicate algorithms clearly.

Why Use Pseudocode?

It allows you to focus on the logic of an algorithm without worrying about syntax.
It is language-independent, so anyone can understand it regardless of their programming background.
It is quicker to write than actual code during the design phase.
It can be easily translated into any programming language.

Common Pseudocode Conventions

Variable assignment:

SET total TO 0
SET name TO "Alice"

Input and output:

INPUT age
OUTPUT "Your age is: ", age

Selection (IF/ELSE):

IF age >= 18 THEN
    OUTPUT "Adult"
ELSE IF age >= 13 THEN
    OUTPUT "Teenager"
ELSE
    OUTPUT "Child"
END IF

Iteration (FOR loop):

FOR i FROM 1 TO 10
    OUTPUT i
END FOR

Iteration (WHILE loop):

SET password TO ""
WHILE password != "secret"
    INPUT password
END WHILE

Iteration (REPEAT-UNTIL loop):

REPEAT
    INPUT number
UNTIL number >= 0

Subroutines:

FUNCTION calculateArea(length, width)
    RETURN length * width
END FUNCTION

PROCEDURE displayMessage(message)
    OUTPUT message
END PROCEDURE

Arrays:

SET marks TO [45, 67, 89, 32, 78]
OUTPUT marks[0]
SET marks[2] TO 90

Key Principles of Good Pseudocode

Use consistent indentation to show the structure of loops and selections.
Use meaningful variable names (e.g. totalScore not x).
Use capital letters for keywords (SET, IF, THEN, ELSE, WHILE, FOR, REPEAT, UNTIL, INPUT, OUTPUT, RETURN, FUNCTION, PROCEDURE, END).
Keep it simple and readable – anyone should be able to follow the logic.
Do not include language-specific syntax (e.g. do not use : or {} or Dim).

In the WJEC exam, pseudocode does not need to follow an exact standard, but it must be clear and unambiguous. Always use indentation and keywords consistently. If the examiner cannot follow your logic, you will lose marks. Write pseudocode as if someone else needs to implement it in any language.

Flowcharts

A flowchart is a diagrammatic representation of an algorithm. It uses standard symbols connected by arrows to show the flow of control through a program.

Standard Flowchart Symbols

Symbol	Shape	Purpose
Terminator	Rounded rectangle (oval)	Marks the start or end of the flowchart. Labelled “Start” or “End” (or “Stop”).
Process	Rectangle	Represents a step or action (e.g. “Set total to 0”, “Calculate tax”).
Input/Output	Parallelogram	Represents input from the user or output to the screen (e.g. “Input name”, “Display result”).
Decision	Diamond	Represents a decision or condition with two outcomes: Yes/No or True/False. Must have exactly two exit paths.
Flow line	Arrow	Shows the direction of flow from one symbol to the next.
Subroutine	Rectangle with double vertical lines on sides	Represents a call to a pre-defined subroutine (function or procedure).

Rules for Drawing Flowcharts

Every flowchart must have exactly one Start terminator.
Every flowchart must have at least one End terminator.
Flow lines must have arrows showing direction.
Decision diamonds must have exactly two labelled exit paths (Yes/No or True/False).
All symbols must be connected – there should be no orphaned symbols.
Flow should generally go from top to bottom and left to right.

Example: Check if a Number is Positive, Negative, or Zero

flowchart TD
    A([Start]) --> B[/Input number/]
    B --> C{"number > 0?"}
    C -- Yes --> D[/Output 'Positive'/]
    C -- No --> E{"number < 0?"}
    E -- Yes --> F[/Output 'Negative'/]
    E -- No --> G[/Output 'Zero'/]
    D --> H([End])
    F --> H
    G --> H

Example: Validate Password Entry (Loop)

flowchart TD
    A([Start]) --> B[/Input password/]
    B --> C{"password = 'abc123'?"}
    C -- Yes --> D[/Output 'Access Granted'/]
    D --> E([End])
    C -- No --> F[/Output 'Incorrect, try again'/]
    F --> B

Example: Calculate Sum of Numbers 1 to 10

flowchart TD
    A([Start]) --> B[Set total = 0]
    B --> C[Set i = 1]
    C --> D{"i <= 10?"}
    D -- No --> E[/Output total/]
    E --> F([End])
    D -- Yes --> G[total = total + i]
    G --> H[i = i + 1]
    H --> D

A flowchart is a visual representation of an algorithm using standard symbols (terminators, processes, decisions, input/output) connected by arrows to show the flow of control.

In the exam, always use the correct shapes for each symbol. The most common mistake is using a rectangle for a decision instead of a diamond, or forgetting to label the Yes/No paths on a decision. Practice drawing flowcharts neatly by hand. Every decision must have exactly two clearly labelled exits.

Structured English

Structured English uses a restricted subset of natural language combined with programming constructs. It avoids technical jargon and is useful for communicating with non-technical stakeholders.

Example:

1. Start with the first item in the list.
2. Compare it with every other item.
3. If a larger item is found, remember it as the new largest.
4. After checking all items, display the largest item.

In the exam, you may be asked to write an algorithm in pseudocode or to trace through a flowchart. Make sure you are comfortable with all three representations and can convert between them.

Constants and Their Importance

A constant is a named value that is set once and cannot be changed during program execution. Constants are declared in a similar way to variables but their value is fixed.

# Constants (Python convention: use UPPER_CASE names)
VAT_RATE = 0.20
MAX_STUDENTS = 30
SCHOOL_NAME = "Greenfield Academy"
PI = 3.14159265

def calculate_price(base_price):
    return base_price * (1 + VAT_RATE)

' Constants (VB.NET uses the Const keyword)
Const VAT_RATE As Double = 0.2
Const MAX_STUDENTS As Integer = 30
Const SCHOOL_NAME As String = "Greenfield Academy"
Const PI As Double = 3.14159265

Function CalculatePrice(basePrice As Double) As Double
    Return basePrice * (1 + VAT_RATE)
End Function

Why Use Constants?

Reason	Explanation
Readability	`VAT_RATE` is more meaningful than `0.20` scattered through the code
Maintainability	If the VAT rate changes, you only need to update it in one place
Prevents accidental modification	The compiler/interpreter will raise an error if you try to change a constant
Self-documenting	The constant name explains what the value represents
Reduces errors	Avoids mistyping a value in multiple places (e.g. typing `0.02` instead of `0.20`)

A constant is a named data item whose value is fixed at the time of declaration and cannot be changed during program execution. Constants improve readability, maintainability, and reduce errors.

Naming Conventions and Self-Documenting Code

Naming Conventions

Good naming conventions make code easier to read and maintain. Different conventions are used in different languages, but consistency is key.

Convention	Style	Example	Typically Used In
camelCase	First word lowercase, subsequent words capitalised	`studentName`, `totalScore`	Python variables, JavaScript
PascalCase	Every word capitalised	`StudentName`, `CalculateTotal`	VB.NET methods and classes
snake_case	All lowercase, words separated by underscores	`student_name`, `total_score`	Python functions and variables
UPPER_SNAKE_CASE	All uppercase, words separated by underscores	`MAX_SIZE`, `VAT_RATE`	Constants in most languages

Rules for Good Variable Names

Use meaningful, descriptive names: total_price not tp or x.
Avoid single-letter names except for simple loop counters (e.g. i, j).
Do not use reserved words (e.g. print, if, for).
Start with a letter, not a number or special character.
Be consistent with your chosen convention throughout the program.

Self-Documenting Code

Self-documenting code is code that is written so clearly that it explains itself without needing extensive comments. This is achieved through:

Meaningful variable and subroutine names.
Clear, logical program structure.
Appropriate use of constants instead of “magic numbers”.
Short subroutines that each do one thing well.

# Poor naming - not self-documenting
def calc(a, b):
    c = a * b * 0.2
    return a * b + c

# Good naming - self-documenting
def calculate_total_with_vat(quantity, unit_price):
    vat = quantity * unit_price * VAT_RATE
    return quantity * unit_price + vat

' Poor naming - not self-documenting
Function Calc(a As Integer, b As Double) As Double
    Dim c As Double = a * b * 0.2
    Return a * b + c
End Function

' Good naming - self-documenting
Function CalculateTotalWithVAT(quantity As Integer, unitPrice As Double) As Double
    Dim vat As Double = quantity * unitPrice * VAT_RATE
    Return quantity * unitPrice + vat
End Function

In the exam, if you are asked to write code, always use meaningful variable names. Writing x = a + b when you mean total_price = subtotal + delivery_charge will cost you marks. Similarly, if asked about self-documenting code, explain that it uses meaningful names and clear structure so that the purpose of the code is obvious without needing comments.

Local and Global Variables

Local Variables

A local variable is declared inside a subroutine and can only be accessed from within that subroutine. It is created when the subroutine starts and destroyed when the subroutine ends.

def calculate_total():
    # 'subtotal' and 'tax' are local variables
    subtotal = 100.00
    tax = subtotal * 0.20
    total = subtotal + tax
    return total

result = calculate_total()
print(result)        # Works: prints 120.0
# print(subtotal)    # Error! 'subtotal' is not defined here

Function CalculateTotal() As Double
    ' subtotal and tax are local variables
    Dim subtotal As Double = 100.0
    Dim tax As Double = subtotal * 0.2
    Dim total As Double = subtotal + tax
    Return total
End Function

Dim result As Double = CalculateTotal()
Console.WriteLine(result)     ' Works: prints 120
' Console.WriteLine(subtotal) ' Error! subtotal is not accessible here

Global Variables

A global variable is declared outside any subroutine and can be accessed from anywhere in the program.

# Global variable
score = 0

def add_points(points):
    global score    # Must use 'global' keyword in Python to modify a global variable
    score = score + points

def display_score():
    print(f"Current score: {score}")

add_points(10)
add_points(5)
display_score()   # Output: Current score: 15

Module Program
    ' Global variable
    Dim score As Integer = 0

    Sub AddPoints(points As Integer)
        score = score + points
    End Sub

    Sub DisplayScore()
        Console.WriteLine($"Current score: {score}")
    End Sub

    Sub Main()
        AddPoints(10)
        AddPoints(5)
        DisplayScore()   ' Output: Current score: 15
    End Sub
End Module

Why Local Variables Are Preferred

Reason	Explanation
Avoids naming conflicts	Two subroutines can use the same variable name without conflict
Reduces side effects	Changes in one subroutine cannot accidentally affect another
Easier debugging	You only need to look at the subroutine to understand the variable’s behaviour
Memory efficient	Local variables are destroyed when the subroutine ends, freeing memory
Portability	Subroutines with local variables are self-contained and can be reused elsewhere

A local variable is declared inside a subroutine and is only accessible within that subroutine. A global variable is declared outside all subroutines and is accessible throughout the entire program. Best practice is to use local variables wherever possible and pass data between subroutines using parameters.

Scope and Lifetime

Scope

The scope of a variable refers to the region of the program where it can be accessed. A variable can only be used within its scope.

Local scope: the variable is accessible only within the subroutine where it is declared.
Global scope: the variable is accessible from anywhere in the program.

If a local variable has the same name as a global variable, the local variable takes precedence within its subroutine (this is called shadowing).

name = "Global"    # Global scope

def greet():
    name = "Local"  # Local scope - shadows the global variable
    print(name)     # Prints "Local"

greet()
print(name)         # Prints "Global" - global variable is unaffected

Module Program
    Dim name As String = "Global"  ' Global scope

    Sub Greet()
        Dim name As String = "Local"  ' Local scope - shadows the global variable
        Console.WriteLine(name)        ' Prints "Local"
    End Sub

    Sub Main()
        Greet()
        Console.WriteLine(name)        ' Prints "Global"
    End Sub
End Module

Lifetime

The lifetime of a variable is the period during which it exists in memory.

Local variables exist from the moment the subroutine is called until the subroutine finishes. They are created and destroyed each time the subroutine runs.
Global variables exist for the entire duration of the program’s execution.

Concept	Local Variable	Global Variable
Scope	Within the subroutine only	Entire program
Lifetime	Created when subroutine starts; destroyed when it ends	Created when program starts; destroyed when it ends
Initialisation	Reinitialised each time the subroutine is called	Initialised once when the program starts

Be careful not to confuse scope and lifetime. Scope is about where a variable can be accessed. Lifetime is about how long the variable exists in memory. A local variable’s scope is its subroutine; its lifetime is the duration of that subroutine’s execution.

Parameter Passing

Parameters are the values passed into a subroutine when it is called. There are two main mechanisms for passing parameters: by value and by reference.

Passing by Value

A copy of the data is passed to the subroutine.
Any changes made to the parameter inside the subroutine have no effect on the original variable in the calling code.
This is the default in most languages (including Python for immutable types and VB.NET with ByVal).
Safer because the original data is protected from accidental modification.

# Pass by value (Python passes immutable types like int by value effectively)
def double_value(num):
    num = num * 2
    print(f"Inside function: {num}")

original = 5
double_value(original)
print(f"Outside function: {original}")
# Output:
# Inside function: 10
# Outside function: 5   <-- original is unchanged

' Pass by value (ByVal is the default in VB.NET)
Sub DoubleValue(ByVal num As Integer)
    num = num * 2
    Console.WriteLine($"Inside sub: {num}")
End Sub

Dim original As Integer = 5
DoubleValue(original)
Console.WriteLine($"Outside sub: {original}")
' Output:
' Inside sub: 10
' Outside sub: 5   <-- original is unchanged

Passing by Reference

A reference (memory address) to the original data is passed to the subroutine.
Any changes made to the parameter inside the subroutine directly affect the original variable in the calling code.
Used when you need the subroutine to modify the original variable.
Must be used carefully as it can lead to unexpected side effects.

# Pass by reference (Python passes mutable types like lists by reference)
def add_item(shopping_list, item):
    shopping_list.append(item)

my_list = ["bread", "milk"]
add_item(my_list, "eggs")
print(my_list)
# Output: ['bread', 'milk', 'eggs']  <-- original list is modified

' Pass by reference (ByRef keyword in VB.NET)
Sub DoubleValue(ByRef num As Integer)
    num = num * 2
    Console.WriteLine($"Inside sub: {num}")
End Sub

Dim original As Integer = 5
DoubleValue(original)
Console.WriteLine($"Outside sub: {original}")
' Output:
' Inside sub: 10
' Outside sub: 10   <-- original IS changed

Comparison

Feature	By Value	By Reference
What is passed	A copy of the data	A reference to the original data
Effect on original	Original is unchanged	Original can be changed
Safety	Safer – protects original data	Riskier – can cause side effects
Memory	Uses more memory (creates a copy)	Uses less memory (no copy made)
VB.NET keyword	`ByVal` (default)	`ByRef`
When to use	When the subroutine only needs to read the value	When the subroutine needs to modify the original variable

A common exam question is to trace through code that uses ByVal and ByRef and predict the output. Remember: with ByVal, changes inside the subroutine are discarded when the subroutine ends. With ByRef, changes persist because the subroutine operates on the original variable.

Mathematical Operations: DIV and MOD

DIV (integer division) returns the whole number result of dividing two integers, discarding the remainder. MOD (modulus) returns the remainder after integer division.

Expression	Result	Explanation
`17 DIV 5`	3	17 ÷ 5 = 3 remainder 2
`17 MOD 5`	2	17 ÷ 5 = 3 remainder 2
`20 MOD 4`	0	Exactly divisible — no remainder
`7 MOD 2`	1	Odd number — remainder 1
`100 DIV 7`	14	100 ÷ 7 = 14 remainder 2

# Python — DIV and MOD
number = 17
quotient  = number // 5    # DIV — result: 3
remainder = number % 5     # MOD — result: 2

# Check if even or odd
if number % 2 == 0:
    print("Even")
else:
    print("Odd")

# Convert 137 minutes into hours and minutes
hours   = 137 // 60   # DIV — result: 2
minutes = 137 % 60    # MOD — result: 17
print(f"{hours}h {minutes}m")   # Output: 2h 17m

' VB.NET — DIV and MOD
Dim number As Integer = 17
Dim quotient  As Integer = number \ 5    ' DIV — result: 3
Dim remainder As Integer = number Mod 5  ' MOD — result: 2

' Check if even or odd
If number Mod 2 = 0 Then
    Console.WriteLine("Even")
Else
    Console.WriteLine("Odd")
End If

' Convert 137 minutes into hours and minutes
Dim hours   As Integer = 137 \ 60    ' DIV — result: 2
Dim minutes As Integer = 137 Mod 60  ' MOD — result: 17
Console.WriteLine(hours & "h " & minutes & "m")  ' Output: 2h 17m

Common Uses

Task	DIV / MOD	Example
Check if even/odd	`n MOD 2`	`7 MOD 2 = 1` → odd
Convert minutes to hours	`mins DIV 60` and `mins MOD 60`	`137 DIV 60 = 2`, `137 MOD 60 = 17`
Extract last digit	`n MOD 10`	`1234 MOD 10 = 4`
Remove last digit	`n DIV 10`	`1234 DIV 10 = 123`
Wrap around (circular buffer)	`(index + 1) MOD size`	Used in circular queues

DIV and MOD appear frequently in exam questions. The most common tasks are: checking even/odd (n MOD 2), converting units (minutes DIV 60 and minutes MOD 60), and extracting or removing digits. Note that in Python, DIV is // and MOD is %. In VB.NET, DIV is \ (backslash) and MOD is Mod.

Bubble Sort

Bubble sort is a simple sorting algorithm that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. The largest unsorted value “bubbles up” to the end of the list on each pass.

How It Works

Start at the beginning of the list.
Compare the first two adjacent elements. If the first is greater than the second, swap them.
Move to the next pair and repeat the comparison and swap.
Continue until the end of the list – this completes one pass. The largest element is now in its correct position.
Repeat passes for the remaining unsorted portion of the list.
If a complete pass is made with no swaps, the list is sorted and the algorithm terminates early.

Code Examples

def bubble_sort(data):
    n = len(data)
    for i in range(n - 1):
        swapped = False
        for j in range(n - 1 - i):
            if data[j] > data[j + 1]:
                data[j], data[j + 1] = data[j + 1], data[j]
                swapped = True
        if not swapped:
            break  # List is already sorted
    return data

# Example usage
numbers = [64, 34, 25, 12, 22, 11, 90]
print(bubble_sort(numbers))
# Output: [11, 12, 22, 25, 34, 64, 90]

Sub BubbleSort(data() As Integer)
    Dim n As Integer = data.Length
    For i As Integer = 0 To n - 2
        Dim swapped As Boolean = False
        For j As Integer = 0 To n - 2 - i
            If data(j) > data(j + 1) Then
                Dim temp As Integer = data(j)
                data(j) = data(j + 1)
                data(j + 1) = temp
                swapped = True
            End If
        Next
        If Not swapped Then
            Exit For  ' List is already sorted
        End If
    Next
End Sub

' Example usage
Dim numbers() As Integer = {64, 34, 25, 12, 22, 11, 90}
BubbleSort(numbers)
For Each num As Integer In numbers
    Console.Write(num & " ")
Next
' Output: 11 12 22 25 34 64 90

Performance

Best case: O(n) – the list is already sorted (only one pass with no swaps needed).
Worst case: O(n^2) – the list is in reverse order.
Average case: O(n^2).
Space complexity: O(1) – it sorts in place (no additional memory needed beyond a temporary swap variable).

Comparing Swapping Sorted

Press Step or Play to begin.

Speed

Insertion Sort

Insertion sort builds the sorted list one element at a time. It takes each element from the unsorted portion and inserts it into its correct position within the sorted portion, shifting elements as necessary.

How It Works

Consider the first element as a sorted sub-list of one element.
Take the next element (the “key”) from the unsorted portion.
Compare the key with each element in the sorted portion, moving from right to left.
Shift elements in the sorted portion one position to the right to make space.
Insert the key into its correct position.
Repeat until all elements have been processed.

Code Examples

def insertion_sort(data):
    for i in range(1, len(data)):
        key = data[i]
        j = i - 1
        while j >= 0 and data[j] > key:
            data[j + 1] = data[j]  # Shift element to the right
            j -= 1
        data[j + 1] = key  # Insert key in correct position
    return data

# Example usage
numbers = [12, 11, 13, 5, 6]
print(insertion_sort(numbers))
# Output: [5, 6, 11, 12, 13]

Sub InsertionSort(data() As Integer)
    For i As Integer = 1 To data.Length - 1
        Dim key As Integer = data(i)
        Dim j As Integer = i - 1
        While j >= 0 AndAlso data(j) > key
            data(j + 1) = data(j)  ' Shift element to the right
            j -= 1
        End While
        data(j + 1) = key  ' Insert key in correct position
    Next
End Sub

' Example usage
Dim numbers() As Integer = {12, 11, 13, 5, 6}
InsertionSort(numbers)
For Each num As Integer In numbers
    Console.Write(num & " ")
Next
' Output: 5 6 11 12 13

Performance

Best case: O(n) – the list is already sorted (no shifts needed, just one comparison per element).
Worst case: O(n^2) – the list is in reverse order.
Average case: O(n^2).
Space complexity: O(1) – it sorts in place.

When Is Insertion Sort Useful?

When the data set is small.
When the data is nearly sorted (it performs very well in this case, approaching O(n)).
It is an online algorithm – it can sort a list as it receives new data.

Comparing Swapping/Shifting Sorted

Press Step or Play to begin.

Speed

Merge Sort

Merge Sort, Big O Notation, and the sections that follow are beyond the AS specification. They are included as extension reading for students progressing to A Level.

Merge sort is a divide and conquer sorting algorithm. It recursively splits the list into halves until each sub-list contains a single element, then merges the sub-lists back together in sorted order.

How It Works

Divide: Split the list into two halves.
Conquer: Recursively apply merge sort to each half.
Merge: Combine the two sorted halves into a single sorted list by comparing elements from each half and placing the smaller one first.

Worked example with [38, 27, 43, 3]:

Step 1 (Divide):  [38, 27, 43, 3]
                  /                \
             [38, 27]           [43, 3]
             /      \           /      \
           [38]    [27]       [43]    [3]

Step 2 (Merge):
           [38]    [27]       [43]    [3]
             \      /           \      /
             [27, 38]           [3, 43]
                  \                /
              [3, 27, 38, 43]

Code Examples

def merge_sort(data):
    if len(data) <= 1:
        return data

    mid = len(data) // 2
    left_half = merge_sort(data[:mid])
    right_half = merge_sort(data[mid:])

    return merge(left_half, right_half)

def merge(left, right):
    result = []
    i = 0
    j = 0

    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1

    # Add any remaining elements
    result.extend(left[i:])
    result.extend(right[j:])
    return result

# Example usage
numbers = [38, 27, 43, 3, 9, 82, 10]
print(merge_sort(numbers))
# Output: [3, 9, 10, 27, 38, 43, 82]

Function MergeSort(data() As Integer) As Integer()
    If data.Length <= 1 Then
        Return data
    End If

    Dim mid As Integer = data.Length \ 2
    Dim leftHalf(mid - 1) As Integer
    Dim rightHalf(data.Length - mid - 1) As Integer

    Array.Copy(data, 0, leftHalf, 0, mid)
    Array.Copy(data, mid, rightHalf, 0, data.Length - mid)

    leftHalf = MergeSort(leftHalf)
    rightHalf = MergeSort(rightHalf)

    Return Merge(leftHalf, rightHalf)
End Function

Function Merge(left() As Integer, right() As Integer) As Integer()
    Dim result(left.Length + right.Length - 1) As Integer
    Dim i As Integer = 0
    Dim j As Integer = 0
    Dim k As Integer = 0

    While i < left.Length AndAlso j < right.Length
        If left(i) <= right(j) Then
            result(k) = left(i)
            i += 1
        Else
            result(k) = right(j)
            j += 1
        End If
        k += 1
    End While

    While i < left.Length
        result(k) = left(i)
        i += 1
        k += 1
    End While

    While j < right.Length
        result(k) = right(j)
        j += 1
        k += 1
    End While

    Return result
End Function

' Example usage
Dim numbers() As Integer = {38, 27, 43, 3, 9, 82, 10}
Dim sorted() As Integer = MergeSort(numbers)
For Each num As Integer In sorted
    Console.Write(num & " ")
Next
' Output: 3 9 10 27 38 43 82

Performance

Best case: O(n log n).
Worst case: O(n log n).
Average case: O(n log n).
Space complexity: O(n) – it requires additional memory to hold the sub-lists during merging.

Comparing Swapping (right half) Sorted

Press Step or Play to begin.

Speed

Why O(n log n)?

The list is divided in half at each level of recursion, giving log n levels. At each level, all n elements must be compared during the merge step. Therefore, the total work is n multiplied by log n.

Comparison of Sorting Algorithms

Feature	Bubble Sort	Insertion Sort	Merge Sort
Best case	O(n)	O(n)	O(n log n)
Worst case	O(n^2)	O(n^2)	O(n log n)
Average case	O(n^2)	O(n^2)	O(n log n)
Space complexity	O(1) – in place	O(1) – in place	O(n) – extra memory needed
Stable?	Yes	Yes	Yes
Adaptive?	Yes (with early exit)	Yes (fast on nearly sorted data)	No
Suitable for	Small or nearly sorted data	Small or nearly sorted data	Large data sets
Approach	Comparison and swap	Shift and insert	Divide and conquer

A stable sorting algorithm preserves the relative order of elements with equal keys. All three algorithms covered here (bubble, insertion, merge) are stable.

In the exam you may be asked to recommend a sorting algorithm for a specific scenario. For large data sets, merge sort is preferred because its worst case is O(n log n). For small or nearly sorted data, insertion sort is often the best practical choice. Bubble sort is mainly used for educational purposes due to its simplicity.

Linear Search

Linear search (also called sequential search) is an algorithm that checks every element in a list, one by one, from the beginning until the target value is found or the end of the list is reached.

How It Works

Start at the first element of the list.
Compare the current element with the target value.
If they match, return the position (index) of the element.
If they do not match, move to the next element.
If the end of the list is reached without finding the target, report that the item was not found.

Code Examples

def linear_search(data, target):
    for i in range(len(data)):
        if data[i] == target:
            return i  # Return the index where target is found
    return -1  # Target not found

# Example usage
numbers = [4, 7, 2, 9, 1, 5, 8]
result = linear_search(numbers, 9)
if result != -1:
    print(f"Found at index {result}")
else:
    print("Not found")

Function LinearSearch(data() As Integer, target As Integer) As Integer
    For i As Integer = 0 To data.Length - 1
        If data(i) = target Then
            Return i  ' Return the index where target is found
        End If
    Next
    Return -1  ' Target not found
End Function

' Example usage
Dim numbers() As Integer = {4, 7, 2, 9, 1, 5, 8}
Dim result As Integer = LinearSearch(numbers, 9)
If result <> -1 Then
    Console.WriteLine("Found at index " & result)
Else
    Console.WriteLine("Not found")
End If

Performance

Best case: O(1) – the target is the first element.
Worst case: O(n) – the target is the last element or not present.
Average case: O(n) – on average, half the list is searched.

Advantages and Disadvantages

Advantages	Disadvantages
Simple to understand and implement	Slow for large data sets
Works on unsorted data	Checks every element in the worst case
Works on any data structure (arrays, linked lists)	Not efficient compared to binary search on sorted data

Binary Search

Binary search is a search algorithm that works on sorted data by repeatedly dividing the search interval in half. It compares the target value to the middle element and eliminates half of the remaining elements at each step.

How It Works

Set low to the first index and high to the last index.
Find the middle index: mid = (low + high) // 2.
If data[mid] equals the target, return mid.
If the target is less than data[mid], set high = mid - 1 (search the left half).
If the target is greater than data[mid], set low = mid + 1 (search the right half).
Repeat steps 2-5 until the target is found or low > high (target not present).

Code Examples

def binary_search(data, target):
    low = 0
    high = len(data) - 1

    while low <= high:
        mid = (low + high) // 2
        if data[mid] == target:
            return mid
        elif data[mid] < target:
            low = mid + 1
        else:
            high = mid - 1

    return -1  # Target not found

# Example usage
numbers = [1, 3, 5, 7, 9, 11, 13, 15]
result = binary_search(numbers, 7)
if result != -1:
    print(f"Found at index {result}")
else:
    print("Not found")

Function BinarySearch(data() As Integer, target As Integer) As Integer
    Dim low As Integer = 0
    Dim high As Integer = data.Length - 1

    While low <= high
        Dim mid As Integer = (low + high) \ 2
        If data(mid) = target Then
            Return mid
        ElseIf data(mid) < target Then
            low = mid + 1
        Else
            high = mid - 1
        End If
    End While

    Return -1  ' Target not found
End Function

' Example usage
Dim numbers() As Integer = {1, 3, 5, 7, 9, 11, 13, 15}
Dim result As Integer = BinarySearch(numbers, 7)
If result <> -1 Then
    Console.WriteLine("Found at index " & result)
Else
    Console.WriteLine("Not found")
End If

Performance

Best case: O(1) – the target is the middle element.
Worst case: O(log n) – the search space is halved at each step.
Average case: O(log n).

Why O(log n)?

Each comparison eliminates half the remaining elements. For a list of 1,000,000 items, binary search needs at most about 20 comparisons (since log2(1,000,000) is approximately 20). Linear search would need up to 1,000,000 comparisons.

Comparison of Searching Algorithms

Feature	Linear Search	Binary Search
Data must be sorted?	No	Yes
Best case	O(1)	O(1)
Worst case	O(n)	O(log n)
Average case	O(n)	O(log n)
Implementation complexity	Very simple	Slightly more complex
Suitable for	Small or unsorted data sets	Large, sorted data sets
Works on linked lists?	Yes (efficiently)	Not efficiently (no random access)

A common exam question asks you to explain when you would use each algorithm. If the data is unsorted and small, linear search is appropriate. If the data is sorted and large, binary search is far more efficient. Remember: binary search requires sorted data – if the data is unsorted, the cost of sorting it first must be considered.

Searching for —

Linear Search Comparisons: 0

—

Binary Search Comparisons: 0

Array must be sorted — ✓ already sorted

Checking Search window Eliminated Found

Speed

Big O Notation

Big O notation is a mathematical notation used to describe the upper bound of an algorithm’s time complexity or space complexity. It describes how the resource usage of an algorithm grows relative to the input size n, focusing on the worst-case scenario.

Big O ignores constants and lower-order terms. For example, an algorithm that takes 3n^2 + 5n + 2 steps is described as O(n^2), because the n^2 term dominates as n grows large.

Common Big O Complexities

Notation	Name	Description	Example
O(1)	Constant	The time taken does not depend on the input size.	Accessing an array element by index; checking if a number is odd/even.
O(log n)	Logarithmic	The time grows logarithmically. The algorithm halves the problem size at each step.	Binary search.
O(n)	Linear	The time grows in direct proportion to the input size.	Linear search; summing all elements in a list.
O(n log n)	Linearithmic	A combination of linear and logarithmic growth. Common in efficient sorting.	Merge sort; the best comparison-based sorts.
O(n^2)	Quadratic	The time grows as the square of the input size. Typically caused by nested loops.	Bubble sort; insertion sort (worst case).
O(2^n)	Exponential	The time doubles with each additional input element. Very slow for even moderate n.	Recursive Fibonacci (naive); brute-force subset enumeration.
O(n!)	Factorial	The time grows factorially. Extremely slow; only practical for very small n.	Travelling Salesman Problem (brute-force); generating all permutations.

Growth Rate Comparison

For an input of size n = 1,000:

Complexity	Approximate Operations
O(1)	1
O(log n)	10
O(n)	1,000
O(n log n)	10,000
O(n^2)	1,000,000
O(2^n)	Astronomically large (impractical)

Time Complexity vs Space Complexity

Time complexity measures how the running time of an algorithm grows with input size.
Space complexity measures how the memory usage of an algorithm grows with input size.

For example, merge sort has time complexity O(n log n) but space complexity O(n) because it requires additional arrays during merging. Bubble sort has time complexity O(n^2) but space complexity O(1) because it sorts in place.

You must be able to identify the Big O complexity of the standard algorithms: linear search is O(n), binary search is O(log n), bubble sort is O(n^2), insertion sort is O(n^2), and merge sort is O(n log n). You should also be able to explain why each algorithm has its particular complexity.

Recursion vs Iteration

Recursion is a technique where a function calls itself to solve a smaller instance of the same problem. Each recursive call works towards a base case that stops the recursion. Iteration uses loops (for, while) to repeat a block of code.

Structure of a Recursive Function

Every recursive function must have:

A base case – a condition that stops the recursion and returns a value directly.
A recursive case – the function calls itself with a modified argument that moves towards the base case.

Example: Factorial

The factorial of n (written n!) is defined as:

0! = 1 (base case)
n! = n * (n-1)! (recursive case)

Recursive approach:

def factorial_recursive(n):
    if n == 0:        # Base case
        return 1
    else:             # Recursive case
        return n * factorial_recursive(n - 1)

print(factorial_recursive(5))  # Output: 120

Function FactorialRecursive(n As Integer) As Integer
    If n = 0 Then        ' Base case
        Return 1
    Else                 ' Recursive case
        Return n * FactorialRecursive(n - 1)
    End If
End Function

Console.WriteLine(FactorialRecursive(5))  ' Output: 120

Iterative approach:

def factorial_iterative(n):
    result = 1
    for i in range(1, n + 1):
        result = result * i
    return result

print(factorial_iterative(5))  # Output: 120

Function FactorialIterative(n As Integer) As Integer
    Dim result As Integer = 1
    For i As Integer = 1 To n
        result = result * i
    Next
    Return result
End Function

Console.WriteLine(FactorialIterative(5))  ' Output: 120

Example: Fibonacci Sequence

# Recursive Fibonacci
def fibonacci_recursive(n):
    if n <= 1:          # Base case
        return n
    else:               # Recursive case
        return fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2)

# Iterative Fibonacci
def fibonacci_iterative(n):
    if n <= 1:
        return n
    a, b = 0, 1
    for i in range(2, n + 1):
        a, b = b, a + b
    return b

print(fibonacci_recursive(7))   # Output: 13
print(fibonacci_iterative(7))   # Output: 13

' Recursive Fibonacci
Function FibonacciRecursive(n As Integer) As Integer
    If n <= 1 Then          ' Base case
        Return n
    Else                    ' Recursive case
        Return FibonacciRecursive(n - 1) + FibonacciRecursive(n - 2)
    End If
End Function

' Iterative Fibonacci
Function FibonacciIterative(n As Integer) As Integer
    If n <= 1 Then
        Return n
    End If
    Dim a As Integer = 0
    Dim b As Integer = 1
    For i As Integer = 2 To n
        Dim temp As Integer = b
        b = a + b
        a = temp
    Next
    Return b
End Function

Console.WriteLine(FibonacciRecursive(7))   ' Output: 13
Console.WriteLine(FibonacciIterative(7))   ' Output: 13

Comparison Table

Feature	Recursion	Iteration
Uses	Function calls itself	Loop constructs (for, while)
Termination	Base case	Loop condition becomes false
Memory usage	Higher (each call adds to the call stack)	Lower (no call stack overhead)
Risk	Stack overflow if base case not reached or recursion too deep	Infinite loop if condition never met
Readability	Often more elegant and closer to mathematical definitions	Can be more straightforward for simple repetition
Speed	Often slower due to function call overhead	Usually faster
Best for	Problems with recursive structure (trees, divide and conquer)	Simple repetition and counting

A common exam question asks you to explain the advantages and disadvantages of recursion versus iteration. Always mention the call stack: each recursive call adds a new frame to the call stack, which uses memory. If the recursion is too deep, this can cause a stack overflow error.

Problem Analysis

Before writing code, a problem must be fully understood and broken down. Two key techniques are abstraction and decomposition.

Abstraction

Abstraction removes unnecessary detail, focusing only on what is relevant to solving the problem. It produces a simplified model of a complex situation.

A student records system abstracts away physical details (paper, filing cabinets) and represents only the data and operations needed: name, grade, search, update.
Abstraction lets programmers manage complexity by ignoring irrelevant detail.

Decomposition

Decomposition breaks a large problem into smaller, more manageable sub-problems that can be solved independently.

Example: A booking system decomposes into:

Search for available slots
Validate user input
Record the booking
Send a confirmation

Pattern Recognition

Pattern recognition involves identifying similarities, trends, or repeated structures within and between problems. Recognising a pattern means you can reuse or adapt a solution that already worked elsewhere, rather than solving every problem from scratch.

If a program needs to validate multiple input fields, the same validation logic appears repeatedly — recognising this pattern leads to a reusable validation subroutine.
Sorting algorithms share patterns (comparing and swapping elements) regardless of what data is being sorted.
Pattern recognition supports generalisation — once a pattern is identified, the solution can be applied to an entire class of problems.

Example: If a teacher notices that calculating the average of a list of numbers appears in multiple programs (test scores, attendance figures, rainfall data), they recognise the pattern and write a single calculateAverage function that works for all of them.

Pattern recognition is the process of identifying similarities, trends, and repeated structures within or between problems. It allows solutions to be generalised and reused.

Problem Analysis Steps

Understand the problem — identify inputs, outputs, and constraints.
Identify data — what types, validation rules, and structures are needed?
Decompose — break into sub-tasks.
Design algorithms — write pseudocode or flowcharts for each sub-task.
Plan test cases — consider normal, boundary, and erroneous data before coding.

Always identify inputs, outputs, process, and constraints when analysing a problem. State these before designing an algorithm — examiners award marks for clearly defining the problem before attempting a solution.

Programming Constructs

All algorithms are built from three fundamental constructs: sequence, selection, and repetition.

Sequence

Instructions execute one after another in the order written — the default behaviour of a program.

SET total TO 0
INPUT price
SET total TO total + price
OUTPUT total

Selection

Selection allows the program to choose between paths based on a condition.

INPUT mark
IF mark >= 70 THEN
    OUTPUT "Distinction"
ELSE IF mark >= 50 THEN
    OUTPUT "Pass"
ELSE
    OUTPUT "Fail"
END IF

mark = int(input("Enter mark: "))
if mark >= 70:
    print("Distinction")
elif mark >= 50:
    print("Pass")
else:
    print("Fail")

Dim mark As Integer = CInt(Console.ReadLine())
If mark >= 70 Then
    Console.WriteLine("Distinction")
ElseIf mark >= 50 Then
    Console.WriteLine("Pass")
Else
    Console.WriteLine("Fail")
End If

Repetition

Repetition executes a block of instructions multiple times.

Count-controlled (FOR):

for i in range(1, 11):
    print(i)

For i As Integer = 1 To 10
    Console.WriteLine(i)
Next

Condition-controlled (WHILE):

password = ""
while password != "correct":
    password = input("Enter password: ")

Dim password As String = ""
Do While password <> "correct"
    password = Console.ReadLine()
Loop

Counts and Rogue Values

Counts track how many times a loop has run and stop it after a set number of iterations.

Rogue values (sentinel values) are special inputs outside the valid data range that signal the end of data entry — the loop stops when the rogue value is entered.

# Rogue value example — -1 signals end of input
total = 0
number = int(input("Enter number (-1 to stop): "))
while number != -1:
    total += number
    number = int(input("Enter number (-1 to stop): "))
print(f"Total: {total}")

Dim total As Integer = 0
Dim number As Integer = CInt(Console.ReadLine())
Do While number <> -1
    total = total + number
    number = CInt(Console.ReadLine())
Loop
Console.WriteLine($"Total: {total}")

The rogue value (-1) is never added to the total — the loop exits when it is detected.

Sequence runs instructions in order. Selection chooses between paths on a condition. Repetition repeats instructions while a condition holds. A count controls loop iterations. A rogue value is a special input that terminates a loop.

When tracing a loop with a rogue value, always check whether the rogue value itself gets processed — it should not. The condition check must happen before (or immediately after) the body, ensuring the rogue value causes exit without being included in any calculation.

Structure of a Typical Program

A well-structured program follows a logical layout that makes it easy to read, debug, and maintain. Although the exact syntax differs between programming languages, most programs share common structural elements.

Common Structural Elements

Import/library statements – bring in external modules or libraries at the top of the file.
Constant declarations – values that do not change throughout execution are declared early.
Global variable declarations – variables accessible throughout the entire program (used sparingly).
Subroutine definitions – functions and procedures that contain the main logic, defined before they are called.
Main program / entry point – the block of code that runs when the program starts, typically calling subroutines to carry out tasks.
Comments – annotations explaining the purpose of code, placed throughout to aid readability.

# Import statements
import math

# Constants
PI = 3.14159
MAX_ATTEMPTS = 3

# Function definitions
def calculate_area(radius):
    return PI * radius ** 2

def get_radius():
    radius = float(input("Enter radius: "))
    return radius

# Main program
def main():
    r = get_radius()
    area = calculate_area(r)
    print(f"The area is {area:.2f}")

main()

' Import statements
Imports System.Math

Module Program
    ' Constants
    Const PI As Double = 3.14159
    Const MAX_ATTEMPTS As Integer = 3

    ' Function definitions
    Function CalculateArea(radius As Double) As Double
        Return PI * radius ^ 2
    End Function

    Function GetRadius() As Double
        Console.Write("Enter radius: ")
        Dim radius As Double = Convert.ToDouble(Console.ReadLine())
        Return radius
    End Function

    ' Main program
    Sub Main()
        Dim r As Double = GetRadius()
        Dim area As Double = CalculateArea(r)
        Console.WriteLine($"The area is {area:F2}")
    End Sub
End Module

A subroutine is a named block of code that performs a specific task. Subroutines include both functions (which return a value) and procedures (which do not return a value). Using subroutines is fundamental to writing structured, modular programs.

Modular Design and Decomposition

Modular design is the practice of breaking a large program into smaller, independent, self-contained modules. Each module handles one specific aspect of the program’s functionality. This is the practical application of top-down design during the coding phase.

Benefits of Modular Design

Benefit	Explanation
Readability	Smaller modules are easier to read and understand than one monolithic block of code
Reusability	A well-written module can be reused in other programs without modification
Ease of debugging	Bugs are isolated within specific modules, making them easier to locate and fix
Team working	Different programmers can develop and test different modules simultaneously
Maintainability	Individual modules can be updated or replaced without affecting the rest of the program
Testing	Each module can be tested independently (unit testing) before being integrated

Decomposition in Practice

Decomposition means dividing a problem into sub-problems, each of which becomes a module. For example, a student records system might be decomposed into:

A module to add new students
A module to search for student records
A module to update student details
A module to generate reports
A module to handle file input/output

Each module would be implemented as one or more subroutines.

When asked about the benefits of a modular approach, always relate your answer to practical programming scenarios. For example: “Modular design allows team working because programmer A can write the input validation module while programmer B works on the report generation module, reducing overall development time.”

Functions and Procedures

Subroutines come in two forms: functions and procedures. The key difference is that a function returns a value, while a procedure does not.

Procedures

A procedure performs a task but does not return a value to the calling code. It is called for its side effects (e.g. displaying output, writing to a file).

# Procedure - does not return a value
def display_welcome(name):
    print(f"Welcome, {name}!")
    print("Please select an option from the menu.")

# Calling the procedure
display_welcome("Alice")

' Procedure - does not return a value
Sub DisplayWelcome(name As String)
    Console.WriteLine($"Welcome, {name}!")
    Console.WriteLine("Please select an option from the menu.")
End Sub

' Calling the procedure
DisplayWelcome("Alice")

Functions

A function performs a task and returns a value to the calling code. The returned value can be stored in a variable or used directly in an expression.

# Function - returns a value
def calculate_vat(price, rate=0.20):
    vat = price * rate
    return vat

# Calling the function and using the returned value
item_price = 50.00
tax = calculate_vat(item_price)
total = item_price + tax
print(f"Total including VAT: £{total:.2f}")

' Function - returns a value
Function CalculateVAT(price As Double, Optional rate As Double = 0.2) As Double
    Dim vat As Double = price * rate
    Return vat
End Function

' Calling the function and using the returned value
Dim itemPrice As Double = 50.0
Dim tax As Double = CalculateVAT(itemPrice)
Dim total As Double = itemPrice + tax
Console.WriteLine($"Total including VAT: £{total:F2}")

Key Differences

Feature	Function	Procedure
Returns a value	Yes	No
Can be used in expressions	Yes (e.g. `x = myFunc()`)	No
Keyword in Python	`def` (with `return`)	`def` (without `return`)
Keyword in VB.NET	`Function ... End Function`	`Sub ... End Sub`
Typical use	Calculations, lookups, conversions	Displaying output, modifying data structures

A function is a subroutine that returns a value. A procedure (called a Sub in VB.NET) is a subroutine that performs an action but does not return a value.

Data Compression

Compression reduces file size, making data faster to transmit and requiring less storage. At AS level you need to understand both categories and specific algorithms.

Lossless vs Lossy Compression

Feature	Lossless	Lossy
Data loss	None — original data perfectly reconstructed	Some data permanently removed
File size	Larger than lossy	Smaller than lossless
Use cases	Text, code, spreadsheets, medical images, archives	Photos, music, video, streaming
File formats	ZIP, PNG, GIF, FLAC, 7Z	JPEG, MP3, MP4/MPEG, AAC
How it works	Identifies and encodes patterns/redundancy	Removes data humans are unlikely to notice

Lossless compression reduces file size while allowing the original data to be perfectly reconstructed. Lossy compression achieves greater size reduction by permanently removing some data, meaning the original cannot be fully restored.

Run-Length Encoding (RLE)

RLE is a lossless algorithm that replaces consecutive runs of the same value with a count and the value.

Worked Example

Original data: AAAAAABBBBCCDDDDDDD

Run	Count	Value	Encoded
AAAAAA	6	A	6A
BBBB	4	B	4B
CC	2	C	2C
DDDDDDD	7	D	7D

Encoded: 6A4B2C7D

Space calculation:

Original: 19 characters (19 bytes if 1 byte per character)
Encoded: 8 characters (8 bytes)
Saving: (19 - 8) / 19 × 100 = 57.9%

When RLE Works Well vs Poorly

Works well: Data with long runs of repeated values — simple graphics, icons, diagrams, fax documents
Works poorly: Data with little repetition — photographs, random data. The encoded version may be larger than the original (e.g. ABCDEF → 1A1B1C1D1E1F is longer)

# Python — Simple RLE encoding
def rle_encode(data):
    encoded = ""
    i = 0
    while i < len(data):
        count = 1
        while i + count < len(data) and data[i + count] == data[i]:
            count += 1
        encoded += str(count) + data[i]
        i += count
    return encoded

def rle_decode(encoded):
    decoded = ""
    i = 0
    while i < len(encoded):
        count = ""
        while encoded[i].isdigit():
            count += encoded[i]
            i += 1
        decoded += encoded[i] * int(count)
        i += 1
    return decoded

original = "AAAAAABBBBCCDDDDDDD"
compressed = rle_encode(original)
print(f"Original:    {original} ({len(original)} chars)")
print(f"Compressed:  {compressed} ({len(compressed)} chars)")
print(f"Decompressed: {rle_decode(compressed)}")

' VB.NET — Simple RLE encoding
Function RLEEncode(data As String) As String
    Dim encoded As String = ""
    Dim i As Integer = 0
    While i < data.Length
        Dim count As Integer = 1
        While i + count < data.Length AndAlso data(i + count) = data(i)
            count += 1
        End While
        encoded &= count.ToString() & data(i)
        i += count
    End While
    Return encoded
End Function

Dim original As String = "AAAAAABBBBCCDDDDDDD"
Dim compressed As String = RLEEncode(original)
Console.WriteLine("Original:   " & original & " (" & original.Length & " chars)")
Console.WriteLine("Compressed: " & compressed & " (" & compressed.Length & " chars)")

Dictionary-Based (LZ) Compression

Dictionary-based compression (named after Lempel and Ziv — LZ77, LZ78, LZW) builds a dictionary of recurring patterns in the data and replaces them with shorter codes.

How It Works

The algorithm scans the data and identifies repeated sequences
Each unique sequence is added to a dictionary with a short index code
Subsequent occurrences of the sequence are replaced with the dictionary code
The dictionary is stored with the compressed data so it can be decoded

Worked Example

Original text: THE CAT SAT ON THE MAT

Dictionary Index	Entry
0	THE
1	CAT
2	SAT
3	ON
4	MAT

Encoded: 0 1 2 3 0 4 (plus spaces/structure)

The word “THE” appears twice but is only stored once in the dictionary. The second occurrence is replaced by its index code (0), saving space.

Key characteristics:

Lossless — original data is perfectly reconstructed using the dictionary
More effective than RLE for text and general-purpose compression
Used in ZIP, GIF, PNG, and PDF formats
Effectiveness depends on how much repetition exists in the data

Huffman Encoding

Huffman encoding is a lossless algorithm that assigns variable-length binary codes to characters based on their frequency. More frequent characters get shorter codes, less frequent characters get longer codes.

How It Works

Count the frequency of each character in the data
Build a binary tree by repeatedly combining the two lowest-frequency characters
Assign binary codes by traversing the tree (left = 0, right = 1)
Replace each character with its Huffman code

Worked Example

Text: AABBBCCCCDDDD (14 characters)

Step 1 — Frequency count:

Character	Frequency
A	2
B	3
C	4
D	5

Step 2 — Build the tree (combine two smallest repeatedly):

Combine A(2) + B(3) = AB(5)
Combine C(4) + AB(5) = CAB(9)
Combine D(5) + CAB(9) = Root(14)

Step 3 — Assign codes (traversing the tree):

Character	Code	Code Length
D	0	1 bit
C	10	2 bits
A	111	3 bits
B	110	3 bits

Step 4 — Encode: AABBBCCCCDDDD → 111 111 110 110 110 10 10 10 10 0 0 0 0 0

Space calculation:

Using fixed-length codes (e.g. ASCII 7-bit): 14 × 7 = 98 bits
Using Huffman codes: (2×3) + (3×3) + (4×2) + (5×1) = 6 + 9 + 8 + 5 = 28 bits
Saving: (98 - 28) / 98 × 100 = 71.4%

(The dictionary must also be stored, adding some overhead, but for large files the savings are substantial.)

In exam questions on Huffman coding, you may be asked to: (1) build a frequency table, (2) construct the Huffman tree, (3) determine the codes for each character, or (4) calculate the space saving compared to fixed-length encoding. Always show your working clearly. Remember that more frequent characters get shorter codes — this is the key insight that makes Huffman efficient.

Compression File Formats Summary

Format	Type	Algorithm	Typical Use
ZIP	Lossless	LZ + Huffman (DEFLATE)	General file archiving
PNG	Lossless	LZ + filtering	Web graphics, screenshots, diagrams
GIF	Lossless	LZW	Simple animations, icons
FLAC	Lossless	Linear prediction + Rice coding	High-quality audio archiving
7Z	Lossless	LZMA	High-ratio file archiving
JPEG	Lossy	DCT + quantisation + Huffman	Photographs
MP3	Lossy	Psychoacoustic model + Huffman	Music, podcasts
MP4/MPEG	Lossy	Motion estimation + DCT	Video
AAC	Lossy	Improved psychoacoustic model	Music streaming (Apple, YouTube)

When recommending a compression method, consider the type of data and whether quality loss is acceptable. Text files and program code must use lossless compression (any data loss would corrupt them). Photos and music can use lossy compression because small quality reductions are often imperceptible.

Test Data

To test a program thoroughly, you need to select appropriate test data. Test data should be carefully chosen to check all possible paths through the program and to identify potential errors. There are three main categories.

Normal (Valid) Data

Data that is within the expected range and of the correct format.
The system should accept this data and process it correctly.
Tests that the program works under normal operating conditions.

Boundary (Extreme) Data

Data that is at the very edges of the valid range.
This is where errors most commonly occur (off-by-one errors).
You should test values at the boundary itself and values just inside and just outside the boundary.

Erroneous (Invalid) Data

Data that is outside the expected range, of the wrong type, or otherwise invalid.
The system should reject this data gracefully without crashing.
Tests the robustness of input validation.

Example: Age Input (Valid Range 18–65)

Test Data	Type	Expected Result	Reason
25	Normal	Accepted	Within valid range
40	Normal	Accepted	Within valid range
18	Boundary	Accepted	At lower boundary (minimum valid)
17	Boundary	Rejected	Just below lower boundary
65	Boundary	Accepted	At upper boundary (maximum valid)
66	Boundary	Rejected	Just above upper boundary
-5	Erroneous	Rejected	Negative number, outside range
200	Erroneous	Rejected	Far outside valid range
“abc”	Erroneous	Rejected	Wrong data type (string, not integer)
18.5	Erroneous	Rejected	Real number, not integer
”” (empty)	Erroneous	Rejected	No data entered

When creating test data in the exam, always include at least one example of each type: normal, boundary, and erroneous. For boundary data, test the value at the boundary AND the values either side of it (e.g. for a range of 1–100, test 0, 1, 100, and 101). Some mark schemes consider boundary data that falls just outside the range as erroneous – include both to be safe.

Trace Tables

A trace table is a technique used to dry-run an algorithm by recording the values of variables at each step of execution. It is used to check the correctness of an algorithm and to understand its behaviour.

How to Create a Trace Table

Create a column for each variable in the algorithm.
Add a column for any output.
Add a column for the condition being tested (optional but helpful).
Work through the algorithm step by step, recording the value of each variable after every instruction that changes it.

Example: Tracing a Linear Search

Algorithm to search for the value 5 in the list [3, 8, 5, 1]:

data = [3, 8, 5, 1]
target = 5
FOR i FROM 0 TO 3
    IF data[i] == target THEN
        OUTPUT i
    END IF
END FOR

Trace table:

Step	i	data[i]	data[i] == target?	Output
1	0	3	No	–
2	1	8	No	–
3	2	5	Yes	2
4	3	1	No	–

Example: Tracing a Bubble Sort Pass

First pass of bubble sort on [5, 3, 8, 1]:

Step	j	data[j]	data[j+1]	Swap?	List after step
1	0	5	3	Yes	[3, 5, 8, 1]
2	1	5	8	No	[3, 5, 8, 1]
3	2	8	1	Yes	[3, 5, 1, 8]

After the first pass, the largest value (8) has “bubbled” to the end.

Example: Tracing Recursion

Tracing factorial_recursive(4):

Call	n	n == 0?	Returns
factorial_recursive(4)	4	No	4 * factorial_recursive(3)
factorial_recursive(3)	3	No	3 * factorial_recursive(2)
factorial_recursive(2)	2	No	2 * factorial_recursive(1)
factorial_recursive(1)	1	No	1 * factorial_recursive(0)
factorial_recursive(0)	0	Yes	1

Now the returns unwind: 1 * 1 = 1, then 2 * 1 = 2, then 3 * 2 = 6, then 4 * 6 = 24.

Trace tables are a very common exam question. Practise by tracing through algorithms by hand. Always work methodically – update one variable at a time and do not skip steps. For recursive algorithms, show the chain of calls and the order in which values are returned.

Dry Running / Desk Checking

Dry running (also called desk checking) is the process of manually working through a program on paper, without using a computer, to check that it produces the correct results.

How to Dry Run a Program

Write out the variable names as column headings in a trace table.
Work through the code line by line, updating the variable values in the table.
For each decision (IF statement), evaluate the condition and follow the correct branch.
For each loop, track the loop counter and repeat until the exit condition is met.
Compare the final output with the expected output.

Benefits of Dry Running

Can identify logic errors before the program is even entered into a computer.
Helps programmers understand the flow of execution through complex code.
Useful during exams where you may not have access to a computer.
Forces careful, systematic analysis of the algorithm.

Example: Dry Run

numbers = [4, 7, 2, 9, 1]
largest = numbers[0]
for i in range(1, len(numbers)):
    if numbers[i] > largest:
        largest = numbers[i]
print(largest)

Dim numbers() As Integer = {4, 7, 2, 9, 1}
Dim largest As Integer = numbers(0)
For i As Integer = 1 To numbers.Length - 1
    If numbers(i) > largest Then
        largest = numbers(i)
    End If
Next
Console.WriteLine(largest)

Trace table:

i	numbers(i)	numbers(i) > largest?	largest
-	-	-	4
1	7	7 > 4? Yes	7
2	2	2 > 7? No	7
3	9	9 > 7? Yes	9
4	1	1 > 9? No	9

Output: 9

Dry running (desk checking) is manually working through a program on paper, line by line, to verify that it produces the correct output. A trace table is a tabular record of variable values at each step of execution.

Error Types

Errors in programs fall into three main categories. Understanding the differences is essential for effective debugging.

Syntax Errors

A syntax error occurs when the code violates the rules (grammar) of the programming language. The program will not compile or run until syntax errors are fixed.

# Syntax errors in Python
print("Hello"          # Missing closing bracket
if x == 5              # Missing colon
    prit("Five")       # Misspelled keyword (not caught as syntax but as NameError)
for i in range(10)     # Missing colon

' Syntax errors in VB.NET
Console.WriteLine("Hello"      ' Missing closing bracket
If x = 5                       ' Missing Then keyword
    Console.WrteLine("Five")   ' Misspelled method name
For i As Integer = 1 To 10     ' Missing Next at the end

Characteristics:

Detected by the compiler/interpreter before the program runs.
Usually easy to find because the error message indicates the line and type of error.
The program cannot execute until all syntax errors are corrected.

Logic Errors

A logic error occurs when the program runs without crashing but produces incorrect results. The code is syntactically valid but the algorithm is wrong.

# Logic error: using + instead of *
def calculate_area(length, width):
    return length + width    # Should be length * width

# Logic error: wrong comparison operator
def is_adult(age):
    return age > 18    # Should be age >= 18 (18-year-olds are adults)

# Logic error: off-by-one error in loop
total = 0
for i in range(1, 10):    # Misses 10; should be range(1, 11)
    total = total + i

' Logic error: using + instead of *
Function CalculateArea(length As Double, width As Double) As Double
    Return length + width    ' Should be length * width
End Function

' Logic error: wrong comparison operator
Function IsAdult(age As Integer) As Boolean
    Return age > 18    ' Should be age >= 18
End Function

' Logic error: off-by-one error in loop
Dim total As Integer = 0
For i As Integer = 1 To 9    ' Misses 10; should be 1 To 10
    total = total + i
Next

Characteristics:

NOT detected by the compiler/interpreter – the program runs without error messages.
The program produces incorrect or unexpected output.
The hardest type of error to find because there are no error messages pointing to the problem.
Found through testing (using test data and comparing actual output to expected output) or dry running.

Runtime Errors

A runtime error occurs during program execution, causing the program to crash or terminate abnormally. The code is syntactically correct but encounters a problem that cannot be handled during execution.

# Runtime errors in Python
numbers = [1, 2, 3]
print(numbers[5])        # IndexError: list index out of range

result = 10 / 0           # ZeroDivisionError: division by zero

age = int("hello")         # ValueError: invalid literal for int()

file = open("nonexistent.txt")  # FileNotFoundError

' Runtime errors in VB.NET
Dim numbers() As Integer = {1, 2, 3}
Console.WriteLine(numbers(5))   ' IndexOutOfRangeException

Dim result As Integer = 10 \ 0   ' DivideByZeroException

Dim age As Integer = CInt("hello")  ' InvalidCastException / FormatException

Dim file As String = IO.File.ReadAllText("nonexistent.txt")  ' FileNotFoundException

Characteristics:

NOT detected at compile time – the program starts running but crashes during execution.
An error message is displayed indicating the type and location of the error.
Often caused by unexpected input, missing resources, or edge cases.
Can be prevented using input validation, error handling (try/catch), and defensive programming.

Comparison of Error Types

Feature	Syntax Error	Logic Error	Runtime Error
When detected	Compile/interpret time	During testing or use	During execution
Program runs?	No	Yes	Yes, but crashes
Error message?	Yes (from compiler)	No	Yes (when it crashes)
Difficulty to find	Easy	Hard	Medium
Example	Missing bracket	Wrong formula	Division by zero
How to fix	Read error message, fix syntax	Test with data, use trace table	Add validation, use try/catch

The exam often gives a piece of code and asks you to identify the type of error. Remember: if the code would not run at all, it is a syntax error. If it runs but gives the wrong answer, it is a logic error. If it runs but crashes at some point during execution, it is a runtime error. Be prepared to identify the specific error and explain how to fix it.

← Principles of Programming →