Algorithms & Programs

AS — Unit 1: Fundamentals of Computer Science

What Is an Algorithm?

An algorithm is a finite, ordered sequence of unambiguous instructions that, given a set of inputs, produces a defined set of outputs and terminates in a finite amount of time.

An algorithm is essentially a step-by-step recipe for solving a problem. Before any code is written, a programmer designs an algorithm to describe the logic of the solution. Algorithms are language-independent – the same algorithm can be implemented in Python, VB.NET, Java, or any other language.

Characteristics of an Algorithm

Input – An algorithm has zero or more inputs taken from a specified set.
Output – An algorithm produces one or more outputs that have a defined relationship to the inputs.
Finiteness – An algorithm must always terminate after a finite number of steps.
Definiteness – Each step must be precisely and unambiguously defined.
Effectiveness – Every step must be basic enough that it could, in principle, be carried out by a person using pencil and paper.

Why Algorithms Matter

They allow us to plan a solution before writing code.
They help us communicate solutions to other programmers regardless of programming language.
They allow us to analyse the efficiency of a solution before implementing it.
They form the basis for correctness proofs and testing strategies.

Representing Algorithms

Before implementing an algorithm in code, we represent it using one of three standard methods.

Pseudocode

Pseudocode is a structured, English-like notation for describing algorithms. It is not a real programming language – there is no single standard syntax – but it follows logical programming constructs such as sequence, selection, and iteration.

Example – Finding the largest value in a list:

BEGIN
    SET max TO list[0]
    FOR i FROM 1 TO LENGTH(list) - 1
        IF list[i] > max THEN
            SET max TO list[i]
        END IF
    END FOR
    OUTPUT max
END

Pseudocode advantages:

Quick to write and easy to read.
Not tied to any programming language.
Focuses on logic rather than syntax.

Flowcharts

A flowchart is a diagrammatic representation of an algorithm using standard symbols:

Symbol	Shape	Purpose
Terminator	Rounded rectangle (oval)	Start or end of the algorithm
Process	Rectangle	An instruction or action
Decision	Diamond	A yes/no or true/false question
Input/Output	Parallelogram	Data input or output
Flow line	Arrow	Shows the direction of flow

Flowchart advantages:

Visual representation makes logic easy to follow.
Good for showing branching and looping clearly.
Useful for non-programmers to understand the logic.

Flowchart disadvantages:

Can become very large and unwieldy for complex algorithms.
Harder to modify than pseudocode.

Structured English

Structured English uses a restricted subset of natural language combined with programming constructs. It avoids technical jargon and is useful for communicating with non-technical stakeholders.

Example:

1. Start with the first item in the list.
2. Compare it with every other item.
3. If a larger item is found, remember it as the new largest.
4. After checking all items, display the largest item.

In the exam, you may be asked to write an algorithm in pseudocode or to trace through a flowchart. Make sure you are comfortable with all three representations and can convert between them.

Linear Search

Linear search (also called sequential search) is an algorithm that checks every element in a list, one by one, from the beginning until the target value is found or the end of the list is reached.

How It Works

Start at the first element of the list.
Compare the current element with the target value.
If they match, return the position (index) of the element.
If they do not match, move to the next element.
If the end of the list is reached without finding the target, report that the item was not found.

Code Examples

def linear_search(data, target):
    for i in range(len(data)):
        if data[i] == target:
            return i  # Return the index where target is found
    return -1  # Target not found

# Example usage
numbers = [4, 7, 2, 9, 1, 5, 8]
result = linear_search(numbers, 9)
if result != -1:
    print(f"Found at index {result}")
else:
    print("Not found")

Function LinearSearch(data() As Integer, target As Integer) As Integer
    For i As Integer = 0 To data.Length - 1
        If data(i) = target Then
            Return i  ' Return the index where target is found
        End If
    Next
    Return -1  ' Target not found
End Function

' Example usage
Dim numbers() As Integer = {4, 7, 2, 9, 1, 5, 8}
Dim result As Integer = LinearSearch(numbers, 9)
If result <> -1 Then
    Console.WriteLine("Found at index " & result)
Else
    Console.WriteLine("Not found")
End If

Performance

Best case: O(1) – the target is the first element.
Worst case: O(n) – the target is the last element or not present.
Average case: O(n) – on average, half the list is searched.

Advantages and Disadvantages

Advantages	Disadvantages
Simple to understand and implement	Slow for large data sets
Works on unsorted data	Checks every element in the worst case
Works on any data structure (arrays, linked lists)	Not efficient compared to binary search on sorted data

Binary Search

Binary search is a search algorithm that works on sorted data by repeatedly dividing the search interval in half. It compares the target value to the middle element and eliminates half of the remaining elements at each step.

How It Works

Set low to the first index and high to the last index.
Find the middle index: mid = (low + high) // 2.
If data[mid] equals the target, return mid.
If the target is less than data[mid], set high = mid - 1 (search the left half).
If the target is greater than data[mid], set low = mid + 1 (search the right half).
Repeat steps 2-5 until the target is found or low > high (target not present).

Code Examples

def binary_search(data, target):
    low = 0
    high = len(data) - 1

    while low <= high:
        mid = (low + high) // 2
        if data[mid] == target:
            return mid
        elif data[mid] < target:
            low = mid + 1
        else:
            high = mid - 1

    return -1  # Target not found

# Example usage
numbers = [1, 3, 5, 7, 9, 11, 13, 15]
result = binary_search(numbers, 7)
if result != -1:
    print(f"Found at index {result}")
else:
    print("Not found")

Function BinarySearch(data() As Integer, target As Integer) As Integer
    Dim low As Integer = 0
    Dim high As Integer = data.Length - 1

    While low <= high
        Dim mid As Integer = (low + high) \ 2
        If data(mid) = target Then
            Return mid
        ElseIf data(mid) < target Then
            low = mid + 1
        Else
            high = mid - 1
        End If
    End While

    Return -1  ' Target not found
End Function

' Example usage
Dim numbers() As Integer = {1, 3, 5, 7, 9, 11, 13, 15}
Dim result As Integer = BinarySearch(numbers, 7)
If result <> -1 Then
    Console.WriteLine("Found at index " & result)
Else
    Console.WriteLine("Not found")
End If

Performance

Best case: O(1) – the target is the middle element.
Worst case: O(log n) – the search space is halved at each step.
Average case: O(log n).

Why O(log n)?

Each comparison eliminates half the remaining elements. For a list of 1,000,000 items, binary search needs at most about 20 comparisons (since log2(1,000,000) is approximately 20). Linear search would need up to 1,000,000 comparisons.

Comparison of Searching Algorithms

Feature	Linear Search	Binary Search
Data must be sorted?	No	Yes
Best case	O(1)	O(1)
Worst case	O(n)	O(log n)
Average case	O(n)	O(log n)
Implementation complexity	Very simple	Slightly more complex
Suitable for	Small or unsorted data sets	Large, sorted data sets
Works on linked lists?	Yes (efficiently)	Not efficiently (no random access)

A common exam question asks you to explain when you would use each algorithm. If the data is unsorted and small, linear search is appropriate. If the data is sorted and large, binary search is far more efficient. Remember: binary search requires sorted data – if the data is unsorted, the cost of sorting it first must be considered.

Bubble Sort

Bubble sort is a simple sorting algorithm that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. The largest unsorted value “bubbles up” to the end of the list on each pass.

How It Works

Start at the beginning of the list.
Compare the first two adjacent elements. If the first is greater than the second, swap them.
Move to the next pair and repeat the comparison and swap.
Continue until the end of the list – this completes one pass. The largest element is now in its correct position.
Repeat passes for the remaining unsorted portion of the list.
If a complete pass is made with no swaps, the list is sorted and the algorithm terminates early.

Code Examples

def bubble_sort(data):
    n = len(data)
    for i in range(n - 1):
        swapped = False
        for j in range(n - 1 - i):
            if data[j] > data[j + 1]:
                data[j], data[j + 1] = data[j + 1], data[j]
                swapped = True
        if not swapped:
            break  # List is already sorted
    return data

# Example usage
numbers = [64, 34, 25, 12, 22, 11, 90]
print(bubble_sort(numbers))
# Output: [11, 12, 22, 25, 34, 64, 90]

Sub BubbleSort(data() As Integer)
    Dim n As Integer = data.Length
    For i As Integer = 0 To n - 2
        Dim swapped As Boolean = False
        For j As Integer = 0 To n - 2 - i
            If data(j) > data(j + 1) Then
                Dim temp As Integer = data(j)
                data(j) = data(j + 1)
                data(j + 1) = temp
                swapped = True
            End If
        Next
        If Not swapped Then
            Exit For  ' List is already sorted
        End If
    Next
End Sub

' Example usage
Dim numbers() As Integer = {64, 34, 25, 12, 22, 11, 90}
BubbleSort(numbers)
For Each num As Integer In numbers
    Console.Write(num & " ")
Next
' Output: 11 12 22 25 34 64 90

Performance

Best case: O(n) – the list is already sorted (only one pass with no swaps needed).
Worst case: O(n^2) – the list is in reverse order.
Average case: O(n^2).
Space complexity: O(1) – it sorts in place (no additional memory needed beyond a temporary swap variable).

Insertion Sort

Insertion sort builds the sorted list one element at a time. It takes each element from the unsorted portion and inserts it into its correct position within the sorted portion, shifting elements as necessary.

How It Works

Consider the first element as a sorted sub-list of one element.
Take the next element (the “key”) from the unsorted portion.
Compare the key with each element in the sorted portion, moving from right to left.
Shift elements in the sorted portion one position to the right to make space.
Insert the key into its correct position.
Repeat until all elements have been processed.

Code Examples

def insertion_sort(data):
    for i in range(1, len(data)):
        key = data[i]
        j = i - 1
        while j >= 0 and data[j] > key:
            data[j + 1] = data[j]  # Shift element to the right
            j -= 1
        data[j + 1] = key  # Insert key in correct position
    return data

# Example usage
numbers = [12, 11, 13, 5, 6]
print(insertion_sort(numbers))
# Output: [5, 6, 11, 12, 13]

Sub InsertionSort(data() As Integer)
    For i As Integer = 1 To data.Length - 1
        Dim key As Integer = data(i)
        Dim j As Integer = i - 1
        While j >= 0 AndAlso data(j) > key
            data(j + 1) = data(j)  ' Shift element to the right
            j -= 1
        End While
        data(j + 1) = key  ' Insert key in correct position
    Next
End Sub

' Example usage
Dim numbers() As Integer = {12, 11, 13, 5, 6}
InsertionSort(numbers)
For Each num As Integer In numbers
    Console.Write(num & " ")
Next
' Output: 5 6 11 12 13

Performance

Best case: O(n) – the list is already sorted (no shifts needed, just one comparison per element).
Worst case: O(n^2) – the list is in reverse order.
Average case: O(n^2).
Space complexity: O(1) – it sorts in place.

When Is Insertion Sort Useful?

When the data set is small.
When the data is nearly sorted (it performs very well in this case, approaching O(n)).
It is an online algorithm – it can sort a list as it receives new data.

Merge Sort

Merge sort is a divide and conquer sorting algorithm. It recursively splits the list into halves until each sub-list contains a single element, then merges the sub-lists back together in sorted order.

How It Works

Divide: Split the list into two halves.
Conquer: Recursively apply merge sort to each half.
Merge: Combine the two sorted halves into a single sorted list by comparing elements from each half and placing the smaller one first.

Worked example with [38, 27, 43, 3]:

Step 1 (Divide):  [38, 27, 43, 3]
                  /                \
             [38, 27]           [43, 3]
             /      \           /      \
           [38]    [27]       [43]    [3]

Step 2 (Merge):
           [38]    [27]       [43]    [3]
             \      /           \      /
             [27, 38]           [3, 43]
                  \                /
              [3, 27, 38, 43]

Code Examples

def merge_sort(data):
    if len(data) <= 1:
        return data

    mid = len(data) // 2
    left_half = merge_sort(data[:mid])
    right_half = merge_sort(data[mid:])

    return merge(left_half, right_half)

def merge(left, right):
    result = []
    i = 0
    j = 0

    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1

    # Add any remaining elements
    result.extend(left[i:])
    result.extend(right[j:])
    return result

# Example usage
numbers = [38, 27, 43, 3, 9, 82, 10]
print(merge_sort(numbers))
# Output: [3, 9, 10, 27, 38, 43, 82]

Function MergeSort(data() As Integer) As Integer()
    If data.Length <= 1 Then
        Return data
    End If

    Dim mid As Integer = data.Length \ 2
    Dim leftHalf(mid - 1) As Integer
    Dim rightHalf(data.Length - mid - 1) As Integer

    Array.Copy(data, 0, leftHalf, 0, mid)
    Array.Copy(data, mid, rightHalf, 0, data.Length - mid)

    leftHalf = MergeSort(leftHalf)
    rightHalf = MergeSort(rightHalf)

    Return Merge(leftHalf, rightHalf)
End Function

Function Merge(left() As Integer, right() As Integer) As Integer()
    Dim result(left.Length + right.Length - 1) As Integer
    Dim i As Integer = 0
    Dim j As Integer = 0
    Dim k As Integer = 0

    While i < left.Length AndAlso j < right.Length
        If left(i) <= right(j) Then
            result(k) = left(i)
            i += 1
        Else
            result(k) = right(j)
            j += 1
        End If
        k += 1
    End While

    While i < left.Length
        result(k) = left(i)
        i += 1
        k += 1
    End While

    While j < right.Length
        result(k) = right(j)
        j += 1
        k += 1
    End While

    Return result
End Function

' Example usage
Dim numbers() As Integer = {38, 27, 43, 3, 9, 82, 10}
Dim sorted() As Integer = MergeSort(numbers)
For Each num As Integer In sorted
    Console.Write(num & " ")
Next
' Output: 3 9 10 27 38 43 82

Performance

Best case: O(n log n).
Worst case: O(n log n).
Average case: O(n log n).
Space complexity: O(n) – it requires additional memory to hold the sub-lists during merging.

Why O(n log n)?

The list is divided in half at each level of recursion, giving log n levels. At each level, all n elements must be compared during the merge step. Therefore, the total work is n multiplied by log n.

Comparison of Sorting Algorithms

Feature	Bubble Sort	Insertion Sort	Merge Sort
Best case	O(n)	O(n)	O(n log n)
Worst case	O(n^2)	O(n^2)	O(n log n)
Average case	O(n^2)	O(n^2)	O(n log n)
Space complexity	O(1) – in place	O(1) – in place	O(n) – extra memory needed
Stable?	Yes	Yes	Yes
Adaptive?	Yes (with early exit)	Yes (fast on nearly sorted data)	No
Suitable for	Small or nearly sorted data	Small or nearly sorted data	Large data sets
Approach	Comparison and swap	Shift and insert	Divide and conquer

A stable sorting algorithm preserves the relative order of elements with equal keys. All three algorithms covered here (bubble, insertion, merge) are stable.

In the exam you may be asked to recommend a sorting algorithm for a specific scenario. For large data sets, merge sort is preferred because its worst case is O(n log n). For small or nearly sorted data, insertion sort is often the best practical choice. Bubble sort is mainly used for educational purposes due to its simplicity.

Big O Notation

Big O notation is a mathematical notation used to describe the upper bound of an algorithm’s time complexity or space complexity. It describes how the resource usage of an algorithm grows relative to the input size n, focusing on the worst-case scenario.

Big O ignores constants and lower-order terms. For example, an algorithm that takes 3n^2 + 5n + 2 steps is described as O(n^2), because the n^2 term dominates as n grows large.

Common Big O Complexities

Notation	Name	Description	Example
O(1)	Constant	The time taken does not depend on the input size.	Accessing an array element by index; checking if a number is odd/even.
O(log n)	Logarithmic	The time grows logarithmically. The algorithm halves the problem size at each step.	Binary search.
O(n)	Linear	The time grows in direct proportion to the input size.	Linear search; summing all elements in a list.
O(n log n)	Linearithmic	A combination of linear and logarithmic growth. Common in efficient sorting.	Merge sort; the best comparison-based sorts.
O(n^2)	Quadratic	The time grows as the square of the input size. Typically caused by nested loops.	Bubble sort; insertion sort (worst case).
O(2^n)	Exponential	The time doubles with each additional input element. Very slow for even moderate n.	Recursive Fibonacci (naive); brute-force subset enumeration.
O(n!)	Factorial	The time grows factorially. Extremely slow; only practical for very small n.	Travelling Salesman Problem (brute-force); generating all permutations.

Growth Rate Comparison

For an input of size n = 1,000:

Complexity	Approximate Operations
O(1)	1
O(log n)	10
O(n)	1,000
O(n log n)	10,000
O(n^2)	1,000,000
O(2^n)	Astronomically large (impractical)

Time Complexity vs Space Complexity

Time complexity measures how the running time of an algorithm grows with input size.
Space complexity measures how the memory usage of an algorithm grows with input size.

For example, merge sort has time complexity O(n log n) but space complexity O(n) because it requires additional arrays during merging. Bubble sort has time complexity O(n^2) but space complexity O(1) because it sorts in place.

You must be able to identify the Big O complexity of the standard algorithms: linear search is O(n), binary search is O(log n), bubble sort is O(n^2), insertion sort is O(n^2), and merge sort is O(n log n). You should also be able to explain why each algorithm has its particular complexity.

Recursion vs Iteration

Recursion is a technique where a function calls itself to solve a smaller instance of the same problem. Each recursive call works towards a base case that stops the recursion. Iteration uses loops (for, while) to repeat a block of code.

Structure of a Recursive Function

Every recursive function must have:

A base case – a condition that stops the recursion and returns a value directly.
A recursive case – the function calls itself with a modified argument that moves towards the base case.

Example: Factorial

The factorial of n (written n!) is defined as:

0! = 1 (base case)
n! = n * (n-1)! (recursive case)

Recursive approach:

def factorial_recursive(n):
    if n == 0:        # Base case
        return 1
    else:             # Recursive case
        return n * factorial_recursive(n - 1)

print(factorial_recursive(5))  # Output: 120

Function FactorialRecursive(n As Integer) As Integer
    If n = 0 Then        ' Base case
        Return 1
    Else                 ' Recursive case
        Return n * FactorialRecursive(n - 1)
    End If
End Function

Console.WriteLine(FactorialRecursive(5))  ' Output: 120

Iterative approach:

def factorial_iterative(n):
    result = 1
    for i in range(1, n + 1):
        result = result * i
    return result

print(factorial_iterative(5))  # Output: 120

Function FactorialIterative(n As Integer) As Integer
    Dim result As Integer = 1
    For i As Integer = 1 To n
        result = result * i
    Next
    Return result
End Function

Console.WriteLine(FactorialIterative(5))  ' Output: 120

Example: Fibonacci Sequence

# Recursive Fibonacci
def fibonacci_recursive(n):
    if n <= 1:          # Base case
        return n
    else:               # Recursive case
        return fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2)

# Iterative Fibonacci
def fibonacci_iterative(n):
    if n <= 1:
        return n
    a, b = 0, 1
    for i in range(2, n + 1):
        a, b = b, a + b
    return b

print(fibonacci_recursive(7))   # Output: 13
print(fibonacci_iterative(7))   # Output: 13

' Recursive Fibonacci
Function FibonacciRecursive(n As Integer) As Integer
    If n <= 1 Then          ' Base case
        Return n
    Else                    ' Recursive case
        Return FibonacciRecursive(n - 1) + FibonacciRecursive(n - 2)
    End If
End Function

' Iterative Fibonacci
Function FibonacciIterative(n As Integer) As Integer
    If n <= 1 Then
        Return n
    End If
    Dim a As Integer = 0
    Dim b As Integer = 1
    For i As Integer = 2 To n
        Dim temp As Integer = b
        b = a + b
        a = temp
    Next
    Return b
End Function

Console.WriteLine(FibonacciRecursive(7))   ' Output: 13
Console.WriteLine(FibonacciIterative(7))   ' Output: 13

Comparison Table

Feature	Recursion	Iteration
Uses	Function calls itself	Loop constructs (for, while)
Termination	Base case	Loop condition becomes false
Memory usage	Higher (each call adds to the call stack)	Lower (no call stack overhead)
Risk	Stack overflow if base case not reached or recursion too deep	Infinite loop if condition never met
Readability	Often more elegant and closer to mathematical definitions	Can be more straightforward for simple repetition
Speed	Often slower due to function call overhead	Usually faster
Best for	Problems with recursive structure (trees, divide and conquer)	Simple repetition and counting

A common exam question asks you to explain the advantages and disadvantages of recursion versus iteration. Always mention the call stack: each recursive call adds a new frame to the call stack, which uses memory. If the recursion is too deep, this can cause a stack overflow error.

Trace Tables

A trace table is a technique used to dry-run an algorithm by recording the values of variables at each step of execution. It is used to check the correctness of an algorithm and to understand its behaviour.

How to Create a Trace Table

Create a column for each variable in the algorithm.
Add a column for any output.
Add a column for the condition being tested (optional but helpful).
Work through the algorithm step by step, recording the value of each variable after every instruction that changes it.

Example: Tracing a Linear Search

Algorithm to search for the value 5 in the list [3, 8, 5, 1]:

data = [3, 8, 5, 1]
target = 5
FOR i FROM 0 TO 3
    IF data[i] == target THEN
        OUTPUT i
    END IF
END FOR

Trace table:

Step	i	data[i]	data[i] == target?	Output
1	0	3	No	–
2	1	8	No	–
3	2	5	Yes	2
4	3	1	No	–

Example: Tracing a Bubble Sort Pass

First pass of bubble sort on [5, 3, 8, 1]:

Step	j	data[j]	data[j+1]	Swap?	List after step
1	0	5	3	Yes	[3, 5, 8, 1]
2	1	5	8	No	[3, 5, 8, 1]
3	2	8	1	Yes	[3, 5, 1, 8]

After the first pass, the largest value (8) has “bubbled” to the end.

Example: Tracing Recursion

Tracing factorial_recursive(4):

Call	n	n == 0?	Returns
factorial_recursive(4)	4	No	4 * factorial_recursive(3)
factorial_recursive(3)	3	No	3 * factorial_recursive(2)
factorial_recursive(2)	2	No	2 * factorial_recursive(1)
factorial_recursive(1)	1	No	1 * factorial_recursive(0)
factorial_recursive(0)	0	Yes	1

Now the returns unwind: 1 * 1 = 1, then 2 * 1 = 2, then 3 * 2 = 6, then 4 * 6 = 24.

Trace tables are a very common exam question. Practise by tracing through algorithms by hand. Always work methodically – update one variable at a time and do not skip steps. For recursive algorithms, show the chain of calls and the order in which values are returned.

Data Compression

Compression reduces file size, making data faster to transmit and requiring less storage. At AS level you need to understand both categories and specific algorithms.

Lossless vs Lossy Compression

Feature	Lossless	Lossy
Data loss	None — original data perfectly reconstructed	Some data permanently removed
File size	Larger than lossy	Smaller than lossless
Use cases	Text, code, spreadsheets, medical images, archives	Photos, music, video, streaming
File formats	ZIP, PNG, GIF, FLAC, 7Z	JPEG, MP3, MP4/MPEG, AAC
How it works	Identifies and encodes patterns/redundancy	Removes data humans are unlikely to notice

Lossless compression reduces file size while allowing the original data to be perfectly reconstructed. Lossy compression achieves greater size reduction by permanently removing some data, meaning the original cannot be fully restored.

Run-Length Encoding (RLE)

RLE is a lossless algorithm that replaces consecutive runs of the same value with a count and the value.

Worked Example

Original data: AAAAAABBBBCCDDDDDDD

Run	Count	Value	Encoded
AAAAAA	6	A	6A
BBBB	4	B	4B
CC	2	C	2C
DDDDDDD	7	D	7D

Encoded: 6A4B2C7D

Space calculation:

Original: 19 characters (19 bytes if 1 byte per character)
Encoded: 8 characters (8 bytes)
Saving: (19 - 8) / 19 × 100 = 57.9%

When RLE Works Well vs Poorly

Works well: Data with long runs of repeated values — simple graphics, icons, diagrams, fax documents
Works poorly: Data with little repetition — photographs, random data. The encoded version may be larger than the original (e.g. ABCDEF → 1A1B1C1D1E1F is longer)

# Python — Simple RLE encoding
def rle_encode(data):
    encoded = ""
    i = 0
    while i < len(data):
        count = 1
        while i + count < len(data) and data[i + count] == data[i]:
            count += 1
        encoded += str(count) + data[i]
        i += count
    return encoded

def rle_decode(encoded):
    decoded = ""
    i = 0
    while i < len(encoded):
        count = ""
        while encoded[i].isdigit():
            count += encoded[i]
            i += 1
        decoded += encoded[i] * int(count)
        i += 1
    return decoded

original = "AAAAAABBBBCCDDDDDDD"
compressed = rle_encode(original)
print(f"Original:    {original} ({len(original)} chars)")
print(f"Compressed:  {compressed} ({len(compressed)} chars)")
print(f"Decompressed: {rle_decode(compressed)}")

' VB.NET — Simple RLE encoding
Function RLEEncode(data As String) As String
    Dim encoded As String = ""
    Dim i As Integer = 0
    While i < data.Length
        Dim count As Integer = 1
        While i + count < data.Length AndAlso data(i + count) = data(i)
            count += 1
        End While
        encoded &= count.ToString() & data(i)
        i += count
    End While
    Return encoded
End Function

Dim original As String = "AAAAAABBBBCCDDDDDDD"
Dim compressed As String = RLEEncode(original)
Console.WriteLine("Original:   " & original & " (" & original.Length & " chars)")
Console.WriteLine("Compressed: " & compressed & " (" & compressed.Length & " chars)")

Dictionary-Based (LZ) Compression

Dictionary-based compression (named after Lempel and Ziv — LZ77, LZ78, LZW) builds a dictionary of recurring patterns in the data and replaces them with shorter codes.

How It Works

The algorithm scans the data and identifies repeated sequences
Each unique sequence is added to a dictionary with a short index code
Subsequent occurrences of the sequence are replaced with the dictionary code
The dictionary is stored with the compressed data so it can be decoded

Worked Example

Original text: THE CAT SAT ON THE MAT

Dictionary Index	Entry
0	THE
1	CAT
2	SAT
3	ON
4	MAT

Encoded: 0 1 2 3 0 4 (plus spaces/structure)

The word “THE” appears twice but is only stored once in the dictionary. The second occurrence is replaced by its index code (0), saving space.

Key characteristics:

Lossless — original data is perfectly reconstructed using the dictionary
More effective than RLE for text and general-purpose compression
Used in ZIP, GIF, PNG, and PDF formats
Effectiveness depends on how much repetition exists in the data

Huffman Encoding

Huffman encoding is a lossless algorithm that assigns variable-length binary codes to characters based on their frequency. More frequent characters get shorter codes, less frequent characters get longer codes.

How It Works

Count the frequency of each character in the data
Build a binary tree by repeatedly combining the two lowest-frequency characters
Assign binary codes by traversing the tree (left = 0, right = 1)
Replace each character with its Huffman code

Worked Example

Text: AABBBCCCCDDDD (14 characters)

Step 1 — Frequency count:

Character	Frequency
A	2
B	3
C	4
D	5

Step 2 — Build the tree (combine two smallest repeatedly):

Combine A(2) + B(3) = AB(5)
Combine C(4) + AB(5) = CAB(9)
Combine D(5) + CAB(9) = Root(14)

Step 3 — Assign codes (traversing the tree):

Character	Code	Code Length
D	0	1 bit
C	10	2 bits
A	111	3 bits
B	110	3 bits

Step 4 — Encode: AABBBCCCCDDDD → 111 111 110 110 110 10 10 10 10 0 0 0 0 0

Space calculation:

Using fixed-length codes (e.g. ASCII 7-bit): 14 × 7 = 98 bits
Using Huffman codes: (2×3) + (3×3) + (4×2) + (5×1) = 6 + 9 + 8 + 5 = 28 bits
Saving: (98 - 28) / 98 × 100 = 71.4%

(The dictionary must also be stored, adding some overhead, but for large files the savings are substantial.)

In exam questions on Huffman coding, you may be asked to: (1) build a frequency table, (2) construct the Huffman tree, (3) determine the codes for each character, or (4) calculate the space saving compared to fixed-length encoding. Always show your working clearly. Remember that more frequent characters get shorter codes — this is the key insight that makes Huffman efficient.

Compression File Formats Summary

Format	Type	Algorithm	Typical Use
ZIP	Lossless	LZ + Huffman (DEFLATE)	General file archiving
PNG	Lossless	LZ + filtering	Web graphics, screenshots, diagrams
GIF	Lossless	LZW	Simple animations, icons
FLAC	Lossless	Linear prediction + Rice coding	High-quality audio archiving
7Z	Lossless	LZMA	High-ratio file archiving
JPEG	Lossy	DCT + quantisation + Huffman	Photographs
MP3	Lossy	Psychoacoustic model + Huffman	Music, podcasts
MP4/MPEG	Lossy	Motion estimation + DCT	Video
AAC	Lossy	Improved psychoacoustic model	Music streaming (Apple, YouTube)

When recommending a compression method, consider the type of data and whether quality loss is acceptable. Text files and program code must use lossless compression (any data loss would corrupt them). Photos and music can use lossy compression because small quality reductions are often imperceptible.

← Previous Next →