Hardware & Communication

A2 Level — Unit 4: Architecture, Data, Communication & Applications

Contemporary Computer Architecture

Modern processors are built upon foundational architectural models, each with distinct design philosophies that affect performance, power consumption, and suitability for different tasks.

Von Neumann vs Harvard Architecture

Von Neumann Architecture stores both program instructions and data in the same memory space, accessed via a single shared bus. This creates the Von Neumann bottleneck where the CPU must wait for either data or instructions because only one can be fetched at a time.

Harvard Architecture uses physically separate memory stores and buses for instructions and data, allowing both to be fetched simultaneously. This removes the Von Neumann bottleneck and increases throughput.

Feature	Von Neumann	Harvard
Memory	Single shared memory for data and instructions	Separate memory for data and instructions
Buses	Single shared bus	Separate data bus and instruction bus
Fetch speed	Slower (sequential access)	Faster (simultaneous access)
Flexibility	Programs can modify themselves (self-modifying code)	Instructions and data are strictly separated
Complexity	Simpler, cheaper to manufacture	More complex, more expensive
Common use	General-purpose PCs, laptops	DSPs, microcontrollers, embedded systems

Most modern desktop processors use a modified Harvard architecture: they maintain separate Level 1 caches for instructions and data (Harvard-style) but share a unified main memory (Von Neumann-style). This combines the speed benefits of Harvard with the flexibility of Von Neumann.

CISC vs RISC

CISC (Complex Instruction Set Computer) provides a large number of complex instructions, some of which can perform multi-step operations in a single instruction. Each instruction may take multiple clock cycles to execute.

RISC (Reduced Instruction Set Computer) uses a small, highly optimised set of simple instructions, each typically executing in a single clock cycle. Complex operations are built by combining multiple simple instructions.

Feature	CISC	RISC
Instruction set	Large, complex (hundreds of instructions)	Small, simple (typically under 100)
Clock cycles per instruction	Variable (1 to many)	Usually 1 cycle per instruction
Instruction length	Variable length	Fixed length
Addressing modes	Many	Few
Hardware complexity	Complex decoding hardware	Simpler hardware, relies on compiler
Pipelining	Harder to implement efficiently	Easier to pipeline due to fixed-length instructions
Power consumption	Generally higher	Generally lower
Example processors	Intel x86, AMD	ARM, MIPS
Typical use	Desktop PCs, servers	Mobile devices, tablets, embedded systems

In exam questions comparing CISC and RISC, always link the architectural features to practical consequences. For example: RISC’s fixed-length instructions make pipelining more efficient, which leads to higher throughput. CISC’s complex instructions mean fewer lines of assembly code are needed, reducing program memory requirements.

Multi-Core Processors

A multi-core processor contains two or more independent processing units (cores) on a single chip. Each core can fetch, decode, and execute instructions independently, allowing genuine parallel execution of multiple threads or processes.

Key characteristics of multi-core processors:

Each core has its own Level 1 cache, but cores typically share Level 2/Level 3 cache and main memory
The operating system’s scheduler assigns threads and processes to available cores
Doubling the number of cores does not double the performance, because not all tasks can be parallelised and there is overhead in coordinating between cores
Software must be written to take advantage of multiple cores (multi-threaded programming)

Common configurations include dual-core (2), quad-core (4), hexa-core (6), octa-core (8), and beyond.

GPU Computing

GPU (Graphics Processing Unit) computing uses the massively parallel architecture of a graphics card for general-purpose computation. GPUs contain thousands of small, simple cores optimised for performing the same operation on many data points simultaneously.

GPUs are designed for data parallelism: applying the same instruction to large datasets. This makes them ideal for:

Graphics rendering (their original purpose)
Machine learning and neural network training
Scientific simulations and modelling
Cryptocurrency mining
Image and video processing

The programming model used for GPU computing is called GPGPU (General-Purpose computing on Graphics Processing Units), often implemented through frameworks such as CUDA (NVIDIA) or OpenCL.

Feature	CPU	GPU
Number of cores	Few (2-64 typically)	Thousands (hundreds to tens of thousands)
Core complexity	Complex, powerful cores	Simple, lightweight cores
Best for	Sequential, branching logic	Repetitive parallel operations on large data
Clock speed per core	Higher	Lower
Memory access	Large caches, optimised for latency	High bandwidth, optimised for throughput

Co-processors

A co-processor is a supplementary processor designed to handle specific types of computation, offloading work from the main CPU to improve overall system performance.

Examples of co-processors include:

Floating Point Unit (FPU): handles floating point arithmetic (historically separate, now integrated into modern CPUs)
GPU: handles graphics rendering and parallel computation
Digital Signal Processor (DSP): optimised for processing audio, video, and sensor signals in real time
Neural Processing Unit (NPU): accelerates machine learning inference tasks
Cryptographic co-processor: handles encryption and decryption operations

Co-processors improve performance by allowing the CPU to delegate specialised tasks while continuing with other work.

Parallel Processing

Parallel processing is the simultaneous execution of multiple instructions or tasks by dividing a problem into parts that can be solved concurrently across multiple processors or cores.

Flynn’s Taxonomy

Flynn’s taxonomy classifies computer architectures based on the number of concurrent instruction streams and data streams they can handle.

Classification	Instruction Streams	Data Streams	Description	Example
SISD	Single	Single	One instruction operates on one data item at a time	Traditional single-core Von Neumann processor
SIMD	Single	Multiple	One instruction operates on multiple data items simultaneously	GPU cores, vector processors, multimedia extensions (SSE/AVX)
MISD	Multiple	Single	Multiple instructions operate on the same data item	Rare; used in fault-tolerant systems (e.g. space shuttle flight computers running different algorithms on the same input)
MIMD	Multiple	Multiple	Multiple instructions operate on multiple data items independently	Multi-core processors, networked computer clusters

SIMD is the most commonly examined parallel architecture. A good example is applying a brightness filter to an image: the same “add 10 to pixel value” instruction is applied simultaneously to thousands of different pixels (data items). GPUs are essentially massively parallel SIMD processors.

MIMD can be further divided into:

Shared memory MIMD: all processors access the same memory space (e.g. multi-core processors on one motherboard)
Distributed memory MIMD: each processor has its own local memory and they communicate via message passing (e.g. computer clusters connected by a network)

Limiting Factors to Parallelisation

Not all programs can benefit equally from parallel processing. Several factors limit the achievable speedup.

Amdahl’s Law

Amdahl’s Law states that the maximum speedup of a program through parallelisation is limited by the proportion of the program that must remain sequential. No matter how many processors are added, the sequential portion creates an upper bound on performance improvement.

The formula for Amdahl’s Law is:

Speedup = 1 / ((1 - P) + P/N)

Where:

P = the proportion (fraction) of the program that can be parallelised (0 to 1)
N = the number of processors
(1 - P) = the proportion that must remain sequential

Worked Example 1

A program takes 100 seconds to run on a single processor. 80% of the code can be parallelised. What is the speedup with 4 processors?

P = 0.8
N = 4
Speedup = 1 / ((1 - 0.8) + 0.8/4)
Speedup = 1 / (0.2 + 0.2)
Speedup = 1 / 0.4
Speedup = 2.5x
New runtime = 100 / 2.5 = 40 seconds

Worked Example 2

A program takes 200 seconds on one processor. 90% can be parallelised. Calculate the speedup and new runtime with 8 processors.

P = 0.9
N = 8
Speedup = 1 / ((1 - 0.9) + 0.9/8)
Speedup = 1 / (0.1 + 0.1125)
Speedup = 1 / 0.2125
Speedup = 4.71x (to 2 d.p.)
New runtime = 200 / 4.71 = 42.46 seconds

Worked Example 3

What is the theoretical maximum speedup if 75% of a program can be parallelised, using an infinite number of processors?

P = 0.75
As N approaches infinity, P/N approaches 0
Speedup = 1 / ((1 - 0.75) + 0)
Speedup = 1 / 0.25
Maximum speedup = 4x

This demonstrates that even with unlimited processors, the sequential 25% limits the speedup to 4x.

In exam calculations, always show your working clearly. State the values of P and N, substitute into the formula, and simplify step by step. For the “theoretical maximum” question, set N to infinity so that P/N becomes 0. Remember: speedup is a multiplier (e.g. 2.5x means 2.5 times faster), not a time.

Data Dependencies

A data dependency occurs when an instruction requires the result of a previous instruction before it can execute. Data dependencies force sequential execution and prevent parallelisation of those instructions.

For example:

A = B + C
D = A * 2

The second instruction depends on the result of the first (it needs the value of A), so they cannot run in parallel.

Types of data dependency:

Read After Write (RAW): an instruction must read a value that a previous instruction has written (most common)
Write After Read (WAR): an instruction must write to a location that a previous instruction has not yet finished reading
Write After Write (WAW): two instructions write to the same location and the order matters

Communication Overhead

When multiple processors work together, they must communicate to share data and coordinate. This communication takes time and introduces overhead that reduces the benefit of parallelisation. As more processors are added:

More data must be transferred between processors
Network latency increases if processors are distributed
Bus contention occurs when multiple processors compete for shared resources
The overhead can eventually outweigh the benefit of adding more processors

Synchronisation

Synchronisation is the coordination of parallel processes to ensure they execute in the correct order and produce consistent results, particularly when accessing shared resources.

Synchronisation issues include:

Race conditions: when the outcome depends on the unpredictable timing of processes
Deadlock: when two or more processes are each waiting for the other to release a resource
Barriers: points where all processes must wait until every process has reached that point before any can proceed

Synchronisation mechanisms add waiting time and reduce the effective parallelism.

Assembly Language Programming

Assembly language is a low-level programming language that uses mnemonics to represent machine code instructions. Each assembly instruction corresponds to a single machine code operation. An assembler translates assembly language into machine code.

Registers

Registers are small, fast storage locations within the CPU. The basic assembly language model used at A2 level typically includes:

Accumulator (ACC): the main working register where arithmetic and logic results are stored
Program Counter (PC): holds the address of the next instruction to be executed
Memory Address Register (MAR): holds the address of the memory location being accessed
Memory Data Register (MDR): holds the data being transferred to or from memory
Current Instruction Register (CIR): holds the instruction currently being decoded and executed

Instruction Set

The A2 level assembly language uses the following instruction set:

Mnemonic	Operand	Description
`INP`	None	Input: reads a value from the user and stores it in the accumulator
`OUT`	None	Output: displays the current value in the accumulator
`MOV`	Register	Moves data from one register to another
`ADD`	Value/Address	Adds the value (or value at address) to the accumulator
`SUB`	Value/Address	Subtracts the value (or value at address) from the accumulator
`CMP`	Value/Address	Compares the accumulator with a value (sets flags for branching)
`BRZ`	Label	Branch if Zero: jumps to label if the result of the last comparison was zero
`BRP`	Label	Branch if Positive: jumps to label if the result was positive (or zero)
`BRA`	Label	Branch Always: unconditional jump to label
`HLT`	None	Halt: stops program execution
`DAT`	Value	Data: reserves a memory location and optionally initialises it with a value

Writing and Tracing Assembly Programs

Example 1: Add Two Numbers

This program inputs two numbers, adds them, and outputs the result.

        INP         // Input first number into ACC
        MOV R1      // Store first number in register R1
        INP         // Input second number into ACC
        ADD R1      // Add R1 to ACC
        OUT         // Output the result
        HLT         // Stop

Trace (inputs: 5 and 3):

Step	Instruction	ACC	R1	Output
1	INP	5	-
2	MOV R1	5	5
3	INP	3	5
4	ADD R1	8	5
5	OUT	8	5	8
6	HLT	8	5

Example 2: Countdown from Input to Zero

This program inputs a number and counts down to zero, outputting each value.

        INP         // Input the starting number
LOOP    OUT         // Output current value
        SUB ONE     // Subtract 1
        BRZ DONE    // If zero, branch to DONE
        BRA LOOP    // Otherwise, loop back
DONE    OUT         // Output the final zero
        HLT         // Stop
ONE     DAT 1       // Constant: 1

Trace (input: 3):

Step	Instruction	ACC	Output	Branch Taken?
1	INP	3
2	OUT	3	3
3	SUB ONE	2
4	BRZ DONE	2		No (ACC != 0)
5	BRA LOOP	2		Yes
6	OUT	2	2
7	SUB ONE	1
8	BRZ DONE	1		No (ACC != 0)
9	BRA LOOP	1		Yes
10	OUT	1	1
11	SUB ONE	0
12	BRZ DONE	0		Yes (ACC == 0)
13	OUT	0	0
14	HLT	0

Example 3: Find the Larger of Two Numbers

        INP         // Input first number
        MOV R1      // Store in R1
        INP         // Input second number
        MOV R2      // Store in R2
        SUB R1      // ACC = second - first
        BRP SECBIG  // If positive, second is bigger
        MOV ACC R1  // Otherwise load first into ACC
        BRA FINISH
SECBIG  MOV ACC R2  // Load second into ACC
FINISH  OUT         // Output the larger number
        HLT

When tracing assembly programs in the exam, draw a clear table showing each instruction executed, the state of the accumulator and any registers, and any output produced. Pay careful attention to branch instructions: check whether the condition is met and clearly show which instruction executes next. Use DAT at the end of your programs to define constants and variables.

Voice Input Systems

Voice input systems convert spoken language into a form that a computer can process. There are three distinct types, each suited to different purposes.

Command and Control Systems

Command and control voice systems recognise a limited set of short, predefined spoken commands to control a device or application. They use a restricted vocabulary matched against stored templates.

Characteristics:

Small, fixed vocabulary (typically tens to hundreds of words)
Recognises short, isolated commands (e.g. “call home”, “play music”, “turn left”)
Uses pattern matching against pre-stored voice templates
Fast response time due to limited search space
Works well in noisy environments because commands are distinct and short
Does not need to understand continuous speech or natural language

Examples: smart home devices (“Hey Siri, turn off the lights”), satellite navigation voice commands, hands-free phone dialling, industrial machinery control.

Vocabulary Dictation Systems

Vocabulary dictation systems convert continuous, natural speech into text. They must handle a large vocabulary, varied sentence structures, and connected words spoken at normal speed.

Characteristics:

Large vocabulary (tens of thousands to hundreds of thousands of words)
Handles continuous speech (words spoken naturally without pauses between them)
Uses language models and context to disambiguate similar-sounding words (e.g. “there”, “their”, “they’re”)
Requires training to adapt to individual speakers for better accuracy
More computationally intensive than command and control
Accuracy improves with user-specific training and context awareness

Examples: dictation software for writing documents, live captioning, medical transcription, legal transcription.

Voice Print Recognition

Voice print recognition (speaker verification) is a biometric security system that identifies or verifies a person based on the unique physical characteristics of their voice, rather than understanding what they say.

Characteristics:

Analyses voice characteristics: pitch, tone, cadence, frequency patterns, vocal tract shape
Creates a voice print (a mathematical model of the speaker’s voice)
Used for authentication and identification, not for understanding speech content
Can be text-dependent (user speaks a specific passphrase) or text-independent (any speech can be analysed)
Vulnerable to spoofing (recordings, voice synthesis) so often combined with other security factors
Affected by illness, emotional state, or background noise

Examples: telephone banking authentication, secure facility access, forensic speaker identification.

Suitability of Each System

Situation	Best System	Reason
Controlling a smart home device	Command and control	Limited set of known commands; fast response needed
Dictating an essay	Vocabulary dictation	Continuous speech; large vocabulary; natural language
Unlocking a secure phone	Voice print recognition	Biometric verification of identity
Surgical theatre (hands-free operation)	Command and control	Short, precise commands in a controlled environment
Creating meeting minutes	Vocabulary dictation	Long-form continuous speech needs accurate transcription
Bank telephone authentication	Voice print recognition	Verifying the caller’s identity
In-car navigation commands	Command and control	Simple commands; driver must keep eyes on road
Accessibility for visually impaired users	Vocabulary dictation	Full text input via speech for extended interaction
Prison visitor verification	Voice print recognition	Biometric check to confirm identity against records

When asked to evaluate the suitability of a voice system for a given scenario, consider: (1) the size of vocabulary needed, (2) whether continuous speech or isolated commands are used, (3) whether the goal is to understand content or verify identity, (4) the environment (noisy or quiet), and (5) the response time requirements. Always justify your choice by linking features of the system to the requirements of the scenario.

Wireless Networking

Wi-Fi (IEEE 802.11 Standards)

Wi-Fi is a family of wireless networking protocols based on the IEEE 802.11 standards. Wi-Fi allows devices to connect to a local area network (LAN) wirelessly using radio waves.

Standard	Frequency Band	Maximum Theoretical Speed	Typical Range (Indoors)	Year
802.11a	5 GHz	54 Mbps	~35 m	1999
802.11b	2.4 GHz	11 Mbps	~35 m	1999
802.11g	2.4 GHz	54 Mbps	~38 m	2003
802.11n (Wi-Fi 4)	2.4 GHz / 5 GHz	600 Mbps	~70 m	2009
802.11ac (Wi-Fi 5)	5 GHz	6.9 Gbps	~35 m	2013
802.11ax (Wi-Fi 6)	2.4 GHz / 5 GHz / 6 GHz	9.6 Gbps	~30 m	2020

Key points about Wi-Fi:

2.4 GHz band: longer range, better wall penetration, but more congestion (shared with microwaves, Bluetooth, etc.) and fewer non-overlapping channels
5 GHz band: faster speeds, less congestion, but shorter range and poorer wall penetration
Wi-Fi uses CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance) to manage shared access to the wireless medium

Bluetooth

Bluetooth is a short-range wireless technology for exchanging data between devices over short distances using the 2.4 GHz ISM band. It is designed for low-power, low-cost personal area networks (PANs).

Feature	Detail
Range	Typically 10 m (Class 2), up to 100 m (Class 1)
Speed	Up to 3 Mbps (Bluetooth 3.0), 2 Mbps (Bluetooth 5.0 LE)
Power consumption	Low (especially Bluetooth Low Energy / BLE)
Connection type	Point-to-point or piconet (up to 8 devices)
Common uses	Wireless headphones, keyboards, mice, fitness trackers, file transfer between phones

NFC (Near Field Communication)

NFC (Near Field Communication) is a very short-range wireless technology (up to approximately 10 cm) based on RFID. It enables simple, touch-based communication between devices.

Feature	Detail
Range	Up to ~10 cm
Speed	424 Kbps
Power	Very low; passive NFC tags require no battery
Connection setup	Instantaneous (no pairing required)
Common uses	Contactless payment (Apple Pay, Google Pay), travel cards (Oyster), access control, pairing Bluetooth devices

NFC’s extremely short range is actually a security feature: an attacker would need to be within centimetres to intercept the signal.

Cellular Networks (4G / 5G)

Cellular networks provide wide-area wireless connectivity through a network of base stations (cell towers). Each base station covers a geographical “cell”, and devices are handed off between cells as they move.

Feature	4G (LTE)	5G
Maximum speed	~150 Mbps (typical), up to 1 Gbps	~1-10 Gbps (theoretical)
Latency	~30-50 ms	~1-10 ms
Frequency	700 MHz - 2.6 GHz	Sub-6 GHz and mmWave (24-100 GHz)
Range per cell	Large (several km)	Smaller cells needed for mmWave
Key applications	Mobile internet, video streaming	IoT, autonomous vehicles, remote surgery, AR/VR

5G uses higher frequencies (particularly mmWave) which provide greater bandwidth but have shorter range and poorer building penetration, requiring a denser network of smaller cells.

Hardware Required for Wireless Connection

To establish a wireless network, the following hardware is required:

Hardware Component	Purpose
Wireless Network Interface Card (NIC)	Installed in each device; contains a radio transceiver to send and receive wireless signals. Converts data between digital format and radio waves.
Wireless Access Point (WAP)	Acts as a central hub for the wireless network; bridges wireless devices to the wired network. Manages connections, authentication, and channel allocation.
Wireless Router	Combines a WAP, router, and often a modem. Routes traffic between the local network and the internet, assigns IP addresses via DHCP.
Antenna	Transmits and receives radio signals. Can be omnidirectional (all directions) or directional (focused beam for longer range). Built into NICs and access points, or external for extended range.
Repeater / Range Extender	Receives the wireless signal and retransmits it to extend coverage area. Useful for large buildings but can reduce throughput.

Comparison of Wireless Technologies

Feature	Wi-Fi	Bluetooth	NFC	Cellular (4G/5G)
Range	~30-70 m (indoors)	~10-100 m	~10 cm	Several km
Speed	Up to several Gbps	Up to 3 Mbps	424 Kbps	Up to 10 Gbps (5G)
Power usage	Medium-High	Low	Very low	High
Setup	Network name/password	Pairing process	Touch/tap	SIM card / eSIM
Best for	Local network, internet access	Peripheral devices, short-range data	Payments, access cards	Mobile internet, wide-area connectivity

When comparing wireless technologies in an exam, structure your answer around the key differentiators: range, speed, power consumption, security, and typical use cases. Always match the technology to the scenario. For example, NFC is ideal for contactless payments because the very short range provides inherent security, instant connection, and minimal power use. Wi-Fi would be inappropriate for a contactless payment system because it operates over a much larger range, creating security concerns.