Program Construction

A2 Level — Unit 3: Programming & System Development

Translation Programs

A translator converts source code written in one language into another form — typically machine code that can be executed by the CPU. There are three main types of translator.

A translation program takes source code as input and produces executable code (or an intermediate form) as output. The three types are compilers, interpreters, and assemblers.

Compilers

A compiler translates an entire high-level language source program into machine code (object code) before execution. The resulting executable file can be run independently without the compiler.

Stages of Compilation

Stage	Purpose
Lexical analysis	Reads source code character by character. Removes whitespace and comments. Identifies tokens (keywords, identifiers, operators, literals). Produces a token stream and populates the symbol table.
Syntax analysis (parsing)	Checks the token stream against the language’s grammar rules. Builds a parse tree (abstract syntax tree). Reports syntax errors with line numbers.
Semantic analysis	Checks for meaning errors: type mismatches, undeclared variables, incorrect function arguments. Annotates the parse tree with type information.
Code generation	Converts the parse tree into machine code or intermediate code. Allocates registers and memory addresses.
Code optimisation	Improves the generated code for speed or memory usage without changing behaviour. Examples: removing dead code, constant folding, loop unrolling.

Lexical Analysis in Detail

The lexer (scanner) processes the source code:

Reads the source file character by character
Groups characters into lexemes (meaningful sequences)
Classifies each lexeme as a token (keyword, identifier, literal, operator, separator)
Adds identifiers to the symbol table (stores name, type, scope, memory address)
Removes comments and whitespace
Replaces each token with a numeric code for efficiency

Example: The statement total = price * 1.2 produces tokens:

Lexeme	Token Type
`total`	Identifier
`=`	Assignment operator
`price`	Identifier
`*`	Arithmetic operator
`1.2`	Numeric literal

Syntax Analysis in Detail

The parser checks the token stream against grammar rules (often defined in BNF):

Takes the token stream from the lexer
Applies grammar rules to build a parse tree
Detects syntax errors (missing brackets, invalid statements)
Each node in the tree represents a grammatical construct

For total = price * 1.2, the parse tree would show:

graph TD
    A[Assignment =] --> B[total]
    A --> C["Multiply *"]
    C --> D[price]
    C --> E[1.2]

Advantages and Disadvantages of Compilers

Advantage	Disadvantage
Compiled code runs faster — no translation overhead at runtime	Slower development — entire program must be recompiled after changes
Object code can be distributed without the source code (protects IP)	Platform-specific — compiled code only runs on the target architecture
Errors are reported all at once before execution	Error messages can be harder to trace back to the original source
Optimisation produces efficient machine code	Compilation can be time-consuming for large programs

Examples: C, C++, Rust, Go, Pascal

Interpreters

An interpreter translates and executes source code one line (or statement) at a time. No separate object code file is produced.

How an Interpreter Works

Reads the next statement from the source code
Analyses it (lexical and syntax analysis)
Translates it into an intermediate form or machine code
Executes it immediately
Moves to the next statement
If an error is found, execution stops at that point with an error message

Advantages and Disadvantages of Interpreters

Advantage	Disadvantage
Faster development — changes can be tested immediately without recompilation	Slower execution — translation happens every time the program runs
Easier debugging — stops at the exact line where an error occurs	Source code must be present to run the program (no IP protection)
Platform independent — same source can run on any platform with the interpreter	A loop executed 1000 times is translated 1000 times
Good for prototyping and scripting	Requires the interpreter to be installed on the target machine

Examples: Python (CPython interpreter), JavaScript (in browsers), Ruby, PHP

Some languages use both compilation and interpretation. Java compiles to bytecode (an intermediate form), which is then interpreted or JIT-compiled by the Java Virtual Machine (JVM). Python compiles to bytecode (.pyc files) which is executed by the Python interpreter. This hybrid approach combines portability with improved performance.

Assemblers

An assembler translates assembly language (low-level, human-readable mnemonics) into machine code on a one-to-one basis — each assembly instruction maps to exactly one machine code instruction.

Two-Pass Assembly

Most assemblers make two passes over the source code:

First pass:

Reads each instruction
Assigns memory addresses to each instruction (using a location counter)
Builds the symbol table — records labels and their memory addresses
Identifies any forward references (labels used before they are defined)

Second pass:

Reads each instruction again
Converts mnemonics to opcodes (binary machine code)
Resolves forward references using the symbol table
Replaces labels with their actual memory addresses
Produces the final machine code output

Why Two Passes?

A single pass cannot resolve forward references — instructions that jump to labels defined later in the program. The first pass discovers all labels; the second pass can then replace them with addresses.

Example:

        BRZ end      ; Jump to 'end' — but where is it?
        ADD total
        ...
end:    HLT          ; First pass records: end = address 15

On the second pass, BRZ end is translated with the now-known address of end.

Advantages and Disadvantages of Assemblers

Advantage	Disadvantage
Produces very efficient machine code	Platform-specific — tied to one processor architecture
Programmer has direct hardware control	Difficult to write and maintain — very low level
One-to-one mapping makes translation fast and simple	Programs are longer and harder to read than HLL code
Useful for embedded systems and device drivers	No portability — must be rewritten for different CPUs

Comparing Translator Types

Feature	Compiler	Interpreter	Assembler
Input	High-level source code	High-level source code	Assembly language
Output	Machine code (object file)	No output file — executes directly	Machine code
Translation	Whole program at once	One statement at a time	One-to-one (mnemonic → opcode)
Speed of execution	Fast (pre-compiled)	Slow (translated each run)	Fast (native machine code)
Error reporting	All errors after compilation	Stops at first error	All errors after assembly
Source code needed at runtime?	No	Yes	No
Portability	Low (platform-specific)	High (any platform with interpreter)	Very low (CPU-specific)
Passes	Multiple (lexical, syntax, semantic, code gen)	Single pass per statement	Two passes

Linkers and Loaders

After compilation, the object code may need to be combined with other modules before it can be executed.

Linker

A linker combines multiple object code modules (including library routines) into a single executable file.

Static linking — library code is copied into the executable at link time. Produces a larger file but runs independently.
Dynamic linking — the executable contains references to shared libraries (DLLs on Windows, .so on Linux) that are loaded at runtime. Produces smaller files but requires the libraries to be present.

Loader

A loader places the executable program into main memory and prepares it for execution.

Allocates memory for the program
Loads the machine code from disk into RAM
Sets up the program counter to the entry point
Resolves any remaining addresses (relocation)
Hands control to the program

Translation Errors vs Execution Errors

Translation Errors (Compile-Time Errors)

Detected during translation before the program runs:

Error Type	Example
Syntax error	Missing bracket: `print("hello"`
Type error	Assigning a string to an integer variable (in statically typed languages)
Undeclared variable	Using a variable name that hasn’t been defined
Missing semicolon	`int x = 5` (in languages that require `;`)

Execution Errors (Runtime Errors)

Occur during program execution — the code is syntactically valid but encounters a problem at runtime:

Error Type	Example
Division by zero	`result = 10 / 0`
Stack overflow	Infinite recursion with no base case
Array index out of bounds	Accessing `arr[100]` when the array has 50 elements
Null pointer / None reference	Trying to access an attribute of a null object
File not found	Attempting to open a file that doesn’t exist
Out of memory	Allocating more memory than is available

Logical Errors

A third category — logical errors — produce incorrect results but no error messages. The program runs without crashing but produces wrong output.

Error Type	Example
Wrong operator	Using `+` instead of `-`
Off-by-one error	Loop runs one too many or one too few times
Incorrect condition	Using `>` instead of `>=`
Wrong variable	Comparing `x > x` instead of `x > y`

Exam questions often ask you to classify errors. Remember: syntax errors are caught at translation time, runtime errors crash the program during execution, and logical errors produce wrong results without crashing. A compiler catches syntax and type errors; an interpreter catches them as it reaches each line. Neither can detect logical errors — only thorough testing reveals those.