Program Construction
A2 Level — Unit 3: Programming & System Development
Translation Programs
A translator converts source code written in one language into another form — typically machine code that can be executed by the CPU. There are three main types of translator.
A translation program takes source code as input and produces executable code (or an intermediate form) as output. The three types are compilers, interpreters, and assemblers.
Compilers
A compiler translates an entire high-level language source program into machine code (object code) before execution. The resulting executable file can be run independently without the compiler.
Stages of Compilation
| Stage | Purpose |
|---|---|
| Lexical analysis | Reads source code character by character. Removes whitespace and comments. Identifies tokens (keywords, identifiers, operators, literals). Produces a token stream and populates the symbol table. |
| Syntax analysis (parsing) | Checks the token stream against the language’s grammar rules. Builds a parse tree (abstract syntax tree). Reports syntax errors with line numbers. |
| Semantic analysis | Checks for meaning errors: type mismatches, undeclared variables, incorrect function arguments. Annotates the parse tree with type information. |
| Code generation | Converts the parse tree into machine code or intermediate code. Allocates registers and memory addresses. |
| Code optimisation | Improves the generated code for speed or memory usage without changing behaviour. Examples: removing dead code, constant folding, loop unrolling. |
Lexical Analysis in Detail
The lexer (scanner) processes the source code:
- Reads the source file character by character
- Groups characters into lexemes (meaningful sequences)
- Classifies each lexeme as a token (keyword, identifier, literal, operator, separator)
- Adds identifiers to the symbol table (stores name, type, scope, memory address)
- Removes comments and whitespace
- Replaces each token with a numeric code for efficiency
Example: The statement total = price * 1.2 produces tokens:
| Lexeme | Token Type |
|---|---|
total |
Identifier |
= |
Assignment operator |
price |
Identifier |
* |
Arithmetic operator |
1.2 |
Numeric literal |
Syntax Analysis in Detail
The parser checks the token stream against grammar rules (often defined in BNF):
- Takes the token stream from the lexer
- Applies grammar rules to build a parse tree
- Detects syntax errors (missing brackets, invalid statements)
- Each node in the tree represents a grammatical construct
For total = price * 1.2, the parse tree would show:
graph TD
A[Assignment =] --> B[total]
A --> C["Multiply *"]
C --> D[price]
C --> E[1.2]
Advantages and Disadvantages of Compilers
| Advantage | Disadvantage |
|---|---|
| Compiled code runs faster — no translation overhead at runtime | Slower development — entire program must be recompiled after changes |
| Object code can be distributed without the source code (protects IP) | Platform-specific — compiled code only runs on the target architecture |
| Errors are reported all at once before execution | Error messages can be harder to trace back to the original source |
| Optimisation produces efficient machine code | Compilation can be time-consuming for large programs |
Examples: C, C++, Rust, Go, Pascal
Interpreters
An interpreter translates and executes source code one line (or statement) at a time. No separate object code file is produced.
How an Interpreter Works
- Reads the next statement from the source code
- Analyses it (lexical and syntax analysis)
- Translates it into an intermediate form or machine code
- Executes it immediately
- Moves to the next statement
- If an error is found, execution stops at that point with an error message
Advantages and Disadvantages of Interpreters
| Advantage | Disadvantage |
|---|---|
| Faster development — changes can be tested immediately without recompilation | Slower execution — translation happens every time the program runs |
| Easier debugging — stops at the exact line where an error occurs | Source code must be present to run the program (no IP protection) |
| Platform independent — same source can run on any platform with the interpreter | A loop executed 1000 times is translated 1000 times |
| Good for prototyping and scripting | Requires the interpreter to be installed on the target machine |
Examples: Python (CPython interpreter), JavaScript (in browsers), Ruby, PHP
Some languages use both compilation and interpretation. Java compiles to bytecode (an intermediate form), which is then interpreted or JIT-compiled by the Java Virtual Machine (JVM). Python compiles to bytecode (.pyc files) which is executed by the Python interpreter. This hybrid approach combines portability with improved performance.
Assemblers
An assembler translates assembly language (low-level, human-readable mnemonics) into machine code on a one-to-one basis — each assembly instruction maps to exactly one machine code instruction.
Two-Pass Assembly
Most assemblers make two passes over the source code:
First pass:
- Reads each instruction
- Assigns memory addresses to each instruction (using a location counter)
- Builds the symbol table — records labels and their memory addresses
- Identifies any forward references (labels used before they are defined)
Second pass:
- Reads each instruction again
- Converts mnemonics to opcodes (binary machine code)
- Resolves forward references using the symbol table
- Replaces labels with their actual memory addresses
- Produces the final machine code output
Why Two Passes?
A single pass cannot resolve forward references — instructions that jump to labels defined later in the program. The first pass discovers all labels; the second pass can then replace them with addresses.
Example:
BRZ end ; Jump to 'end' — but where is it?
ADD total
...
end: HLT ; First pass records: end = address 15
On the second pass, BRZ end is translated with the now-known address of end.
Advantages and Disadvantages of Assemblers
| Advantage | Disadvantage |
|---|---|
| Produces very efficient machine code | Platform-specific — tied to one processor architecture |
| Programmer has direct hardware control | Difficult to write and maintain — very low level |
| One-to-one mapping makes translation fast and simple | Programs are longer and harder to read than HLL code |
| Useful for embedded systems and device drivers | No portability — must be rewritten for different CPUs |
Comparing Translator Types
| Feature | Compiler | Interpreter | Assembler |
|---|---|---|---|
| Input | High-level source code | High-level source code | Assembly language |
| Output | Machine code (object file) | No output file — executes directly | Machine code |
| Translation | Whole program at once | One statement at a time | One-to-one (mnemonic → opcode) |
| Speed of execution | Fast (pre-compiled) | Slow (translated each run) | Fast (native machine code) |
| Error reporting | All errors after compilation | Stops at first error | All errors after assembly |
| Source code needed at runtime? | No | Yes | No |
| Portability | Low (platform-specific) | High (any platform with interpreter) | Very low (CPU-specific) |
| Passes | Multiple (lexical, syntax, semantic, code gen) | Single pass per statement | Two passes |
Linkers and Loaders
After compilation, the object code may need to be combined with other modules before it can be executed.
Linker
A linker combines multiple object code modules (including library routines) into a single executable file.
- Static linking — library code is copied into the executable at link time. Produces a larger file but runs independently.
- Dynamic linking — the executable contains references to shared libraries (DLLs on Windows,
.soon Linux) that are loaded at runtime. Produces smaller files but requires the libraries to be present.
Loader
A loader places the executable program into main memory and prepares it for execution.
- Allocates memory for the program
- Loads the machine code from disk into RAM
- Sets up the program counter to the entry point
- Resolves any remaining addresses (relocation)
- Hands control to the program
Translation Errors vs Execution Errors
Translation Errors (Compile-Time Errors)
Detected during translation before the program runs:
| Error Type | Example |
|---|---|
| Syntax error | Missing bracket: print("hello" |
| Type error | Assigning a string to an integer variable (in statically typed languages) |
| Undeclared variable | Using a variable name that hasn’t been defined |
| Missing semicolon | int x = 5 (in languages that require ;) |
Execution Errors (Runtime Errors)
Occur during program execution — the code is syntactically valid but encounters a problem at runtime:
| Error Type | Example |
|---|---|
| Division by zero | result = 10 / 0 |
| Stack overflow | Infinite recursion with no base case |
| Array index out of bounds | Accessing arr[100] when the array has 50 elements |
| Null pointer / None reference | Trying to access an attribute of a null object |
| File not found | Attempting to open a file that doesn’t exist |
| Out of memory | Allocating more memory than is available |
Logical Errors
A third category — logical errors — produce incorrect results but no error messages. The program runs without crashing but produces wrong output.
| Error Type | Example |
|---|---|
| Wrong operator | Using + instead of - |
| Off-by-one error | Loop runs one too many or one too few times |
| Incorrect condition | Using > instead of >= |
| Wrong variable | Comparing x > x instead of x > y |
Exam questions often ask you to classify errors. Remember: syntax errors are caught at translation time, runtime errors crash the program during execution, and logical errors produce wrong results without crashing. A compiler catches syntax and type errors; an interpreter catches them as it reaches each line. Neither can detect logical errors — only thorough testing reveals those.