Organisation of Data

AS — Unit 1: Fundamentals of Computer Science

Fields, Records, and Files

Data stored on a computer is typically organised into a clear hierarchy. Understanding this hierarchy is essential for working with files and databases.

Field: A single item of data (e.g., a surname, a date of birth). Record: A collection of related fields describing one entity (e.g., one student). File: A collection of related records (e.g., all students in a school).

Field

A field is the smallest meaningful unit of data. Each field has:

A field name (e.g., Surname, DateOfBirth, Mark)
A data type (e.g., String, Integer, Date, Boolean, Real)
A field size (e.g., 20 characters for a name)

Record

A record is a collection of related fields that together describe a single entity. For example, a Student record might contain:

Field Name	Data Type	Example Value
StudentID	String	“S1024”
FirstName	String	“Alice”
Surname	String	“Williams”
DateOfBirth	Date	15/03/2007
Mark	Integer	78
Passed	Boolean	True

File

A file is a collection of related records stored together. A Students file would hold the records for all students. Files are stored on secondary storage (e.g., hard drive, SSD) so the data persists when the computer is turned off.

The Data Hierarchy

File (e.g., Students.dat)
 └── Record 1 (Alice Williams)
 │    ├── Field: StudentID = "S1024"
 │    ├── Field: FirstName = "Alice"
 │    ├── Field: Surname = "Williams"
 │    └── Field: Mark = 78
 └── Record 2 (Bob Jones)
      ├── Field: StudentID = "S1025"
      ├── Field: FirstName = "Bob"
      ├── Field: Surname = "Jones"
      └── Field: Mark = 65

The data hierarchy from smallest to largest is: Bit → Byte → Field → Record → File → Database. In exams, you may be asked to identify the fields, records, and file from a given scenario.

Fixed-Length vs Variable-Length Records

When records are stored in a file, they can be formatted in two different ways: fixed-length or variable-length.

Fixed-Length Records

Every record in the file occupies the exact same amount of storage space, regardless of the actual length of the data. Each field is allocated a maximum size, and if the data is shorter, the remaining space is padded (often with spaces or null characters).

Example: If FirstName is allocated 15 characters, the name “Sue” would be stored as "Sue " (padded with 12 spaces).

Structure:

StudentID (5)	FirstName (15)	Surname (20)	Mark (3)
S1024	Alice	Williams	078
S1025	Bob	Jones	065

Each record = 5 + 15 + 20 + 3 = 43 bytes.

Calculating record position: The position of the nth record can be calculated directly: Position = (n - 1) × Record Length

For example, to find record 50 in a file where each record is 43 bytes: Position = (50 - 1) × 43 = 2107 bytes from the start

Variable-Length Records

Each record only takes up as much space as the data actually requires. A delimiter (special character, such as a comma or pipe symbol) or a length indicator separates fields and records.

Example (comma-delimited):

S1024,Alice,Williams,78
S1025,Bob,Jones,65

Comparison

Feature	Fixed-Length Records	Variable-Length Records
Storage space	Wastes space due to padding	Efficient — no wasted space
Access speed	Fast — can calculate exact position of any record	Slow — must read from the start to find a record
Simplicity	Simple to program and manage	More complex — needs delimiters or length fields
Insertion/Deletion	Easy — overwrite at known position	Harder — may require rewriting the file
Best for	Records of similar length; random access needed	Records of very different lengths; storage is limited

Fixed-length records allocate the same amount of space for every record, allowing direct calculation of record positions. Variable-length records use only the space needed, separated by delimiters.

The key trade-off is speed vs. storage. Fixed-length records are faster for direct access but waste space. Variable-length records save space but require sequential reading. Be ready to explain this trade-off in an exam.

Serial File Organisation

In a serial file, records are stored in the order they are added (chronologically) with no particular sorting. New records are simply appended to the end of the file.

A serial file stores records in the order they were entered, with no sorting or logical ordering applied.

Characteristics

Records are stored in no particular order (or in the order of entry)
To find a specific record, a linear search must be performed — reading every record from the beginning until the target is found
New records are added to the end of the file
Deleting a record may require rewriting the entire file, or marking the record as deleted

Advantages and Disadvantages

Advantage	Disadvantage
Simple to implement	Slow to search — must read sequentially
Fast to add new records (append to end)	No ordering makes retrieval inefficient
Appropriate for small files or batch input	Not suitable for large files needing frequent lookups

Use Cases

Transaction logs — events recorded in the order they occur
Batch processing input files — data collected throughout the day, processed later
Backup files — records dumped in the order they are read

Sequential File Organisation

In a sequential file, records are stored sorted by a key field (e.g., sorted by StudentID or Surname). The file must be kept in order at all times.

A sequential file stores records in order of a key field. Searching can be more efficient than serial because the search can stop once the key value has been passed.

Characteristics

Records are sorted by a key field
To search, read through the file in order; once a key value greater than the target is found, the search can stop (the record is not in the file)
Inserting a new record requires finding the correct position and rewriting the file from that point onwards (or creating a new file with the record inserted)
Often used with fixed-length records for efficient processing

Advantages and Disadvantages

Advantage	Disadvantage
Faster searching than serial (can stop early)	Insertion is slow — file must be partially or fully rewritten
Efficient for processing all records in order (batch processing)	Not suitable for frequent additions/deletions
Can use binary search if records are fixed-length	Must maintain sort order at all times

Serial vs Sequential Comparison

Feature	Serial	Sequential
Order	No order (order of entry)	Sorted by key field
Adding records	Append to end (fast)	Insert in correct position (slow)
Searching	Linear search only	Linear search (can stop early) or binary search
Best use	Temporary data, logs	Batch processing of ordered data

Random (Direct Access) File Organisation

In a random access file (also called direct access), records can be read or written directly without reading through preceding records. The position of a record is calculated using a hashing algorithm or an address calculation applied to the record’s key field.

Random (direct) access allows any record to be accessed directly by calculating its storage address from the key field, typically using a hash function.

How It Works

A hash function is applied to the record’s key field to produce a storage address (position in the file)
The record is stored at (or near) that address
To retrieve a record, the same hash function is applied to the key to calculate the address and go directly to it

Handling Collisions

When two different keys produce the same hash value (a collision), the system must resolve it. Common methods include:

Open addressing — store the record in the next available slot
Overflow area — store colliding records in a separate overflow area with pointers

Advantages and Disadvantages

Advantage	Disadvantage
Very fast access to individual records — O(1)	Wasted storage space (empty slots in the hash table)
Ideal for applications requiring frequent lookups	Collisions reduce efficiency
No need to read through other records	Poor for processing all records in order
Records can be updated in place	Choosing a good hash function is important

Use Cases

Real-time systems requiring instant lookup (e.g., airline booking systems)
Stock control systems — look up a product by barcode
Bank account systems — access an account by account number

Indexed Sequential Access

Indexed sequential access combines the benefits of sequential and random access. Records are stored sequentially (sorted by key field), but an index is maintained that maps key values to their approximate positions in the file.

Indexed sequential access stores records in key order with an index that enables fast lookup by pointing to the approximate location of a record, reducing the number of records that must be searched.

How It Works

Records are stored in sequential order by key field
An index is created that stores the key value and the address of certain records (e.g., the first record on each page or block)
To find a record, the system first searches the index to find the approximate location, then reads sequentially from that point
Multiple levels of index can be used (like a multi-level index) for very large files

Example

Consider a file of students sorted by StudentID, with a block index:

Index:

Key Range	Block Address
S1000 - S1099	Block 1
S1100 - S1199	Block 2
S1200 - S1299	Block 3

To find student S1157, the system checks the index, jumps directly to Block 2, then searches sequentially within that block.

Advantages and Disadvantages

Advantage	Disadvantage
Much faster than purely sequential search	Index requires additional storage space
Supports both sequential and direct access	Index must be maintained when records are added/deleted
Efficient for large files	More complex to implement
Good for batch and interactive processing	Overflow handling needed for insertions

Comparison of File Organisation Methods

Feature	Serial	Sequential	Random (Direct)	Indexed Sequential
Order	None	Sorted by key	None (hash-based)	Sorted by key + index
Search speed	Slow (linear)	Medium (linear, can stop early)	Fast (O(1) via hash)	Fast (index then short search)
Insertion	Fast (append)	Slow (rewrite)	Fast (hash to position)	Medium (may need index update)
Deletion	Slow (rewrite or mark)	Slow (rewrite)	Fast (mark as empty)	Medium
Best for	Logs, batch input	Batch processing	Real-time lookups	Large files needing both access types
Storage efficiency	High	High	Low (empty slots)	Medium (index overhead)

In the exam, you may be asked to recommend a file organisation method for a given scenario. Think about whether the application needs frequent individual lookups (random access), batch processing in order (sequential), or both (indexed sequential). Serial is mainly used for temporary or log data.

File Handling — Code Examples

Working with files in code involves opening, reading from, writing to, and closing files. The two most common modes are:

Writing — creates a new file or overwrites an existing one
Reading — opens an existing file for reading
Appending — adds data to the end of an existing file

Writing to a Text File

# Python — Writing to a text file
# Writing records to a file (overwrites if file exists)
file = open("students.txt", "w")
file.write("S1024,Alice,Williams,78\n")
file.write("S1025,Bob,Jones,65\n")
file.write("S1026,Charlie,Brown,82\n")
file.close()

# Better practice: using 'with' to auto-close the file
with open("students.txt", "w") as file:
    file.write("S1024,Alice,Williams,78\n")
    file.write("S1025,Bob,Jones,65\n")
    file.write("S1026,Charlie,Brown,82\n")

' VB.NET — Writing to a text file
Imports System.IO

' Writing records to a file (overwrites if file exists)
Dim writer As New StreamWriter("students.txt")
writer.WriteLine("S1024,Alice,Williams,78")
writer.WriteLine("S1025,Bob,Jones,65")
writer.WriteLine("S1026,Charlie,Brown,82")
writer.Close()

' Alternative using Using block (auto-closes the file)
Using writer As New StreamWriter("students.txt")
    writer.WriteLine("S1024,Alice,Williams,78")
    writer.WriteLine("S1025,Bob,Jones,65")
    writer.WriteLine("S1026,Charlie,Brown,82")
End Using

Reading from a Text File

# Python — Reading from a text file

# Reading the entire file
with open("students.txt", "r") as file:
    contents = file.read()
    print(contents)

# Reading line by line
with open("students.txt", "r") as file:
    for line in file:
        line = line.strip()  # Remove newline character
        fields = line.split(",")  # Split by comma delimiter
        student_id = fields[0]
        first_name = fields[1]
        surname = fields[2]
        mark = int(fields[3])
        print(f"{first_name} {surname} (ID: {student_id}) scored {mark}")

' VB.NET — Reading from a text file
Imports System.IO

' Reading line by line
Dim reader As New StreamReader("students.txt")
Dim line As String

While Not reader.EndOfStream
    line = reader.ReadLine()
    Dim fields() As String = line.Split(",")
    Dim studentID As String = fields(0)
    Dim firstName As String = fields(1)
    Dim surname As String = fields(2)
    Dim mark As Integer = CInt(fields(3))
    Console.WriteLine(firstName & " " & surname & " (ID: " & studentID & ") scored " & mark)
End While
reader.Close()

Appending to a Text File

# Python — Appending to a text file
with open("students.txt", "a") as file:
    file.write("S1027,Diana,Evans,91\n")

' VB.NET — Appending to a text file
Imports System.IO

' Append mode (True = append)
Using writer As New StreamWriter("students.txt", True)
    writer.WriteLine("S1027,Diana,Evans,91")
End Using

Searching for a Record in a File

# Python — Searching for a record by StudentID
def search_student(filename, search_id):
    with open(filename, "r") as file:
        for line in file:
            line = line.strip()
            fields = line.split(",")
            if fields[0] == search_id:
                return {
                    "id": fields[0],
                    "first_name": fields[1],
                    "surname": fields[2],
                    "mark": int(fields[3])
                }
    return None  # Not found

result = search_student("students.txt", "S1025")
if result is not None:
    print(f"Found: {result['first_name']} {result['surname']}")
else:
    print("Student not found")

' VB.NET — Searching for a record by StudentID
Imports System.IO

Sub SearchStudent(filename As String, searchID As String)
    Dim reader As New StreamReader(filename)
    Dim found As Boolean = False
    Dim line As String

    While Not reader.EndOfStream
        line = reader.ReadLine()
        Dim fields() As String = line.Split(",")
        If fields(0) = searchID Then
            Console.WriteLine("Found: " & fields(1) & " " & fields(2))
            found = True
            Exit While
        End If
    End While

    reader.Close()

    If Not found Then
        Console.WriteLine("Student not found")
    End If
End Sub

File Modes Summary

Mode	Python	VB.NET	Description
Read	`"r"`	`New StreamReader(path)`	Open for reading; file must exist
Write	`"w"`	`New StreamWriter(path)`	Open for writing; creates or overwrites
Append	`"a"`	`New StreamWriter(path, True)`	Open for appending; adds to end

In exam pseudocode, you may see simplified file operations such as OPENFILE, READLINE, WRITELINE, and CLOSEFILE. Make sure you understand the difference between write mode (which overwrites existing content) and append mode (which adds to the end). Always close a file after use to ensure data is saved properly.

Master Files and Transaction Files

In many real-world systems, data processing involves two key types of file working together.

Master file — the main, permanent data store that holds the current state of all records. It is updated periodically but is not changed during daily operations. Examples include a payroll file, customer database, or stock file.

Transaction file — a temporary file that holds all the changes (additions, deletions, modifications) that need to be applied to the master file. Transactions are collected over a period and then processed in a batch.

The File Update Process

The process of applying a transaction file to a master file follows these steps:

Sort the transaction file by the same key field as the master file
Open both files — the old master file for reading and a new master file for writing
Read a record from each file and compare their key fields
If the keys match — apply the transaction (update or delete the record) and write to the new master file
If the master key is smaller — the record is unchanged; copy it to the new master file
If the transaction key is smaller — it is a new record; write it to the new master file
Repeat until both files have been fully processed
The old master file is kept as a backup (the “grandfather” in the GFS scheme)

Real-World File Processing Examples

Application	Master File	Transaction File	Update Frequency
Payroll	Employee records (name, salary, tax code, bank details)	Hours worked, overtime, bonuses for the pay period	Weekly or monthly
Utility billing	Customer accounts (address, tariff, meter readings)	New meter readings for the billing period	Quarterly
Banking	Account records (balance, account holder, account type)	Daily deposits, withdrawals, transfers	Daily (overnight batch)
Retail stock	Product records (stock level, price, supplier)	Sales transactions, deliveries, returns	Daily

When describing the file update process, always mention that both files must be sorted by the same key field before processing begins. Also note that the old master file is not deleted — it is kept as a backup generation. This is a common source of marks in exam questions.

← Previous Next →