Database Systems

AS — Unit 1: Fundamentals of Computer Science

Flat File vs Relational Databases

Flat File Database

A flat file database stores all data in a single table (or file). Every record contains all the data, which often leads to significant data redundancy (the same data stored multiple times).

Example — Flat File:

StudentID StudentName CourseID CourseName TutorName
S101 Alice C1 Maths Mr Smith
S102 Bob C1 Maths Mr Smith
S103 Charlie C2 English Ms Jones
S101 Alice C2 English Ms Jones

Problems with flat files:

  • Data redundancy — “Maths”, “Mr Smith” repeated for every student on that course
  • Data inconsistency — if Mr Smith changes his name, it must be updated in every row
  • Insertion anomaly — cannot add a new course without assigning a student to it
  • Deletion anomaly — deleting the last student on a course removes all data about that course
  • Update anomaly — changing a course name requires updating multiple rows

Relational Database

A relational database stores data across multiple linked tables, each representing a single entity. Tables are linked through key fields, eliminating redundancy and the anomalies listed above.

A flat file database stores all data in one table, leading to redundancy and anomalies. A relational database stores data in multiple linked tables, reducing redundancy and improving data integrity.

Comparison

Feature Flat File Relational Database
Structure Single table Multiple linked tables
Redundancy High Low (data stored once)
Data integrity Poor (inconsistencies likely) High (enforced through keys and constraints)
Anomalies Insertion, deletion, update anomalies Eliminated through normalisation
Complexity Simple to set up More complex structure
Storage Wastes space Efficient use of storage
Querying Limited Powerful querying with SQL

Entity-Relationship Diagrams

An entity-relationship (E-R) diagram is a visual representation of the entities in a database and the relationships between them. Entities are represented as rectangles and relationships as lines connecting them.

An entity is a real-world object or concept about which data is stored (e.g., Student, Course, Teacher). A relationship describes how entities are associated with each other.

Types of Relationships

There are three types of relationships between entities:

One-to-One (1:1)

Each instance of Entity A is associated with exactly one instance of Entity B, and vice versa.

Example: Each Employee has one DrivingLicence, and each DrivingLicence belongs to one Employee.

erDiagram
    EMPLOYEE ||--|| DRIVING_LICENCE : has

One-to-Many (1:M)

Each instance of Entity A can be associated with many instances of Entity B, but each instance of Entity B is associated with only one instance of Entity A.

Example: One Tutor teaches many Students, but each Student has only one Tutor.

erDiagram
    TUTOR ||--o{ STUDENT : teaches

This is the most common relationship type. It is implemented by placing the primary key of the “one” side as a foreign key in the table on the “many” side.

Many-to-Many (M:M)

Each instance of Entity A can be associated with many instances of Entity B, and vice versa.

Example: A Student can enrol on many Courses, and each Course can have many Students.

erDiagram
    STUDENT }o--o{ COURSE : "enrols on"

Important: A many-to-many relationship cannot be directly implemented in a relational database. It must be resolved by creating a junction table (also called a linking table or associative entity) that sits between the two tables, converting the M:M relationship into two 1:M relationships.

erDiagram
    STUDENT ||--o{ ENROLMENT : ""
    COURSE ||--o{ ENROLMENT : ""

The Enrolment table typically contains the primary keys of both Student and Course as foreign keys.

In the exam, you may be asked to draw an E-R diagram or identify relationship types. Remember that many-to-many relationships must be decomposed using a junction table. The foreign key always goes on the “many” side of a 1:M relationship.


Keys

Keys are fields that serve special purposes in a relational database, used to uniquely identify records and create relationships between tables.

Primary Key

A primary key is a field (or combination of fields) that uniquely identifies each record in a table. Every table must have a primary key.

Rules for primary keys:

  • Must be unique — no two records can have the same primary key value
  • Must be not null — every record must have a value for the primary key
  • Should be stable — the value should not change over time
  • Should be minimal — use the fewest fields necessary

Example: StudentID in a Student table — each student has a unique ID.

Foreign Key

A foreign key is a field in one table that references the primary key of another table. Foreign keys create the links (relationships) between tables.

Example: TutorID in the Student table is a foreign key that references TutorID (the primary key) in the Tutor table.

Composite Key

A composite key is a primary key made up of two or more fields combined. Individually, neither field is unique, but together they uniquely identify a record.

Example: In an Enrolment table (junction table), neither StudentID nor CourseID alone is unique (a student takes many courses, and a course has many students). Together, the combination of StudentID + CourseID uniquely identifies each enrolment.

StudentID (FK/PK) CourseID (FK/PK) EnrolmentDate
S101 C1 01/09/2025
S101 C2 01/09/2025
S102 C1 01/09/2025

The composite primary key is (StudentID, CourseID).

Secondary Key

A secondary key is a field that is not the primary key but is used frequently for searching or sorting. An index is typically created on secondary keys to speed up queries.

Example: Surname in a Student table — not unique, but frequently searched.

A primary key uniquely identifies each record. A foreign key is a field in one table that references the primary key of another table. A composite key is a primary key made up of two or more fields.

Summary of Key Types

Key Type Purpose Example
Primary Key Uniquely identifies each record in a table StudentID
Foreign Key Links one table to another (references a primary key) TutorID in Student table
Composite Key A primary key made up of two or more fields (StudentID, CourseID)
Secondary Key Used for searching/sorting, indexed for speed Surname

Normalisation

Normalisation is the process of organising a database to reduce data redundancy and eliminate anomalies (insertion, deletion, and update anomalies). It involves splitting a flat file into related tables through a series of normal forms.

Normalisation is a systematic process of decomposing tables to eliminate data redundancy and dependency anomalies, progressing through First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF).

Worked Example

Consider this un-normalised data for a school system:

Un-normalised (UNF):

StudentID StudentName Courses
S101 Alice Maths, English, Physics
S102 Bob Maths, Chemistry
S103 Charlie English, Chemistry, Art

Problems: The Courses field contains repeating groups (multiple values in one field).

First Normal Form (1NF)

Rule: A table is in 1NF if:

  • All fields contain atomic (single, indivisible) values — no repeating groups
  • Each record is unique (has a primary key)

Solution: Remove repeating groups by creating a separate row for each course:

StudentID StudentName CourseID CourseName
S101 Alice C1 Maths
S101 Alice C2 English
S101 Alice C3 Physics
S102 Bob C1 Maths
S102 Bob C4 Chemistry
S103 Charlie C2 English
S103 Charlie C4 Chemistry
S103 Charlie C5 Art

The composite primary key is (StudentID, CourseID).

This is now in 1NF but still has redundancy (e.g., “Alice” is repeated, “Maths” is repeated).

Second Normal Form (2NF)

Rule: A table is in 2NF if:

  • It is already in 1NF
  • Every non-key field depends on the whole primary key (no partial dependencies)

A partial dependency occurs when a non-key field depends on only part of a composite key.

In the 1NF table above:

  • StudentName depends only on StudentID (not on the full key StudentID + CourseID) — partial dependency
  • CourseName depends only on CourseID (not on the full key) — partial dependency

Solution: Split into separate tables to remove partial dependencies:

Student Table:

StudentID (PK) StudentName
S101 Alice
S102 Bob
S103 Charlie

Course Table:

CourseID (PK) CourseName
C1 Maths
C2 English
C3 Physics
C4 Chemistry
C5 Art

Enrolment Table (junction table):

StudentID (FK/PK) CourseID (FK/PK)
S101 C1
S101 C2
S101 C3
S102 C1
S102 C4
S103 C2
S103 C4
S103 C5

This is now in 2NF — every non-key attribute depends on the whole of its table’s primary key.

Third Normal Form (3NF)

Rule: A table is in 3NF if:

  • It is already in 2NF
  • There are no transitive dependencies — non-key fields must depend only on the primary key, not on other non-key fields

A transitive dependency occurs when a non-key field depends on another non-key field, which in turn depends on the primary key.

Example of a transitive dependency:

StudentID (PK) StudentName TutorID TutorName
S101 Alice T1 Mr Smith
S102 Bob T1 Mr Smith
S103 Charlie T2 Ms Jones

Here, TutorName depends on TutorID, which depends on StudentID. So TutorName is transitively dependent on StudentID through TutorID.

Solution: Remove transitive dependencies by creating a separate table:

Student Table:

StudentID (PK) StudentName TutorID (FK)
S101 Alice T1
S102 Bob T1
S103 Charlie T2

Tutor Table:

TutorID (PK) TutorName
T1 Mr Smith
T2 Ms Jones

This is now in 3NF.

Summary of Normal Forms

Normal Form Rule What is Removed
1NF No repeating groups; all values are atomic Repeating groups
2NF 1NF + no partial dependencies Partial dependencies on composite key
3NF 2NF + no transitive dependencies Transitive dependencies between non-key fields

A useful mnemonic for 3NF: “Each non-key attribute must depend on the key, the whole key, and nothing but the key.” In the exam, you may be given un-normalised data and asked to normalise it through each form. Show each step and clearly identify your primary and foreign keys.


Structured Query Language (SQL)

SQL (Structured Query Language) is the standard language used to create, manage, and query relational databases. SQL commands can be divided into DDL (Data Definition Language) for structure and DML (Data Manipulation Language) for data.

SELECT — Retrieving Data

The SELECT statement retrieves data from one or more tables.

-- Select all columns from a table
SELECT * FROM Student;

-- Select specific columns
SELECT StudentName, Mark FROM Student;

-- Select with a condition (WHERE clause)
SELECT StudentName, Mark
FROM Student
WHERE Mark >= 70;

-- Select with multiple conditions
SELECT StudentName, Mark
FROM Student
WHERE Mark >= 70 AND TutorID = 'T1';

-- Select with sorting (ORDER BY)
SELECT StudentName, Mark
FROM Student
ORDER BY Mark DESC;

-- Select with sorting by multiple columns
SELECT StudentName, Mark
FROM Student
ORDER BY TutorID ASC, Mark DESC;

WHERE Clause — Operators

Operator Meaning Example
= Equal to WHERE Mark = 80
<> or != Not equal to WHERE TutorID <> 'T1'
< Less than WHERE Mark < 50
> Greater than WHERE Mark > 90
<= Less than or equal to WHERE Mark <= 75
>= Greater than or equal to WHERE Mark >= 60
BETWEEN Within a range WHERE Mark BETWEEN 60 AND 80
LIKE Pattern matching WHERE StudentName LIKE 'A%'
IN Matches any value in a list WHERE TutorID IN ('T1', 'T2')
AND Both conditions must be true WHERE Mark > 70 AND TutorID = 'T1'
OR Either condition can be true WHERE Mark > 90 OR TutorID = 'T2'
NOT Negates a condition WHERE NOT TutorID = 'T1'

INSERT — Adding Records

-- Insert a new record with all fields specified
INSERT INTO Student (StudentID, StudentName, Mark, TutorID)
VALUES ('S104', 'Diana', 88, 'T2');

-- Insert multiple records
INSERT INTO Student (StudentID, StudentName, Mark, TutorID)
VALUES ('S105', 'Edward', 72, 'T1'),
       ('S106', 'Fiona', 95, 'T2');

UPDATE — Modifying Records

-- Update a specific record
UPDATE Student
SET Mark = 85
WHERE StudentID = 'S102';

-- Update multiple fields
UPDATE Student
SET Mark = 90, TutorID = 'T2'
WHERE StudentID = 'S101';

-- Update all records matching a condition
UPDATE Student
SET Mark = Mark + 5
WHERE TutorID = 'T1';

DELETE — Removing Records

-- Delete a specific record
DELETE FROM Student
WHERE StudentID = 'S104';

-- Delete all records matching a condition
DELETE FROM Student
WHERE Mark < 40;

-- Delete all records from a table (use with caution!)
DELETE FROM Student;

Be very careful with UPDATE and DELETE statements — always include a WHERE clause unless you deliberately want to affect every record. DELETE FROM Student; without a WHERE clause deletes all students.

JOIN — Combining Tables

A JOIN combines rows from two or more tables based on a related field (typically a foreign key relationship).

INNER JOIN returns only rows where there is a match in both tables:

-- Get student names with their tutor names
SELECT Student.StudentName, Tutor.TutorName
FROM Student
INNER JOIN Tutor ON Student.TutorID = Tutor.TutorID;

-- Join with a WHERE clause and ORDER BY
SELECT Student.StudentName, Student.Mark, Tutor.TutorName
FROM Student
INNER JOIN Tutor ON Student.TutorID = Tutor.TutorID
WHERE Student.Mark >= 70
ORDER BY Student.Mark DESC;

LEFT JOIN returns all rows from the left table and matched rows from the right table (NULL if no match):

-- Get all tutors and their students (including tutors with no students)
SELECT Tutor.TutorName, Student.StudentName
FROM Tutor
LEFT JOIN Student ON Tutor.TutorID = Student.TutorID;

GROUP BY and Aggregate Functions

GROUP BY groups rows that share a value, allowing you to apply aggregate functions to each group.

Function Purpose
COUNT() Counts the number of rows
SUM() Adds up values
AVG() Calculates the average
MAX() Finds the maximum value
MIN() Finds the minimum value
-- Count students per tutor
SELECT TutorID, COUNT(*) AS NumberOfStudents
FROM Student
GROUP BY TutorID;

-- Average mark per tutor
SELECT Tutor.TutorName, AVG(Student.Mark) AS AverageMark
FROM Student
INNER JOIN Tutor ON Student.TutorID = Tutor.TutorID
GROUP BY Tutor.TutorName;

-- Find the highest mark
SELECT MAX(Mark) AS HighestMark FROM Student;

-- Count students scoring above 70, grouped by tutor
SELECT Tutor.TutorName, COUNT(*) AS HighScorers
FROM Student
INNER JOIN Tutor ON Student.TutorID = Tutor.TutorID
WHERE Student.Mark > 70
GROUP BY Tutor.TutorName;

Wildcards with LIKE

Wildcard Meaning Example
% Zero or more characters LIKE 'A%' matches Alice, Adam, Anna
_ Exactly one character LIKE '_ob' matches Bob, Rob
-- Find students whose name starts with 'A'
SELECT * FROM Student WHERE StudentName LIKE 'A%';

-- Find students whose name contains 'li'
SELECT * FROM Student WHERE StudentName LIKE '%li%';

SQL is a frequent exam topic. Practise writing queries from scratch. Remember the order of clauses: SELECT … FROM … JOIN … ON … WHERE … GROUP BY … ORDER BY …. You may also be asked to interpret what a given SQL query does.


Database Management System (DBMS)

A Database Management System (DBMS) is the software that sits between the user/application and the physical database. It provides tools for creating, managing, querying, and securing the database. Examples include MySQL, Microsoft Access, Oracle, and PostgreSQL.

A DBMS is software that manages access to a database, providing data definition, data manipulation, security, and integrity features while hiding the physical storage details from users.

Functions of a DBMS

Function Description
Data Definition Language (DDL) Define and modify the database structure: create tables, define fields, data types, primary/foreign keys, constraints
Data Manipulation Language (DML) Insert, update, delete, and query data using SQL
Data Dictionary Maintains metadata about the database (see below)
Security management User accounts, passwords, access rights, permissions (read, write, delete per table/field)
Backup and recovery Tools to create backups and restore the database after failure
Data integrity Enforces validation rules, referential integrity (foreign keys must reference valid primary keys), and data types
Concurrency control Manages simultaneous access by multiple users, preventing conflicts using record locking
Query processing Optimises and executes queries efficiently

Advantages of Using a DBMS

  • Data independence — applications do not need to know how data is physically stored; the DBMS handles this
  • Reduced redundancy — normalised tables store data once
  • Improved data integrity — validation and constraints enforced centrally
  • Improved security — fine-grained access control per user, per table
  • Concurrent access — multiple users can access the database simultaneously
  • Centralised backup — one backup covers all the data

Disadvantages of Using a DBMS

  • Cost — DBMS software can be expensive to purchase and maintain
  • Complexity — requires trained database administrators (DBAs)
  • Performance overhead — the DBMS adds a layer of processing
  • Single point of failure — if the DBMS fails, all applications lose access to data

Data Dictionary

A data dictionary is a centralised repository of metadata — data about the data. It is maintained automatically by the DBMS and is used internally to manage the database.

A data dictionary stores metadata about the database structure, including table names, field names, data types, constraints, relationships, and access rights.

Contents of a Data Dictionary

Metadata Description
Table names Names of all tables in the database
Field names Names of all fields in each table
Data types The data type of each field (e.g., VARCHAR, INTEGER, DATE)
Field sizes Maximum length or precision of each field
Primary keys Which field(s) form the primary key for each table
Foreign keys Which fields are foreign keys and which table/field they reference
Constraints Validation rules (e.g., NOT NULL, UNIQUE, CHECK)
Default values Default values for fields when no data is entered
Indexes Which fields are indexed for faster searching
Access rights Which users have read/write/delete permissions for each table

Example Data Dictionary Entry

For the Student table:

Field Name Data Type Size Key Constraints Default
StudentID VARCHAR 5 Primary Key NOT NULL, UNIQUE
StudentName VARCHAR 50 NOT NULL
Mark INTEGER CHECK (Mark >= 0 AND Mark <= 100) 0
TutorID VARCHAR 3 Foreign Key (Tutor.TutorID) NOT NULL

Purpose of the Data Dictionary

  • Enforces consistency — ensures all applications use the same definitions for fields
  • Supports query processing — the DBMS uses it to validate and optimise queries
  • Aids database administration — DBAs use it to understand and manage the database structure
  • Documents the database — serves as a reference for developers

The data dictionary is about metadata (data about data), not the actual data stored in the tables. Be ready to explain what information it contains and why it is useful for the DBMS and database administrators.


Referential Integrity

Referential integrity is a database concept that ensures relationships between tables remain consistent. It means that a foreign key value must either match an existing primary key value in the referenced table or be null.

Rules

  • You cannot add a record with a foreign key value that does not exist as a primary key in the related table
  • You cannot delete a record from a parent table if there are records in a child table that reference it (unless cascade delete is enabled)
  • You cannot change a primary key value if foreign keys in other tables reference it (unless cascade update is enabled)

Example: If you try to insert a student with TutorID = 'T9' but no tutor with TutorID = 'T9' exists in the Tutor table, the DBMS will reject the insertion.

Referential integrity prevents orphan records — records that reference a parent record that no longer exists. This is a fundamental concept in relational databases and is enforced automatically by the DBMS.