Data Security & Integrity Processes

A2 Level — Unit 4: Architecture, Data, Communication & Applications

Security and Integrity Problems During Online File Updating

When multiple users or processes update the same data simultaneously, several problems can arise.

Concurrent Access Problems

Problem Description Example
Lost update Two users read the same record, both modify it, and the second write overwrites the first — the first update is lost User A reads stock = 100, User B reads stock = 100. A sets stock = 95. B sets stock = 90. A’s update is lost — stock should be 85.
Uncommitted dependency A user reads data that another user has modified but not yet committed. If the modification is rolled back, the first user has acted on invalid data. User A updates a price to £20 (not committed). User B reads £20 and bills a customer. User A rolls back — the price was never £20.
Inconsistent analysis A user reads data while another user is partway through updating related records, seeing a mix of old and new values A report totals account balances while a transfer is moving £500 between accounts — the total appears £500 short or over.

Solutions to Concurrent Access Problems

Record Locking

When a user accesses a record, the system places a lock on it to prevent other users from modifying it simultaneously.

Lock Type Description
Exclusive lock (write lock) Only one user can read or write the record. All other users are blocked until the lock is released.
Shared lock (read lock) Multiple users can read the record, but no one can write to it until all shared locks are released.
Record-level locking Locks only the specific record being accessed — other records remain available
Table-level locking Locks the entire table — simpler but reduces concurrency

Timestamps

Each transaction is assigned a timestamp when it begins. If two transactions conflict, the system compares timestamps:

  • The older transaction (earlier timestamp) takes priority
  • The younger transaction is rolled back and restarted
  • Ensures transactions are processed in chronological order

Serialisation

Ensures that the effect of executing transactions concurrently is the same as if they were executed one after another (serially). The actual execution may be interleaved, but the result must be equivalent to some serial ordering.

Deadlock

Deadlock occurs when two or more transactions are each waiting for the other to release a lock, creating a circular dependency where none can proceed.

Example:

  1. Transaction A locks Record 1 and requests Record 2
  2. Transaction B locks Record 2 and requests Record 1
  3. Neither can proceed — both are waiting for the other

Prevention and detection:

Approach Description
Timeout If a transaction waits longer than a set time for a lock, it is automatically rolled back and restarted
Lock ordering All transactions must request locks in the same predefined order — prevents circular dependencies
Wait-die / wound-wait Timestamp-based schemes that decide whether a transaction should wait or be rolled back
Deadlock detection The system periodically checks for circular dependencies in the lock graph and rolls back one transaction to break the cycle

Exam questions often describe a scenario with two users updating the same data and ask you to identify the problem and suggest solutions. Always identify the specific problem (lost update, uncommitted dependency, or inconsistent analysis) and then explain how locking or timestamps would prevent it.

Cloud Storage and Data Integrity

Cloud storage introduces its own data integrity challenges because data is held remotely and synchronised across multiple devices.

Measures used to protect data integrity in cloud storage:

Measure How it helps
RAID technology Stores data redundantly across multiple disks; if one disk fails, data can be recovered from the others
Multiple file versions The cloud retains previous versions of files, allowing restoration if a file is corrupted or accidentally overwritten
Checksums A hash value calculated from the file data; recalculated on retrieval and compared to detect any corruption
Synchronisation Changes are propagated across all a user’s devices to keep all copies consistent

Risks and issues with cloud storage:

  • If the Internet connection is lost during synchronisation, the copy stored in the cloud will be incomplete or out of date
  • If data is corrupted on the client machine, the corrupted version may be synchronised to the cloud, overwriting the good copy
  • If a file is accessed and edited from multiple devices simultaneously, there may be version conflicts between the local and cloud copies

Need for and Purpose of Cryptography

Why Cryptography is Needed

Purpose Description
Confidentiality Ensures that only authorised parties can read the data. Even if intercepted, encrypted data is meaningless without the key.
Integrity Detects whether data has been tampered with during transmission or storage.
Authentication Verifies the identity of the sender — confirms the message genuinely comes from who it claims.
Non-repudiation Prevents the sender from denying they sent a message. Digital signatures provide evidence of origin.

Cryptography is the science of securing information by transforming it into an unreadable format (ciphertext) that can only be converted back to readable form (plaintext) by someone with the correct key. Encryption is the process of converting plaintext to ciphertext; decryption is the reverse.


Techniques of Cryptography

Symmetric Encryption

In symmetric encryption, the same key is used for both encryption and decryption. Both the sender and receiver must possess this shared secret key.

Feature Detail
How it works Plaintext + Key → Ciphertext (encryption). Ciphertext + Same Key → Plaintext (decryption).
Key distribution The key must be shared securely between parties before communication. This is the main weakness.
Speed Fast — suitable for encrypting large amounts of data
Examples AES (Advanced Encryption Standard — 128/192/256-bit keys), DES (Data Encryption Standard — 56-bit, now insecure), 3DES (Triple DES)

Advantages:

  • Fast encryption and decryption
  • Simpler algorithms
  • Efficient for bulk data encryption

Disadvantages:

  • Key distribution problem — how do you securely share the key?
  • Each pair of communicating parties needs a unique key
  • Does not provide non-repudiation (both parties have the same key)

Asymmetric Encryption

In asymmetric encryption, two mathematically related keys are used: a public key (shared openly) and a private key (kept secret).

Operation Keys Used
Encryption Sender encrypts with the recipient’s public key
Decryption Recipient decrypts with their own private key
Digital signature Sender signs with their own private key
Signature verification Recipient verifies with the sender’s public key

Key principle: Data encrypted with a public key can only be decrypted with the corresponding private key, and vice versa.

Feature Detail
Key distribution No need to share secret keys — public keys are freely distributed
Speed Slower than symmetric — not practical for large data volumes
Examples RSA (Rivest-Shamir-Adleman), ECC (Elliptic Curve Cryptography)

Advantages:

  • No key distribution problem
  • Provides digital signatures (non-repudiation)
  • Scales well — N users need only N key pairs, not N(N-1)/2 shared keys

Disadvantages:

  • Much slower than symmetric encryption
  • Key sizes must be larger for equivalent security
  • Computationally intensive

Hybrid Approach

In practice, most secure communication uses both types:

  1. Asymmetric encryption securely exchanges a session key
  2. The session key is then used for symmetric encryption of the actual data
  3. This combines the key distribution advantage of asymmetric with the speed of symmetric

This is exactly how TLS/SSL (used in HTTPS) works.


Hashing

A hash function takes input of any size and produces a fixed-size output (the hash or message digest). Hashing is a one-way function — you cannot recover the original input from the hash.

Property Description
Deterministic The same input always produces the same hash
Fixed output size Regardless of input size, the output is always the same length (e.g. SHA-256 produces 256 bits)
One-way Cannot reverse the hash to find the input
Collision-resistant It should be extremely difficult to find two different inputs with the same hash
Avalanche effect A tiny change in input (even one bit) produces a completely different hash

Uses of Hashing

Use How It Works
Password storage Store the hash of the password, not the password itself. To verify a login, hash the entered password and compare hashes.
Data integrity Send data with its hash. The receiver hashes the received data and compares — if the hashes match, the data is unchanged.
Digital signatures Hash the message, then encrypt the hash with the sender’s private key. The recipient decrypts with the sender’s public key and compares hashes.
File verification Software downloads include checksums (hashes) so users can verify the file is complete and unaltered.

Salting Passwords

A salt is a random value added to each password before hashing:

  1. Generate a unique random salt for each user
  2. Concatenate: salt + password
  3. Hash the combined value: hash(salt + password)
  4. Store both the salt and the hash (the salt is not secret)

Why salting is necessary:

  • Without salts, identical passwords produce identical hashes — an attacker who cracks one password cracks all accounts with that password
  • Prevents rainbow table attacks (precomputed tables of hash-to-password mappings)
  • Each password has a unique salt, so each must be attacked individually

Cryptographic Algorithms — Worked Examples

Caesar Cipher

The Caesar cipher is a simple substitution cipher that shifts each letter by a fixed number of positions in the alphabet.

Encryption: Replace each letter with the letter k positions later in the alphabet (wrapping around).

Example with shift of 3:

Plaintext A B C D E F X Y Z
Ciphertext D E F G H I A B C

Encrypt “HELLO” with shift 3:

H E L L O
K H O O R

Ciphertext: KHORR

Decryption: Shift each letter back by k positions.

def caesar_encrypt(plaintext, shift):
    result = ""
    for char in plaintext:
        if char.isalpha():
            base = ord('A') if char.isupper() else ord('a')
            result += chr((ord(char) - base + shift) % 26 + base)
        else:
            result += char
    return result

def caesar_decrypt(ciphertext, shift):
    return caesar_encrypt(ciphertext, -shift)

# Example
encrypted = caesar_encrypt("HELLO", 3)
print(encrypted)  # KHORR
print(caesar_decrypt(encrypted, 3))  # HELLO
Function CaesarEncrypt(plaintext As String, shift As Integer) As String
    Dim result As String = ""
    For Each ch As Char In plaintext
        If Char.IsLetter(ch) Then
            Dim base As Integer = If(Char.IsUpper(ch), Asc("A"c), Asc("a"c))
            Dim shifted As Integer = ((Asc(ch) - base + shift) Mod 26 + 26) Mod 26
            result &= Chr(shifted + base)
        Else
            result &= ch
        End If
    Next
    Return result
End Function

Function CaesarDecrypt(ciphertext As String, shift As Integer) As String
    Return CaesarEncrypt(ciphertext, 26 - shift)
End Function

' Example
Dim encrypted As String = CaesarEncrypt("HELLO", 3)
Console.WriteLine(encrypted)  ' KHORR
Console.WriteLine(CaesarDecrypt(encrypted, 3))  ' HELLO

Weakness: Only 25 possible keys — easily broken by brute force (trying all shifts) or frequency analysis (comparing letter frequencies to known language patterns).

Vigenère Cipher

The Vigenère cipher uses a keyword to determine different shifts for each letter, making it much harder to crack than a simple Caesar cipher.

How it works:

  1. Choose a keyword (e.g. “KEY”)
  2. Repeat the keyword to match the length of the plaintext
  3. Each letter of the keyword determines the shift for the corresponding plaintext letter (A=0, B=1, C=2, … Z=25)

Example — encrypt “HELLO WORLD” with keyword “KEY”:

Plaintext H E L L O W O R L D
Key letter K E Y K E Y K E Y K
Shift 10 4 24 10 4 24 10 4 24 10
Ciphertext R I J V S U Y V J N

Ciphertext: RIJVS UYVJN

def vigenere_encrypt(plaintext, keyword):
    result = ""
    key_index = 0
    keyword = keyword.upper()
    for char in plaintext:
        if char.isalpha():
            shift = ord(keyword[key_index % len(keyword)]) - ord('A')
            base = ord('A') if char.isupper() else ord('a')
            result += chr((ord(char) - base + shift) % 26 + base)
            key_index += 1
        else:
            result += char
    return result

def vigenere_decrypt(ciphertext, keyword):
    result = ""
    key_index = 0
    keyword = keyword.upper()
    for char in ciphertext:
        if char.isalpha():
            shift = ord(keyword[key_index % len(keyword)]) - ord('A')
            base = ord('A') if char.isupper() else ord('a')
            result += chr((ord(char) - base - shift) % 26 + base)
            key_index += 1
        else:
            result += char
    return result

print(vigenere_encrypt("HELLO WORLD", "KEY"))  # RIJVS UYVJN
print(vigenere_decrypt("RIJVS UYVJN", "KEY"))  # HELLO WORLD
Function VigenereEncrypt(plaintext As String, keyword As String) As String
    Dim result As String = ""
    Dim keyIndex As Integer = 0
    keyword = keyword.ToUpper()
    For Each ch As Char In plaintext
        If Char.IsLetter(ch) Then
            Dim shift As Integer = Asc(keyword(keyIndex Mod keyword.Length)) - Asc("A"c)
            Dim base As Integer = If(Char.IsUpper(ch), Asc("A"c), Asc("a"c))
            Dim encrypted As Integer = ((Asc(ch) - base + shift) Mod 26 + 26) Mod 26
            result &= Chr(encrypted + base)
            keyIndex += 1
        Else
            result &= ch
        End If
    Next
    Return result
End Function

Function VigenereDecrypt(ciphertext As String, keyword As String) As String
    Dim result As String = ""
    Dim keyIndex As Integer = 0
    keyword = keyword.ToUpper()
    For Each ch As Char In ciphertext
        If Char.IsLetter(ch) Then
            Dim shift As Integer = Asc(keyword(keyIndex Mod keyword.Length)) - Asc("A"c)
            Dim base As Integer = If(Char.IsUpper(ch), Asc("A"c), Asc("a"c))
            Dim decrypted As Integer = ((Asc(ch) - base - shift) Mod 26 + 26) Mod 26
            result &= Chr(decrypted + base)
            keyIndex += 1
        Else
            result &= ch
        End If
    Next
    Return result
End Function

Console.WriteLine(VigenereEncrypt("HELLO WORLD", "KEY"))  ' RIJVS UYVJN
Console.WriteLine(VigenereDecrypt("RIJVS UYVJN", "KEY"))  ' HELLO WORLD

Strength: The repeating key means simple frequency analysis does not work (the same plaintext letter maps to different ciphertext letters). However, if the key length is discovered, the cipher can be broken as multiple Caesar ciphers.


Digital Certificates and PKI

Digital Certificates

A digital certificate is an electronic document that binds a public key to an identity (person, organisation, or server). It is issued by a trusted Certificate Authority (CA).

A certificate contains:

  • The subject’s identity (name, organisation, domain name)
  • The subject’s public key
  • The CA’s digital signature (proving the certificate is genuine)
  • Validity period (issue date and expiry date)
  • Serial number (unique identifier)

Public Key Infrastructure (PKI)

PKI is the framework of policies, procedures, and technologies that manages digital certificates and public keys:

Component Role
Certificate Authority (CA) Issues, signs, and revokes certificates. The trusted third party.
Registration Authority (RA) Verifies the identity of certificate applicants before the CA issues a certificate
Certificate Revocation List (CRL) A list of certificates that have been revoked before their expiry date
Certificate store Where certificates are stored on a device (browsers have built-in stores of trusted CAs)

How HTTPS Works (TLS Handshake)

When a browser connects to an HTTPS website:

  1. Client Hello — the browser sends its supported encryption algorithms and a random number
  2. Server Hello — the server responds with its chosen algorithm, a random number, and its digital certificate
  3. Certificate verification — the browser checks the certificate against its trusted CA list, verifies the digital signature, and checks the expiry date
  4. Key exchange — the browser generates a pre-master secret, encrypts it with the server’s public key (from the certificate), and sends it
  5. Session keys — both sides derive identical symmetric session keys from the pre-master secret and random numbers
  6. Secure communication — all subsequent data is encrypted using the symmetric session key (fast encryption for the actual data transfer)

The HTTPS handshake combines asymmetric and symmetric encryption. Asymmetric encryption is used only for the initial key exchange (slow but solves the key distribution problem). Once both sides have the shared session key, they switch to symmetric encryption for speed. This hybrid approach is the foundation of secure Internet communication.


Comparing Cryptographic Methods

Different cryptographic methods offer different trade-offs between security strength, speed, and practicality.

Method Key Type Key Size Speed Relative Strength Main Use
Caesar cipher Symmetric 1 value (shift) Very fast Very weak — 25 possible keys Historical; educational
Vigenère cipher Symmetric Short keyword Fast Weak — vulnerable to Kasiski analysis Historical; educational
DES Symmetric 56-bit Fast Weak — 56-bit key is brute-forceable Legacy systems (now insecure)
3DES Symmetric 112/168-bit Moderate Moderate — applies DES three times Transitional; being phased out
AES Symmetric 128/192/256-bit Fast Strong — current standard File encryption, disk encryption, TLS
RSA Asymmetric 2048–4096-bit Slow Strong (with sufficient key size) Key exchange, digital signatures
ECC Asymmetric 256–384-bit Faster than RSA Strong — smaller keys, equivalent security Mobile, TLS, certificates

Factors That Determine Cryptographic Strength

Factor Description
Key length Longer keys provide more possible combinations and are harder to brute-force
Algorithm design Well-analysed algorithms (AES, RSA) have no known practical attacks; simple substitution ciphers can be broken by frequency analysis
Key management Even a strong algorithm is weak if keys are poorly stored, shared insecurely, or reused
Vulnerability to known attacks Brute force, frequency analysis, birthday attack, man-in-the-middle — resistance to these differs between algorithms

When comparing cryptographic methods, consider: key length, resistance to attack, speed, and whether it is symmetric or asymmetric. AES-256 is currently considered secure for most applications. RSA requires much larger keys (2048+ bits) to achieve comparable security because it is based on factoring large numbers, which is harder to make secure than AES’s block cipher structure.


Biometrics

Biometrics is the measurement and statistical analysis of people’s unique physical or behavioural characteristics for the purpose of identification and authentication.

Purpose and Use of Biometric Technologies

Technology What is Measured Typical Use
Fingerprint recognition Unique ridge patterns on fingertips Phone unlock, border control, workplace access
Facial recognition Geometry of facial features (distance between eyes, nose shape, jawline) Smartphone unlock, CCTV surveillance, airport security
Iris recognition Pattern of the iris (the coloured ring of the eye) High-security access control, border control
Voice recognition Vocal characteristics (pitch, cadence, tone) Phone banking, smart speakers, security verification
Retinal scanning Blood vessel patterns at the back of the eye High-security environments (government, military)
Hand geometry Shape and size of the hand and fingers Physical access control
Behavioural biometrics Typing rhythm, gait, mouse movement patterns Continuous authentication in banking/security systems

Benefits of Biometric Technologies

  • Uniqueness — biometric characteristics are unique to each individual (much harder to fake than a password or PIN)
  • Convenience — no passwords or tokens to remember, lose, or forget
  • Non-transferable — you cannot lend your fingerprint to someone else
  • Difficult to forge — modern systems are hard to spoof (though not impossible)
  • Speed — authentication is fast once enrolled

Drawbacks of Biometric Technologies

Drawback Explanation
False acceptance rate (FAR) The system incorrectly accepts an unauthorised user — a security failure
False rejection rate (FRR) The system incorrectly rejects a legitimate user — an inconvenience failure
Permanence Unlike a password, a compromised biometric cannot be changed — if your fingerprint data is stolen, you cannot issue yourself a new fingerprint
Privacy concerns Biometric data is sensitive personal data; its collection and storage raises significant legal and ethical questions
Environmental factors Dirty fingerprints, illness affecting voice, injuries affecting facial appearance can cause recognition failures
Cost Biometric hardware (sensors, cameras) and software is more expensive than simple password authentication

Complexities of Capturing, Storing and Processing Biometric Data

Capturing:

  • Biometric sensors must be high quality to capture sufficient detail (e.g. high-resolution cameras, sensitive fingerprint scanners)
  • Data capture must be consistent — variations in lighting, angle, or physical condition affect accuracy
  • Liveness detection is needed to prevent spoofing with photos or artificial fingers

Storing:

  • Biometric data must be stored securely — unlike passwords, it cannot be hashed in a way that is easily reversable for matching
  • Templates (mathematical representations of biometric features) are stored rather than raw images — but templates must still be protected
  • Centralised biometric databases are high-value targets for attackers; distributed storage (on the device itself) is often preferred

Processing:

  • Feature extraction converts raw biometric data into a mathematical template
  • Matching algorithms compare the new template against stored templates using a similarity score — a threshold determines accept/reject
  • Processing must be fast enough for real-time use
  • Large-scale systems (e.g. national ID databases) require significant computational resources for 1:N matching (one person against many)

A key trade-off in biometrics is between FAR and FRR. Lowering the acceptance threshold reduces FAR (fewer unauthorised users accepted) but increases FRR (more legitimate users rejected). Raising the threshold does the opposite. The appropriate balance depends on the application: high-security systems prioritise low FAR; convenience-focused systems prioritise low FRR.


Malicious Software

Malicious software (malware) is any software intentionally designed to disrupt, damage, or gain unauthorised access to a computer system.

Types of Malware

Type Description How it Spreads
Virus Code that attaches itself to legitimate programs and replicates when those programs run. Can corrupt or delete files. Infected files shared between users; email attachments
Worm Self-replicating malware that spreads across networks without needing to attach to a file or be activated by a user Network connections; exploiting software vulnerabilities
Trojan horse Appears to be legitimate software but contains hidden malicious functionality Downloading software from untrusted sources; email attachments
Ransomware Encrypts the victim’s files and demands payment (ransom) for the decryption key Email phishing; malicious downloads; exploit kits
Spyware Silently monitors user activity (keystrokes, browsing, credentials) and sends data to an attacker Bundled with free software; drive-by downloads
Adware Displays unwanted advertisements; may redirect browser traffic Bundled software; browser extensions
Rootkit Hides itself and other malware deep within the OS, often at kernel level, to evade detection Exploiting OS vulnerabilities; physical access
Keylogger Records keystrokes to capture passwords, credit card numbers, and other sensitive data Often delivered as part of a Trojan or spyware
Botnet/Bot An infected computer that becomes part of a network of compromised machines controlled by an attacker Worm propagation; Trojans; drive-by downloads

Mechanisms of Attack

Mechanism Description
Phishing Fraudulent emails or websites that trick users into revealing credentials or downloading malware
Social engineering Manipulating people into performing actions or divulging information (e.g. pretending to be IT support)
Exploit Taking advantage of a software vulnerability to execute malicious code
Drive-by download Malware automatically downloaded when visiting a compromised or malicious website
Man-in-the-middle (MITM) Attacker intercepts communication between two parties to eavesdrop or alter data
Brute force Systematically trying all possible passwords or keys until the correct one is found
Dictionary attack Uses a list of common words and passwords to guess a user’s credentials; faster than brute force as it targets likely passwords first
Shoulder surfing Directly observing a user entering sensitive information (e.g. a PIN or password) by looking over their shoulder, or from a distance using binoculars or CCTV
IP spoofing Attacker falsifies the source IP address of packets to impersonate a trusted host, or redirects users typing a legitimate URL to a fraudulent website to steal data or install malware
Pharming Redirecting a user to a fake website without their knowledge (by corrupting DNS records or the hosts file) to harvest credentials or install malware
SQL injection Inserting malicious SQL code into an input field to manipulate a database
Cross-site scripting (XSS) Injecting malicious scripts into web pages viewed by other users

Vectors (Routes of Entry)

Vector Examples
Email Phishing attachments; malicious links
Removable media Infected USB drives
Network Worms spreading through open ports; unpatched services
Web Malicious downloads; drive-by exploits; malicious browser extensions
Supply chain Malware embedded in legitimate software updates or third-party components
Physical access Direct access to hardware; bootable malware installed from removable media

Defences Against Malware

  • Anti-malware software — detects and removes known malware using signature-based and heuristic analysis
  • Firewalls — block unauthorised network traffic
  • Software updates/patching — closes known vulnerabilities exploited by malware
  • User education — training users to recognise phishing and social engineering
  • Least privilege — limiting user permissions to reduce the impact of infections
  • Backups — regular offsite backups allow recovery from ransomware without paying

Know the difference between malware types: a virus needs a host file and user action to spread; a worm spreads automatically across networks; a Trojan disguises itself as legitimate software. Ransomware is currently one of the most significant real-world threats to organisations.


Black Hat, White Hat Hacking and Penetration Testing

Types of Hacker

Type Description Legality
Black hat hacker Breaks into systems without authorisation for personal gain, malice, or political motivation. Steals data, installs malware, or causes damage. Illegal
White hat hacker (ethical hacker) Uses the same techniques as black hat hackers but with explicit permission from the system owner, with the goal of identifying and fixing vulnerabilities. Legal (with permission)
Grey hat hacker Operates between black and white hat — may break into systems without permission but reports vulnerabilities rather than exploiting them, sometimes requesting a fee Legally ambiguous
Script kiddie Uses pre-written tools and exploits without deep technical understanding Usually illegal
Hacktivist Motivated by political or social causes; targets organisations to make a political point (e.g. website defacement, DDoS) Usually illegal
State-sponsored hacker Acts on behalf of a government to conduct espionage, sabotage, or influence operations Illegal under target country’s law

Penetration Testing

Penetration testing (pen testing) is an authorised simulated attack on a computer system, network, or application to evaluate its security. The goal is to identify vulnerabilities before malicious attackers do.

Stages of a Penetration Test

  1. Planning and reconnaissance — define scope and goals; gather information about the target (IP addresses, domain names, technologies in use)
  2. Scanning — use tools to identify open ports, running services, and potential vulnerabilities
  3. Gaining access — attempt to exploit identified vulnerabilities (e.g. SQL injection, weak passwords, unpatched software)
  4. Maintaining access — determine if the vulnerability could provide persistent access (e.g. installing a backdoor)
  5. Reporting — document all findings, vulnerabilities exploited, data that could have been accessed, and recommended remediations

Footprinting

Footprinting is the first step in evaluating the security of any system. It involves gathering all available information about the target system, network, and devices — the same information an attacker would collect before an attack.

  • Includes discovering IP address ranges, domain names, active services, operating systems, and publicly available technical information
  • Enables an organisation to understand what details a potential attacker could find
  • Allows the organisation to limit publicly available technical information before it can be exploited

Penetration Testing Strategies

Strategy Description
Targeted testing The organisation’s IT team and the penetration testing team work together with shared knowledge of the system
External testing Tests whether an outside attacker can gain access and how far they can penetrate once in
Internal testing Estimates the damage a malicious or disgruntled employee could cause from inside the network
Blind testing The tester is given minimal information about the target, simulating the actions of a real external attacker as closely as possible

Types of Penetration Test

Type Description
Black box Tester has no prior knowledge of the system — simulates an external attacker
White box Tester has full knowledge of the system (source code, architecture, credentials) — thorough but not realistic
Grey box Tester has partial knowledge — simulates an insider threat or attacker with some information

Why Penetration Testing is Important

  • Identifies vulnerabilities before they can be exploited by malicious actors
  • Tests the effectiveness of existing security controls
  • Required for compliance with security standards (e.g. PCI DSS for payment card data)
  • Provides evidence of due diligence in security for legal and regulatory purposes

The key distinction between black hat and white hat hacking is authorisation. White hat hackers have written permission from the system owner; black hat hackers do not. Penetration testing is white hat hacking — it is legal, documented, and conducted with clear rules of engagement. Always emphasise the importance of written authorisation in any exam answer about ethical hacking.


Secure by design is beyond the A2 specification but is included here as extension reading relevant to professional software development practice.

Secure by Design

Secure by design is an approach to software development in which security is built into a system from the very beginning, rather than being added as an afterthought once vulnerabilities are discovered.

Key principles of the secure by design approach:

  • At the design stage, it is assumed that the system will be subject to invalid data entry, hacking attempts, and malicious use — security measures are planned for these from the outset
  • Continuous testing is carried out throughout development to identify and close vulnerabilities early
  • Best programming practices are followed at every stage (e.g. input validation, parameterised queries to prevent SQL injection, least-privilege principles)
  • The goal is to reduce the number of vulnerabilities that reach production and to minimise the need for reactive security patches

Advantages of secure by design:

  • Reduces the cost of fixing vulnerabilities — it is far cheaper to address security at design time than after deployment
  • Reduces the risk of data breaches and reputational damage
  • Minimises the need for emergency security patches in live systems
  • Builds user trust by demonstrating a proactive approach to security

Secure by design contrasts with the older approach of building a system first and then “bolting on” security. The key point is that security is considered at every stage of development, not just during testing. Exam questions may ask you to explain what secure by design means or to give examples of how it would be applied during the development of a specific system.