Denormalization
ArkitekturStrategisk tilføjelse af redundant data til en normalized database for at forbedre read performance på bekostning af write complexity.
Beskrivelse
Denormalization er processen med bevidst at bryde normalization rules ved at tilføje redundant data for at forbedre query performance. Mens normalization reducerer redundancy og sikrer data integrity, kan det kræve mange JOINs for at hente relateret data, hvilket er dyrt ved høj scale. Denormalization er en pragmatisk trade-off: du ofrer storage space og write simplicity for at få hurtigere reads. Common denormalization patterns inkluderer: duplicating foreign key data (gemme username i posts i stedet for kun user_id), storing aggregates (gemme comment_count i posts), maintaining derived values (total_price = sum of line items), og duplicating data across tables. Denormalization kræver careful maintenance - når source data ændres, skal denormalized copies opdateres (via triggers, application logic, eller batch jobs). Det er især værdifuldt i read-heavy systems, analytics databases, og caching layers. NoSQL databases som MongoDB embrace denormalization naturligt ved at embedde relateret data i documents.
Problem
Highly normalized databases kræver mange JOINs for at hente complete data, hvilket bliver langsomt ved scale. Hver JOIN er dyrt, og complex queries med 5-10 JOINs kan tage sekunder. Read-heavy applications lider mest under dette.
Løsning
Denormalization tilføjer strategisk redundant data for at eliminere JOINs. Ved at gemme author_name direkte i posts elimineres behovet for at JOIN med users table. Dette gør reads meget hurtigere, men writes skal nu opdatere multiple steder. Afvejen mellem read speed og write complexity.
Eksempel
-- NORMALIZED design (3NF)
CREATE TABLE users (
user_id INT PRIMARY KEY,
username VARCHAR(50),
email VARCHAR(100)
);
CREATE TABLE posts (
post_id INT PRIMARY KEY,
user_id INT REFERENCES users(user_id),
title VARCHAR(200),
content TEXT,
created_at TIMESTAMP
);
-- Query kræver JOIN:
SELECT p.title, p.content, u.username
FROM posts p
JOIN users u ON p.user_id = u.id
WHERE p.post_id = 123;
-- DENORMALIZED design
CREATE TABLE posts_denorm (
post_id INT PRIMARY KEY,
user_id INT REFERENCES users(user_id),
username VARCHAR(50), -- REDUNDANT! Also in users table
title VARCHAR(200),
content TEXT,
created_at TIMESTAMP
);
-- Query UDEN JOIN:
SELECT title, content, username
FROM posts_denorm
WHERE post_id = 123;
-- Meget hurtigere!
-- Pattern 1: Storing aggregates
-- NORMALIZED:
CREATE TABLE posts (
post_id INT PRIMARY KEY,
title VARCHAR(200)
);
CREATE TABLE comments (
comment_id INT PRIMARY KEY,
post_id INT REFERENCES posts,
text TEXT
);
-- Count comments (slow med mange comments):
SELECT p.*, COUNT(c.comment_id) as comment_count
FROM posts p
LEFT JOIN comments c ON p.post_id = c.post_id
GROUP BY p.post_id;
-- DENORMALIZED:
ALTER TABLE posts ADD COLUMN comment_count INT DEFAULT 0;
-- Update via trigger
CREATE TRIGGER update_comment_count
AFTER INSERT ON comments
FOR EACH ROW
BEGIN
UPDATE posts
SET comment_count = comment_count + 1
WHERE post_id = NEW.post_id;
END;
-- Now simple query (fast!):
SELECT * FROM posts WHERE post_id = 123;
-- No JOIN, no COUNT
-- Pattern 2: Duplicating lookup data
-- NORMALIZED:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
product_id INT,
quantity INT
);
CREATE TABLE products (
product_id INT PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10,2)
);
-- DENORMALIZED (for historical accuracy):
CREATE TABLE orders_denorm (
order_id INT PRIMARY KEY,
product_id INT,
product_name VARCHAR(100), -- Snapshot!
product_price DECIMAL(10,2), -- Snapshot!
quantity INT
);
-- Hvis product price ændrer, påvirker det ikke historical orders
-- Pattern 3: Materialized path (tree structures)
-- NORMALIZED:
CREATE TABLE categories (
id INT PRIMARY KEY,
name VARCHAR(100),
parent_id INT REFERENCES categories(id)
);
-- Find path kræver recursive query:
WITH RECURSIVE category_path AS (
SELECT id, name, parent_id, name as path
FROM categories WHERE id = 123
UNION ALL
SELECT c.id, c.name, c.parent_id,
cp.path || ' > ' || c.name
FROM categories c
JOIN category_path cp ON c.id = cp.parent_id
)
SELECT path FROM category_path;
-- DENORMALIZED:
ALTER TABLE categories ADD COLUMN path VARCHAR(500);
-- Electronics > Computers > Laptops
SELECT path FROM categories WHERE id = 123;
-- Instant!
-- NoSQL denormalization (MongoDB)
// NORMALIZED approach (anti-pattern i MongoDB):
// users collection:
{ _id: 1, name: "Peter", email: "peter@email.dk" }
// posts collection:
{ _id: 123, userId: 1, title: "Post Title" }
// Kræver $lookup (JOIN):
db.posts.aggregate([
{
$lookup: {
from: "users",
localField: "userId",
foreignField: "_id",
as: "author"
}
}
]);
// DENORMALIZED (embedded, MongoDB best practice):
{
_id: 123,
title: "Post Title",
author: {
id: 1,
name: "Peter", // Redundant men fast!
email: "peter@email.dk"
}
}
// Single query, no $lookup:
db.posts.findOne({ _id: 123 });
-- Maintaining denormalized data
-- Option 1: Application logic
await db.query('UPDATE users SET username = ? WHERE id = ?',
[newUsername, userId]);
// Update denormalized copies
await db.query(
'UPDATE posts SET username = ? WHERE user_id = ?',
[newUsername, userId]
);
await db.query(
'UPDATE comments SET username = ? WHERE user_id = ?',
[newUsername, userId]
);
-- Option 2: Database triggers (automatic)
CREATE TRIGGER sync_username
AFTER UPDATE ON users
FOR EACH ROW
WHEN (OLD.username != NEW.username)
BEGIN
UPDATE posts SET username = NEW.username
WHERE user_id = NEW.user_id;
UPDATE comments SET username = NEW.username
WHERE user_id = NEW.user_id;
END;
-- Option 3: Batch sync job
// Run periodically
UPDATE posts p
SET username = (
SELECT username FROM users WHERE id = p.user_id
)
WHERE username != (
SELECT username FROM users WHERE id = p.user_id
);
-- Consistency risks
-- Problem: User updates username
UPDATE users SET username = 'NewName' WHERE id = 123;
-- Application crashes before updating posts!
-- Now posts have old username (inconsistent)
-- Solution: Use transactions
BEGIN TRANSACTION;
UPDATE users SET username = 'NewName' WHERE id = 123;
UPDATE posts SET username = 'NewName' WHERE user_id = 123;
COMMIT;
-- When to denormalize?
// 1. Read-heavy workloads (reads >> writes)
// 2. Complex JOINs impacting performance
// 3. Data that changes rarely (product names)
// 4. Acceptable eventual consistency
-- When NOT to denormalize?
// 1. Write-heavy workloads
// 2. Frequently changing data
// 3. Strict consistency requirements
// 4. Limited development resources (maintenance burden)Fordele
- ✓Drastisk hurtigere read queries
- ✓Eliminerer dyre JOINs
- ✓Simplere queries
- ✓Better for caching
- ✓Natural fit for NoSQL
Udfordringer
- ⚠Data redundancy (storage overhead)
- ⚠Consistency maintenance kompleksitet
- ⚠Write operations mere komplekse
- ⚠Risk of data inconsistency
- ⚠Harder to refactor
Anvendelsesområder
- •Read-heavy applications (blogs, news sites)
- •Analytics og reporting
- •Caching layers
- •NoSQL database design
- •Historical/audit data
Eksempler fra den virkelige verden
- •Social media feeds (embed author info)
- •E-commerce order history (snapshot prices)
- •Blog comment counts
- •Leaderboards (cached ranks)
- •Product catalogs (denormalized categories)