NoSQL Design Patterns
ArkitekturCommon design patterns og best practices for at modellere data effektivt i NoSQL databases som MongoDB, DynamoDB, og Cassandra.
Beskrivelse
NoSQL databases kræver en fundamentalt anderledes approach til data modeling end relationelle databaser. I stedet for at normalisere data og bruge JOINs, designer man schemas baseret på query patterns (query-driven design). Common NoSQL patterns inkluderer: Embedding (nested documents i stedet for separate collections), Denormalization (duplicate data for fast reads), Bucketing (group relateret data), Polymorphic Pattern (different document types i samme collection), Computed Pattern (pre-calculate aggregates), Subset Pattern (embed subset af data), Extended Reference Pattern (embed frequently accessed fields), og Outlier Pattern (handle edge cases differently). I document databases som MongoDB er embedding ofte bedre end references. I wide-column stores som Cassandra er denormalization og duplicate tables normal. I key-value stores som DynamoDB er composite keys og GSIs (Global Secondary Indexes) kritiske. NoSQL design er iterativt - start med use cases og query patterns, design schema til at optimere disse, og iterate baseret på performance metrics.
Problem
Relationel database design (normalization) fungerer dårligt i NoSQL hvor JOINs er dyre eller ikke-eksisterende. Naive overførsel af relationel design til NoSQL resulterer i slow queries og poor performance. Hver query kan kræve multiple round-trips til database.
Løsning
NoSQL design patterns optimerer for query patterns ved at embed, denormalize, og pre-compute data. Data duplikeres strategisk for at eliminere behov for JOINs. Schemas designes specifikt til hvordan data accesses, ikke hvordan det logisk relaterer. Dette giver fast, predictable query performance.
Eksempel
-- Pattern 1: Embedding vs Referencing (MongoDB)
// RELATIONAL approach (anti-pattern i MongoDB):
// users collection:
{
_id: 1,
name: "Peter Hansen",
email: "peter@email.dk"
}
// posts collection (reference):
{
_id: 123,
userId: 1, // Reference!
title: "My Post",
content: "..."
}
// Kræver $lookup (slow):
db.posts.aggregate([
{ $match: { _id: 123 } },
{ $lookup: {
from: "users",
localField: "userId",
foreignField: "_id",
as: "author"
}
}
]);
// NOSQL approach (embedding):
{
_id: 123,
title: "My Post",
content: "...",
author: { // Embedded!
id: 1,
name: "Peter Hansen",
email: "peter@email.dk"
},
comments: [ // Array of embedded documents
{
id: 1,
text: "Great post!",
author: { name: "Maria Nielsen" }
}
]
}
// Single query, no $lookup:
db.posts.findOne({ _id: 123 });
-- Pattern 2: Bucketing (time-series data)
// NAIVE (one document per measurement):
{
_id: ObjectId(),
sensor: "sensor-1",
temperature: 23.5,
timestamp: ISODate("2024-01-15T10:00:00Z")
}
// Millions of tiny documents!
// BUCKETING pattern (group by hour):
{
_id: "sensor-1_2024-01-15T10",
sensor: "sensor-1",
hour: ISODate("2024-01-15T10:00:00Z"),
measurements: [
{ temp: 23.5, time: ISODate("2024-01-15T10:00:00Z") },
{ temp: 23.6, time: ISODate("2024-01-15T10:05:00Z") },
// ... 12 measurements per hour
],
count: 12,
avg_temp: 23.7
}
// Fewer, larger documents, better performance
-- Pattern 3: Computed Pattern (pre-calculate)
// NAIVE (calculate on query):
db.orders.aggregate([
{ $match: { customerId: 123 } },
{ $group: {
_id: "$customerId",
totalSpent: { $sum: "$total" },
orderCount: { $sum: 1 }
}
}
]);
// Slow med mange orders
// COMPUTED pattern (store calculated values):
{
_id: 123,
name: "Peter Hansen",
totalSpent: 15234.50, // Pre-calculated!
orderCount: 47, // Pre-calculated!
lastOrderDate: ISODate("2024-01-15")
}
// Update med increment:
db.users.updateOne(
{ _id: 123 },
{
$inc: { totalSpent: 99.99, orderCount: 1 },
$set: { lastOrderDate: new Date() }
}
);
-- Pattern 4: Subset Pattern (partial embedding)
// Problem: User har 10,000 orders
// Embedding alle er for tungt
// SUBSET pattern (embed recent subset):
{
_id: 123,
name: "Peter Hansen",
recentOrders: [ // Only last 10
{ id: 9999, total: 99.99, date: "2024-01-15" },
{ id: 9998, total: 149.50, date: "2024-01-10" },
// ...
],
allOrdersCount: 10000
}
// Full orders stadig i separate collection for queries
-- Pattern 5: Extended Reference Pattern
// NAIVE reference (need frequent lookups):
{
_id: 123,
title: "Post",
authorId: 456 // Bare ID
}
// Hver gang skal author lookup
// EXTENDED reference (embed frequently used fields):
{
_id: 123,
title: "Post",
author: {
id: 456,
name: "Peter Hansen", // Frequently needed
avatar: "url" // Frequently needed
// NOT email, bio, etc (rarely needed)
}
}
-- Pattern 6: Polymorphic Pattern
// Different document types i samme collection
{
_id: 1,
type: "image",
url: "image.jpg",
width: 1920,
height: 1080
}
{
_id: 2,
type: "video",
url: "video.mp4",
duration: 120,
codec: "h264"
}
{
_id: 3,
type: "document",
url: "doc.pdf",
pageCount: 50
}
// Query by type:
db.media.find({ type: "video" });
-- DynamoDB Single Table Design
// Instead af separate tables, use composite keys
// PK: partition key, SK: sort key
// Users:
{ PK: "USER#123", SK: "PROFILE", name: "Peter", email: "..." }
// User's orders:
{ PK: "USER#123", SK: "ORDER#456", total: 99.99, date: "..." }
{ PK: "USER#123", SK: "ORDER#457", total: 149.50, date: "..." }
// Single query gets user + orders:
query({
KeyConditionExpression: 'PK = :pk',
ExpressionAttributeValues: { ':pk': 'USER#123' }
});
-- Cassandra Denormalization
// Query-per-table approach
// Different tables for different query patterns
// Query: Get user by ID
CREATE TABLE users_by_id (
user_id UUID PRIMARY KEY,
name TEXT,
email TEXT
);
// Query: Get users by email
CREATE TABLE users_by_email (
email TEXT PRIMARY KEY,
user_id UUID,
name TEXT
);
// Query: Get user's orders
CREATE TABLE orders_by_user (
user_id UUID,
order_date TIMESTAMP,
order_id UUID,
total DECIMAL,
PRIMARY KEY (user_id, order_date)
) WITH CLUSTERING ORDER BY (order_date DESC);
// Samme data i multiple tables!
// Application skal update alle
-- Pattern 7: Outlier Pattern
// 99% af users har < 100 orders (normal)
// 1% af users har > 10,000 orders (outliers)
// Normal users (embed orders):
{
_id: 123,
name: "Regular User",
orders: [/* 50 orders embedded */]
}
// Outlier users (reference):
{
_id: 456,
name: "Power User",
isOutlier: true,
orderCount: 15000
// Orders i separate collection
}
// Application logic handles differently:
if (user.isOutlier) {
// Query separate orders collection
orders = await db.orders.find({ userId: user._id });
} else {
// Use embedded orders
orders = user.orders;
}
-- Redis patterns
// Hash for objects
HSET user:123 name "Peter" email "peter@email.dk" age 30
HGETALL user:123
// Set for relationships
SADD user:123:followers user:456
SADD user:123:followers user:789
SMEMBERS user:123:followers
// Sorted set for leaderboards
ZADD leaderboard 1500 "player1"
ZADD leaderboard 2000 "player2"
ZREVRANGE leaderboard 0 9 // Top 10Fordele
- ✓Optimeret for query patterns
- ✓Eliminerer JOINs
- ✓Predictable performance
- ✓Skalerbar design
- ✓Fast reads
Udfordringer
- ⚠Data duplication
- ⚠Write complexity
- ⚠Consistency maintenance
- ⚠Learning curve
- ⚠Må redesign når query patterns ændres
Anvendelsesområder
- •Document databases (MongoDB, CouchDB)
- •Key-value stores (DynamoDB, Redis)
- •Wide-column stores (Cassandra, HBase)
- •Time-series data
- •High-scale applications
Eksempler fra den virkelige verden
- •Social media feeds (embed user info)
- •E-commerce catalogs (denormalized product data)
- •IoT time-series (bucketing pattern)
- •Gaming leaderboards (Redis sorted sets)
- •User profiles (computed aggregates)