← Tilbage til koncepter

NoSQL Design Patterns

Arkitektur

Common design patterns og best practices for at modellere data effektivt i NoSQL databases som MongoDB, DynamoDB, og Cassandra.

Beskrivelse

NoSQL databases kræver en fundamentalt anderledes approach til data modeling end relationelle databaser. I stedet for at normalisere data og bruge JOINs, designer man schemas baseret på query patterns (query-driven design). Common NoSQL patterns inkluderer: Embedding (nested documents i stedet for separate collections), Denormalization (duplicate data for fast reads), Bucketing (group relateret data), Polymorphic Pattern (different document types i samme collection), Computed Pattern (pre-calculate aggregates), Subset Pattern (embed subset af data), Extended Reference Pattern (embed frequently accessed fields), og Outlier Pattern (handle edge cases differently). I document databases som MongoDB er embedding ofte bedre end references. I wide-column stores som Cassandra er denormalization og duplicate tables normal. I key-value stores som DynamoDB er composite keys og GSIs (Global Secondary Indexes) kritiske. NoSQL design er iterativt - start med use cases og query patterns, design schema til at optimere disse, og iterate baseret på performance metrics.

Problem

Relationel database design (normalization) fungerer dårligt i NoSQL hvor JOINs er dyre eller ikke-eksisterende. Naive overførsel af relationel design til NoSQL resulterer i slow queries og poor performance. Hver query kan kræve multiple round-trips til database.

Løsning

NoSQL design patterns optimerer for query patterns ved at embed, denormalize, og pre-compute data. Data duplikeres strategisk for at eliminere behov for JOINs. Schemas designes specifikt til hvordan data accesses, ikke hvordan det logisk relaterer. Dette giver fast, predictable query performance.

Eksempel

-- Pattern 1: Embedding vs Referencing (MongoDB)

// RELATIONAL approach (anti-pattern i MongoDB):
// users collection:
{
  _id: 1,
  name: "Peter Hansen",
  email: "peter@email.dk"
}

// posts collection (reference):
{
  _id: 123,
  userId: 1,  // Reference!
  title: "My Post",
  content: "..."
}

// Kræver $lookup (slow):
db.posts.aggregate([
  { $match: { _id: 123 } },
  { $lookup: {
      from: "users",
      localField: "userId",
      foreignField: "_id",
      as: "author"
    }
  }
]);

// NOSQL approach (embedding):
{
  _id: 123,
  title: "My Post",
  content: "...",
  author: {  // Embedded!
    id: 1,
    name: "Peter Hansen",
    email: "peter@email.dk"
  },
  comments: [  // Array of embedded documents
    {
      id: 1,
      text: "Great post!",
      author: { name: "Maria Nielsen" }
    }
  ]
}

// Single query, no $lookup:
db.posts.findOne({ _id: 123 });

-- Pattern 2: Bucketing (time-series data)

// NAIVE (one document per measurement):
{
  _id: ObjectId(),
  sensor: "sensor-1",
  temperature: 23.5,
  timestamp: ISODate("2024-01-15T10:00:00Z")
}
// Millions of tiny documents!

// BUCKETING pattern (group by hour):
{
  _id: "sensor-1_2024-01-15T10",
  sensor: "sensor-1",
  hour: ISODate("2024-01-15T10:00:00Z"),
  measurements: [
    { temp: 23.5, time: ISODate("2024-01-15T10:00:00Z") },
    { temp: 23.6, time: ISODate("2024-01-15T10:05:00Z") },
    // ... 12 measurements per hour
  ],
  count: 12,
  avg_temp: 23.7
}
// Fewer, larger documents, better performance

-- Pattern 3: Computed Pattern (pre-calculate)

// NAIVE (calculate on query):
db.orders.aggregate([
  { $match: { customerId: 123 } },
  { $group: {
      _id: "$customerId",
      totalSpent: { $sum: "$total" },
      orderCount: { $sum: 1 }
    }
  }
]);
// Slow med mange orders

// COMPUTED pattern (store calculated values):
{
  _id: 123,
  name: "Peter Hansen",
  totalSpent: 15234.50,  // Pre-calculated!
  orderCount: 47,        // Pre-calculated!
  lastOrderDate: ISODate("2024-01-15")
}

// Update med increment:
db.users.updateOne(
  { _id: 123 },
  { 
    $inc: { totalSpent: 99.99, orderCount: 1 },
    $set: { lastOrderDate: new Date() }
  }
);

-- Pattern 4: Subset Pattern (partial embedding)

// Problem: User har 10,000 orders
// Embedding alle er for tungt

// SUBSET pattern (embed recent subset):
{
  _id: 123,
  name: "Peter Hansen",
  recentOrders: [  // Only last 10
    { id: 9999, total: 99.99, date: "2024-01-15" },
    { id: 9998, total: 149.50, date: "2024-01-10" },
    // ...
  ],
  allOrdersCount: 10000
}
// Full orders stadig i separate collection for queries

-- Pattern 5: Extended Reference Pattern

// NAIVE reference (need frequent lookups):
{
  _id: 123,
  title: "Post",
  authorId: 456  // Bare ID
}
// Hver gang skal author lookup

// EXTENDED reference (embed frequently used fields):
{
  _id: 123,
  title: "Post",
  author: {
    id: 456,
    name: "Peter Hansen",  // Frequently needed
    avatar: "url"           // Frequently needed
    // NOT email, bio, etc (rarely needed)
  }
}

-- Pattern 6: Polymorphic Pattern

// Different document types i samme collection
{
  _id: 1,
  type: "image",
  url: "image.jpg",
  width: 1920,
  height: 1080
}

{
  _id: 2,
  type: "video",
  url: "video.mp4",
  duration: 120,
  codec: "h264"
}

{
  _id: 3,
  type: "document",
  url: "doc.pdf",
  pageCount: 50
}

// Query by type:
db.media.find({ type: "video" });

-- DynamoDB Single Table Design

// Instead af separate tables, use composite keys
// PK: partition key, SK: sort key

// Users:
{ PK: "USER#123", SK: "PROFILE", name: "Peter", email: "..." }

// User's orders:
{ PK: "USER#123", SK: "ORDER#456", total: 99.99, date: "..." }
{ PK: "USER#123", SK: "ORDER#457", total: 149.50, date: "..." }

// Single query gets user + orders:
query({
  KeyConditionExpression: 'PK = :pk',
  ExpressionAttributeValues: { ':pk': 'USER#123' }
});

-- Cassandra Denormalization

// Query-per-table approach
// Different tables for different query patterns

// Query: Get user by ID
CREATE TABLE users_by_id (
  user_id UUID PRIMARY KEY,
  name TEXT,
  email TEXT
);

// Query: Get users by email
CREATE TABLE users_by_email (
  email TEXT PRIMARY KEY,
  user_id UUID,
  name TEXT
);

// Query: Get user's orders
CREATE TABLE orders_by_user (
  user_id UUID,
  order_date TIMESTAMP,
  order_id UUID,
  total DECIMAL,
  PRIMARY KEY (user_id, order_date)
) WITH CLUSTERING ORDER BY (order_date DESC);

// Samme data i multiple tables!
// Application skal update alle

-- Pattern 7: Outlier Pattern

// 99% af users har < 100 orders (normal)
// 1% af users har > 10,000 orders (outliers)

// Normal users (embed orders):
{
  _id: 123,
  name: "Regular User",
  orders: [/* 50 orders embedded */]
}

// Outlier users (reference):
{
  _id: 456,
  name: "Power User",
  isOutlier: true,
  orderCount: 15000
  // Orders i separate collection
}

// Application logic handles differently:
if (user.isOutlier) {
  // Query separate orders collection
  orders = await db.orders.find({ userId: user._id });
} else {
  // Use embedded orders
  orders = user.orders;
}

-- Redis patterns

// Hash for objects
HSET user:123 name "Peter" email "peter@email.dk" age 30
HGETALL user:123

// Set for relationships
SADD user:123:followers user:456
SADD user:123:followers user:789
SMEMBERS user:123:followers

// Sorted set for leaderboards
ZADD leaderboard 1500 "player1"
ZADD leaderboard 2000 "player2"
ZREVRANGE leaderboard 0 9  // Top 10

Fordele

  • Optimeret for query patterns
  • Eliminerer JOINs
  • Predictable performance
  • Skalerbar design
  • Fast reads

Udfordringer

  • Data duplication
  • Write complexity
  • Consistency maintenance
  • Learning curve
  • Må redesign når query patterns ændres

Anvendelsesområder

  • Document databases (MongoDB, CouchDB)
  • Key-value stores (DynamoDB, Redis)
  • Wide-column stores (Cassandra, HBase)
  • Time-series data
  • High-scale applications

Eksempler fra den virkelige verden

  • Social media feeds (embed user info)
  • E-commerce catalogs (denormalized product data)
  • IoT time-series (bucketing pattern)
  • Gaming leaderboards (Redis sorted sets)
  • User profiles (computed aggregates)