← Tilbage til koncepter

Materialized Views

Performance

Pre-computed query results gemt som fysiske tabeller for hurtig adgang til komplekse aggregations og JOINs.

Beskrivelse

Materialized Views er query results der er gemt fysisk på disk, i modsætning til normale views der er virtuelle. En normal view er bare en stored query der executes hver gang den kaldes, mens en materialized view faktisk gemmer resultatet. Dette gør dem perfekte til dyre queries med komplekse JOINs, aggregations, eller calculations der bruges ofte men ændres sjældent. Trade-off er at data kan blive stale - materialized views skal refreshes periodically for at reflektere ændringer i underliggende tabeller. Refresh kan være COMPLETE (rebuild hele viewet) eller INCREMENTAL (kun ændringer siden sidste refresh). PostgreSQL understøtter materialized views nativt. Oracle har avancerede features som query rewrite hvor optimizer automatisk bruger materialized views. I systems uden native support kan samme effekt opnås med denormalized tables og triggers. Materialized views er særligt værdifulde i data warehouses og analytics hvor samme komplekse queries køres ofte (dashboards, reports).

Problem

Komplekse analytical queries med mange JOINs og aggregations kan tage minutter at køre. Hvis samme query køres ofte (fx dashboard der refreshes hver 5 min), spildes enorme ressourcer på at re-compute samme resultater. Users oplever slow dashboards.

Løsning

Materialized views pre-computer og gemmer query results. Første query tager tid, men efterfølgende queries er instant fordi de læser fra stored results. Views refreshes periodically (fx hver time) for at holde data reasonably fresh. Dette giver 1000x speedup for complex queries.

Eksempel

-- PostgreSQL Materialized View
-- Complex query der tager lang tid:
SELECT 
  c.category_name,
  COUNT(DISTINCT o.order_id) as order_count,
  COUNT(DISTINCT o.customer_id) as customer_count,
  SUM(oi.quantity * oi.price) as total_revenue,
  AVG(oi.quantity * oi.price) as avg_order_value
FROM categories c
JOIN products p ON c.category_id = p.category_id
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date >= '2024-01-01'
GROUP BY c.category_id, c.category_name
ORDER BY total_revenue DESC;
-- Takes 30 seconds med millions of rows

-- Opret materialized view
CREATE MATERIALIZED VIEW sales_by_category AS
SELECT 
  c.category_name,
  COUNT(DISTINCT o.order_id) as order_count,
  COUNT(DISTINCT o.customer_id) as customer_count,
  SUM(oi.quantity * oi.price) as total_revenue,
  AVG(oi.quantity * oi.price) as avg_order_value,
  CURRENT_TIMESTAMP as last_updated
FROM categories c
JOIN products p ON c.category_id = p.category_id
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date >= '2024-01-01'
GROUP BY c.category_id, c.category_name;

-- Create index på materialized view
CREATE INDEX idx_sales_category 
ON sales_by_category(category_name);

-- Query materialized view (instant!)
SELECT * FROM sales_by_category
ORDER BY total_revenue DESC;
-- Takes milliseconds!

-- Refresh materialized view
REFRESH MATERIALIZED VIEW sales_by_category;
-- Rebuilds view med fresh data

-- Concurrent refresh (no locks)
REFRESH MATERIALIZED VIEW CONCURRENTLY sales_by_category;
-- Requires UNIQUE index

-- Automatic refresh med cron/scheduler
-- Kør hver time:
REFRESH MATERIALIZED VIEW sales_by_category;

-- Scheduled refresh (PostgreSQL pg_cron)
SELECT cron.schedule(
  'refresh-sales',
  '0 * * * *',  -- Every hour
  $$REFRESH MATERIALIZED VIEW sales_by_category$$
);

-- Conditional refresh (kun hvis data ændret)
DO $$
BEGIN
  IF EXISTS (
    SELECT 1 FROM orders 
    WHERE updated_at > (
      SELECT last_updated FROM sales_by_category LIMIT 1
    )
  ) THEN
    REFRESH MATERIALIZED VIEW sales_by_category;
  END IF;
END $$;

-- MySQL doesn't have native materialized views
-- Workaround: Create table + refresh logic

-- Create summary table
CREATE TABLE sales_by_category_mv (
  category_name VARCHAR(100),
  order_count INT,
  customer_count INT,
  total_revenue DECIMAL(12,2),
  avg_order_value DECIMAL(10,2),
  last_updated TIMESTAMP,
  PRIMARY KEY (category_name)
);

-- Procedure to refresh
DELIMITER $$
CREATE PROCEDURE refresh_sales_by_category()
BEGIN
  TRUNCATE TABLE sales_by_category_mv;
  
  INSERT INTO sales_by_category_mv
  SELECT 
    c.category_name,
    COUNT(DISTINCT o.order_id),
    COUNT(DISTINCT o.customer_id),
    SUM(oi.quantity * oi.price),
    AVG(oi.quantity * oi.price),
    NOW()
  FROM categories c
  JOIN products p ON c.category_id = p.category_id
  JOIN order_items oi ON p.product_id = oi.product_id
  JOIN orders o ON oi.order_id = o.order_id
  WHERE o.order_date >= '2024-01-01'
  GROUP BY c.category_id, c.category_name;
END$$
DELIMITER ;

-- Call procedure to refresh
CALL refresh_sales_by_category();

-- Schedule med MySQL Event Scheduler
CREATE EVENT refresh_sales_hourly
ON SCHEDULE EVERY 1 HOUR
DO CALL refresh_sales_by_category();

-- Incremental refresh pattern
-- Track changes i separate table
CREATE TABLE sales_changes (
  order_id INT,
  changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Trigger to track changes
CREATE TRIGGER track_order_changes
AFTER INSERT OR UPDATE ON orders
FOR EACH ROW
INSERT INTO sales_changes (order_id) VALUES (NEW.order_id);

-- Incremental refresh (only changed data)
UPDATE sales_by_category_mv mv
SET 
  order_count = (
    SELECT COUNT(*) FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    WHERE oi.product_id IN (
      SELECT product_id FROM products 
      WHERE category_name = mv.category_name
    )
  ),
  last_updated = NOW()
WHERE category_name IN (
  SELECT DISTINCT p.category_name
  FROM sales_changes sc
  JOIN orders o ON sc.order_id = o.order_id
  JOIN order_items oi ON o.order_id = oi.order_id
  JOIN products p ON oi.product_id = p.product_id
);

-- Node.js scheduled refresh
const cron = require('node-cron');

// Refresh every hour
cron.schedule('0 * * * *', async () => {
  console.log('Refreshing materialized view...');
  
  await db.query(
    'REFRESH MATERIALIZED VIEW CONCURRENTLY sales_by_category'
  );
  
  console.log('Refresh complete');
});

-- Monitoring view freshness
SELECT 
  schemaname,
  matviewname,
  last_updated,
  NOW() - last_updated as age
FROM sales_by_category;

-- Query rewrite (Oracle feature)
-- Oracle can automatically use materialized view
-- even if query references base tables!

CREATE MATERIALIZED VIEW sales_mv
ENABLE QUERY REWRITE AS
SELECT category_id, SUM(revenue) FROM sales
GROUP BY category_id;

-- Query base table:
SELECT category_id, SUM(revenue) FROM sales
GROUP BY category_id;
-- Oracle automatically uses sales_mv!

-- Real-world dashboard example
-- Daily active users materialized view
CREATE MATERIALIZED VIEW daily_active_users AS
SELECT 
  DATE(login_time) as date,
  COUNT(DISTINCT user_id) as dau,
  COUNT(*) as total_logins
FROM user_sessions
WHERE login_time >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY DATE(login_time);

-- Refresh daily at 1 AM
SELECT cron.schedule(
  'refresh-dau',
  '0 1 * * *',
  $$REFRESH MATERIALIZED VIEW daily_active_users$$
);

Fordele

  • Drastisk hurtigere complex queries
  • Reduceret database load
  • Consistent query performance
  • Kan indexes på view results
  • Perfekt til dashboards og reports

Udfordringer

  • Data kan være stale
  • Refresh kan tage lang tid
  • Storage overhead
  • Concurrent refresh complexity
  • Maintenance af refresh schedule

Anvendelsesområder

  • Analytics dashboards
  • Report generation
  • Data warehousing
  • Complex aggregations
  • Read-heavy workloads

Eksempler fra den virkelige verden

  • Business intelligence dashboards
  • E-commerce sales reports
  • Social media analytics
  • Financial summaries
  • Customer 360 views