← Tilbage til koncepter

Query Optimization

Performance

Teknikker til at forbedre database query performance gennem indexes, query rewriting, og forståelse af execution plans.

Beskrivelse

Query optimization er processen med at forbedre database query performance. Selv små forbedringer kan have enorm effekt når queries kører millioner af gange dagligt. Optimization involverer multiple aspekter: Creating proper indexes (single-column, composite, covering), understanding og analyzing query execution plans (EXPLAIN), rewriting queries for efficiency (avoid SELECT *, use appropriate JOINs), leveraging database features (materialized views, partitions), og database tuning (shared buffers, work_mem). Query optimizer i databasen vælger execution plan, men den er ikke altid optimal - derfor skal udviklere forstå hvordan deres queries executeres. Common anti-patterns inkluderer N+1 queries, SELECT * når kun få kolonner bruges, missing indexes på WHERE/JOIN kolonner, og subqueries der kunne være JOINs. Profiling tools som EXPLAIN ANALYZE, query logs, og APM tools hjælper med at identificere slow queries. Optimization er iterativ - measure, optimize, measure again.

Problem

Slow queries påvirker user experience, øger server load, og kan forårsage timeouts eller crashes ved høj trafik. Uden optimization kan selv simple queries tage sekunder på store datasets. Full table scans på millioner af rows er katastrofalt for performance.

Løsning

Query optimization kombinerer proper indexing, query rewriting, og database tuning. EXPLAIN ANALYZE viser hvor tid bruges. Indexes reducerer data scanning fra O(n) til O(log n). Denormalization kan eliminere JOINs. Caching reducerer database hits. Hver optimization skal måles for at verificere effekt.

Eksempel

-- SLOW query (anti-pattern)
SELECT * FROM orders 
WHERE YEAR(order_date) = 2024;
-- Problem: Function på column forhindrer index usage
-- Scanner ALLE rows

-- OPTIMIZED query
SELECT order_id, customer_id, total FROM orders
WHERE order_date >= '2024-01-01' 
  AND order_date < '2025-01-01';
-- Bruger index på order_date
-- Kun nødvendige kolonner

-- Create index for WHERE clause
CREATE INDEX idx_orders_date ON orders(order_date);

-- EXPLAIN ANALYZE for at se execution plan
EXPLAIN ANALYZE
SELECT * FROM orders 
WHERE order_date >= '2024-01-01';

-- Output:
-- Index Scan using idx_orders_date on orders (cost=0.43..854.22 rows=1000)
-- Planning Time: 0.123 ms
-- Execution Time: 2.456 ms

-- N+1 Query Problem (BAD!)
-- Query 1: Get all users
SELECT * FROM users;
-- Then for each user (N queries):
SELECT * FROM posts WHERE user_id = ?;
-- Total: 1 + N queries!

-- OPTIMIZED with JOIN (GOOD!)
SELECT u.*, p.* 
FROM users u
LEFT JOIN posts p ON u.id = p.user_id;
-- Single query!

-- Composite Index
CREATE INDEX idx_orders_customer_date 
ON orders(customer_id, order_date);

-- Bruger composite index efficient
SELECT * FROM orders
WHERE customer_id = 123 
  AND order_date >= '2024-01-01';
-- Begge conditions i index!

-- Covering Index (includes all needed columns)
CREATE INDEX idx_orders_covering 
ON orders(customer_id, order_date) 
INCLUDE (total, status);

SELECT customer_id, order_date, total, status
FROM orders
WHERE customer_id = 123;
-- Index-only scan - ingen table lookup!

-- Avoid SELECT *
-- SLOW:
SELECT * FROM users 
JOIN profiles USING (user_id);
-- Fetches alle kolonner (waste of bandwidth)

-- FAST:
SELECT u.id, u.name, p.bio 
FROM users u
JOIN profiles p USING (user_id);
-- Kun nødvendige kolonner

-- Subquery vs JOIN
-- SLOW (correlated subquery):
SELECT * FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o 
  WHERE o.user_id = u.id AND o.total > 1000
);

-- FASTER (JOIN):
SELECT DISTINCT u.*
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.total > 1000;

-- LIMIT for pagination
-- SLOW (large offset):
SELECT * FROM posts 
ORDER BY created_at DESC 
LIMIT 50 OFFSET 10000;
-- Scans 10050 rows, returns 50

-- FASTER (cursor-based pagination):
SELECT * FROM posts
WHERE created_at < '2024-01-15 10:00:00'
ORDER BY created_at DESC
LIMIT 50;

-- Avoid OR in WHERE (often can't use index)
-- SLOW:
SELECT * FROM users 
WHERE first_name = 'Peter' OR last_name = 'Hansen';

-- FASTER:
SELECT * FROM users WHERE first_name = 'Peter'
UNION
SELECT * FROM users WHERE last_name = 'Hansen';
-- Each can use index

-- COUNT optimization
-- SLOW:
SELECT COUNT(*) FROM orders;
-- Full table scan!

-- FASTER (if approximate count OK):
SELECT reltuples::bigint FROM pg_class 
WHERE relname = 'orders';
-- Uses statistics, instant

-- Window functions vs self-join
-- SLOW (self-join):
SELECT 
  o1.order_id,
  o1.total,
  AVG(o2.total) as running_avg
FROM orders o1
JOIN orders o2 ON o2.order_date <= o1.order_date
GROUP BY o1.order_id, o1.total;

-- FASTER (window function):
SELECT 
  order_id,
  total,
  AVG(total) OVER (
    ORDER BY order_date 
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) as running_avg
FROM orders;

-- Batch operations
-- SLOW (one-by-one inserts):
for (const user of users) {
  await db.query(
    'INSERT INTO users (name, email) VALUES (?, ?)',
    [user.name, user.email]
  );
}
-- 1000 users = 1000 queries!

-- FAST (bulk insert):
const values = users.map(u => [u.name, u.email]);
await db.query(
  'INSERT INTO users (name, email) VALUES ?',
  [values]
);
-- 1000 users = 1 query!

-- Denormalization for read-heavy workloads
-- Instead of JOIN:
SELECT p.*, u.name as author_name
FROM posts p
JOIN users u ON p.user_id = u.id;

-- Store author_name in posts table:
ALTER TABLE posts ADD COLUMN author_name VARCHAR(100);
UPDATE posts p SET author_name = (
  SELECT name FROM users WHERE id = p.user_id
);

-- Now simple query:
SELECT * FROM posts;
-- No JOIN needed!

-- Query hints (database-specific)
-- PostgreSQL:
SELECT /*+ SeqScan(orders) */ * FROM orders;

-- MySQL:
SELECT * FROM orders USE INDEX (idx_orders_date)
WHERE order_date > '2024-01-01';

-- Monitoring slow queries
-- PostgreSQL pg_stat_statements
SELECT 
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- MySQL slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- Log queries > 1s

Fordele

  • Drastisk hurtigere queries
  • Reduceret database load
  • Bedre user experience
  • Lavere infrastructure costs
  • Højere throughput

Udfordringer

  • Kræver forståelse af execution plans
  • Over-indexing kan skade write performance
  • Optimization kan blive outdated når data vokser
  • Trade-offs mellem read og write performance
  • Requires ongoing monitoring

Anvendelsesområder

  • Slow dashboards og reports
  • High-traffic web applications
  • Complex analytical queries
  • Real-time data processing
  • Legacy systems med performance issues

Eksempler fra den virkelige verden

  • E-commerce product searches
  • Social media feeds
  • Analytics dashboards
  • Financial reports
  • Search functionality