title: "Database Optimization Guide" description: "Comprehensive guide for optimizing PostgreSQL database performance in Navius applications, including indexing, query optimization, and schema design" category: "Guides" tags: ["database", "postgresql", "optimization", "performance", "indexing", "queries", "schema"] last_updated: "April 5, 2025" version: "1.0"
Database Optimization Guide
Overview
This guide provides comprehensive strategies for optimizing PostgreSQL database performance in Navius applications. Database performance is critical for application responsiveness and scalability, as database operations often represent the most significant bottleneck in web applications.
Database Design Principles
Schema Design
- Normalize with purpose - Follow normalization principles but prioritize query performance
- Choose appropriate data types - Use the most efficient data types for each column
- Limit column width - Use varchar with appropriate length limits instead of unlimited text fields
- Consider table partitioning - For very large tables (millions of rows)
Index Design
- Primary keys - Always define explicit primary keys
- Foreign keys - Index all foreign key columns
- Compound indexes - Create for commonly queried column combinations
- Cover indexes - Include additional columns to create covering indexes
- Partial indexes - Use for filtered queries on large tables
-- Example: Compound index for a commonly used query pattern
CREATE INDEX idx_users_email_status ON users (email, status);
-- Example: Covering index to avoid table lookups
CREATE INDEX idx_posts_author_created_title ON posts (author_id, created_at) INCLUDE (title);
-- Example: Partial index for active users
CREATE INDEX idx_active_users ON users (email, last_login) WHERE status = 'active';
Query Optimization
Query Analysis
Use EXPLAIN ANALYZE
to understand query execution plans:
EXPLAIN ANALYZE SELECT * FROM users
WHERE email LIKE 'user%' AND status = 'active'
ORDER BY created_at DESC LIMIT 10;
Look for these issues in query plans:
- Sequential scans on large tables
- High cost operations
- Unused indexes
- Poor join performance
Common Optimizations
1. Avoid SELECT *
-- Instead of this
SELECT * FROM users WHERE id = 1;
-- Do this
SELECT id, email, name, created_at FROM users WHERE id = 1;
2. Use Parameterized Queries
#![allow(unused)] fn main() { // Instead of string interpolation let users = sqlx::query_as::<_, User>("SELECT id, name FROM users WHERE email = $1 AND status = $2") .bind(email) .bind(status) .fetch_all(&pool) .await?; }
3. Batch Operations
#![allow(unused)] fn main() { // Instead of multiple single inserts let mut transaction = pool.begin().await?; for user in users { sqlx::query("INSERT INTO user_logs (user_id, action, timestamp) VALUES ($1, $2, $3)") .bind(user.id) .bind("login") .bind(Utc::now()) .execute(&mut transaction) .await?; } transaction.commit().await?; }
4. Use Appropriate WHERE Clauses
-- Instead of functions on indexed columns
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
-- Do this
SELECT * FROM users WHERE email = '[email protected]';
Connection Management
Connection Pooling
Configure your connection pool appropriately:
#![allow(unused)] fn main() { // Optimal connection pool configuration for Navius applications let pool = PgPoolOptions::new() .max_connections(num_cpus::get() * 4) // 4x CPU cores as a starting point .min_connections(num_cpus::get()) // Maintain at least one connection per CPU .max_lifetime(std::time::Duration::from_secs(30 * 60)) // 30 minutes .idle_timeout(std::time::Duration::from_secs(5 * 60)) // 5 minutes .connect(&database_url) .await?; }
Guidelines for sizing:
- Measure maximum concurrent database operations during peak load
- Consider PostgreSQL's
max_connections
setting (usually 100-300) - Monitor connection usage over time
Transaction Management
- Keep transactions as short as possible
- Don't perform I/O or network operations within transactions
- Use appropriate isolation levels
- Consider using read-only transactions for queries
#![allow(unused)] fn main() { // Example: Read-only transaction let mut tx = pool.begin_read_only().await?; let users = sqlx::query_as::<_, User>("SELECT id, name FROM users WHERE status = $1") .bind("active") .fetch_all(&mut tx) .await?; tx.commit().await?; }
Advanced Optimization Techniques
PostgreSQL Configuration
Key PostgreSQL settings to tune:
# Memory settings
shared_buffers = 25% of system RAM (up to 8GB)
work_mem = 32-64MB
maintenance_work_mem = 256MB
# Checkpoint settings
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
# Planner settings
random_page_cost = 1.1 (for SSD storage)
effective_cache_size = 75% of system RAM
Materialized Views
For expensive reports or analytics queries:
CREATE MATERIALIZED VIEW user_stats AS
SELECT
date_trunc('day', created_at) as day,
count(*) as new_users,
avg(extract(epoch from now() - last_login)) as avg_days_since_login
FROM users
GROUP BY date_trunc('day', created_at);
-- Refresh the view:
REFRESH MATERIALIZED VIEW user_stats;
Database Monitoring
Monitor these metrics:
- Query execution times
- Index usage statistics
- Cache hit ratios
- Lock contention
- Deadlocks
Tools to use:
Implementing Repository Pattern in Navius
The Repository pattern helps maintain clean database access and makes queries easier to optimize:
#![allow(unused)] fn main() { // User repository implementation pub struct UserRepository { pool: PgPool, } impl UserRepository { pub fn new(pool: PgPool) -> Self { Self { pool } } pub async fn find_by_email(&self, email: &str) -> Result<Option<User>, sqlx::Error> { sqlx::query_as::<_, User>("SELECT id, name, email, status FROM users WHERE email = $1 LIMIT 1") .bind(email) .fetch_optional(&self.pool) .await } pub async fn find_active_users(&self, limit: i64, offset: i64) -> Result<Vec<User>, sqlx::Error> { sqlx::query_as::<_, User>( "SELECT id, name, email, status FROM users WHERE status = 'active' ORDER BY last_login DESC LIMIT $1 OFFSET $2" ) .bind(limit) .bind(offset) .fetch_all(&self.pool) .await } // Additional methods... } }
Performance Testing Database Queries
Benchmarking Strategies
- Isolated Query Testing - Test queries independently from application
- Mock Production Data - Use production-sized datasets
- Concurrent Load Testing - Test under simultaneous connections
- EXPLAIN ANALYZE - Measure execution plan costs
- Cache Warmup/Cooldown - Test with both hot and cold cache scenarios
Testing with Criterion
#![allow(unused)] fn main() { fn benchmark_user_query(c: &mut Criterion) { let rt = Runtime::new().unwrap(); let pool = rt.block_on(establish_connection()); let repo = UserRepository::new(pool.clone()); c.bench_function("find_active_users", |b| { b.iter(|| { rt.block_on(repo.find_active_users(100, 0)) }) }); } }
Common Database Performance Issues
N+1 Query Problem
#![allow(unused)] fn main() { // BAD: N+1 query problem let users = repo.find_active_users(100, 0).await?; for user in &users { let posts = repo.find_posts_by_user_id(user.id).await?; // Process posts... } // GOOD: Single query with join let user_with_posts = repo.find_active_users_with_posts(100, 0).await?; }
Missing Indexes
Signs of missing indexes:
- Sequential scans on large tables
- Slow filtering operations
- Slow ORDER BY or GROUP BY clauses
Oversized Queries
- Fetching unnecessary columns
- Not using LIMIT with large result sets
- Not using pagination
- Using subqueries when joins would be more efficient
Database Migration Strategies
When migrating or updating schemas:
-
Create indexes concurrently
CREATE INDEX CONCURRENTLY idx_users_status ON users (status);
-
Perform updates in batches
#![allow(unused)] fn main() { // Update in batches of 1000 for batch in user_ids.chunks(1000) { sqlx::query("UPDATE users SET status = $1 WHERE id = ANY($2)") .bind("inactive") .bind(batch) .execute(&pool) .await?; } }
-
Use temporary tables for complex migrations
CREATE TEMPORARY TABLE temp_users AS SELECT * FROM users WHERE false; INSERT INTO temp_users SELECT * FROM users WHERE <condition>; -- Perform operations on temp_users -- Finally update or insert back to users
Case Study: Optimizing a High-Traffic User Service
Initial State
- Average query time: 150ms
- Database CPU: 85% utilization
- Cache hit ratio: 45%
- Frequent timeouts during peak traffic
Optimization Steps
- Added compound indexes on commonly queried fields
- Implemented result caching for frequent queries
- Optimized schema removing unused columns
- Implemented connection pooling with optimal settings
- Added database replicas for read operations
Results
- Average query time: 15ms (10x improvement)
- Database CPU: 40% utilization
- Cache hit ratio: 80%
- Zero timeouts during peak traffic