Database Design Patterns for Modern Applications
Database design decisions made early in a project's life have an outsized impact on its future. A well-designed schema enables features; a poorly designed one blocks them. After working with databases at various scales—from MVPs to systems handling millions of requests—here are the patterns that consistently prove their worth.
1. Choose the Right Database Type First
The SQL vs. NoSQL debate misses the point. The question is: what are your access patterns?
Choose Relational (PostgreSQL, MySQL) When:
- Data has complex relationships that benefit from joins
- You need ACID transactions across multiple tables
- Query patterns are varied and unpredictable
- Data integrity constraints are critical
- You need powerful ad-hoc reporting
Choose Document (MongoDB, DynamoDB) When:
- Data is naturally hierarchical or nested
- Access patterns are known and limited
- Schema needs to evolve frequently
- Horizontal scaling is a primary requirement
- You're denormalizing anyway for performance
Choose Key-Value (Redis, DynamoDB) When:
- Simple lookups by key dominate
- Ultra-low latency is required
- Data is ephemeral (sessions, caches)
- High write throughput is critical
The pragmatic choice: PostgreSQL handles 90% of use cases well. Start there unless you have specific requirements that demand something else. It's relational when you need it, supports JSON when you don't, and scales further than most applications will ever need.
2. Normalize Until It Hurts, Then Denormalize Strategically
Start with normalized schemas. Third normal form (3NF) prevents data anomalies and keeps your schema flexible. Denormalization is an optimization—don't apply it prematurely.
-- Normalized (correct starting point)
CREATE TABLE users (
id UUID PRIMARY KEY,
email VARCHAR(255) UNIQUE,
name VARCHAR(255)
);
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
created_at TIMESTAMP,
total_cents INTEGER
);
CREATE TABLE order_items (
id UUID PRIMARY KEY,
order_id UUID REFERENCES orders(id),
product_id UUID REFERENCES products(id),
quantity INTEGER,
price_cents INTEGER
);When to denormalize: Only when you have measured performance problems. Common cases:
- Read-heavy dashboards that join many tables
- Frequently accessed aggregations (order totals, user statistics)
- Search indexes that need flattened data
-- Denormalized for read performance (add when needed)
ALTER TABLE orders ADD COLUMN user_email VARCHAR(255);
ALTER TABLE orders ADD COLUMN item_count INTEGER;
-- Keep denormalized fields updated via triggers or application logic
-- Document WHY the denormalization exists3. Primary Keys: UUIDs vs. Sequential IDs
This debate is more nuanced than it appears:
Sequential IDs (SERIAL, AUTO_INCREMENT)
- Pros: Smaller storage, better index performance, naturally ordered
- Cons: Expose record counts, require database for ID generation, problematic for distributed systems
UUIDs
- Pros: Generate anywhere, no central coordination, hide business information
- Cons: Larger storage (16 bytes vs 4-8), random distribution hurts index performance
ULIDs / UUID v7
- Pros: Sortable by creation time, generate anywhere, good index locality
- Cons: Slightly more complex to generate
Recommendation: Use UUIDv7 or ULID for new projects. They provide the benefits of UUIDs (no coordination needed, hide information) while maintaining sortability that helps index performance. If you're using PostgreSQL 17+, UUIDv7 is natively supported.
-- PostgreSQL with UUID
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- other fields
);
-- For UUID v7 (sortable), use application-generated IDs
-- or a PostgreSQL extension that supports it4. Timestamps: More Than Created/Updated
Every table should have audit timestamps, but think beyond the basics:
CREATE TABLE orders (
id UUID PRIMARY KEY,
-- Basic audit timestamps
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
-- Soft delete
deleted_at TIMESTAMP,
-- Business state timestamps
submitted_at TIMESTAMP,
confirmed_at TIMESTAMP,
shipped_at TIMESTAMP,
delivered_at TIMESTAMP,
-- Who made changes
created_by UUID REFERENCES users(id),
updated_by UUID REFERENCES users(id)
);
-- Auto-update updated_at
CREATE OR REPLACE FUNCTION update_timestamp()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER orders_updated_at
BEFORE UPDATE ON orders
FOR EACH ROW
EXECUTE FUNCTION update_timestamp();Always use UTC. Store timestamps without timezone (TIMESTAMP, not TIMESTAMPTZ) in UTC. Convert to local time in the application layer where you have user context.
5. Soft Deletes vs. Hard Deletes
The choice matters more than people realize:
Use Soft Deletes When:
- Data might need to be restored
- Audit trails are required
- Foreign key relationships would break
- You need to answer "what did this look like before?"
Use Hard Deletes When:
- Data retention is a liability (GDPR, privacy laws)
- Storage costs matter at scale
- The data genuinely has no future value
Implementation tip: If using soft deletes, create a partial unique index to prevent duplicates among active records:
-- Soft delete with unique constraint on active records
CREATE UNIQUE INDEX users_email_unique
ON users(email)
WHERE deleted_at IS NULL;
-- Query helper: default to excluding deleted
CREATE VIEW active_users AS
SELECT * FROM users WHERE deleted_at IS NULL;6. Polymorphic Associations: Avoid If Possible
Polymorphic associations (where a column can reference different tables) seem flexible but cause pain:
-- Problematic polymorphic design
CREATE TABLE comments (
id UUID PRIMARY KEY,
body TEXT,
commentable_type VARCHAR(50), -- 'Post', 'Photo', 'Video'
commentable_id UUID -- No foreign key constraint possible
);Problems with this approach:
- No foreign key constraints (database can't enforce referential integrity)
- Joins require dynamic table selection
- Queries become complex and slow
- Type safety goes out the window
Better Alternatives:
-- Option 1: Separate tables (cleaner, but more tables)
CREATE TABLE post_comments (
id UUID PRIMARY KEY,
post_id UUID REFERENCES posts(id),
body TEXT
);
CREATE TABLE photo_comments (
id UUID PRIMARY KEY,
photo_id UUID REFERENCES photos(id),
body TEXT
);
-- Option 2: Junction tables (if comments need shared identity)
CREATE TABLE comments (
id UUID PRIMARY KEY,
body TEXT
);
CREATE TABLE post_comments (
comment_id UUID PRIMARY KEY REFERENCES comments(id),
post_id UUID REFERENCES posts(id)
);
CREATE TABLE photo_comments (
comment_id UUID PRIMARY KEY REFERENCES comments(id),
photo_id UUID REFERENCES photos(id)
);7. JSON Columns: When to Use Them
Modern databases support JSON natively. It's powerful but easy to misuse:
Good Uses for JSON:
- User preferences or settings (schema varies per user)
- Third-party API response caching
- Metadata that doesn't need querying
- Flexible attributes on a per-record basis
Bad Uses for JSON:
- Core business data you'll query or join on
- Data with consistent structure (just make columns)
- Relationships between entities
- Anything requiring transactions across JSON fields
-- Good: User preferences vary widely
CREATE TABLE users (
id UUID PRIMARY KEY,
email VARCHAR(255),
preferences JSONB DEFAULT '{}'
);
-- Can still query JSON when needed
SELECT * FROM users
WHERE preferences->>'theme' = 'dark';
-- Can index JSON fields if query patterns emerge
CREATE INDEX users_theme_idx
ON users((preferences->>'theme'));8. Indexing Strategy
Indexes are the difference between millisecond and second queries. But more isn't always better:
Always Index:
- Foreign keys (joins become expensive without them)
- Columns in WHERE clauses of frequent queries
- Columns used for sorting large result sets
- Unique constraints (automatically creates index)
Consider Composite Indexes:
-- Query: SELECT * FROM orders WHERE user_id = ? AND status = ?
-- Index covers both conditions efficiently
CREATE INDEX orders_user_status_idx
ON orders(user_id, status);
-- Column order matters: leftmost columns can be used alone
-- This index helps queries on user_id alone, but not status alonePartial Indexes for Filtered Queries:
-- If you mostly query active orders
CREATE INDEX orders_active_idx
ON orders(created_at)
WHERE status = 'active';
-- Smaller index, faster queries for the common caseDon't Over-Index:
Every index slows down writes. For write-heavy tables, be judicious. Profile actual queries before adding indexes, and remove unused ones periodically.
9. Migrations: Safety First
Schema migrations on production data require care:
Safe Migration Practices:
- Backward compatible: New code must work with old schema during deployment
- Expand-contract: Add new columns first, migrate data, remove old columns later
- No long-running locks: Avoid operations that lock tables for extended periods
- Test with production data volumes: A migration that takes seconds on dev might take hours on production
-- Dangerous: Adding NOT NULL column with default
-- Locks table and rewrites all rows
ALTER TABLE users ADD COLUMN status VARCHAR(20) NOT NULL DEFAULT 'active';
-- Safe alternative: Three-step process
-- Step 1: Add nullable column (fast, no lock)
ALTER TABLE users ADD COLUMN status VARCHAR(20);
-- Step 2: Backfill in batches (no lock)
UPDATE users SET status = 'active' WHERE status IS NULL LIMIT 10000;
-- Repeat until done
-- Step 3: Add constraint (fast after backfill)
ALTER TABLE users ALTER COLUMN status SET NOT NULL;
ALTER TABLE users ALTER COLUMN status SET DEFAULT 'active';10. Connection Management
Database connections are expensive. Manage them properly:
- Use connection pooling: PgBouncer for PostgreSQL, built-in pools for most ORMs
- Size pools appropriately: More connections isn't better. Start with 10-20, adjust based on metrics.
- Handle connection errors: Connections fail. Retry with backoff.
- Close connections: Return connections to the pool promptly. Don't hold them during external calls.
// Connection pool configuration (example with Node.js pg)
const pool = new Pool({
max: 20, // Maximum connections
idleTimeoutMillis: 30000, // Close idle connections
connectionTimeoutMillis: 5000, // Timeout for new connections
});
// Always release connections
async function getUser(id) {
const client = await pool.connect();
try {
const result = await client.query('SELECT * FROM users WHERE id = $1', [id]);
return result.rows[0];
} finally {
client.release(); // Always release, even on error
}
}Practical Checklist
Before finalizing your schema, verify:
- ☐ Primary keys on every table (preferably UUIDs)
- ☐ Created_at and updated_at timestamps
- ☐ Foreign keys with appropriate ON DELETE behavior
- ☐ Indexes on foreign keys and frequent query columns
- ☐ Unique constraints where business logic requires them
- ☐ Appropriate column types (don't use TEXT for everything)
- ☐ NOT NULL constraints where nulls don't make sense
- ☐ Check constraints for enum-like values
- ☐ Migration plan that doesn't require downtime
Conclusion
Good database design is about making future changes possible, not predicting every future need. Start normalized, add indexes based on actual query patterns, and denormalize only when you have measured performance problems.
The patterns here aren't rules—they're starting points. Your specific use case might justify different choices. But understanding why these patterns exist helps you make informed decisions when you choose to deviate from them.
Need help designing your database schema or optimizing query performance? Get in touch for architecture consultation and implementation guidance.