Database Design Patterns for Modern Applications

December 15, 2025•11 min read

Database design decisions made early in a project's life have an outsized impact on its future. A well-designed schema enables features; a poorly designed one blocks them. After working with databases at various scales—from MVPs to systems handling millions of requests—here are the patterns that consistently prove their worth.

1. Choose the Right Database Type First

The SQL vs. NoSQL debate misses the point. The question is: what are your access patterns?

Choose Relational (PostgreSQL, MySQL) When:

Data has complex relationships that benefit from joins
You need ACID transactions across multiple tables
Query patterns are varied and unpredictable
Data integrity constraints are critical
You need powerful ad-hoc reporting

Choose Document (MongoDB, DynamoDB) When:

Data is naturally hierarchical or nested
Access patterns are known and limited
Schema needs to evolve frequently
Horizontal scaling is a primary requirement
You're denormalizing anyway for performance

Choose Key-Value (Redis, DynamoDB) When:

Simple lookups by key dominate
Ultra-low latency is required
Data is ephemeral (sessions, caches)
High write throughput is critical

The pragmatic choice: PostgreSQL handles 90% of use cases well. Start there unless you have specific requirements that demand something else. It's relational when you need it, supports JSON when you don't, and scales further than most applications will ever need.

2. Normalize Until It Hurts, Then Denormalize Strategically

Start with normalized schemas. Third normal form (3NF) prevents data anomalies and keeps your schema flexible. Denormalization is an optimization—don't apply it prematurely.

-- Normalized (correct starting point)
CREATE TABLE users (
  id UUID PRIMARY KEY,
  email VARCHAR(255) UNIQUE,
  name VARCHAR(255)
);

CREATE TABLE orders (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  created_at TIMESTAMP,
  total_cents INTEGER
);

CREATE TABLE order_items (
  id UUID PRIMARY KEY,
  order_id UUID REFERENCES orders(id),
  product_id UUID REFERENCES products(id),
  quantity INTEGER,
  price_cents INTEGER
);

When to denormalize: Only when you have measured performance problems. Common cases:

Read-heavy dashboards that join many tables
Frequently accessed aggregations (order totals, user statistics)
Search indexes that need flattened data

-- Denormalized for read performance (add when needed)
ALTER TABLE orders ADD COLUMN user_email VARCHAR(255);
ALTER TABLE orders ADD COLUMN item_count INTEGER;

-- Keep denormalized fields updated via triggers or application logic
-- Document WHY the denormalization exists

3. Primary Keys: UUIDs vs. Sequential IDs

This debate is more nuanced than it appears:

Sequential IDs (SERIAL, AUTO_INCREMENT)

Pros: Smaller storage, better index performance, naturally ordered
Cons: Expose record counts, require database for ID generation, problematic for distributed systems

UUIDs

Pros: Generate anywhere, no central coordination, hide business information
Cons: Larger storage (16 bytes vs 4-8), random distribution hurts index performance

ULIDs / UUID v7

Pros: Sortable by creation time, generate anywhere, good index locality
Cons: Slightly more complex to generate

Recommendation: Use UUIDv7 or ULID for new projects. They provide the benefits of UUIDs (no coordination needed, hide information) while maintaining sortability that helps index performance. If you're using PostgreSQL 17+, UUIDv7 is natively supported.

-- PostgreSQL with UUID
CREATE EXTENSION IF NOT EXISTS "pgcrypto";

CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  -- other fields
);

-- For UUID v7 (sortable), use application-generated IDs
-- or a PostgreSQL extension that supports it

4. Timestamps: More Than Created/Updated

Every table should have audit timestamps, but think beyond the basics:

CREATE TABLE orders (
  id UUID PRIMARY KEY,

  -- Basic audit timestamps
  created_at TIMESTAMP NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMP NOT NULL DEFAULT NOW(),

  -- Soft delete
  deleted_at TIMESTAMP,

  -- Business state timestamps
  submitted_at TIMESTAMP,
  confirmed_at TIMESTAMP,
  shipped_at TIMESTAMP,
  delivered_at TIMESTAMP,

  -- Who made changes
  created_by UUID REFERENCES users(id),
  updated_by UUID REFERENCES users(id)
);

-- Auto-update updated_at
CREATE OR REPLACE FUNCTION update_timestamp()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at = NOW();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER orders_updated_at
  BEFORE UPDATE ON orders
  FOR EACH ROW
  EXECUTE FUNCTION update_timestamp();

Always use UTC. Store timestamps without timezone (TIMESTAMP, not TIMESTAMPTZ) in UTC. Convert to local time in the application layer where you have user context.

5. Soft Deletes vs. Hard Deletes

The choice matters more than people realize:

Use Soft Deletes When:

Data might need to be restored
Audit trails are required
Foreign key relationships would break
You need to answer "what did this look like before?"

Use Hard Deletes When:

Data retention is a liability (GDPR, privacy laws)
Storage costs matter at scale
The data genuinely has no future value

Implementation tip: If using soft deletes, create a partial unique index to prevent duplicates among active records:

-- Soft delete with unique constraint on active records
CREATE UNIQUE INDEX users_email_unique
  ON users(email)
  WHERE deleted_at IS NULL;

-- Query helper: default to excluding deleted
CREATE VIEW active_users AS
  SELECT * FROM users WHERE deleted_at IS NULL;

6. Polymorphic Associations: Avoid If Possible

Polymorphic associations (where a column can reference different tables) seem flexible but cause pain:

-- Problematic polymorphic design
CREATE TABLE comments (
  id UUID PRIMARY KEY,
  body TEXT,
  commentable_type VARCHAR(50),  -- 'Post', 'Photo', 'Video'
  commentable_id UUID            -- No foreign key constraint possible
);

Problems with this approach:

No foreign key constraints (database can't enforce referential integrity)
Joins require dynamic table selection
Queries become complex and slow
Type safety goes out the window

Better Alternatives:

-- Option 1: Separate tables (cleaner, but more tables)
CREATE TABLE post_comments (
  id UUID PRIMARY KEY,
  post_id UUID REFERENCES posts(id),
  body TEXT
);

CREATE TABLE photo_comments (
  id UUID PRIMARY KEY,
  photo_id UUID REFERENCES photos(id),
  body TEXT
);

-- Option 2: Junction tables (if comments need shared identity)
CREATE TABLE comments (
  id UUID PRIMARY KEY,
  body TEXT
);

CREATE TABLE post_comments (
  comment_id UUID PRIMARY KEY REFERENCES comments(id),
  post_id UUID REFERENCES posts(id)
);

CREATE TABLE photo_comments (
  comment_id UUID PRIMARY KEY REFERENCES comments(id),
  photo_id UUID REFERENCES photos(id)
);

7. JSON Columns: When to Use Them

Modern databases support JSON natively. It's powerful but easy to misuse:

Good Uses for JSON:

User preferences or settings (schema varies per user)
Third-party API response caching
Metadata that doesn't need querying
Flexible attributes on a per-record basis

Bad Uses for JSON:

Core business data you'll query or join on
Data with consistent structure (just make columns)
Relationships between entities
Anything requiring transactions across JSON fields

-- Good: User preferences vary widely
CREATE TABLE users (
  id UUID PRIMARY KEY,
  email VARCHAR(255),
  preferences JSONB DEFAULT '{}'
);

-- Can still query JSON when needed
SELECT * FROM users
WHERE preferences->>'theme' = 'dark';

-- Can index JSON fields if query patterns emerge
CREATE INDEX users_theme_idx
ON users((preferences->>'theme'));

8. Indexing Strategy

Indexes are the difference between millisecond and second queries. But more isn't always better:

Always Index:

Foreign keys (joins become expensive without them)
Columns in WHERE clauses of frequent queries
Columns used for sorting large result sets
Unique constraints (automatically creates index)

Consider Composite Indexes:

-- Query: SELECT * FROM orders WHERE user_id = ? AND status = ?
-- Index covers both conditions efficiently
CREATE INDEX orders_user_status_idx
ON orders(user_id, status);

-- Column order matters: leftmost columns can be used alone
-- This index helps queries on user_id alone, but not status alone

Partial Indexes for Filtered Queries:

-- If you mostly query active orders
CREATE INDEX orders_active_idx
ON orders(created_at)
WHERE status = 'active';

-- Smaller index, faster queries for the common case

Don't Over-Index:

Every index slows down writes. For write-heavy tables, be judicious. Profile actual queries before adding indexes, and remove unused ones periodically.

9. Migrations: Safety First

Schema migrations on production data require care:

Safe Migration Practices:

Backward compatible: New code must work with old schema during deployment
Expand-contract: Add new columns first, migrate data, remove old columns later
No long-running locks: Avoid operations that lock tables for extended periods
Test with production data volumes: A migration that takes seconds on dev might take hours on production

-- Dangerous: Adding NOT NULL column with default
-- Locks table and rewrites all rows
ALTER TABLE users ADD COLUMN status VARCHAR(20) NOT NULL DEFAULT 'active';

-- Safe alternative: Three-step process
-- Step 1: Add nullable column (fast, no lock)
ALTER TABLE users ADD COLUMN status VARCHAR(20);

-- Step 2: Backfill in batches (no lock)
UPDATE users SET status = 'active' WHERE status IS NULL LIMIT 10000;
-- Repeat until done

-- Step 3: Add constraint (fast after backfill)
ALTER TABLE users ALTER COLUMN status SET NOT NULL;
ALTER TABLE users ALTER COLUMN status SET DEFAULT 'active';

10. Connection Management

Database connections are expensive. Manage them properly:

Use connection pooling: PgBouncer for PostgreSQL, built-in pools for most ORMs
Size pools appropriately: More connections isn't better. Start with 10-20, adjust based on metrics.
Handle connection errors: Connections fail. Retry with backoff.
Close connections: Return connections to the pool promptly. Don't hold them during external calls.

// Connection pool configuration (example with Node.js pg)
const pool = new Pool({
  max: 20,                    // Maximum connections
  idleTimeoutMillis: 30000,   // Close idle connections
  connectionTimeoutMillis: 5000, // Timeout for new connections
});

// Always release connections
async function getUser(id) {
  const client = await pool.connect();
  try {
    const result = await client.query('SELECT * FROM users WHERE id = $1', [id]);
    return result.rows[0];
  } finally {
    client.release(); // Always release, even on error
  }
}

Practical Checklist

Before finalizing your schema, verify:

☐ Primary keys on every table (preferably UUIDs)
☐ Created_at and updated_at timestamps
☐ Foreign keys with appropriate ON DELETE behavior
☐ Indexes on foreign keys and frequent query columns
☐ Unique constraints where business logic requires them
☐ Appropriate column types (don't use TEXT for everything)
☐ NOT NULL constraints where nulls don't make sense
☐ Check constraints for enum-like values
☐ Migration plan that doesn't require downtime

Conclusion

Good database design is about making future changes possible, not predicting every future need. Start normalized, add indexes based on actual query patterns, and denormalize only when you have measured performance problems.

The patterns here aren't rules—they're starting points. Your specific use case might justify different choices. But understanding why these patterns exist helps you make informed decisions when you choose to deviate from them.

Need help designing your database schema or optimizing query performance? Get in touch for architecture consultation and implementation guidance.