System Design Guide

Database Indexing: Optimizing Query Performance

Database indexes are specialized data structures that dramatically improve query performance by allowing the database to locate data without scanning entire tables. Understanding how indexes work, when to use them, and their tradeoffs is fundamental to building performant database-backed applications.

How Indexes Work

Without an index, a database must perform a full table scan to find matching rows: examining every row in the table sequentially. For a table with millions of rows, this is prohibitively slow. An index provides a sorted structure that enables efficient lookups, similar to how a book’s index lets you find topics without reading every page.

Most databases implement indexes using B-tree or B+ tree data structures. These balanced tree structures maintain sorted data and allow searches, inserts, and deletes in logarithmic time. A B-tree index on a million-row table typically requires only 3-4 disk reads to locate any row, compared to potentially millions of reads for a full table scan.

When you execute a query with a WHERE clause on an indexed column, the database uses the index to quickly identify which rows match the condition. For a query like SELECT * FROM users WHERE email = '[email protected]', an index on the email column allows direct lookup instead of scanning all users.

Types of Indexes

Single-Column Indexes index one column, supporting queries filtering or sorting by that column. These are straightforward and cover many common query patterns.

Composite Indexes index multiple columns together. A composite index on (last_name, first_name) supports queries filtering by last name alone or by both last name and first name, but not efficiently for first name alone. The order of columns in composite indexes matters significantly for query performance.

Unique Indexes enforce uniqueness while providing the performance benefits of regular indexes. Primary keys automatically have unique indexes. Email addresses, usernames, and other unique identifiers benefit from unique indexes.

Partial Indexes index only rows matching a condition, like an index on active users excluding deleted users. This reduces index size and improves performance for queries on the subset.

Full-Text Indexes enable efficient text search with features like stemming, ranking, and phrase matching. These specialized indexes support search functionality that would be impractically slow with standard indexes.

Index Tradeoffs

Indexes dramatically improve read performance but come with costs. Storage overhead is significant: indexes consume disk space, sometimes exceeding the table size for heavily indexed tables. Each index is essentially a complete copy of the indexed columns in a different structure.

Write performance impact occurs because every INSERT, UPDATE, or DELETE that modifies indexed columns must also update relevant indexes. A table with ten indexes requires updating ten data structures for each write, potentially making writes ten times slower.

The key is strategic indexing: create indexes that provide substantial query improvements while minimizing write overhead. Over-indexing is as problematic as under-indexing.

Query Optimization with Indexes

The database query optimizer decides whether to use indexes based on query structure, index availability, and statistics about data distribution. Understanding how optimizers work helps you write index-friendly queries.

Index selectivity measures how unique index values are. Highly selective indexes (like unique IDs) are more beneficial than low-selectivity indexes (like boolean flags). An index on a column where 99% of values are the same provides little benefit.

Covering indexes include all columns needed by a query, allowing the database to satisfy the query entirely from the index without accessing the table. This eliminates disk I/O and can provide dramatic performance improvements.

Index hints or explicit index specification can force the optimizer to use specific indexes when it makes suboptimal choices, though this is rarely necessary with modern optimizers and good statistics.

Common Indexing Patterns

Index foreign keys used in joins to dramatically improve join performance. Index columns frequently used in WHERE clauses, especially for equality comparisons. Index columns used in ORDER BY clauses to avoid expensive sorting operations.

For composite indexes, place the most selective column first, and consider query patterns: an index on (a, b, c) supports queries filtering on (a), (a, b), or (a, b, c), but not efficiently on (b) or (c) alone.

Index Maintenance

Indexes require maintenance to remain efficient. Over time, fragmentation can degrade performance as page splits and modifications disrupt the balanced tree structure. Regular index rebuilding or reorganization restores optimal structure.

Update statistics to keep the query optimizer informed about data distribution. Stale statistics lead to poor execution plans. Most databases can automatically update statistics, but critical tables may benefit from manual statistics management.

Monitor index usage to identify unused indexes consuming resources without providing benefits. Most databases provide queries to identify indexes that are maintained but never used, which can be safely dropped to improve write performance.

Best Practices

Start with indexes on primary keys and foreign keys. Add indexes based on actual query patterns, not speculation. Use query analysis tools to identify slow queries and EXPLAIN plans to understand how queries use indexes.

Avoid redundant indexes: an index on (a, b) makes an index on (a) redundant. Test index impact on both read and write performance before deploying to production. Consider the write-heavy versus read-heavy nature of your workload when deciding how aggressively to index.

For large tables, create indexes concurrently when supported to avoid locking the table during index creation. Monitor index size and growth to ensure adequate storage capacity.

Indexes are powerful tools for database performance optimization, but they require thoughtful application. Understanding their mechanics, costs, and proper usage patterns enables you to design database schemas that perform well at scale while maintaining efficient write operations. The goal is strategic indexing that provides maximum benefit with minimal overhead.