When performance problems appear in a database, the first reaction is often:

“Let’s add an index.”

Sometimes that works. Sometimes it changes nothing. And in some cases, it even makes the query slower.

The problem is not indexes themselves — it’s the misunderstanding of how indexes actually work.

This article explains database indexing in a practical and beginner-friendly way using simple SQL examples. Instead of memorizing rules, you’ll understand why indexes help, when they fail, and how to design them correctly.

The Example Table

Let’s start with a simple orders table containing hundreds of thousands of rows.

CREATE TABLE orders
(
    id                BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    firstName         VARCHAR(255) NOT NULL,
    lastName          VARCHAR(255) NOT NULL,
    address           VARCHAR(255) NOT NULL,
    status            ENUM ('created', 'preparing', 'prepared', 'shipped', 'delivered') NOT NULL,
    canPreparingStart TINYINT(1) NOT NULL,
    created_at        TIMESTAMP NULL,
    updated_at        TIMESTAMP NULL,
    deleted_at        TIMESTAMP NULL
)
COLLATE = utf8mb4_unicode_ci;

Now imagine the application frequently runs this query:

SELECT * 
FROM orders 
WHERE firstName = 'Pavel';

At first glance, the query looks harmless.

But with hundreds of thousands (or millions) of rows, performance starts becoming a problem.

Understanding the Problem with EXPLAIN

Before optimizing queries, we should understand how the database executes them.

That’s where EXPLAIN becomes useful.

EXPLAIN
SELECT * 
FROM orders 
WHERE firstName = 'Pavel';

Without an index, the database performs a full table scan.

That means:

Every row is read
Every row is checked
The database scans the entire table

Even if only a few rows match.

This is expensive.

What Is an Index?

An index helps the database find rows faster without scanning the entire table.

The most common type is the B-Tree index.

Think of it like a phone book.

If you want to find someone named “Pavel”:

You don’t start reading from page 1
You jump somewhere near the middle
Then narrow the search step by step

That’s essentially how a B-Tree index works.

Adding the First Index

Let’s create an index on firstName.

CREATE INDEX orders_firstName_index 
ON orders (firstName);

Now run the same query again:

SELECT * 
FROM orders 
WHERE firstName = 'Pavel';

Performance improves dramatically because the database now performs an:

Index lookup
instead of a
Full table scan

The index allows the database to jump directly to matching rows.

B-Tree Indexes Work Left to Right

At this point, many developers think:

“Great. Indexes solved the problem.”

But here comes the first important limitation.

Consider this query:

SELECT * 
FROM orders 
WHERE firstName LIKE '%vel';

Or:

SELECT * 
FROM orders 
WHERE firstName LIKE '%ave%';

Even with the index present, the database often ignores it and performs a full table scan.

Why?

Because B-Tree indexes work from left to right.

Why Leading Wildcards Break Indexes

A B-Tree index stores data in sorted order.

For example:

Adam
Alex
Daniel
Pavel
Peter

The database can efficiently search:

LIKE 'Pa%'

because it knows where names starting with “Pa” begin.

But with:

LIKE '%vel'

there’s no starting point.

The database has no choice but to scan everything.

What B-Tree Indexes Handle Well

B-Tree indexes are excellent for:

Exact matches

WHERE firstName = 'Pavel'

Prefix searches

WHERE firstName LIKE 'Pa%'

Range queries

WHERE created_at BETWEEN '2025-01-01' AND '2025-12-31'

What B-Tree Indexes Do NOT Handle Well

B-Tree indexes struggle with:

Suffix searches

LIKE '%text'

Contains searches

LIKE '%text%'

Functions on columns

WHERE LOWER(firstName) = 'pavel'

Different Queries Need Different Indexes

This does not mean those queries cannot be optimized.

It simply means they require different index types, such as:

Full-text indexes
Functional indexes
Trigram indexes

The key lesson:

B-Tree indexes are powerful, but they are not universal solutions.

The Second Limitation: Data Distribution

Now let’s look at another example.

Imagine this query:

SELECT *
FROM orders
WHERE deleted_at IS NULL
LIMIT 20000 OFFSET 0;

A common assumption is:

“Let’s index deleted_at.”

So we do:

CREATE INDEX orders_deleted_at_index
ON orders (deleted_at);

Surprisingly, performance may actually become worse.

Why?

The Data Matters More Than the Index

Suppose almost every row has:

deleted_at = NULL

This means the column has low selectivity.

In simple terms:

The index cannot effectively narrow down results
Too many rows match the condition

The database still has to process a huge amount of data.

Phone Book Analogy Again

Imagine a phone book where almost everyone has the same name.

Even though the book is sorted, searching becomes inefficient because the result set is massive.

That’s exactly what happens with low-selectivity indexes.

High Selectivity Makes Indexes Powerful

Now imagine we soft-delete most records.

Suddenly:

WHERE deleted_at IS NULL

matches only a small percentage of rows.

Now the index becomes extremely useful because it can quickly isolate a tiny subset of records.

This is why:

Good indexing is not only about queries — it’s also about data distribution.

Composite Indexes

Now let’s move to composite indexes.

Suppose we frequently run:

SELECT *
FROM orders
WHERE firstName = 'Pavel'
AND lastName = 'Komin';

A composite index can help significantly.

CREATE INDEX orders_first_last_index
ON orders (firstName, lastName);

This is usually much better than creating two separate indexes.

Why Composite Indexes Work Better

With separate indexes:

INDEX(firstName)
INDEX(lastName)

the database may still need to combine results internally.

But with a composite index:

(firstName, lastName)

the database can directly navigate to the exact combination.

This reduces work dramatically.

Column Order Is Critical

Composite indexes still follow the same left-to-right rule.

So this index:

(firstName, lastName)

works well for:

WHERE firstName = 'Pavel'

and:

WHERE firstName = 'Pavel'
AND lastName = 'Komin'

But not efficiently for:

WHERE lastName = 'Komin'

because the search starts from the second column, breaking the left-to-right chain.

Choosing Column Order Correctly

The first column in a composite index should ideally be:

Frequently queried
Highly selective

This allows the database to eliminate as many rows as possible early in the search process.

The Biggest Indexing Mistake

One of the most common mistakes is creating indexes without understanding:

Query patterns
Data distribution
Selectivity
Index structure

Indexes are not magic.

Poorly designed indexes can:

Waste storage
Slow down inserts and updates
Increase maintenance overhead
Make queries slower

Key Takeaways

1. B-Tree indexes work left to right

They are optimized for:

Exact matches
Prefix searches
Range queries

2. Leading wildcards break B-Tree indexes

These queries are problematic:

LIKE '%text'
LIKE '%text%'

3. Data selectivity matters

Indexes on low-diversity columns often provide little benefit.

4. Composite indexes depend on column order

The first column is the most important.

5. Good indexing requires understanding your workload

Always analyze:

Query frequency
Filtering patterns
Cardinality/selectivity
Execution plans (EXPLAIN)

before adding indexes.

Final Thoughts

Indexes are one of the most powerful tools for database optimization — but only when used correctly.

Understanding:

how B-Tree indexes work,
why selectivity matters,
and how composite indexes behave

will help you optimize databases far more effectively than simply “adding indexes everywhere.”

The best database engineers don’t just create indexes.

They understand why the database chooses to use them — or ignore them.

shashi

Thursday, May 21, 2026

Database Performance Tuning with Indexes: What Most Developers Get Wrong