Creating useful MySQL indexes – InnoDB & MyISAM using B-Tree

Any database developer understands that indexing is extremely important for optimal query performance. Here we’ll attempt to get beyond adding one column per index to being more purposeful about intelligent index creation.

In order to be able to efficiently index you have to understand two important points: how your index type (B-Tree, Hash, etc) works and how your data is going to be accessed.

Using B-Tree Indexes

B-Tree indexes are the default index type for both the MyISAM and InnoDB storage engines, so if you’re not explicitly specifying an index type B-Tree is typically the type being used. (InnoDB technically uses a B+Tree structure) Without getting into the actual implementation of the structure, the important thing to know about a B-Tree index is columns are stored in order, based on the creation of the index. For an indexed column to be useful, all indexed columns further left must be referenced.

Suppose you have the following table:

title varchar(125) not null,
rating enum('G','PG','PG-13','R','NC-17','NR') not null,
runtime smallint not null,
release_date date not null,
synopsis text,
key(title, rating, runtime)

The multi-column key (title, rating, runtime) would be helpful for the following query types:

Finding rows with the title “Aliens” with an “R” rating and a runtime of 137 (all columns)
Finding rows with the title “Aliens” with an “R” rating (first two columns)
Finding rows between “Aliens” and “Fifth Element” (Matching a range)

It would not optimize the following:

Finding rows where title ends in a specific character or characters
Finding rows by rating, runtime or both (doesn’t use the leftmost column)

It would provide a limited optimization for:

Finding rows with the title “Batman” and a runtime of 120 (optimizes title, but not runtime, as rating is skipped)
Finding rows that start with “B” and have a “R” rating (conditions after a range (LIKE) can not be optimized)

As you can see, all the discussed limitations of a B-Tree index have to do with column order. To optimally index high performance tables, it may be necessary to create multiple indexes with the same columns, only in a different order.

Covering Indexes

B-Tree indexes have the opportunity to be used as covering indexes since they actually contain the data from the columns they’re indexing. Typically, an index provides a shortcut to access a row, but when a query only requires data already stored in the index, MySQL can return the data directly from the index rather than continuing on to the row. This can provide a dramatic performance improvement.