davidmah.com

Using Separate Tables as an Index

Indexes have a major performance cost

Take a look at these two options for how to implement table schemas for user accounts searchable by company.

    CREATE TABLE `user_accounts` (
      `email` varchar(255) NOT NULL,
      `company` varchar(255) NOT NULL,
      ...
      PRIMARY KEY (`email`),
      INDEX (`company`)
    )

And

    CREATE TABLE `user_accounts` (
      `email` varchar(255) NOT NULL,
      `company` varchar(255) NOT NULL,
      ...
      PRIMARY KEY (`email`),
    )
    CREATE TABLE `user_accounts_by_company` (
      `company` varchar(255) NOT NULL,
      `email` varchar(255) NOT NULL,
      PRIMARY KEY (`company`)
    )

The main difference to call out is that in the second schema set, instead of an index, the second table is used as for lookups of user accounts.

A line of common wisdom is that the first schema would be better. With the second schema --

The two tables can get out of sync with each other. We need a background job to keep the lookup table up to date.
To issue a lookup, we need to issue multiple queries now. One to find the emails for a given company, and another to retrieve information for those emails.

Given all of this, the second schema looks like craziness.

Sort of yes, but sort of no. The issue with the first schema is that MySQL has to use locks to keep a table in sync with its indexes. This can become quite costly for specific tables that have lots of indexes and are interwoven with other tables in transactions.

I'm not saying the second option is always better. I'm saying that there is a trade-off between feature friendliness and performance. If your feature can tolerate short periods of having the index be out-of-sync, and database utilization is going haywire from lock contention, then the trade-off swings away from builtin indexes.

Indexes do not come for free. They are an optional capability that the database offers which comes with real performance costs. It's up to you to consider how to use tools like these based on the tradeoffs appropriate for your company.

That's all for this essay. If you have a question or an interesting thought to bounce around, email me back at david@davidmah.com. I'd enjoy the chat.

This essay was published 2020-05-31.