(MySQL) Filesort

When debugging a slow-performing SQL query, I often pre-pend EXPLAIN to the query to introspect on the MySQL query optimizer's plan for it. I usually focus on the output's key, rows, and filtered columns to gauge how well the query takes advantage of table indexes, but sometimes I need to examine other columns as well. For example, what if the Extra column says Using filesort? What does "filesort" mean, and does it slow down my query?

To filesort is to sort

According to the official MySQL documentation, Using filesort shows up in the plan when:

MySQL must do an extra pass to find out how to retrieve the rows in sorted order. The sort is done by going through all rows according to the join type and storing the sort key and pointer to the row for all rows that match the WHERE clause. The keys then are sorted and the rows are retrieved in sorted order. See Section 8.2.1.16, “ORDER BY Optimization”.

In other words, when a query's ORDER BY clause doesn't cleanly map to an existing table index (or the order in which the rows are fetched), then the optimizer needs to do a separate pass to sort the rows. So really, a "filesort" is just a fancy way of saying "sort."

Where the sorting happens

When possible, the MySQL database engine strongly prefers to use a table index to return results in an order that satisfies the query's ORDER BY clause. This requires no extra pass for sorting, meaning no extra time or space is used.

Failing that, it must find space to perform the sorting. A common mistake is to assume that the relevant unsorted rows are fetched from their tables and dumped to disk or a temporary table, where they are then sorted. Nope! MySQL actually has an in-memory sort buffer specifically reserved for this. And it's not slow. (Data in the sort buffer is sorted via a quicksort.)

The buffer's maximum size is set by the customizable sort_buffer_size variable. In older MySQL versions (<8.0.12), the database engine allocates the full number of sort_buffer_size bytes upfront and sorts the data in that space, but newer versions allocate the space incrementally as needed up until the full sort_buffer_size is used.

The appropriate sort_buffer_size

When more data is retrieved than can fit inside the sort buffer, the data is sorted in chunks. The buffer is filled with as many rows as can fit, the rows are quicksorted, and then the chunk is moved to a temporary file so that the buffer can be reused to sort the next chunk. When all chunks have been sorted, a merge sort combines all the chunks together.

This means that filesort is fastest when all the data can fit completely within the sort buffer. However, setting a sort_buffer_size that's too large can slow down the entire database server. So what's the optimal value?

Luckily, MySQL maintains status variables to share how well it's operating. (These are accessible using SHOW GLOBAL STATUS or SHOW SESSION STATUS). One of these, the sort_merge_passes status variable tells us how many times the merge sort has been used, which lets us know how many times the database engine could have used a larger sort buffer to be faster. So if the value is large, increase the size of sort_buffer_size!

That said, modifying sort_buffer_size should not be the way to speed up individual low-performing queries. Tuning a system to perform optimally may be a good practice, but slow queries benefit more from query optimization or better indexing. Make sure to try one of those strategies first instead of worrying about Using filesort slowing down the query. It's not.

(MySQL) Filesort

Fresh Khale

Fresh Khale

To filesort is to sort

Where the sorting happens

The appropriate sort_buffer_size

(SQL) Alphabetizing strings

(Docker) Spin up a database

(SQL) Finding rows with the second-highest value

(Python) Topological sort

(git) Changing commit date