Improving DISTINCT Query Performance Up to 8,000x on PostgreSQL

PostgreSQL is an amazing database, but it can struggle with certain types of queries, especially as tables approach tens and hundreds of millions of rows (or more). DISTINCT queries are an example of this.

Why are DISTINCT queries slow on PostgreSQL when they seem to ask an "easy" question? It turns out that PostgreSQL currently lacks the ability to efficiently pull a list of unique values from an ordered index.

Even when you have an index that matches the exact order and columns for these "last-point" queries, PostgreSQL is still forced to scan the entire index to find all unique values. As a table grows (and they grow quickly with time-series data), this operation keeps getting slower.

Other databases like MySQL, Oracle, and DB2 implement a feature called "Loose indexscan," "Index Skip Scan," or “Skip Scan” to speed up the performance of queries like this.

When a database has a feature like "Skip Scan," it can incrementally jump from one ordered value to the next without reading all of the rows in between. Without support for this feature, the database engine has to scan the entire ordered index and then deduplicate it at the end—which is a much slower process.

Since 2018, there have been plans to support something similar in PostgreSQL. (Note: We couldn’t use this implementation directly due to some limitations of what is possible within the Postgres extension framework.)

Unfortunately, this patch wasn't included in the CommitFest for PostgreSQL 14, so it won't be included until PostgreSQL 15 at the earliest (i.e., no sooner than Fall 2022, at least 1.5 years from now).

We don’t want our users to have to wait that long.

What is Timescale's Skip Scan?

Today, via TimescaleDB 2.2.1, we are releasing TimescaleDB Skip Scan, a custom query planner node that makes ordered DISTINCT queries blazing fast in PostgreSQL 🔥.

As you'll see in the benchmarks below, some queries performed more than 8,000x better than before—and many of the SQL queries your applications and analytics tools use could also see dramatic improvements with this new feature.

This feature works in both Timescale hypertables and distributed hypertables, and normal PostgreSQL tables.

This means that with Timescale, not only will your time-series DISTINCT queries be faster, but any other related queries you may have on normal PostgreSQL tables will also be faster.

This is because Timescale is not just a time-series database. It’s a relational database, specifically, a relational database for time series. Developers who use Timescale benefit from a purpose-built time-series database plus a classic relational (Postgres) database, all in one, with full SQL support.

And to be clear, we love PostgreSQL. We employ engineers who contribute to PostgreSQL. We contribute to the ecosystem around PostgreSQL. PostgreSQL is the world’s fastest-growing database, and we are excited to support it alongside thousands of other users and contributors.

We constantly seek to advance the state of the art with databases, and features like Skip Scan are only our latest contribution to the industry. Skip Scan makes Timescale better and PostgreSQL a better, more competitive database overall, especially compared to MySQL, Oracle, DB2, and others.

How to check (and optimize) your query performance in PostgreSQL

If you're new to PostgreSQL and are wondering how to check your query performance in the first place (and optimize it!), we're going to leave two helpful resources here:

This beginner's guide to EXPLAIN ANALYZE by Michael Christofides in one of our Timescale Community Days.
And our blog post on using pg_stat_statements to optimize queries.

Optimizing DISTINCT query performance: What about RECURSIVE CTEs?

However, if you're an experienced PostgreSQL user, you might point out that it is already possible to get reasonably fast DISTINCTqueries via RECURSIVE CTEs.

From the PostgreSQL Wiki, using a RECURSIVE CTE can get you good results, but writing these kinds of queries can often feel cumbersome and unintuitive, especially for developers new to PostgreSQL:

WITH RECURSIVE cte AS (
   (SELECT tags_id FROM cpu ORDER BY tags_id, time DESC LIMIT 1)
   UNION ALL
   SELECT (
      SELECT tags_id FROM cpu
      WHERE tags_id > t.tags_id 
      ORDER BY tags_id, time DESC LIMIT 1
   )
   FROM cte t
   WHERE t.tags_id IS NOT NULL
)
SELECT * FROM cte LIMIT 50;

But even if writing a RECURSIVE CTE like this in day-to-day querying felt natural to you, there's a bigger problem. Most application developers, ORMs, and charting tools like Grafana or Tableau will still use the simpler, straight-forward form:

SELECT DISTINCT ON (tags_id) * FROM cpu
WHERE tags_id >=1 
ORDER BY tags_id, time DESC
LIMIT 50;

In PostgreSQL, without a "skip scan" node, this query will perform the much slower Index Only Scan, causing your applications and graphing tools to feel clunky and slow.

Surely there's a better way, right?

Skip Scan Is the Way

Skip Scan is an optimization for queries in the form of SELECT DISTINCT ON (column). Conceptually, a Skip Scan is a regular IndexScan but that “skips” across an index looking for the next value that is greater than the current value:

Illustration of how a Skip Scan search works on a Btree index — *Skip Scan: An index scan that “skips” across an index looking for the next greater value*

With Skip Scan in Timescale/PostgreSQL, query planning and execution can now utilize a new node (displayed as (SkipScan) in the EXPLAIN output) to quickly return distinct items from a properly ordered index.

Rather than scanning the entire index with an Index Only Scan, Skip Scan incrementally searches for each successive item in the ordered index. As it locates one item, the (SkipScan) node quickly restarts the search for the next item. This is a much more efficient way of finding distinct items in an ordered index. (See GitHub for more details.)

Benchmarking TimescaleDB Skip Scan vs. a Normal PostgreSQL Index Scan

In every example query, Timescale with Skip Scan improved query response times by at least 26x.

But, the real surprise is how much of a difference it makes at lower cardinalities with lots of data—almost 8,500x faster to retrieve all columns for the most recent reading of each device. That's fast!

Mandolorian Razor Crest being chased by X-wing fighters

In our tests, Skip Scan is also consistently faster—by 80x or more—in our 4,000 device benchmarks. (This level of cardinality is typical for many users of Timescale.)

Before we share the full results, here is how our benchmark was set up.

Benchmark setup

To perform our benchmarks, we installed Timescale on a DigitalOcean Droplet using the following specifications. PostgreSQL and Timescale were installed from packages, and we applied the recommended tuning from timescaledb-tune.

8 Intel vCPUs
16 GB of RAM
320 GB NVMe SSD
Ubuntu 20.04 LTS
Postgres 12.6
TimescaleDB 2.2 (The first release with Skip Scan. TimescaleDB 2.2.1 primarily adds distributed hypertable support and some bug fixes.)

To demonstrate the performance impact of Skip Scan on varying degrees of cardinality, we benchmarked three separate datasets of varying sizes. To generate our datasets, we used the 'cpu-only' use case in the Time Series Benchmark Suite (TSBS), which creates 10 metrics every 10 seconds for each device (identified by the tag_id in our benchmark queries).

Dataset 1	Dataset 2	Dataset 3
100 devices	4000 devices	10,000 devices
4 months of data	4 days of data	36 hours of data
~103,000,000 rows	~103,000,000 rows	~144,000,000 rows

Additional data preparation

Not all device data is up-to-date in real life because devices go offline and internet connections get interrupted. Therefore, to simulate a more realistic scenario (i.e., that some devices had stopped reporting for a period of time), we deleted rows for random devices over each of the following periods.

Dataset 1	Dataset 2	Dataset 3
5 random devices over:	100 random devices over:	250 random devices over:
30 minutes	1 hour	10 minutes
36 hours	12 hours	1 hour
7 days	36 hours	12 hours
1 month	3 days	24 hours

To delete the data, we utilized the tablesample function of Postgres. This SELECT feature allows you to return a random sample of rows from a table based on a percentage of the total rows. In the example below, we randomly sample 10% of the rows ( bernoulli(10) ) and then take the first 10 ( limit 10 ).

DELETE FROM cpu
WHERE tags_id IN 
  (SELECT id FROM tags tablesample bernoulli(10) LIMIT 10)
  AND time >= now() - INTERVAL '30 minutes';

From there, we ran each benchmarking query multiple times to accommodate for caching, with and without Skip Scan enabled.

As mentioned earlier, the following two indexes were present on the hypertable for all queries.

"cpu_tags_id_time_idx" btree (tags_id, "time" DESC)
"cpu_time_idx" btree ("time" DESC)

Benchmark results

Here are the results:

*TimescaleDB with Skip Scan improved the query response by at least 26x, up to 8500x in some cases.*

About the Queries Benchmarked

For this test, we benchmarked five types of common queries:

Scenario #1: What was the last reported time of each device in a paged list?

SELECT DISTINCT ON (tags_id) tags_id, time FROM cpu
ORDER BY tags_id, time DESC
LIMIT 10 OFFSET 50;

Scenario #2: What was the time and most recently reported set of values for each device in a paged list?

SELECT DISTINCT ON (tags_id) * FROM cpu
ORDER BY tags_id, time DESC
LIMIT 10 OFFSET 50;

Scenario #3: What is the most recent point for all reporting devices in the last 5 minutes?

SELECT DISTINCT ON (tags_id) * FROM cpu 
WHERE time >= now() - INTERVAL '5 minutes' 
ORDER BY tags_id, time DESC;

Scenario #4: Which devices reported at some time today but not within the last hour?

WITH older AS (
  SELECT DISTINCT ON (tags_id) tags_id FROM cpu 
  WHERE time > now() - INTERVAL '24 hours'
)                                          
SELECT * FROM older o 
WHERE NOT EXISTS (
  SELECT 1 FROM cpu 
  WHERE cpu.tags_id = o.tags_id 
  AND time > now() - INTERVAL '1 hour'
);

Scenario #5: Which devices reported yesterday but not in the last 24 hours?

WITH older AS (
  SELECT DISTINCT ON (tags_id) tags_id FROM cpu 
  WHERE time > now() - INTERVAL '48 hours'
  AND time < now() - INTERVAL '24 hours'
)                                          
SELECT * FROM older o 
WHERE NOT EXISTS (
  SELECT 1 FROM cpu 
  WHERE cpu.tags_id = o.tags_id 
  AND time > now() - INTERVAL '24 hour'
);

How Will Your Application Improve?

But, Skip Scan isn’t a theoretical improvement reserved for benchmarking blog posts 😉—it has real-world implications, and many applications we use rely on getting this data as fast as possible.

Think about the applications you use (or develop) every day. Do they retrieve paged lists of unique items from database tables to fill dropdown options (or grids of data)?

At a few thousand items, the query latency might not be very noticeable. But, as your data grows and you have millions of rows of data and tens of thousands of distinct items, that dropdown menu might take seconds—or minutes—to populate.

Skip Scan can reduce that to tens of milliseconds!

Even better, Skip Scan also provides a fast, efficient way of answering the question that so many people with time-series data ask every day:

"What was the last time and value recorded for each of my [devices / users / services / crypto and stock investments / etc]?"

As long as there is an index on "device_id" and "time" descending, Skip Scan will retrieve the data using a query like this much more efficiently.

SELECT DISTINCT ON (device_id) * FROM cpu 
ORDER BY device_id, time DESC;

With Skip Scan, your application and dashboards that rely on these types of queries will now load a whole lot faster 🚀 (see below).

*TimescaleDB 2.2 **with** Skip Scan enabled runs in less than 400 ms*

*TimescaleDB 2.2 without Skip Scan enabled runs in 23 seconds*

How to Use Skip Scan on Timescale

How do you get started? Upgrade to TimescaleDB 2.2.1 and set up your schema and indexing as described below. You should start to see immediate speed improvements in many of your DISTINCT queries.

To ensure that a (SkipScan) node can be chosen for your query plan:

First, the query must use the DISTINCT keyword on a single column. The benchmarking queries above will give you some examples to draw from.

Second, there must be an index that contains the DISTINCT column first, and any other ORDER BY columns. Specifically:

The index needs to be a BTREE index.
The index needs to match the ORDER BY in your query.
The DISTINCT column must either be the first column of the index, or any leading column(s) must be used as constraints in your query.

In practice, this means that if we use the questions from the beginning of this blog post ("retrieve a list of unique IDs in order" and "retrieve the last reading of each ID"), we would need at least one index like this (but if you're using a TimescaleDB hypertable, this likely already exists):

 "cpu_tags_id_time_idx" btree (tags_id, "time" DESC)

With that index in place, you should start to see immediate benefit if your queries look similar to the benchmarking examples below. When Skip Scan is chosen for your query, the EXPLAIN ANALYZE output will show one or more Custom Scan (SkipScan) nodes similar to this:

->  Unique
  ->  Merge Append
    Sort Key: _hyper_8_79_chunk.tags_id, _hyper_8_79_chunk."time" DESC
     ->  Custom Scan (SkipScan) on _hyper_8_79_chunk
      ->  Index Only Scan using _hyper_8_79_chunk_cpu_tags_id_time_idx on _hyper_8_79_chunk
          Index Cond: (tags_id > NULL::integer)
     ->  Custom Scan (SkipScan) on _hyper_8_80_chunk
      ->  Index Only Scan using _hyper_8_80_chunk_cpu_tags_id_time_idx on _hyper_8_80_chunk
         Index Cond: (tags_id > NULL::integer)
...

All about Databases - Oracle, PostgeSQL, Mysql, Big Data .....

Thursday, October 19, 2023