DATA ENGINEERING, TIMESERIES

QuestDB vs. TimescaleDB

How to use the Time Series Benchmark Suite to compare the database read and write performance of QuestDB versus TimescaleDB

Kovid Rathee

Published in

Towards Data Science

6 min readAug 18, 2021

Introduction

In a world more connected than ever, billions of users are generating more data than ever. From human communication to the digital footprint we create, IoT sensors have become ubiquitous, and financial transactions are digitized. We have an explosion of the volume of time-centric data, and we are struggling to keep up with it all. Timeseries databases are on the rise. OSS projects like QuestDB, InfluxDB, TimescaleDB, and cloud-based solutions like Amazon Timestream, etc., are in higher demand than ever. Timeseries databases have officially come of age.

All these products compete for more space in the time-series domain, and in doing that, they’re making each other better. This article will look at two major timeseries databases and compare them using an open-source benchmarking tool called TSBS — Time Series Benchmarking Suite. This benchmarking suite is based on the testing scripts originally developed at InfluxDB, later enhanced by other major timeseries databases, and currently maintained by TimescaleDB.

What is the Time Series Benchmark Suite (TSBS)?

For traditional databases like MySQL and PostgreSQL, many popular options like HammerDB and sysbench are standard tools to measure database read and write performance. Similar tools exist for different types of databases. Performance testing makes sense when the benchmarking tool simulates real-life scenarios by creating realistic bursts and reading streams. The access pattern for timeseries databases is very different from a traditional database — that is why we need a tool like TSBS.

TSBS currently supports two kinds of loads:

IoT — emulates the IoT data generated from the sensors of a trucking company. Imagine tracking a trucking fleet with real-time diagnostic data from every truck in your fleet.
DevOps — emulates data usage generated by a server that tracks memory usage, CPU usage, disk usage, and so on. Imagine looking at the Grafana dashboard with these metrics and getting breach alerts.

Prerequisites

For this tutorial, we will be using the DevOps option. First, you need to follow these steps:

Install Docker on your machine to install and run QuestDB and TimescaleDB from Docker Hub.
Clone the TSBS repository — Once all three are up and running, clone the TSBS repository on your machine.

Note: This benchmark run was completed on a 16-core Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz with 128 GB RAM on AWS EC2.

Using the TSBS for Testing Time Series Database Performance

We will test the performance of these two databases in four phases:

Generate DevOps data for one day, where nine different metrics are collected every 10 seconds for 200 devices. The data will be generated separately for QuestDB and TimescaleDB based on their respective formats. Use the tsbs_generate_data utility for this.
Load the generated data. Use the tsbs_load_questdb and tsbs_load utilities to load data into QuestDB and TimescaleDB, respectively. This allows us to test the ingestion and write speeds of each system.
Generate queries to run on the loaded data for QuestDB and TimescaleDB. Use the tsbs_generate_queries utility for this.
Execute generated queries on QuestDB and TimescaleDB using tsbs_run_queries_questdb and tsbs_run_queries_timescaledb respectively.

Let’s go through the scripts for each of these steps one by one.

Generating the Test Data

For the scope of this tutorial, we’ll limit the scale of the benchmark run to 200 devices. As mentioned above, the data will be generated for one day, tracking nine metrics for every one of the hundred devices every 10 seconds. Using the tsbs_generate_data command, you can generate test data for any of the supported databases and use cases.

Note: The data generated can occupy a lot of space. You can scale up or down based on your benchmarking requirements and availability of disk space.

Note: The files generated for the data will be of different sizes because of the different formats used by TimescaleDB and QuestDB. QuestDB uses Influx Line Protocol format which is much lighter than any other format out there.

Load Data

Loading the data is even simpler than generating it. For TimescaleDB, you can use the common utility tsbs_load. For QuestDB, you can use the tsbs_load_questdb utility as it supports some QuestDB-specific flags like --ilp-bind-to for Influx Line Protocol binding port and --url signifying QuestDB’s REST endpoint. You can use the following commands to load the data into TimescaleDB and QuestDB, respectively:

Note: Please follow the instructions for TimescaleDB’s config.yaml file.

To get a better idea of load performance, you can try changing the --workers parameter. Please ensure that the benchmarking parameters and conditions for both databases are the same so that you get a fair comparison.

TimescaleDB Load/Write Performance

./tsbs_load load timescaledb --config=./config.yaml
...
Summary:
loaded 174528000 metrics in 59.376sec with 8 workers (mean rate 2939386.10 metrics/sec)
loaded 15552000 rows in 59.376sec with 8 workers (mean rate 261925.49 rows/sec)

QuestDB Load/Write Performance

./tsbs_load_questdb --file /tmp/questdb-data --workers 8
...
Summary:
loaded 174528000 metrics in 18.223sec with 8 workers (mean rate 9577373.06 metrics/sec)
loaded 15552000 rows in 18.223sec with 8 workers (mean rate 853429.28 rows/sec)

The write performance of QuestDB with eight workers, in this case, is ~3.2x faster than TimescaleDB. For the complete output of this benchmark run, please visit this GitHub repository.

Generate Queries

The data set TSBS has generated for both QuestDB and TimescaleDB contains metrics for 200 hosts. To query all the readings where one metric is above a threshold across all hosts, we will use the query type high-cpu-all. To generate 1000 queries with different time ranges during that one day, you need to run the following commands:

In this tutorial, we’re running just one type of read query. You can choose from the different types of queries you can run to test the read performance.

Execute Queries

Now that we’ve generated the data, loaded it into QuestDB and TimescaleDB, and also generated the benchmarking queries that we want to run, we can finally perform the read performance benchmark using the following commands:

Note: Ensure that the queries have been generated properly before running the commands. To do that, you can run less /tmp/timescaledb-queries-high-cpu-all or less /tmp/queries_questdb-high-cpu-all.

TimescaleDB Read Performance

Run complete after 1000 queries with 8 workers (Overall query rate 25.78 queries/sec):
TimescaleDB CPU over threshold, all hosts:
min: 222.49ms, med: 274.94ms, mean: 308.60ms, max: 580.13ms, stddev: 73.70ms, sum: 308.6sec, count: 1000
all queries :
min: 222.49ms, med: 274.94ms, mean: 308.60ms, max: 580.13ms, stddev: 73.70ms, sum: 308.6sec, count: 1000
wall clock time: 38.827242sec

QuestDB Read Performance

Run complete after 1000 queries with 8 workers (Overall query rate 70.18 queries/sec):
QuestDB CPU over threshold, all hosts:
min: 92.71ms, med: 109.10ms, mean: 113.32ms, max: 382.57ms, stddev: 19.34ms, sum: 113.3sec, count: 1000
all queries :
min: 92.71ms, med: 109.10ms, mean: 113.32ms, max: 382.57ms, stddev: 19.34ms, sum: 113.3sec, count: 1000
wall clock time: 14.275811sec

QuestDB executed the queries ~2.7x faster than TimescaleDB.

QuestDB vs. TimescaleDB

To summarize the read and write benchmarks results, we can say that QuestDB is significantly faster in writes than TimescaleDB and extremely faster in reads. When we talk about the read performance, concluding with just one type of query is probably not fair, which is why you can try running the TSBS suite on your own for different types of queries for both these databases. Here’s the summary:

QuestDB performed ~320 % faster than TimescaleDB for write/load workloads.
QuestDB performed ~270 % faster than TimescaleDB for read/analytical workloads.

Conclusion

TSBS is the de facto standard for benchmarking timeseries databases. In this tutorial, you learned how to use TSBS to compare two timeseries databases by easily generating test data and emulating realistic read and write loads. As mentioned before, TSBS currently supports test data generation for DevOps and IoT (vehicle diagnostics). You can create your test data generation scripts to create more use cases for, say, real-time weather tracking, traffic signals, financial markets, and so on.

DATA ENGINEERING, TIMESERIES

QuestDB vs. TimescaleDB

How to use the Time Series Benchmark Suite to compare the database read and write performance of QuestDB versus TimescaleDB

Introduction

What is the Time Series Benchmark Suite (TSBS)?

Prerequisites

Using the TSBS for Testing Time Series Database Performance

Generating the Test Data

Load Data

TimescaleDB Load/Write Performance

QuestDB Load/Write Performance

Generate Queries

Execute Queries

TimescaleDB Read Performance

QuestDB Read Performance

QuestDB vs. TimescaleDB

Conclusion

Written by Kovid Rathee