pg_timeseries: Open-source time-series extension for PostgreSQL

Daily Digest email

Get the top HN stories in your inbox every day.

riedel

>You may already be asking: “why not just power the stack using TimescaleDB?” The Timescale License would restrict our use of features such as compression, incremental materialized views, and bottomless storage. With these missing, we felt that what remained would not provide an adequate basis for our customers’ time-series needs. Therefore, we decided to build our own PostgreSQL-licensed extension.

Have been using the free version timescaledb before to shard a 500 Million observation time series database. Worked drop-in without much hassle. Would have expected some benchmarks and comparisons in the post. I will for sure watch this...

osigurdson

500 million is very little however. A regular table with a covering index would probably be fine for many use cases with this number of points.

vegabook

indeed. Financial timeseries I was working with over 100 million new points, _per day_. For anything serious TimescaleDB is essentially not open source. Well done tembo.io crew -- will definitely give this a whirl.

eyegor

What do you mean by "for anything serious it isn't open source"? I didn't see any red flags in the apache variant of timescale, just constant pleading to try their hosted option.

https://github.com/timescale/timescaledb/blob/main/LICENSE-A...

miohtama

If you have 100 million points per day it’s likely you afford to pay any commercial license.

riedel

> 500 million is very little however. A regular table with a covering index would probably be fine for many use.

Totally agree. The problem here was it was some awkwardly designed geospatial measurement dabase (OGC sensor things) and we could not do much about the queries etc forms one ORM logic and alot of postgis stuff. It was great to have something as a drop-in replacement speeding up all the time series queries without much thinking. Actually what was still nagging us was all the locking going on due to the transactional semantics. Time series databases are probably much better at handling queries under constant ingress. We are total amateurs wrt database optimisation, but my guess is that many of those offerings are rather targeting the average use case.

vjerancrnjak

I think you’re not talking about the same thing. There’s an expression related to time series data —- “high churn” and another “active time series”.

500 million active time series is extremely huge.

It does not have anything to do with number of data points.

Good time series databases can scale to 1M-10M writes per second without a hiccup.

osigurdson

I suppose it means by what is meant by an "observation". Is that an entire time series for a single property or a single point? Nevertheless, the number of points absolutely matters.

A regular Postgres database can give you 50-100K inserts per second and can scale to at least 1B rows with 100K+ individual series without much difficultly. If you know you will need less (or much less) than this, my suggestion is to use a regular table with a covering index. If you need more, use ClickHouse.

Too

That number in itself doesn’t say anything.

What really causes a database to sweat is high cardinality.

When talking about time series, also which fields are indexed and if are you inserting out of order.

rapsey

Databases are a tough business. You're just waiting for open source to eat your lunch.

dengolius

AFAIK https://github.com/timescale/tsbs is based on artificial data and I would recommend running benchmarks and comparisons on real data from node_exporter, like https://github.com/VictoriaMetrics/prometheus-benchmark.

remram

500 million observations, with 4-byte floats, is 2 GB. This is the kind of size that you can store uncompressed, in RAM, on a phone. It is hardly at the point where you require specialized time-series software at all.

mnahkies

Looking at their roadmap, the killer feature for me would be incremental materialised views

> Incremental view maintenance — define views which stay up-to-date with incoming data without the performance hit of a REFRESH

I wonder if they plan to incorporate something like https://github.com/sraoss/pg_ivm or write their own implementation.

(Although I'm hopeful that one day we see ivm land in postgres core)

techoffs

Former Timescaler here.

It's about time that Timescale started getting what it deserves.

Sometime in early 2022, just as they raised their Series C, leadership decided that they had gotten what they wanted from the open-source community and TimescaleDB. They decided it was time to focus 100% on Timescale Cloud. Features began to become exclusive to Timescale Cloud, and the self-hosted TimescaleDB was literally treated as competition. At the same time, they managed to spoil their long-time PaaS partnership with Aiven, which was (and still is) a major source of revenue for the company. The reason? Everyone needed to use Timescale Cloud and give their money to Timescale, thus making Aiven a competitor. In short, with the raising of Series C, Timescale stopped being an OSS startup and began transitioning to a money hungry corporation.

In 2023, they conducted two rounds of layoffs, even though the company was highly profitable. Recently, Planetscale also carried out layoffs in a similarly harsh manner as Timescale, but at least Planetscale had the "courtesy" to address this with two sentences in their PR statement about company restructuring. Timescale did not even do that; they kept it all quiet. Out of 160 employees, around 65 were laid off. The first round of layoffs occurred in January, and the second in September. No warnings. No PIPs. Just an email informing you that you no longer work for them. Many of the affected employees were in the middle of ongoing projects. The CEO even mentioned in the in-house memo how they diligently worked on the September layoff throughout the summer. Interestingly, many of these employees were hired by competitors like Supabase and Neon. It’s worth emphasizing that this was not a financial issue—Timescale is far from having such problems. Instead, it was a restructuring effort to present nice financial numbers during ongoing turbulence in the tech market. (And yes, you guessed it! Timescale also hired their first CFO a couple of months before the first layoffs.)

You might say that it's just business, but as an OSS startup, I expect them to live by the values they have advertised over the years and treat their users and employees much better than they currently do. With this in mind, I welcome Tembo as a new player in the time-series market.

Footnotes: Timescale = the company. TimescaleDB = OSS time-series database developed by Timescale. Timescale Cloud = TimescaleDB managed by Timescale on AWS.

redwood

Knowing little about the company, it's almost certainly completely untrue to state the "the company was highly profitable"; you don't hire 160 people in an OSS-centric business and also turn a profit. You likely have a distorted understanding of the challenging nature of the burn rate in a changing macro environment.

akulkarni

Ajay, Timescale CEO and co-founder, here.

It saddens me to see that we have generated so much ill will from you. It sounds like you were affected by our layoffs last year. You have every right to be upset. If you ever want to chat about this 1:1, you know how to reach me. I’d be happy to make the time.

To anyone else reading this: Some of what this person has shared is true, but some of it is not true.

I debated whether or not to reply. But one of my personal leadership values is “transparency”, so I thought I’d take the time to respond.

Yes, we conducted two rounds of layoffs in 2023. Like many tech companies, we hired a lot in 2021 and early 2022. Then, as the tech market began to correct mid 2022, we were forced to make tough decisions, including layoffs.

I take responsibility for the over-hiring and the layoffs. It brought me no joy to do them. But I feel a moral obligation to our customers to stay on the path of financial sustainability. I also feel a fiduciary obligation to our investors, some of whom are individuals, some of whom are large funds, who have all trusted us with their money. I feel a similar responsibility to current and former Timescalers who own equity in Timescale.

Sometimes, that means making tough decisions like this. But again, it was my call (not anyone else), and I accept full responsibility.

Yes, we did not publicize this news. Frankly, we thought we were too small for others to care. Maybe we got that wrong. But that decision came from a place of humility.

This is not true: “Just an email informing you that you no longer work for them.” Every affected person – except for a handful who were not working that day – was told the news individually, on a live Zoom call, that included at least one of our executives or a member of our People team. For the few teammates who were not working that day, we made many attempts to connect with them personally. I know the team tried their best to approach these hard conversations with care and empathy.

I was glad to see that a number of the affected individuals quickly found new roles at other companies in the PostgreSQL ecosystem, including at Supabase, Neon, and Tembo. These are good, smart people. The PostgreSQL ecosystem is better off with these people continuing to work to improve PostgreSQL.

The comments questioning our belief in open source are also not true. We still believe in open source. The core of TimescaleDB is still open source. Some of the advanced features are under a free, source-available license. Our latest release – TimescaleDB 2.15 – was just two weeks ago. Unlike most (all?) of our competitors, we have never re-licensed our open source software. This is something that is true for us but not for many others, like MongoDB, Elastic, Redis, Hashicorp, Confluent, etc.

Yes, we are building a self-sustaining open source business. Yes, it is hard and sometimes we get things wrong. But we have never stopped investing in our community. Today the TimescaleDB community (open source and free) is 20x larger than our customer base. And this community has more than doubled in the past 1+ year. We are also planning significant open source contributions for the next few months.

To the author of this post: I hope this response provides some clarification. And again, I’m available to chat one-on-one if you’d like.

To our open source and free community users, and to our customers: thank you for trusting us with your workloads. We are committed to serving you.

Finally, to the Timescale team, both current and former: thank you for all your hard work making developers successful. We are here to serve developers so that they can build the future. The road won’t always be easy or smooth. But we are committed, and we will get there.

smga3000

Taking responsibility for laying off a bunch of people is thin gruel. What does that mean? Nothing. They showed you loyalty, you didn't return it. The over hiring is a symptom of bad management, so taking responsibility would be to demote yourself and take a pay cut. All the executive staff should have taken one and kept more people on. It is irresponsible and cruel to overhire and then dump people. I could say more, but I am sure this is falling on deaf ears.

redwood

Over-hiring is a calculus that shifts depending on macro market conditions. To pretend otherwise just isn't an honest assessment of what it means to be a business leader. In the zero interest rate era, it was irresponsible for business executives to not invest to ensure they had a competitive foundation or else be left-behind by those doing so... The rise of interest rates changed the market's appetite to invest in non-profitable growth companies, which in turn had a series of follow-on effects that required software executives to reverse course so as to optimize for their long term viability in the new climate.

MuffinFlavored

Dumb question: why can't I just insert a bunch of rows with a timestamp column and indices? Where does that fall short? At a certain # of rows or something?

What does this let me do that can't be achieved with "regular PostgreSQL without the extension"?

skibbityboop

I'm with you, I need to read up more on where timeseries could benefit, at work we have a PostgreSQL instance with around 27 billion rows in a single partitioned table, partitioned by week. Goes back to January of 2017 and just contains tons of data coming in from sensors. It's not "fast", but also not ridiculously slow to say e.g. "Give me everything for sensor 29380 in March of 2019".

I guess depends on your needs but I do think I need to investigate timeseries more to see if it'd help us.

Too

Now give me a plot with the average of all sensors of model=A in region=B, grouped by customer, for the past 3 months, downsampled to 500 points. Assuming 1 sensor reading per minute.

I have no doubt sql can do it without too much trouble, but for a time series this is really an instant operation, even on a small server.

A time series will first find the relevant series and then simply for-loop through all the data. It takes just a handful of milliseconds.

Sql will need to join with other tables, traverse index, load wider columns. And you better have set the correct index first, in your case you also spent extra effort on partitioning tables. Likely you are also using a beefy server.

defrost

Adding to your comment, from my perspective (exploration geophysics)

> and just contains tons of data coming in from sensors.

it's also desired to, on the fly, deal with missing sensor data, clearly bad sensor data, identify and smooth spikes in data (weird glitch or actual transient spike of interest), apply a variety of running filters; centred average is basic, parameterised Savitzky–Golay filters provide "beefed up" better than running average handling .. and there are more.

It's not just better access to sequential data that makes a dedicated time series engine desirable, it's the suite of time (and geo spatial) series operations that close the deal.

klaussilveira

Same boat here, but with DSP/SSP bidding statistics. Generating around 1 billion rows a day and still going strong. Single table, partitioned by week. BRIN index on timestamp, normal index on one column.

Postgres is just a beast.

ishikawa

there are several good articles explaining this, especially on Timescable blog, but in short, without time partitioning and just index, at some given point the performance for reads and writes degrades exponencially.

gonzo41

Time based partitioning.

MuffinFlavored

    CREATE TABLE logs (
        id SERIAL PRIMARY KEY,
        log_time TIMESTAMP NOT NULL,
        message TEXT
    ) PARTITION BY RANGE (log_time);

Why won't this work on stock PostgreSQL?

kbolino

I think what's meant here is windowing (partitioning the query) not partitioning the table per se. Though even with this strategy, you must manually create new partitions all the time.

This also isn't typical time-series data, which generally stores numbers. Supposing you had a column "value INTEGER" as well, how do you do something like the following (pseudo-SQL)?

    SELECT AVG(value) AS avg_value FROM logs GROUP BY INTERVAL '5m'

Which should output rows like the following, even if the data were reported much more frequently than every 5 minutes:

    log_time             | avg_value
    2024-05-20T00:00:00Z | 10.3
    2024-05-20T00:05:00Z | 7.8
    2024-05-20T00:10:00Z | 16.1

gonzo41

read the docs, it's not saying that won't work. This extension along with timescale just makes some things more ergonomic.

jmaker

That won’t work already because your timestamp isn’t part of your primary key.

citizen_friend

It will work just fine.

nitinreddy88

Most of the time-series queries (almost all of them) are aggregated queries. Why not leverage or build top-notch Columnarstore for the same.

Everything seems to be there and why there's not first class product like ClickHouse on PG.

netik

The gold standard for this Druid at very large scale, or ClickhouseDB. Clickhouse has a lot of problems as far as modifying/scaling shards after the fact, while Druid handles this with ease (and the penalty of not being able to update after the fact.)

anentropic

Doris?

paulryanrogers

Citus, Persona, TimescaleDB?

bloopernova

That was very "Klaatu, Barada, Nikto".

applied_heat

Victoria metrics as well, they say based on similar structures used in clickhouse

nitinreddy88

Looking at the comparison with Click Benchmark, they are almost pathetic in terms of performance. They cant even handle sub-second aggregation queries for 10M records. Compared that too even duckdb reading from parquet files.

nikita

Postgres is missing a proper columnstore implementation. It's a big gap and it's not easy to build.

One solution could be integrating duckdb in a similar way as pgvector. You need to map duckdb storage to Postgres storage and reuse duckdb query processor. I believe it's the fastest way to get Postgres to have competitive columnstores.

mathfailure

Thank you for posting it: I followed the links and found out about trunk and https://pgt.dev/

PeterZaitsev

Great to see this kind of innovation. PostgreSQL is interesting while "core" was always Open Source and using very permissive Open Source library, there have been many proprietary and source available extensions, ranging from replication to time series support.

Now we see those Proprietary extensions being disrupted by proper Open Source!

vantiro

PostgreSQL licensed, good move!

plainOldText

Your site is very well designed and easy to read btw, and the app UI looks great from the demo photos. I might try it!

samaysharma

Thank you!

nhourcard

Interesting release, it feels that the time-series database landscape is evolving toward:

a) columnar store & built from scratch, with convergence toward open formats such as parquet & arrow: influxdb 3.0, questdb

b) Adding time-series capabilities on top of Postgres: timescale, pg_timeseries

c) platforms focused on observability around the Prometheus ecosystem: grafana, victoria metrics, chronosphere

wdb

Would this be a good extension when you want to load balancer log entries (status, response body, headers etc)?

I think a columnar database store would be more efficient than normal row-based databases? load balancer log entries could be considered something similar to analytics events.

samaysharma

Yes. Columnar is integrated with pg_timeseries already.

vivzkestrel

Benchmarks with respect to QuestDB, TimescaleDB?

dengolius

QuestDB, TimescaleDB, and PostgreSQL are more about universal databases and don't work as well as Prometheus or VictoriaMetrics. And yes, Clickhouse will beat them both, but only by running queries on a huge amount of data. See also https://docs.victoriametrics.com/faq/#how-does-victoriametri...

Daily Digest email

Get the top HN stories in your inbox every day.