Postgres LISTEN/NOTIFY does not scale

Daily Digest email

Get the top HN stories in your inbox every day.

osigurdson

I like this article. Lots of comments are stating that they are "using it wrong" and I'm sure they are. However, it does help to contrast the much more common, "use Postgres for everything" type sentiment. It is pretty hard to use Postgres wrong for relational things in the sense that everyone knows about indexes and so on. But using something like L/N comes with a separate learning curve anyway - evidenced in this case by someone having to read comments in the Postgres source code itself. Then if it turns out that it cannot work for your situation it may be very hard to back away from as you may have tightly integrated it with your normal Postgres stuff.

I've landed on Postgres/ClickHouse/NATS since together they handle nearly any conceivable workload managing relational, columnar, messaging/streaming very well. It is also not painful at all to use as it is lightweight and fast/easy to spin up in a simple docker compose. Postgres is of course the core and you don't always need all three but compliment each other very well imo. This has been my "go to" for a while.

jelder

"use Postgres for everything" is certainly wrong, eventually. It's still the second-best choice for every new project, and most products will never see the traffic levels that justify using something more specialized. Obviously, recall.ai hit the level of traffic where Postgres was no longer ideal. I bet they don't regret it for the other parts of their product.

closeparen

They aren't even questioning its use as a database, just as an event bus.

lytedev

What is the first-best choice for a new project? SQLite?

osigurdson

No, generally Postgres, just not for everything. If you understand the tradeoffs SQLite can be fine. Once you have more than one service (even just for HA) SQLite means doing kind of crazy things like using NFS in your infra. If you know you will only have one service and can bind it to an EBS like volume it is totally fine.

jelder

That’s my point, there is no best-first choice for everything. There will always be trade-offs. But Postgres makes the right trade-offs to be good enough in almost every scenario.

riedel

Actually LISTEN/NOTIFY does also not scale the other way. Immich also moved to that pg for everything mentality (trying to remove redis dependencies). The problem: postgres needs a WAL flush for all notifications. I ran immich on my HDD-NAS. The result was constant noise because the pg backed socket.io backend issues constant keep alive messages.

j45

There’s no reason this article and start with Postgres for everything can’t be true.

In the beginning having fewer parts to connect and maintain lets the needs and bottlenecks of the actual application emerge.

If it was listen/notify in such a scenario at some volume where optimizing it isn’t in the cards… so be it. It would be some time down the road before sharding a function into a specific subsystem like what you described.

Appreciate learning about the Postgres/Clickhouse/nats combo. If there might be an article if the three together that you liked would be happy to read and learn.

dathinab

Honestly whatever kind of DB you are speaking about always be wary of "niche/side features" which don't fit it's core design goals, they tend to have unexpected limitations.

listen/notify isn't necessary a replacement for redis or other pub/sub systems, redis pub/sub and similar isn't necessary a replacement for idk. Kafka or similar queue/messaging system

but a lot of companies have (for modern standards) surprisingly small amounts of data, very even a increase by 2,3,4x still isn't that big. In that case listen/notify and similar might just work fine :shrug:

also same is true the other way around, depending on you application you can go redis only, as long as you data volume stays small enough and needs for transactional/sync are reasonable simple enough (with watch+exec, NX,XX options etc. and maybe some redis side lua scripts you can do quite a lot for data synchronization). Issue with that is that stylistically redis data sync/transaction code is often much more similar to writing atomic data-structures then to SQL transactions, and even for SQL transactions there is a trend of devs severely overestimating what they provide, so often you are better of not touching on it when you can avoid it, also BTW. redis has something very similar to sqlite or Notify where "basically" (oversimplified by a lot) there is only one set of writes done at a time ;) (and then afterwards distributed to replicas), just that outside of some micro lua scripts you don't really run much logic outside of some NX, XX checks etc. so it's not blocking much and it's "more or less" all just in memory not touching a WAL (again oversimplified).

osigurdson

>> also same is true the other way around, depending on you application you can go redis only

Really the primary reason not to try stuff like this is (at least for me), feel that I won't paint myself into a corner with Postgres. I can always add a table here or a join there and things will work. If I need columnar, I use ClickHouse and NATS for messaging. I know these well but still gravitate toward Postgres because I feel it can grow in whatever direction is needed. However, it is true, I have thought about trying to just use NATS KV and make all services stateful receiving notifications when things change. It does seem that it could massively simplify some things but expect there could be some sharp edges in the face of unknown requirements. If one could just design for exactly the problem at hand it would be different but it never seems to work out like that.

ownagefool

Largely agree. Functionality wise if you don't have many jobs, using the database as the queue is fine.

However, I've been in several situations where scaling the queue brings down the database, and therefore the app, and am thus of the opinion you probably shouldn't couple these systems too tightly.

There are pros and cons, of course.

mike_hearn

Using the database for queues is more than fine, it's often essential to correctness. In many use cases for queues you need to atomically update the database with respect to popping from the queue, and if they're separate systems you end up needing either XA or brittle and unreliable custom idempotency logic. I've seen this go wrong before and it's not nice, the common outcome is business-visible data corruption that can have financial impact.

This seems like another case where Postgres gets free marketing due to companies hitting its technical limits. I get why they choose to make lemonade in these cases with an eng blog post, but this is a way too common pattern on HN. Some startup builds on Postgres then spends half their eng budget at the most critical growth time firefighting around its limits instead of scaling their business. OpenAI had a similar blog post a couple of months ago where they revealed they were probably spending more than quarter of a million a month on an Azure managed Postgres, and it had stopped scaling so they were having to slowly abandon it, where I made the same comment [1].

Postgres is a great DB for what you pay, but IMHO well capitalized blitzscaling startups shouldn't be using it. If you buy a database - and realistically most Postgres users do anyway as they're paying for a cloud managed db - then you might as well just buy a commercial DB with an integrated queue engine. I have a financial COI because I have a part time job there in the research division (on non-DB stuff), so keep that in mind, but they should just migrate to an Oracle Database. It has a queue engine called TxEQ which is implemented on top of database tables with some C code for efficient blocking polls. It scales horizontally by just adding database nodes whilst retaining ACID transactions, and you can get hosted versions of them in all the major clouds. I'm using it in a project at the moment and it's been working well. In particular the ability to dequeue a message into the same transaction that does other database writes is very useful, as is the exposed lock manager.

Beyond scaling horizontally the nice thing about TxEQ/AQ is that it's a full message queue broker with all the normal features you'd expect. Delayed messages, exception queues, queue browsing, multi-consumer etc. LISTEN/NOTIFY is barely a queue at all, really.

For startups like this, the amount of time, money and morale they are losing with all these constant stories of firefights just doesn't make sense to me. It doesn't have to be Oracle, there are other DBs that can do this too. But "We discovered X about Postgres" is a eng blog cliché by this point. You're paying $$$ to a cloud and GPU vendor anyway, just buy a database and get back to work!

[1] https://news.ycombinator.com/item?id=44074506

jumski

Using queues in atomic, transactional way was a core principle for building https://pgflow.dev - having whole workflow state transactionally updated alongside the work on the in db queue really simplifies a lot of things: debugging is easier, audit log is easy, reporting, stats etc are one SQL query away.

ownagefool

It actually depends on the workload.

Sending webhooks, as an example, often has zero need to go back and update the database, but I've seen that exact example take down several different managed databases ( i.e., not just postgres ).

osigurdson

>> but they should just migrate to an Oracle Database

No big tech companies or unicorn type startups are using Oracle. Is your claim that they are all wrong?

>> Some startup builds on Postgres then spends half their eng budget at the most critical growth time firefighting around its limits instead of scaling their business

This is why I suggest starting with some kind of normal queue / stream mechanism and columnar DB if needed. It isn't even harder than using one DB, particularly if you are using niche features.

sgarland

> "We discovered X about Postgres" is a eng blog cliché by this point.

It really is, and it’s often surprising to me how basic some of the issues are being discovered. Like Figma, when they waited a shocking amount of time add [0] PgBouncer and read replicas. This is such a well-trod path that it’s baffling to me why you wouldn’t add it once it’s clear you have a winning product. At the very least, PgBouncer (or PgCat, or any other connection pooler / proxying service) - it adds negligible cost per month (in comparison to DB read replicas) to run a couple of containers with a load balancer.

Re: Oracle, as much as I despise the company for its litigious practices, I’ll hand it to you that the features your DB has are astonishing. RAC is absolutely incredible (on paper - I’ve never used it).

[0]: https://www.figma.com/blog/how-figma-scaled-to-multiple-data...

dathinab

if you need transaction across a queue into a normal SQL DB or similar I believe you are doing something very wrong.

Sure you need transaction about processing things in a queue (mark as "taken out", but not yet remove then remove or "place back in (or into a failed messages inbox)" on timeout or similar can be _very_ important for queue systems.

But the moment the "fail save if something dies while processing a message" becomes a directly coupled with DB transactions you have created something very brittle and cumbersome.

To be fair that might still be the best solution for some situations.

But the better solution is to make sure you treat a queue as message passing system and handle messages as messages with the appropriate delivery semantics. And if you can't because idk. idempotency logic is supper unreliable then there indeed is a problem, but its not in the missing cross transactions but how you write that logic (missing ?_tooling_, strict code guidelines people actually comply with, interface regression checks, tests (including prop/fuzz tests, regression tests, integration/e2e tests etc., not just "dump" unit test)).

> just migrate to an Oracle Database.

In my experience while Oracle DB is very powerful but also very cumbersome in a lot of ways and if you need thing only they can provide you most likely already fucked up big time somewhere else in your design/architecture. Sure if you are at that point Oracle can lightly be the cheaper solution. But still preferable you never endup there.

As a side note, there are also a lot of decent plugins which can provide similar capabilities to PG, but they tend to have the issue that they aren't part of managed PG solutions and self managing PG (or most other reasonable powerful DB) can be a huge pain, and then yes Oracle can be a solution.

Still the amount of startups which had a overall good experience are in my experience overall non existing in my experience. (But there are some pretty big companies/projects I know of which have a overall good experience with Oracle.)

> constant stories of firefights

If you mean stories on HN, than that isn't a meaningful metric, you will only hear about the "interesting" stories which mostly are about fire fighting or "using pg for everything is grate" but rarely the majority of in-between stories and boring silent successes. If it's about stories from you carriers and asking dev friends you have what their experience is then it is more meaningful. But in a bubble (like this answer of mine is, without question, in a bubble).

Generally I think people really overestimate how representative HN is, idk. about the US but outside of it _huge_ parts of the IT industry are not represented by HN in any meaningful way. I would say in my country HN is _at most_ representative for 1/4 of the industry, but that 1/4 also contains many of the very very motivated software developers. But also very few of the "that my work but not my calling", "bread and butter" work software devs, which are often 1/3 to over 1/2 of devs in most countries as far as I can tell.

KronisLV

> Postgres/ClickHouse/NATS

Maybe throw in a dedicated key-value store like Redis or Valkey.

Oh and maybe something S3 compatible like MinIO, Garage or SeaweedFS for storing bunches of binary data.

With all of that, honestly it should cover most of the common workloads out there! Of course, depends on how specialized vs generic you like your software to be.

whaleofatw2022

NATS does KV pretty well now (didn't have expiration till earlier this year)

indeyets

Nats is getting there, but not yet.

Redis is still much more powerful: lists, sorted sets and bazillion of other data structures

goodkiwi

I’ve been meaning to check out NATS - I’ve tended to default to Redis for pubsub. What are the main advantages? I use clickhouse and Postgres extensively

sbstp

I've been disappointed by Nats. Core Nats is good and works well, but if you need stronger delivery guarantees you need to use Jetstream which has a lot of quirks, for instance it does not integrate well with the permission system in Core Nats. Their client SDKs are very buggy and unreliable. I've used the Python, Rust and Go ones, only the Go one worked as expected. I would recommend using rabbitmq, Kafka or redpanda instead of Nats.

PaoloBarbolini

I've had the same experience and I fixed part of the problem by writing my own Rust client, Watermelon. It's still missing a lot of features but at least I'm not blocked by weird decisions taken by upstream.

FZambia

Client SDKs are often a major challenge in systems like these. In my experience, building SDKs on top of asynchronous protocols is particularly tricky. It's generally much easier to make the server-side part reliable. The complexity arises because SDKs must account for a wide range of usage patterns - and you are not controlling the usage.

Asynchronous protocols frequently result in callback-based or generator-style APIs on the client side, which are hard to implement safely and intuitively. For example, consider building a real-time SDK for something like NATS. Once a message arrives, you need to invoke a user-defined callback to handle it. At that point, you're faced with a design decision: either call the callback synchronously (which risks blocking the socket reading loop), or do it asynchronously (which raises issues like backpressure handling).

Also, SDKs are often developed by different people, each with their own design philosophy and coding style, leading to inconsistency and subtle bugs.

So this isn't only about NATS. Just last week, we ran into two critical bugs in two separate Kafka SDKs at work.

chatmasta

Are those recommendations based on using them all in the same context? Curious why you chose Kafka (or Redpanda which is effectively the same) over NATS.

osigurdson

NATS gives you regular pub/sub but also streams as well (similar to Kafka along with strong durability guarantees, etc).

riku_iki

> However, it does help to contrast the much more common, "use Postgres for everything" type sentiment.

I think sentiment is to use "for everything in 99% business cases", which involves few 100GB of data with some thousands QPS, and could be handled by PG very well.

sorentwo

Postgres LISTEN/NOTIFY was a consistent pain point for Oban (background job processing framework for Elixir) for a while. The payload size limitations and connection pooler issues alone would cause subtle breakage.

It was particularly ironic because Elixir has a fantastic distribution and pubsub story thanks to distributed Erlang. That’s much more commonly used in apps now compared to 5 or so years ago when 40-50% of apps didn’t weren’t clustered. Thanks to the rise of platforms like Fly that made it easier, and the decline of Heroku that made it nearly impossible.

cpursley

How did you resolve this? Did you consider listening to the WAL?

sorentwo

We have Postgres based pubsub, but encourage people to use a distributed Erlang based notifier instead whenever possible. Another important change was removing insert triggers, partially for the exact reasons mentioned in this post.

MuffinFlavored

> Another important change was removing insert triggers, partially for the exact reasons mentioned in this post.

What did you replace them with instead?

parthdesai

Distributed Erlang if application is clustered, redis if it is not.

Source: Dev at one of the companies that hit this issue with Oban

nightpool

What about Heroku made Erlang clustering difficult? It's had the same DNS clustering feature that Fly has, and they've had it since 2017: https://devcenter.heroku.com/articles/dyno-dns-service-disco....

sorentwo

The problem was with restrictive connections, not DNS based discovery for clustering. It wasn't possible (as far as I'm aware) to connect directly from one dyno to another through tcp/udp.

nightpool

That is not an issue when using Private Spaces, which have been available since 2015

alberth

I didn’t realize Oban didn’t use Mnesia (Erlang built-in).

sorentwo

Very very few applications use mnsesia. There’s absolutely no way I would recommend it over Postgres.

arcanemachiner

I have heard the mnesia is very unreliable, which is a damn shame.

I wonder if that is fixable, or just inherent to its design.

tecleandor

I think RabbitMQ still uses by default for its metadata storage. Is it problematic?

asg0451

can you explain why?

JoelJacobson

Hey folks, I ran into similar scalability issues and ended up building a benchmark tool to analyze exactly how LISTEN/NOTIFY behaves as you scale up the number of listeners.

Turns out that all Postgres versions from 9.6 through current master scale linearly with the number of idle listeners — about 13 μs extra latency per connection. That adds up fast: with 1,000 idle listeners, a NOTIFY round-trip goes from ~0.4 ms to ~14 ms.

To better understand the bottlenecks, I wrote both a benchmark tool and a proof-of-concept patch that replaces the O(N) backend scan with a shared hash table for the single-listener case — and it brings latency down to near-O(1), even with thousands of listeners.

Full benchmark, source, and analysis here: https://github.com/joelonsql/pg-bench-listen-notify

No proposals yet on what to do upstream, just trying to gather interest and surface the performance cliff. Feedback welcome.

cryptonector

That's pretty cool.

IMO LISTEN/NOTIFY is badly designed as an interface to begin with because there is no way to enforce access controls (who can notify; who can listen) nor is there any way to enforce payload content type (e.g., JSON). It's very unlike SQL to not have a `CREATE CHANNEL` and `GRANT` commands for dealing with authorization to listen/notify.

If you have authz then the lack of payload content type constraints becomes more tolerable, but if you add a `CREATE CHANNEL` you might as well add something there regarding payload types, or you might as well just make it so it has to always be JSON.

With a `CREATE CHANNEL` PG could provide:

  - authz for listen
  - authz for notify
  - payload content type constraints
    (maybe always JSON if you CREATE
    the channel)
  - select different serialization
    semantics (to avoid this horrible,
    no good, very bad locking behavior)
  - backwards-compatibility for listen/
    notify on non-created channels

maxbond

> there is no way to enforce access controls

(I thought this was a fun puzzle, so don't take this as advice or as disagreement with your point.)

There is the option to use functions with SECURITY DEFINER to hack around this, but the cleanest way to do it (in the current API) would be to encrypt your messages on the application side using an authenticated system (eg AES-GCM). You can then apply access control to the keys. (Compromised services could still snoop on when adjacent channels were in use, however.)

cryptonector

Yes, I've thought about this too, but it's annoying to have to resort to that, no?

infogulch

Cool! This article and thread has already been referenced on the mailing list, maybe its worth mentioning this benchmark and experiment.

https://www.postgresql.org/message-id/flat/CAM527d_s8coiXDA4...

https://www.postgresql.org/message-id/flat/175222328116.3157...

WhyNotHugo

Thanks for attacking this issue (even if still in a research phase, that's definitely a needed start).

I'm amused at how op brags about the huge scale at which they operate, but instead of even considering fixing the issue (both for themselves and for others), they just switched to something else for pubsub.

leontrolski

I'd be interested as to how dumb-ol' polling would compare here (the FOR UPDATE SKIP LOCKED method https://leontrolski.github.io/postgres-as-queue.html). One day I will set up some benchmarks as this is the kind of thing people argue about a lot without much evidence either way.

Wasn't aware of this AccessExclusiveLock behaviour - a reminder (and shameless plug 2) of how Postgres locks interact: https://leontrolski.github.io/pglockpy.html

qianli_cs

My colleague did some internal benchmarking and found that LISTEN/NOTIFY performs well under low to moderate load, but doesn't scale well with a large number of listeners. Our findings were pretty consistent with this blog post.

(Shameless plug [1]) I'm working on DBOS, where we implemented durable workflows and queues on top of Postgres. For queues, we use FOR UPDATE SKIP LOCKED for task dispatch, combined with exponential backoff and jitter to reduce contention under high load when many workers are polling the same table.

Would love to hear feedback from you and others building similar systems.

[1] https://github.com/dbos-inc/dbos-transact-py

mind-blight

Nice! I'm using DBOS and am a little active on the discord. I was just wondering how y'all handled this under the hood. Glad to hear I don't have to worry much about this issue

eatonphil

Why not read the WAL?

qianli_cs

We considered using WAL for change tracking in DBOS, but it requires careful setup and maintenance of replication slots, which may lead to unbounded disk growth if misconfigured. Since DBOS is designed to bolt onto users' existing Postgres instances (we don't manage their data), we chose a simpler, less intrusive approach that doesn't require a replication setup.

Plus, for queues, it's so much easier to leverage database constraints and transactions to implement global concurrency limit, rate limit, and deduplication.

singron

Polling is the way to go, but it's also very tricky to get right. In particular, it's non-trivial to make a reliable queue that's also fast when transactions are held open and vacuum isn't able to clean tuples. E.g. "get the first available tuple" might have to skip over 1000s of dead tuples.

Holding transactions open is an anti-pattern for sure, but it's occasionally useful. E.g. pg_repack keeps a transaction open while it runs, and I believe vacuum also holds an open transaction part of the time too. It's also nice if your database doesn't melt whenever this happens on accident.

time0ut

An approach that has worked for me is to hash partition the table and have each worker look for work in one partition at a time. There are a number of strategies depending on how you manage workers. This allows you to only consider 1/Nth of the dead tuples, where N is the number of partitions, when looking for work. It does come at the cost of strict ordering, but there are many use cases where strict ordering is not required. The largest scale implementation of this strategy that I have done had 128 partitions with a worker per partition pumping through ~100 million tasks per day.

I also found LISTEN/NOTIFY to not work well at this scale and used a polling based approach with a back off when no work was found.

Quite an interesting problem and a bit challenging to get right at scale.

j16sdiz

Can't change the number of partition dynamically.

Additional challenge if jobs comes in funny sizes

dfsegoat

If there were a toy or other public implementation of this, I would love to see it.

CBLT

This is how Kafka does it. Kafka has spent years working on the rough edges (e.g. partition resizing), haven't used it recently though.

atombender

Dead tuples is a real and significant problem, not just because it has to skip the tuples, but because the statistics that drive the planner don't account for them.

I found this out the hard way when I had a simple query that suddenly got very, very slow on a table where the application would constantly do a `SELECT ... FOR UPDATE SKIP LOCKED` and then immediately delete the rows after a tiny bit of processing.

It turned out that with a nearly empty table of about 10-20k dead tuples, the planner switched to using a different index scan, and would overfetch tons of pages just to discard them, as they only contained dead tuples. What I didn't realize is that the planner statistics doesn't care about dead tuples, and ANALYZE doesn't take them into account. So the planner started to think the table was much bigger than it actually was.

It's really important for these uses cases to tweak the autovacuum settings (which can be set on a per-table basis) to be much more aggressive, so that under high load, the vacuum runs pretty much continuously.

Another option is to avoid deleting rows, but instead use a column to mark rows as complete, which together with a partial index can avoid dead tuples. There are both pros and cons; it requires doing the cleanup (and VACUUM) as a separate job.

singron

Unfortunately, updating the row also creates dead tuples. It's very tricky!

leontrolski

> also fast when transactions are held open

In my linked example, on getting the item from the queue, you immediately set the status to something that you're not polling for - does Postgres still have to skip past these tuples (even in an index) until they're vacuumed up?

broken_broken_

I have implemented polling against a cluster of mixed mariadb/mysql databases which do not offer listen/notify. It was a pain in the neck to get right.

- The batch size needs to be adaptative for performance, latency, and recovering smoothly after downtime.

- The polling timeouts, frequency etc the same.

- You need to avoid hysteresis.

- You want to be super careful about not disturbing the main application by placing heavy load on the database or accidentally locking tables/rows

- You likely want multiple distributed workers in case of a network partition to keep handling events

It’s hard to get right especially when the databases at the time did not support SKIP LOCKED.

In retrospect I wish I had listened to the WAL. Much easier.

cpursley

Have you played with pgmq? It's pretty neat: https://github.com/pgmq/pgmq

edoceo

Another thing for @leontrolski to add to the benchmarks - which I cannot wait to read.

cpursley

There's a pretty cool solution built on pgmq called pgflow:

https://www.pgflow.dev/concepts/how-pgflow-works

RedShift1

I use polling with back off up to one minute. So when a workload is done, it immediately polls for more work. If nothing found, wait for 5 seconds, still nothing 10 seconds, ... until one minute and from then on it polls every minute until it finds work again and the back off timer resets to 0 again.

TkTech

With that experience behind you, would you have feedback for Chancy[1]? It aims to be a batteries-included offering for postgres+python, aiming for hundreds of millions of jobs a day, not massive horizontal worker scaling.

It both polls (configurable per queue) and supports listen/notify simply to inform workers that it can wake up early to trigger polling, and this can be turned off globally with a notifications=false flag.

[1]: https://github.com/tktech/chancy

cryptonector

Instead of LISTEN/NOTIFY you could listen to the wal / logical replication stream.

Or you could have a worker whose only job is to listen to the wal / logical replication stream and then NOTIFY. Being the only one to do so would not burden other transactions.

Or you could have a worker whose only job is to listen to the wal / logical replication stream and then publish on some non-PG pubsub system.

aurumque

I'll take the shameless plug. Thank you for putting this together! Very helpful overview of pg locks.

notarobot123

It's funny how "shameless plug" actually means "excuse the self-promotion" and implies at least a little bit of shame even when the reference is appropriate and on-topic.

cpursley

Right, plus there's character limitations (column size). This is why I prefer listening to the Postgres WAL for database changes:

https://github.com/cpursley/walex?tab=readme-ov-file#walex (there's a few useful links in here)

williamdclt

I found recently that you can write directly to the WAL with transactional guarantees, without writing to an actual table. This sounds like it would be amazing for queue/outbox purposes, as the normal approaches of actually inserting data in a table cause a lot of resource usage (autovacuum is a major concern for these use cases).

Can’t find the function that does that, and I’ve not seen it used in the wild yet, idk if there’s gotchas

Edit: found it, it’s pg_logical_emit_message

gunnarmorling

pg_logical_emit_message() is how I recommend users on Postgres to implement the outbox pattern [1]. No table overhead as you say, no need for housekeeping, etc. It has some other cool applications, e.g. providing application-specific metadata for CDC streams or transactional logging, wrote about it at [2] a while ago. Another one is making sure replication slots can advance also if there's no traffic in the database they monitor [3].

[1] https://speakerdeck.com/gunnarmorling/ins-and-outs-of-the-ou...

[2] https://www.infoq.com/articles/wonders-of-postgres-logical-d...

[3] https://www.morling.dev/blog/mastering-postgres-replication-...

williamdclt

Ha, your [2] is how I learnt about it! Thanks :)

brightball

You know, this would be a great talk at the 2026 Carolina Code Conference...

cyberax

One annoying thing is that there is no counterpart for an operation to wait and read data from WAL. You can poll it using pg_logical_slot_get_binary_changes, but it returns immediately.

It'd be nice to have a method that would block for N seconds waiting for a new entry.

You can also use a streaming replication connection, but it often is not enabled by default.

williamdclt

I think replication is the way to go, it’s kinda what it’s for.

Might be a bit tricky to get debezium to decode the logical event, not sure

cryptonector

`pg_logical_emit_message()` is great and better than `NOTIFY` in terms of how it works, but...

`pg_logical_emit_message()` perpetuates/continues the lack of authz around `NOTIFY`.

williamdclt

What do you mean by this? What authz would you expect/like?

denysonique

For node.js users there is postgres.js that can listen to the Postgres WAL and emit node events that can be handled by application code.

BowBun

Yeah until vendors butcher Postgres replication behaviors and prevent common paths of integrating these capabilities into other tools. Looking at you AWS

FZambia

Many here recommend using Kafka or RabbitMQ for real-time notifications. While these tools work well with a relatively stable, limited set of topics, they become costly and inefficient when dealing with a large number of dynamic subscribers, such as in a messaging app where users frequently come and go. In RabbitMQ, queue bindings are resource-intensive, and in Kafka, creating new subscriptions often triggers expensive rebalancing operations. I've seen a use case for a messenger app with 100k concurrent subscribers where developers used RabbitMQ and individual queues for each user. It worked at 60 CPU on Rabbit side during normal situation and during mass reconnections of users (due to some proxy reload in infra) – it took up to several minutes for users to reconnect. I suggested switching to https://github.com/centrifugal/centrifugo with Redis engine (combines PUB/SUB + Redis streams for individual queues) – and it went to 0.3 CPU on Redis side. Now the system serves about 2 million concurrent connections.

odie5533

I wonder who works on centrifugo. Could be anyone.

CaliforniaKarl

I appreciate this post for two reasons:

* It gives an indication of how much you need to grow before this Postgres functionality starts being a blocker.

* Folks encountering this issue—and its confusing log line—in the future will be able to find this post and quickly understand the issue.

Gigachad

Sounds like ChatGPT appreciated the post

acdha

If you think they’re a bot, flag and move on. No need for a derail about writing style.

CaliforniaKarl

I did not use ChatGPT—nor any AI—in writing the post. I'm curious, would you mind emailing—or replying—with what made you think that it was written by AI? Or why you do not believe my statement?

yrds96

I'm ESL, so I often check my grammar on ChatGPT, and 99% of the time it includes em dashes in the corrected sentences, which I remove or just replace with commas or hyphens to sound more natural. So maybe this was not entirely written but just revised by ChatGPT.

jjgreen

Just for the em-dashes? Some humans also use them.

Gigachad

It’s also the fact it’s just a summary of the post content without anything extra or any opinions.

TrackerFF

A decent way to classify human vs bot when it comes to dashes, is that all bots use ‘em-dashes(—), while almost none use regular dashes (-) in writing. While plenty of humans will use regular dashes, because they won’t bother to look for ‘em-dashes on the keyboard, or phone.

Of course, you have the people that correctly use em-dashes, too.

andrewstuart

There’s lots of ways to invoke NOTIFY without doing it from with the transaction doing the work.

The post author is too focused on using NOTIFY in only one way.

This post fails to explain WHY they are sending a NOTIFY. Not much use telling us what doesn’t work without telling us the actual business goal.

It’s crazy to send a notify for every transaction, they should be debounced/grouped.

The point of a NOTIFY is to let some other system know something has changed. Don’t do it every transaction.

0xCMP

Agreed, I am struggling to understand why "it does not scale" is not "we used it wrong and hit the point where it's a problem" here.

Like if it needs to be very consistent I would use an unlogged table (since we're worried about "scale" here) and then `FOR UPDATE SKIP LOCKED` like others have mentioned. Otherwise what exactly is notify doing that can't be done after the first transaction?

Edit: in-fact, how can they send an HTTP call for something and not be able to do a `NOTIFY` after as well?

One possible way I could understand what they wrote is that somewhere in their code, within the same transaction, there are notifies which conditionally trigger and it would be difficult to know which ones to notify again in another transaction after the fact. But they must know enough to make the HTTP call, so why not NOTIFY?

andrewstuart

Agreed.

They’re using it wrong and blaming Postgres.

Instead they should use Postgres properly and architect their system to match how Postgres works.

There’s correct ways to notify external systems of events via NOTIFY, they should use them.

thom

Yeah, the way I've always used LISTEN/NOTIFY is just to tell some pool of workers that they should wake up and check some transactional outbox for new work. False positives are basically harmless and therefore don't need to be transactional. If you're sending sophisticated messages with NOTIFY (which is a reasonable thing to think you can do) you're probably headed for pain at some point.

tomrod

Assuming you skip select transaction, or require logging on it because your regulated industry had bad auditors, then every transaction changes something.

hombre_fatal

Interesting. What if you just execute `NOTIFY` in its own connection outside of / after the transaction?

nick_

My thought as well. You could add notify commands to a temp table during the transaction, then run NOTIFY on each row in that temp table after the transaction commits successfully?

zbentley

This is roughly the “transactional outbox” pattern—and an elegant use of it, since the only service invoked during the “publish” RPC is also the database, reducing distributed reliability concerns.

…of course, you need dedup/support for duplicate messages on the notify stream if you do this, but that’s table stakes in a lot of messaging scenarios anyway.

foota

Wouldn't you need to then commit to remove the entries from the temp table?

zbentley

No, so long as the rows in there are transactionally guaranteed to be present or not, a sweeper script can handle removing failed “publishes” (notifys that didn’t delete their row) later.

This does sacrifice ordering and increases the risk of duplicates in the message stream, though.

parthdesai

You lose transactional guarantees if you notify outside of the transaction though

hombre_fatal

Yeah, but pub/sub systems already need to be robust to missed messages. And, sending the notify after the transaction succeeds usually accomplishes everything you really care about (no false positives).

parthdesai

What happens when transaction succeeds but the execution of NOTIFY fails if it's outside of transaction, in it's own separate connection?

gwbas1c

... And working outside of the guarantee is harder, especially if you're in a "move fast and break things because we can fix it later" mode.

Anyway, the article indicates that the fix was very simple and primarily in the application layer. Makes me wonder if someone was getting "creative" when they used LISTEN/NOTIFY.

soursoup

Isn’t it standard practice to have a separate TCP stream for NOTIFY or am I mistaken

remram

You mean for LISTEN?

zerd

That would make the locked time shorter, but it would still contend on the global lock, right?

callamdelaney

My kneejerk reaction to the headline is ‘why would it?’.

It’s unsurprising to me that an AI company appears to have chosen exactly the wrong tool for the job.

kristianc

Sounds like a deliberate attempt to avoid spinning up Redis, Kafka, or an outbox system early on.. and then underestimated how quickly their scale would make it blow up. Story as old as time.

const_cast

I find the opposite story more true: additional complexity in the form of caching early, for a scale that never comes. I've worked on one too many sprawling, distributed systems with too little users to justify it.

physix

"Sprawling distributed systems".

I like that. Sounds like a synonym for "Platform Engineering". :-)

I remember being amazed that lambda architecture was considered a kind of reference, when it looked to me more like a workaround.

We like to build IT cathedrals, until we have to run them.

v5v3

Better to be successful with simple tech and have a minor 'blow up', then over engineer and go bust.

oulipo

Not sure I get it... how would you replicate this functionality with Kafka? You'd still need to have the database LISTEN to changes and push it to Kafka no?

j16sdiz

Kafka head of line blocking sucks.

chrnola

Guaranteeing order has its tradeoffs.

There is work happening currently to make Kafka behave more like a queue: https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A...

LgWoodenBadger

Isn't this one of the things partitioning is meant to ameliorate? Either through partitions themselves, or through an appropriate partitioning strategy?

bravesoul2

Yeah I have no idea whether it would. But I'd load test it if it needed to scale.

SQS may have been a good "boring" choice for this?

TheTaytay

Because documentation doesn’t warn about this well-loved feature effectively ruins the ability to perform parallel writes, and because everything else in Postgres scales well.

I think it’s a reasonable assumption. Based on the second half of your comment, you clearly don’t think highly of “AI companies,” but I think that’s a separate issue.

NightMKoder

Facebook’s wormhole seems like a better approach here - just tailing the MySQL bin log gets you commit safety for messages without running into this kind of locking behavior.

cshimmin

If I understood correctly, the global lock is so that notify events are emitted in order. Would it make sense to have a variant that doesn't make this ordering guarantee if you don't care about it, so that you can "notify" within transactions without locking the whole thing?

GuinansEyebrows

possibly, but i think at that point it would make more sense to move the business logic outside of the database (you can wait for a successful commit before triggering an external process via the originating app, or monitor the WAL with an external pub/sub system, or something else more clever than i can think of).

Daily Digest email

Get the top HN stories in your inbox every day.