Brian Lovin
/
Hacker News
Daily Digest email

Get the top HN stories in your inbox every day.

ekidd

I've built three distributed job systems at this point. A handy rule of thumb which I have promoted for years is "build for 10x your current scale."

If you need to handle 70 requests/second, design for 700. If you need to handle 20 servers running batch jobs, design for 200 servers. If you're in a startup that grows 100% per year, you'll be at 8x scale in 3 years. So you'll have time to rewrite as you grow!

Out of three job systems I built, the first one tried to avoid SQL for "scalability." Then we hit a bunch of edge cases that needed transactional integrity, and we were in a world of pain.

My two more recent distributed job systems use PostgreSQL as a coordinator. They're literally built around SELECT FOR UPDATE SKIP LOCKED. One of them routinely controls 350 workers and the other does elaborate prioritization for thousands of jobs. Both of them will continue to run just fine until they're earning millions of dollars a year—for example, the one controlling 350 workers should scale to about 2,000 CPUs with a little work.

Hyperscale technologies are fairly cheap right up until you discover you need transactions. At that point, faking transactional semantics on top of an eventually consistent data store becomes an engineering nightmare.

So sit down, and do the math. If your company was earning $100 million/year, how big would your distributed system need to be? Can you easily get a PostgreSQL instance big enough to handle that load? (Or could you just shard per client?) If so, strongly consider using PostgreSQL. It makes a hundred things trivial.

bunderbunder

Similar experience here. Multiple times, I've pushed an SQL-based queue a couple orders of magnitude past the scale where others say SQL craps out and a distributed solution is an absolute requirement. And the SQL solution is typically simpler, requires fewer compute resources, and easier to support in production.

But, to make it work, you've got to know the database well enough to know that things like SELECT FOR UPDATE SKIP LOCKED exist in the first place. Which is a kind of knowledge that's getting quite rare these days, because more and more engineers grow up never having known a world where they aren't walled off from their DBMS's true capabilities by a heavyweight ORM.

runeks

> Multiple times, I've pushed an SQL-based queue a couple orders of magnitude past the scale where others say SQL craps out and a distributed solution is an absolute requirement.

What about availability, though? The distributed solution is also useful to avoid downtime in case of single node failure.

Is there an off-the-shelf solution that lets me do that with Postgres? I know the newest version (16) just added active-active replication, but I wouldn’t know how to use that to achieve resilience.

adamckay

But if you're using Postgres as your queuing system because you're already using it as your core database technology for your app, you've got the same issue. If your single Postgres instance is down then your app is, too, and won't be enqueuing more jobs.

And unless your jobs are trivial then it's highly likely that they interact with your app in some way so it doesn't really matter if your workers are distributed and up, they're not able to complete their work because your app is down because of a single-node Postgres.

dekkers

What the article writes about scalability also applies to availability. Does the queue need 99.999% or 99.9999% uptime? Or is the Service Level Objective actually 99.99%, 99.9% or even 99.5%?

With 99.99% you can have 4 minutes of downtime a month. If failover to a hot standby takes a minute then that shouldn't be a problem to achieve a 99.99% uptime SLO.

bunderbunder

I don't know PostgreSQL as well. I mostly use MSSQL in production, and it's had good replication for ages, so I just wasn't really that worried about single-node failure for the critical stuff.

And, frankly, even for the less-critical stuff that was only running on a single node, I still dealt with fewer availability problems back in the day than I do now that everything's gone distributed. I think that a thing that's been forgotten over the years is that a lot of this stuff that distributed systems do to be reliable was more about digging oneself out of the hole that was created by running on lots of cheap hardware instead of using a single server with redundancy built-in. I acknowledge that, past a certain scale, that's the only option that makes sense. But if you're not operating at that scale then there's a good chance it's all just verschlimmbessern.

nerpderp82

Design (and test) for 10x your current scale, build for what you need now. And the system has to be able to handle peak loads, and if you don't know what those are build in a safety margin or a way to shed or defer work if you need to.

Everything is a tradeoff, optimize for the things that need optimizing, and determining what those are is the hallmark of a good engineer.

insanitybit

I've found the same. You should understand what your 10x / 100x growth solution would look like (assuming that that's relevant - obviously if you have no intent to hit that scale, don't bother). Build your system to handle your 1-1.5x, maybe 10x scale, and make sure you're not blocking the 10-100x solutions by doing so.

aaroninsf

"faking transactional semantics on top of an eventually consistent data store becomes an engineering nightmare"

PREACH

If this is news to you, you just justified a week's worth of browsing HN on the clock reading this statement. Scribble it on a scrap of paper and keep it taped to your ATM card.

merb

you can even built a advisory lock queue in Postgres, which is way slower but has some benefits.

pas

slower to build or slower to run?

merb

slower to run, but when you keep the postgres connection open you will know that the job is still running, while with for update skip locked you would need to have a status and a job_timeout basically.

so pg_try_advisory_lock/pg_advisory_unlock can lock over transactions while for update skip locked can't, thus you would either need to keep a transaction open or use status+job_timeout (and in postgres you should not use long transactions)

basically we use c#, but we looked into https://github.com/que-rb/que which uses advisory_locks, since our jobs take like 1 min to 2 hours it was a no-brainer to use advisory_locks. it's just not the best thing if you have thousands of fast jobs per second, but for a more moderate queue where you have like 10000 jobs per minute/10 minutes/30 minutes and they take like 1 min to 2 hours its fine.

we also do not delete jobs, we do not care about storage since the job table basically does not take a lot. and we have a lot of time to catchup at night since we are only in europe

0x457

IMO the main issue with it - advisory locks in postgres require an open connection being held the entire time lock is taken. Combine that with thread per connection model...

aduffy

For several projects I’ve opted for the even dumber approach, that works out of the box with every ORM/Query DSL framework in every language: using a normal table with SELECT FOR UPDATE SKIP LOCKED

https://www.pgcasts.com/episodes/the-skip-locked-feature-in-...

It’s not “web scale” but it easily extends to several thousand background jobs in my experience

matsemann

I've done even simpler without locks (as no transaction logic), where I select a row, and then try to update a field about it being taken. If 1 row is affected, it's mine. If 0, someone else did it before me and I select a new row.

I've used this for tasks at big organizations without issue. No need for any special deployments or new infra. Just spin up a few worker threads in your app. Perhaps a thread to reset abandoned tasks. But in three years this never actually happened, as everything was contained in try/catch that would add it back to the queue, and our java app was damn stable.

refibrillator

PSA: This is a read-modify-write pattern, thus it is not safe under concurrency unless a transaction isolation level of SERIALIZABLE is specified, or some locking mechanism is used (select for update etc).

deredede

The part about checking the number of affected rows hints at using `UPDATE ... WHERE ...` which should act as an atomic CAS regardless of isolation level.

Edit: To clarify, I mean `SELECT id WHERE used = 0` followed by `UPDATE ... SET used = 1 WHERE id = ... AND used = 0`

mjb

This should be safe under SI (other than the ABA issue, which isn't even fixed with serializable). The update forces a W-W conflict, which is sufficient to make the behavior serializable under SI (and therefore, I think but am not sure, PG's RR level too).

andrelaszlo

I guess you update it with the assigned worker id, where the "taken by" field is currently null? Does it mean that workers have persistent identities, something like an index? How do you deal with workers being replaced, scaled down, etc?

Just curious. We maintained a custom background processing system for years but recently replaced it with off the shelf stuff, so I'm really interested in how others are doing similar stuff.

matsemann

No, just update set taken=1. If it was a change to the row, you updated it. If it wasn't, someone updated before you.

Our tasks were quick enough so that all fetched tasks would always be able to be completed before a scale down / new deploy etc, but we stopped fetching new ones when the signal came so it just finished what it had. I updated above, we did have logic to monitor if a task got taken but never got a finished status, but I can't remember it ever actually reporting on anything.

calrain

I've done this successfully with a web service front that retrieves jobs to send to workers for processing, by using a SQL table queue. That web service ran without a hitch for a long time, serving about 10 to 50 job consumers for fast and highly concurrent queues.

My approach was:

- Accept the inbound call

- Generate a 20 character random string (used as a signature)

- Execute a sql query that selects the oldest job without a signature and write the signature, return the primary key of the job that was updated.

- If it errors for any reason, loop back and attempt again, but only 10 times, as some underlying issue exists (10 collisions is statistically improbable for my use case)

- Read the primary key returned by that sql query and read it, comparing it's signature to my random one.

- If a hit, return the job to the caller

- If a miss, loop back and start again, incrementing attempts by 1.

The caller has to handle the possibility that a call to this web service won't return anything, either due to no jobs existing, or the collision/error threshold being reached.

In either case, the caller backs for it's configured time, then calls again.

Callers are usually in 'while true' loops, only existing if they get an external signal to close or an uncontrolled crash.

If you take this approach, you will have a function or a web service that converts the SQL table into a job queue service. When you do that, you can build metrics on the amount of collisions you get while trying to pull and assign jobs to workers.

I had inbuilt processes that would sweep through jobs that were assigned (had a job signature) and weren't marked as complete, it actioned those to handle the condition of a crashed worker.

There are many many other services the proper job queues offer, but that usually means more dependencies, and code libraries / containers, so just build in the functionality you need.

If it is accurate, fast enough, and stable, you've got the best solution for you.

/edited for formatting

valzam

The reason why you want to use skip locked is so that Postgres can automatically skip rows that are being concurrently accessed for updating the "status". You are right, if you update a "status" field you don't really need to worry about advisory locks and skipping rows that are locked but it still helps with performance if you have a decent amount of concurrent consumers polling the table.

samus

I recently got introduced to this system at work, and also built a new job using it. It works fine, but since I had to implement work stealing to deal with abandoned jobs in a timely manner, I wouldn't dare to use it for actions that absolutely must not happen twice.

A1kmm

Exactly-once is only meaningfully possible if you have a rollback for tasks of unknown completion state - for example if the task involves manipulating the same database as the one controlling the task execution. Otherwise, it becomes the (impossible to solve) two-generals problem between updating the task status and performing the task.

mbb70

I've done the same with MongoDB with findOneAndModify, simple and solid

ttfkam

I think you mean findOneAndUpdate, and while simple, I wouldn't call it solid.

https://stackoverflow.com/a/76821755

dminor

Agenda uses this, and we found the hard way on mongo 4 that it can lead to mongo spinning the CPU at 100% if it gets too many at once. No idea if they've fixed it in later versions.

klysm

With what transaction isolation level?

bushbaba

You could even use a timestamp for handling what if this task was never finished by the worker who locked the row.

surprisetalk

I recently published a manifesto and code snippets for exactly this in Postgres!

  delete from task
  where task_id in
  ( select task_id
    from task
    order by random() -- use tablesample for better performance
    for update
    skip locked
    limit 1
  )
  returning task_id, task_type, params::jsonb as params
[1] https://taylor.town/pg-task

thom

Presumably it's okay that this loses work if your task runner has an error?

surprisetalk

If you read my guide, you’ll see that I embed it in a transaction that doesn’t COMMIT until the companion code is complete :)

For example, I run the above query to grab a queued email, send it using mailgun, then COMMIT. Nothing is changed in the DB unless the email is sent.

muti

From the linked article

> The task row will not be deleted if sendEmail fails. The PG transaction will be rolled back. The row and sendEmail will be retried.

Karrot_Kream

In my experience, a queue system is the worst thing to find out isn't scaling properly because once you find out your queue system can't architecturally scale, there's no easy fix to avoid data loss. You talk about "several thousand background jobs" but generally, queues are measured in terms of Little's Law [1] for which you need to be talking about rates; according to Little's Law namely average task enqueue rate per second and average task duration per second. Raw numbers don't mean that much.

In the beginning you can do a naive UPDATE ... SET, which locks way too much. While you can make your locking more efficient, doing UPDATE with SELECT subqueries for dequeues and SELECT FOR UPDATE SKIP LOCKED, eventually your dequeue queries will throttle each other's locks and your queue will grind to a halt. You can try to disable enqueues at that point to give your DB more breathing room but you'll have data loss on lost enqueues and it'll mostly be your dequeues locking each other out.

You can try very quickly to shard out your task tables to avoid locking and that may work but it's brittle to roll out across multiple workers and can result in data loss. You can of course drop a random subset of tasks but this will cause data loss. Any of these options is not only highly stressful in a production scenario but also very hard to recover from without a ground-up rearchitecture.

Is this kind of a nightmare production scenario really worth choosing Boring Technology? Maybe if you have a handful of customers and are confident you'll be working at tens of tasks per second forever. Having been in the hot seat for one of these I will always choose a real queue technology over a database when possible.

[1]: https://en.wikipedia.org/wiki/Little%27s_law

mlyle

> and are confident you'll be working at tens of tasks per second forever.

It's more like a few thousand per second, and enqueues win, not dequeues like you say... on very small hardware without tuning. If you're at tens of tasks per second, you have a whole lot of breathing room: don't build for 100x current requirements.

https://chbussler.medium.com/implementing-queues-in-postgres...

> eventually your dequeue queries will throttle each other's locks a

This doesn't really make sense to me. To me, the main problem seems to be that you end up with having a lot of snapshots around.

Karrot_Kream

> https://chbussler.medium.com/implementing-queues-in-postgres...

This link is simply raw enqueue/dequeue performance. Factor in workers that perform work or execute remote calls and the numbers change. Also, I find when your jobs have high variance in times, performance degrades significantly.

> This doesn't really make sense to me. To me, the main problem seems to be that you end up with having a lot of snapshots around.

The dequeuer needs to know which tasks to "claim", so this requires some form of locking. Eventually this becomes a bottleneck.

> don't build for 100x current requirements

What happens if you get 100x traffic? Popularity spikes can do it, so can attacks. Is the answer to just accept data loss in those situations? Queue systems are super simple to use. I'm counting "NOTIFY/LISTEN" on Postgres as a queue, because it is a queue from the bottom up.

klauserc

If you find yourself in that situation, migrating to a more performant queuing solution is not that much of a leap. You already have an overall system architecture that scales well (async processing with a queue).

_Ideally_ the queuing technology is abstracted from the job-submitters/job-runners anyway. It's a bit more work if multiple services are just writing to the queue table directly.

I agree that the _moment_ the system comes to a screeching halt is definitely not fun.

asdfaoeu

You are going to have the same scaling issues with your datastore. I don't really understand why you say that your dequeue queries will throttle each others locks and grind it to a half? Isn't that the whole point of SKIP LOCKED?

adatta02

This is more or less how graphile, https://github.com/graphile/worker is implemented.

ricardobeat

That's what's in the article.

somsak2

Fourth paragraph of the post:

>Applied to job records, this feature enables simple queue processing queries, e.g. SELECT * FROM jobs ORDER BY created_at FOR UPDATE SKIP LOCKED LIMIT 1.

Klonoar

I’ve used this for a queue with millions of items and some indexes. It “just works”.

theK

Ditto.

Also, postgres partial indexes can be quite helpful in situations where you want to persist and query intermediate job lifecycle state and don't want multiple rows or tables to track one type of job queue

ritzaco

I often see the "engineers copy FAANG infrastructure because they want to be cool, even though their needs are completely different" take as a kind of attack on engineers.

But I think a lot of it is also about knowledge and documentation. If I want to copy FAANG or another startup, and set up an infinitely scalable queue-based architecture, I can find dozens of high quality guides, tutorials, white papers etc, showing me exactly how to do it. Yes maintenance is higher, but I can get set up with redis, SQS, any of the 'scalable' solutions within a few hours of copy-pasting commands and code and configuration from a reputable source.

If I want to use NOTIFY in postgres? I googled "SQLALchemy notify listen postgres" and I find a few unanswered stackoverflow questions and a github gist that has some code but no context.

I would honestly love to use this approach for a side project, but I don't have 2-3 days to figure it out on my own. The direct choice for me might seem to be

* simple, but not scalable (ie just use postgres)

* complex, but scalable (ie redis, sqs, whatever)

and then it's a tradeoff, and the argument goes that I am blinded by cool tech and FAANG and I'm choosing complex but scalable, even though I don't need scalable.

But taking into account guides and other resources, the choice for me is actually

* complex and not scalable (this, because I don't know how to implement it and I can't predict what pitfalls I might face if I try)

* simple and scalable (what everyone actually does)

and that makes the engineer's choice to follow faang look a lot more reasonable.

zozbot234

> If I want to use NOTIFY in postgres?

The nice thing about "boring" tech like Postgres is that it has great documentation. So just peruse https://www.postgresql.org/docs/current/sql-notify.html . No need for google-fu.

ritzaco

Python, Flask, SQLAlchemy, and Postgres all have great documentation individually, but if I am building an application at the intersection often a guide on exactly how to join them all up is much faster than using each individually and trying to figure out the interactions in four places.

AWS white papers and engineering blogs tend to give me everything I need in one place, and I don't think there are any for apps built with NOTIFY.

spprashant

SQLAlchemy is an extra abstraction blocking your path here. While you probably should still use an ORM for your regular relation queries, you are not gaining anything significant by trying to use SQLAlchemy for implementing a queue backend. You can write raw SQL with psycopg2 (which is already a dependency in your project thanks to SQLAlchemy), and wrap these raw queue management SQL in a nice little Python module which you can later reuse for other applications as well.

randomdata

If SQLAlchemy’s documentation doesn’t explain its use with LISTEN/NOTIFY, perhaps it’s the wrong tool for the job? You are presumably not going to use it with Redis or SQS queues, so why are you so hung up on it here?

sgarland

IMHO - and this probably why I’ll never launch a product - you should understand each piece of your infra. Not necessarily to the metal on each, but I don’t think it’s unreasonable to be able to explain why each piece is necessary, what it’s doing, and how to troubleshoot it when it breaks.

With your mentioned list, three of them are Python, so that significantly reduces the breadth.

valenterry

Scalability comes at a price. Unless you need it, it makes you less flexible. And that is exactly what you don't want to be as a startup.

For instance, if you use postgres with a low load, it is almost trivial to migrate schemas, add new constraints, do analytics etc.

If you use SQS, Cassandra, whatever, then you now get scalability/availability but it becomes much more time-consuming to change things if you figure out that your original design doesn't work. Say the business comes and says "please add constraint X. All users of type foo must never combined value bar at the same time."

It is possible to implemented that without postgres, but it is not easy or simple, especially if you need to make changes.

Therefore, my take is that you either use postgres to stay flexible or you use both postgres and something else on top of it when you know that you won't have to change things. Of course this means additional infrastructure/maintenance overhead.

In the end it's always a trade-off, you just need to know when to trade which thing off against what.

Lutger

This is all true, important and often misunderstood, but beside the point made to which you reply.

There's a (sort of) objective trade-off to be made, but another dimension is how familiar you are with the solution and/or how quick can you implement it using documentation and examples.

If you happen to know exactly how to create a horizontally scalable microservice based hairball with nodejs, then maybe you are quicker with that than with some traditional django monolith using a nicely normalized sql database (or whatever).

In a startup, you are often always squeezed for time, so making the objectively right tradeoff for your context is usually secondary to the simple question of 'when can you ship?' If the scalable-yet-inflexible is what stack overflow abundantly recommends and documents, maybe this is quicker to get done now, whatever the consequences are on the longer run.

valenterry

Then maybe I just don't understand your post. To me it sounds like you say "FAANG-technology" is chosen because of documentation. But I don't think that the documentation of e.g. SQS is better than the postgres (if you can even compare the too).

If someone says "I choose X over Y because I used X before (or because X is better documented" then fair enough - but I rarely hear that as an argument when choosing "FAANG-technology".

gazpacho

> Scalability comes at a price. Unless you need it, it makes you less flexible. And that is exactly what you don't want to be as a startup.

This is a valid comment. I’ve chosen Postgres in the past for the features, not the performance. For example guaranteed at most once delivery (via row locks) and filtering of jobs based on attributes (it’s a database after all).

acaloiar

> If I want to use NOTIFY in postgres? I googled "SQLALchemy notify listen postgres" and I find a few unanswered stackoverflow questions and a github gist that has some code but no context.

Author here. I would say that my post is less targeted at someone like you (application developer, presumably) and more targeted at library developers.

I don't think it's ideal for everyone to be implementing bespoke, Postgres-backend (or any other queue for that matter) background job workers in their applications. There's a lot of nuance and implementation details to get wrong with background jobs, and for that reason I think background work should generally be done by more comprehensive, dedicated libraries or frameworks.

If every Rails application didn't have Sidekiq/Active Jobs and instead had bespoke background worker implementations, Rails applications would likely have a much less rosy reputation on account of their unreliability.

JohnBooty

Thank you for writing this.

I love the article's point, and I tend to feel that the "chasing the cargo cult of 'scale'" is maybe the biggest problem I see in development teams today. It is certainly the biggest problem that I rarely hear anybody talking about.

    Author here. I would say that my post is less 
    targeted at someone like you (application developer, 
    presumably) and more targeted at library developers.
I think the article might benefit from clarification on this point.

Reading the HN comments, I see that I'm not the only person who came away with a misunderstanding there.

Again, I 100% love the overall point.

acaloiar

Yep, fair critique. Glad you enjoyed it.

ranting-moth

> Yes maintenance is higher

This is what kills you if you're a small startup. Of course it gives you a lot too. But if you're belly up then it doesn't matter.

Of course go for whatever solution gives you the most benefits while not distracting you too much from your main goal.

I've seen a startup where devs spent around 80% of the time fighting their tools and infrastructure. They had a 3 month runway and today there's a massive hole at the end of that runway. I still shudder form just the thought of it.

PH95VuimJjqBqy

This line stood out to me too as very dismissive of something that can absolutely bring you to a standstill.

It's not a small thing and it's not something you should be dismissing out of hand.

PH95VuimJjqBqy

> and that makes the engineer's choice to follow faang look a lot more reasonable.

Your argument is that going with FAANG level designs saves time?

And the crux of your argument is that you're able to find a guide online?

I strongly suspect you don't have a healthy respect for complexity.

natmaka

Another point is: you don't need scalable now, but may (or even hope) to need it later, and you know that when you will need it you probably won't have time to invest into migrating this component.

Also: you may think that you may one day want to be hired by a FAANG.

mkl95

How relevant is it to be hired by a FAANG? I have some experience with "web scale" systems, but I tend to reject FAANG recruiters because Leetcode makes me want to become an apple farmer (no pun).

vineyardmike

> How relevant is it to be hired by a FAANG?

If you want a job there, very relevant.

> I tend to reject FAANG recruiters because Leetcode

I understand the pain of leetcode interviews. They’re terrible. But optimizing your career based on the interview process seems… backwards?

FAANG companies (for example) are very relevant if you want to make a lot of money and live in Silicon Valley without being a successful founder/VC. Apple farmers… not so much. If you live in Tokyo, then FAANG companies might be less relevant.

Either way, doesn’t seem like the interview is where you should draw the line.

evantbyrne

Hard disagree. Some things are difficult to change later on, others not so much, and you can't do everything for v1. The product has to launch at some point. Your choice of queue is one of the things you'll be able to change. Don't complicate things unless you've run the numbers and know you'll need to. A lot of very large companies do just fine with using relational databases as queues.

runeks

If your first point holds, then all app components should be “scalable” from the beginning, because you may not have time to make it so later.

And that’s terrible advice, of course. You very likely will have time to scale things up (customer count almost never increase dramatically from one day to the next), and even if you don’t you’ll most likely never deliver a useable product if all components need to be “scalable” from the beginning.

natmaka

This is all a matter of balancing constraints. I wrote "you know that when you will need it you probably won't have time to invest into migrating this component.". I didn't wrote "always go for the more scalable".

For a starter "the most scalable component is always the most difficult to integrate and use" isn't true, and "whatever your team knows or don't know, the challenges tied to integrating then exploiting a given component are always the same". There are many parameters. In some contexts taking into account the team's subjective preferences is crucial.

There is no universal rule, à la "always go for the most scalable, neglecting any other consideration" or "the minimal immediate effort is always the best option".

tmpX7dMeXU

Yeah. Just keep it to side-projects only. Anyone practicing resume-driven development on my team will be (and I’m exaggerating here) shown the door.

natmaka

Decisions are rarely made upon a single criterion, and such a criterion isn't usually formulated explicitly.

_jal

Building things you don't need in hopes that you'll need them because things you aren't spending time on will grow to demand them is like hiring an investment manager when you're in debt.

JohnBooty

I agree with you:

    but I can get set up with redis, SQS, any of the 
    'scalable' solutions within a few hours of copy-pasting 
    commands and code and configuration from a reputable 
    source

    [...] and that makes the engineer's choice to follow faang 
    look a lot more reasonable. 
I also agree with the linked article's overall point, but I think the specific "job queue" example from the article is actually a bad example because:

- "rolling your own" job queue is not rocket science but is nontrivial and easy to get wrong w.r.t. locking etc.

- the argument against taking additional dependencies is that now you have one more tool to master, understand, and manage. but my experience is that job queues like Sidekiq are not a significant overhead in terms of developer burden.

edanm

That's a great point, that often people misunderstand.

It's even worse than you say though. As someone who has used neither Postgres or Redis for queueing, how am I supposed to know what is the "simple" solution here and if it really solves my problem?

Almost everyone uses solution X. A few people are saying "no, just use solution Y, it's obviously enough and far simpler". Even if it is far simpler, how am I supposed to know whether there are some hidden gotchas here?

Much safer to bet on technology that is proven to work, given that large amounts of people are using it in production for this purpose.

jpgvm

Few things.

1. The main downside to using PostgreSQL as a pub/sub bus with LISTEN/NOTIFY is that LISTEN is a session feature, making it incompatible with statement level connection pooling.

2. If you are going to do this use advisory locks [0]. Other forms of explicit locking put more pressure on the database while advisory locks are deliberately very lightweight.

My favorite example implementation is que [1] which is ported to several languages.

[0] https://www.postgresql.org/docs/current/explicit-locking.htm...

[1] https://github.com/que-rb/que

MrPowerGamerBR

One reason that makes me dislike NOTIFY/LISTEN is that issues with it are hard to diagnose.

Recently I had to stop using it because after a while all NOTIFY/LISTENS would stop working, and only a database restart would fix the issue https://dba.stackexchange.com/questions/325104/error-could-n...

runeks

At my previous company, they switched from using NOTIFY/LISTEN for Postgres notifications to a custom solution built on top of logical replication. As I understand it, part of the reason was reliability. I never touched that part of the code, but I believe the idea was to subscribe to logical replication updates and send out notifications based on these.

cpursley

This is what we do. Even put together a library for it:

https://github.com/cpursley/walex

teaearlgraycold

Is there a Node version of Que?

mianos

Skype used postgres as queue with a small plugin to process all their CDR many years ago. I have no idea if it used these days but it was 'web scale', 10 years ago. Just working, while people on the internet argued about using a database as a queue is an anti-pattern.

Having transactions is quite handy.

https://wiki.postgresql.org/wiki/SkyTools

I did a few talks on this at Sydpy as I used it at work quite a bit. It's handy when you already have postgresql running well and supported.

This said, I'd use a dedicated queue these days. Anything but RabbitMQ.

reubenbond

> Anything but RabbitMQ.

Would you mind elaborating on this? I'd be happy for others to chime in with their experiences/opinions, too.

mianos

The software works excellently in a development environment and performs well when running as a single instance. However, I encountered issues when scaling it up for high availability in a clustered setup. The system would fail inconsistently, with two masters consuming messages simultaneously, which wasn't ideal for my use case. Eventually, I switched to Kafka and haven't revisited the original solution since.

It's worth noting that these issues might have been due to my improper configuration. Nevertheless, if the configuration process is fraught with pitfalls, that's problematic in itself. I've had these experiences more than once.

Additionally, I found a critical race condition in the Python library, rendering it practically unusable for me. I submitted a bug report with a minimal example demonstrating the issue. I considered fixing it myself, but since using RabbitMQ wasn't crucial for my project, I switched to ZeroMQ, which didn't require a broker. The issue was acknowledged and fixed about a year later. At the time, I had to assume that nobody else was using the Python bindings.

Three years ago, I worked on a project that used the software for a Celery queue. Messages would occasionally go missing, although this could have been a configuration issue on our part. Ultimately, we replaced it with a Redis queue (not the best practice, I admit) and didn't look back. This was for a lower-availability use case where a single instance of Redis sufficed.

boyter

I used RabbitMQ for a while and nothing but problems.

Admittedly I probably shouldn't have used it the way I did. I dumped many millions of tasks into it, then fanned out processes pulling from that queue that took a variable amount of time to run. Some ran in seconds, some hours.

I had picked RabbitMQ because I wanted that queue to be durable and resist workers dying, or being restarted. However long lived tasks like this is not really what it was designed for (in my opinion). I kept running into issues where it would take a long time to restart, and stop answering connections and need a restart to continue. I ended up having to write monitoring code to check for this and handle it to have it be slightly reliable.

Im sure it works well for smaller short lived messages, but considering the issues I bumped into I would be hesitant to try it. Id probably reach to redis first with wrappers allowing me to swap out to any other queue as required first.

phamilton

I can share our experience with RabbitMQ/SQS/Sidekiq. Our two major issues have been around the retry mechanism and resource bottlenecks.

The key retry problem is "What happens when a worker crashes?".

RabbitMQ solves this problem by tying "unacknowledged messages" to a tcp connection. If the connection dies, the in-flight messages are made available to other connections. This is a decent approach, but we hit a lot of issues with bugs in our code that would fail to acknowledge a message and the message would get stuck until that handler cycled. They've improved this over the past year or so with consumer timeouts, but we've already moved on.

The second problem we hit with RabbitMQ was that it uses one-erlang-process-per-queue and we found that big bursts of traffic could saturate a single CPU. There are ways to use sharded queues or re-architect to use dynamically created queues but the complexity led us towards SQS.

Sidekiq solves "What happens when a worker crashes?" by just not solving it. In the free version, those jobs are just lost. In Sidekiq Pro there are features that provide some guarantees that the jobs will not be lost, but no guarantees about when they will be processed (nor where they will be processed). Simply put, some worker sometime will see the orphaned job and decide to give it another shot. It's not super common, but it is worse in containerized environments where memory limits can trigger the OOM killer and cause a worker to die immediately.

The other issue with Sidekiq has been a general lack of hard constraints around resources. A single event thread in redis means that when things go sideways it breaks everything. We've had errant jobs enqueued with 100MB of json and seen it jam things up badly when Sidekiq tries to parse that with a lua script (on the event thread). While it's obvious that 100MB is too big to shove into a queue, mistakes happen and tools that limit the blast radius add a lot of value.

We've been leaning heavily on SQS the past few years and it is indeed Simple. It blocks us from doing even marginally dumb things (max message size of 256gb). The visibility timeout approach for handling crashing workers is easy to reason about. DLQ tooling has finally improved so you can redrive through standard aws tools. There are some gaps we struggle with (e.g. firing callbacks when a set of messages are fully processed) but sometimes simple tools force you to simplify things on your end and that ends up being a good thing.

imiric

That's very insightful, thanks for sharing.

Do you have any experience with NATS, and how would you compare it to RMQ/SQS?

The authors claim it guarantees exactly-once delivery with its JetStream component, and it looks very alluring from the documentation, but looks can be deceiving.

chucke

SQS limits you further in other ways. For instance, scheduled tasks are capped to 15m (delaySconds knob), so you'll be stuck when implementing the "cancel account if not verified in 7 days" workflow. You'll either reenqueue a message every 15m until its ready (and eat your the SQS costs), or build a bespoke solution only for scheduled tasks using some other store (the database usually) and another polling loop (at a fraction of the quality of any other OSS tool). This is a problem well solved by sidekiq, despite the other drawbacks you mention.

Bottom line, there is no silver bullet.

pqdbr

> Sidekiq solves "What happens when a worker crashes?" by just not solving it. In the free version, those jobs are just lost.

I've been using Sidekiq for 11+ years in production and I've never seen this happen. Sidekiq (free version) has a very robust retry workflow. What are you talking about here?

hn_throwaway_99

> This said, I'd use a dedicated queue these days.

I agree, primary reason being that if you're in the cloud (thus this applies to a lot of people but obviously not everyone), all the cloud providers have extremely easy to use, and cheap, hosted queueing tech. Even if you're worried about vendor lockin, queueing primitives are so small (basically push and pop), that it's relatively easy to write things in a way so it would be easy to migrate if necessary.

klysm

Is it cheap if you already using Postgres though?

hn_throwaway_99

I commented elsewhere, but in most cases I think it would be a bad idea to host your queue tables and logic in the same instance that hosts your primary data. This if you spin up another PG instance in the cloud, it could very well end up costing you more than a default cloud-hosted queue service.

justinclift

> Anything but RabbitMQ.

Bad experiences? If so, was it the tech itself or surrounding stuff? (aka maybe docs, maybe community, etc)

lmm

> Skype used postgres as queue with a small plugin to process all their CDR many years ago. I have no idea if it used these days but it was 'web scale', 10 years ago. Just working, while people on the internet argued about using a database as a queue is an anti-pattern.

It works great until it doesn't, and the way it breaks puts you in a state that's very difficult to recover from. And if your excuse for using a database as a queue was that you were already running a database, that cuts both ways: congratulations, your queue mess has now brought down your primary datastore too.

mianos

> It works great until it doesn't, and the way it breaks puts you in a state that's very difficult to recover from.

This is similar to saying, 'if I mess up all the tables in one database I wreck the rest'. Just my opinion, but this is not actually a thing in databases.

Maybe compromised the performance of one database due to another loading things up? I think database are developed with this as an important consideration. I can't say I have seen this, but admit it's a possibility.

Also, if you run one postgres, you won't have of an issue running another if you have the experience in production.

lmm

> This is similar to saying, 'if I mess up all the tables in one database I wreck the rest'. Just my opinion, but this is not actually a thing in databases.

If you mess up the tables in one database it doesn't affect others, but if you lock up the server where it can't respond to queries, that affects every database running on that server.

> Also, if you run one postgres, you won't have of an issue running another if you have the experience in production.

We're talking about using a different feature that you presumably haven't used before, so you won't necessarily know about the admin side of that.

dventimi

I wonder if it would be possible to have more than just one PostgreSQL database.

lmm

It is, but it comes with a lot of the same costs as having a PostgreSQL database and a proper queue system.

hn_throwaway_99

> congratulations, your queue mess has now brought down your primary datastore too.

Just don't put your queue tables/logic in the same DB instance as your datastore. There are still a lot of benefits to using the same tech even if you have segregated instances.

5id

One of the biggest benefits imo of using Postgres as your application queue, is that any async work you schedule benefits from transactionality.

That is, say you have a relatively complex backend mutation that needs to schedule some async work (eg sending an email after signup). With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.

theptip

Worth being clear that bridging to another non-idempotent system necessarily requires you to pick at-least-once or at-most-once semantics. So for emails, if you fail awaiting confirmation of your email you still need to pick between failing your transaction and potentially duplicating the email, or continuing and potentially dropping it.

The big advantage is for code paths which async modify your DB; these can be done fully transactionally with exactly-once semantics since the Job consumption and DB update are in the same transaction.

skybrian

Email might never arrive, though. The only way to know they got it is to have them follow a link to confirm.

collinvandyck76

That's kind of missing the parent's point. If you wanted to ensure emails arrive, that sounds like another queue that could be backed by a different table that is also produced into as part of the original transaction.

laurencerowe

> One of the biggest benefits imo of using Postgres as your application queue, is that any async work you schedule benefits from transactionality.

This is a really important point. I often end up using a combination of Postgres and SQS since SQS makes it easy to autoscale the job processing cluster.

In Postgres I have a transaction log table that includes columns for triggered events and the pg_current_xact_id() for the transaction. (You can also use the built in xmin of the row but then you have to worry about transaction wrap around.) Inserting into this row triggers a NOTIFY.

A background process runs in a loop. Selects all rows in the transaction table with a transaction id between the last run's xmin and the current pg_snapshot_xmin(pg_current_snapshot()). Maps those events to jobs and submits them to SQS. Records the xmin. LISTEN's to await the next NOTIFY.

matsemann

Good point. We alleviate that a bit by scheduling our queue adds to not run until after commit. But then we still have some unsafety, and if connection to rabbit is down we're in trouble.

robertlagrant

I agree - having to tell a database that something was processed, and fire off a message into RabbitMQ, say, is never 100% transactional. This would be my top reason to use this approach.

> With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.

This is true - definitely worth isolating what should be totally separate database code into different transactions. On the other hand, if your user is not created in the DB, you might not want your signup email. Just depends on the situation.

ryanbrunner

Another benefit of this is that you're guaranteed that the transaction is completed before the job is picked up. With redis-backed queues (or really anything else), you very quickly run into the situation where your queue executes a job depending on a database record existing prior to the transaction being committed (and the fix for this is usually awkward / complex code).

__jem

I'm not sure this is really an issue with transactionality as a single request can obviously be split up into multiple transactions, but rather that even if you correctly flag the email as pending/errored, you either need to process these manually, or have some other kind of background task that looks for them, at which point why not just process them asynchronously.

vb-8448

> With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.

An option could be use a second connection and a separate transaction to insert data in the queue table.

Ozzie_osman

One thing I love about Kafka is... It's just an append-only log, and a client is essentially just holding an offset. This is conceptually very simple to reason about. It's also persistent and pretty fault-tolerant (you can just go back and read any offset).

Unfortunately, Kafka carries with it enough complexity (due to the distributed nature) that it ends up not being worth it for most use-cases.

Personally I'd love something similar that's easier to operate. You'd probably be able to handle hundreds (if not thousands) of events per second on a single node, and without distributed complexity it'd be really nice.

And yes, in theory you could still use postgres for this (and just never delete rows). And maybe that's the answer.

valzam

Considering that you have a native "offset" (auto incrementing id) and the ability to partition by date I would say postgres is a great candidate for a simple Kafka replacement. It will also be significantly simpler to set up consumers if you don't really need to whole consumer group, partition etc. functionality.

maxbond

Unfortunately `serial` is not sufficient on it's own for that use case. If you observe the values 1, 2, and 4, you can't actually conclude whether 3 exists or not. That transaction may have failed after incrementing the serial, meaning 3 does not exist, or it may be an ongoing transaction which hasn't yet committed, meaning 3 exists but is not yet visible to you.

So if you update your offset to 4 before the transaction for 3 commits, you'll lose 3 forever (unless you control for this, eg by periodically detecting & requeueing orphaned jobs, or by using a strictly serial/gap free integer rather than the built in auto incrementing type).

sgarland

SERIAL/SEQUENCE/IDENTITY increment immediately, not at commit. They’re just reading from a generator. You can also get the current key from it if you’d like.

Advisory locks also exist, if you want to implement logic in the application to inform you of various row conditions without having the DB care about it.

But for the example given, you could do many things:

* Add some boolean columns for ack and complete. Performance due to cardinality of these will eventually start to suck if they’re being indexed, but by that point (millions of rows) you can have thought of another solution.

* Add ctime and atime columns, each of which can be handled by Postgres natively to update when created / written, respectively. This has the advantage of lending itself nicely to partitioning by date range, if that becomes necessary.

* Have three tables - available, in_progress, and completed. Use triggers or application logic to move entries in an atomic manner.

None of this is necessarily normalized, but if you’re receiving JSON payloads for the job, 1NF went out the window anyway.

chalcolithic

I used to hope that FasterLog would eventually be mature enough

rubenfiszel

We use exactly this for windmill (OSS Retool alternative + modern airflow) and run benchmarks everyday. On a modest github CI instance where one windmill worker and postgres run as containers, our benchmarks run at 1200jobs/s. Workers can be added and it will scale gracefully up to 5000jobs/s. We are exploring using Citus to cross the barrier of 5000j/s on our multi-tenant instance.

https://github.com/windmill-labs/windmill/tree/benchmarks

vhiremath4

We used postgres for some of our queues back when we were at ~10 msg/s. It scaled quite a bit, but, honestly, setting up SQS or some other queue stack in AWS, GCP, or Azure is so simple and purpose built for the task (with DL queues and the like built in), I don’t know why you wouldn’t just go that route and not have to worry about that system shitting the bed and affecting the rest of the DB’s health.

It seems foolish. I am a big fan of “use the dumbest tool”, but sometimes engineers take it too far and you’re left with the dumbest tool with caveats that don’t seem worth it given the mainstream alternative is relatively cheap and simple.

rtpg

What I've settled on is "store most job state in the DB, use task queues just to poke workers into working on the jobs".

Storing the job state in the DB means you can query state nicely. It's not going to exactly show the state of things but it's helpful for working through a live incident (especially when most job queues just delete records as work is processed).

And if you make all the background tasks idempotent anyways then you're almost always safe with running a thing like "send a job to the task queue to handle this job".

If you rely _just_ on message queues, there are a lot of times where you can have performance issues, yet have a lot of trouble knowing what's going on (for example, rabbitMQ might tell you the size of your queues, but offer little-to-no inspection of the data inside them).

vhiremath4

Ultimately you have to figure out the separation of concerns of the job state and other core state. Ranging from “all state stored in message and will never become out of sync” to “no state stored in message and will never become out of sync”. In between you have “some state stored in db and some in message” and what I’ve found to be useful is keeping stuff in the db that needs to have high end state integrity (or as you said just making sure jobs are cancellable/idempotent).

Tangible example:

We have a video transcoder queue. The state of the video model in our db can change as the video is being finalized in various ways. The transcoder generates thumbnails and assets from the video and also updates its state in the db. So we store job information in the message about what thumbnails we want to generate and the video ID but nothing else. This allows us to look up the video row, see if the same media was already transcoded from the video (and cancel the job), and, if not, run the job and update the video row.

Also (and I know you’re not saying this), but I’ve never understood the argument that keeping queues in Postgres leads to higher data integrity via transaction guarantees. The job is still running on another process outside of the db. The only time this could be true is if the job itself mostly updates state in the db, in which case it’s the small minority of queued workloads (with the majority needing to do non-db compute work).

bicijay

Transactions, data consistency. This is the answer that you will not find in SQS.

mjb

You can't do better than At Least Once if you're having side effects outside the database, so it's not clear that SQS's weaker semantics have any practical effect.

Rapzid

> I don’t know why you wouldn’t just go that route and not have to worry about that system shitting the bed

Because different software has different requirements. Not having an external service requirement other than Postgres might be a feature of an on-prem/b2b appliance.

Because some software may be projected never outgrow the capabilities of Postgres, and if it does moving to another service can be made very easy.

Because you want a transitional job system and the simplicity of doing it in Postgres.

scraplab

Yep, we process hundreds of thousands and sometimes a few million jobs daily inside Postgres, using Oban in Elixir.

Having transactional semantics around background jobs is incredibly convenient for things like scheduling email only if the transaction is successful, and so on.

You do need to do a little bit of autovacuum tuning, but once sorted it’s been great for us.

dools

I'm always surprised that when I see people talk about queues I never see anyone mention beanstalkd. I've been using it for basically everything for 10 years and it's solid as a rock, incredibly simple and requires basically no maintenance. It Just Works™

arthurbrown

I've used beanstalkd for personal projects previously with a similar experience to you.

For one such project, the message 'priority' feature was a life saver and a feature that is not super common in competing solutions.

mixmastamyk

Yes, I liked it when I encountered it a few years back. Lately been using redis and rq on a recent project since we're already using redis for caching. Is there much difference other than that?

MattIPv4

We process around 1 million events a day using a queue like this in Postgres, and have processed over 400 million events since the system this is used in went live. Only issue we've had was slow queries due to the table size, as we keep an archive of all the events processed, but some scheduled vacuums every so often kept that under control.

andrewstuart

Partial indexes might help.

runeks

Also: an ordered index that matches the ordering clause in your job-grabbing query. This is useful if you have lots of pending jobs.

emilsedgh

Exactly. A partial index should make things fly here.

djbusby

Active Queue table and then archive jobs to a JobDone table? I do that. Queue table is small but archive goes back many months

paulddraper

In modern PG you can use partitioned table for a similar effect.

MattIPv4

We just have a single table, with a column indicating if the job has been taken by a worker or not. Probably could get a bit more performance out of it by splitting into two tables, but it works as it is for now.

Daily Digest email

Get the top HN stories in your inbox every day.