Skip to content(if available)orjump to list(if available)

Tricking PostgreSQL into using an insane, but faster, query plan

srcreigh

The curious part is why postgres can't estimate that the size of the active runs<>stacks join will be very small.

A core part of query planning is estimating the best join order using a dynamic programming algorithm. So roughly, A(1M) <> B(1M) <> C(10) should use the join order A-(B-C), not (A-B)-C.

I bet it's something like, postgres doesn't know the correlation between runs_other.worker_id and runs_other.stack_id. It seems like its seeing low number of runs_other.worker_id, then estimating the stacks are split evenly among those small number of runs in the millions.

(Why it wouldn't know stacks.id correlates to one stack each, idk. Is stacks.id not the primary key? Is there a foreign key from runs? Very curious.)

What happens if you follow this guide to hint Postgres as to which columns are statistically related? [0]

Hinting PG's estimating stats may be a more persistent solution. Another core part of query planning is query rewriting-if they add a rule to simplify COUNT(select...)>0 to normal joins (which are equivalent I think?) then your trick may stop working.

[0]: https://www.postgresql.org/docs/12/multivariate-statistics-e...

cube2222

To be fair, we're using Aurora Serverless, so I'll have to dig deeper if we can use these kinds of hints. But thanks for the tip, this could be really useful!

mst

Given Aurora talks about itself as a replacement for pg/mysql that provides compatible skims I'm honestly wondering whether you're dealing with the pg optimiser here in any meaningful way.

(exactly how far along the scale from 'tweaked version' to 'basically just a skin on a completely new database' aurora is is ... opaque to me ... there's probably something out there that answers that but I haven't yet managed to find it)

Edited to add: zorgmonkey downthread points out they have support for some extensions like pg_hint_plan - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide... - which suggests at least a decent amount of pg code but I'm still curious exactly where the changes start.

cube2222

I obviously don't know its exact internal architecture, but based on my digging and trying out different PostgreSQL-native features, it looks like it's PostgreSQL instances as query executors with a custom (high-availability) storage layer, fronted by PgBouncer.

SigmundA

PostgreSQL really needs hints. Sometimes I as the developer know better than the optimizer and having to trick the optimizer to get it to do what I want is a ridiculous situation.

Also having the optimizer decide to do something bad off hours is not a good situation again hints or plan locking would help here.

quietbritishjim

Although absolute sacrilege to many, but I would honestly support a really explicit syntax to compose particular tables and indices. Not even in top of regular SQL query, I mean its own first class language.. E.g. "Take this index, restrict to this subrange; for each row, pick a row using this other index on the foreign key" (but it wouldn't have to be imperative like I've phrased it).

There are times when I already know how I want the query to execute, and I have to iteratively fiddle with the query to indirectly trick the planner into doing that. There's just no excuse for that arrangement. Even if I've made a mistake and my plan couldn't work efficiently – at least let me execute it inefficiently so I can discover and debug that.

derefr

Or, to put that another way: Postgres's query planner should be emitting an explicit bytecode that is then interpreted by a query-execution abstract machine. That same layer should also be exposed directly to clients over the network, allowing pre-planned query bytecode to be submitted directly in place of a textual query.

I've heart the rebuttal to this idea before, that even minor DB schema/configuration changes that should be "invisible" to the application (e.g. an index being created — or even refreshed — on the table) would result in the resulting bytecode being different; and so you'd never be able to ship such bytecode as a static artifact in your client.

But: who said anything about a static artifact? I'm more imagining a workflow like this:

1. user submits a query with leading command "PLAN" (e.g. "PLAN SELECT ...")

2. Postgres returns planned bytecode (as a regular single-column bytea rowset)

3. if desired, the user modifies the received bytecode arbitrarily (leaving alone anything the client doesn't understand — much like how optimization passes leave alone generated intrinsics in LLVM IR)

4. user later submits a low-level query-execution command over the binary wire protocol, supplying the (potentially modified) bytecode

5. if the bytecode's assumptions are still valid, the query executes. Otherwise, the query gives a "must re-plan" error. The client is expected to hold onto the original SQL and re-supply it in this case.

In other words, do it similarly to how Redis's EVAL + EVALSHA works; but instead of opaque handles (SHA hashes), the client receives a white-box internal representation.

Or: do it similarly to how compiling a shader on a game-console with unikernel games + unified RAM/VRAM works. The driver's shader compiler spits you back a buffer full of some GPU object code — which, running in kernel-mode, you're free to modify before running.

jhgb

> Or, to put that another way: Postgres's query planner should be emitting an explicit bytecode that is then interpreted by a query-execution abstract machine. That same layer should also be exposed directly to clients over the network, allowing pre-planned query bytecode to be submitted directly in place of a textual query.

And now you've invented Firebird, or even 1980's Interbase. ;)

mst

I suspect the extra compatibility/dependency surface would make this a non starter in practice but it's an absolutely fascinating thought experiment and I'm glad you wrote it up.

SigmundA

PostgreSQL doesn't even cache its own byte code, it replans every time!

tshaddox

You can't ship SQL queries and then do arbitrary schema changes either, right?

magicalhippo

There's no technical reason why one couldn't just select or join from an index as if it was a table or a view is there?

We also write queries knowing it will use a specific index, or we will create an index because of a specific query. And then we have to have a scheduled task to periodically recalculate statistics just so the DB server doesn't get silly ideas.

Of course it could be abused, but I'm in favor of having ways of letting programmers tell computers exactly what to do. Sometimes we really do know best.

SigmundA

Yes I would too. Over time I have come to appreciate at least being able to control order of operations, just forcing join order is many times enough of hint to keep things working properly.

Having used Linq a lot I would actually prefer more of that kind of chained statement approach that is more hands on without having to explicitly loop.

gfody

something like sql server's "use plan" it really comes in handy when you need it (https://forrestmcdaniel.com/2018/07/19/how-to-use-the-use-pl...)

orthoxerox

Clickhouse doesn't go that way all the way, but its approach to SQL support is very similar: it does exactly what you have written. It doesn't rewrite your query in any way: if you want to filter table A before you join it with table B, you write a subquery.

remus

Not that I disagree with your point, but the opposite of

> Also having the optimizer decide to do something bad off hours is not a good situation again hints or plan locking would help here.

Can also happen when you force a particular query plan, only for it to turn to treacle when some assumption you made about the data suddenly breaks.

As with most things there's no free lunch.

grogers

In practice what happens in this case is that the query slows down over time, often days or weeks after deploying the thing that makes the query worse. It takes time to fill up the data that slows things down. This gives you plenty of time to monitor and take action. If it does happen more quickly than that (e.g. if a table starts getting filled for the first time, statistics can be off) then rolling back is easy.

But if the query planner decides to screw up your query plan it happens instantaneously and there's no possibility of rollback - only emergency deploying code fixes to try to tweak the query. In most OLTP use cases, you almost always know exactly how a query is supposed to execute, so always using index hints is totally reasonable to prevent bad query plans.

Basically, you breaking your own query with bad hints usually breaks things a lot less and at better times than the query planner doing it, and is usually easier to fix too.

sagarm

Slow degradation is a common case, but a day with bad skew due to spam or other unusual activity can cause a previously optimal query plan to never complete.

lazide

When doing disaster planning, at least something blowing up because of a data change is preventable or mitigateable. Staging environments. Rollbacks, etc.

The system randomly deciding to drive itself off a cliff for no reason, with no known way to stop it next time is quite concerning.

Arnavion

And if you do find your query has turned to treacle, looking for whether you used a wrong query hint is already in your checklist, because it's right there in the query that you're already debugging. You can't forget to double-check it.

shock-value

It's basically shameful how the maintainers refuse to even entertain the thought of allowing query hints. These kinds of articles and anecdotes come up again and again, yet seemingly no movement whatsoever on the blanket ban on query hints in Postgres. (See https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion)

Postgres generally is great -- so many great features and optimizations and more added all the time. But its query optimizer still messes up, often. It's absolutely not the engineering marvel some would have you believe.

yjftsjthsd-h

> These kinds of articles and anecdotes come up again and again

Don't articles and anecdotes also come up again and again of developers feeding bad happens to the database and cratering their performance?

And that wiki page... Very explicitly isn't a blanket ban? They straight up say that they're willing to consider the idea if somebody wants to say how to do it without the pitfalls of other systems. The only thing they say they're going to ignore outright is people thinking they should have a feature because other databases have it (which seems fair).

erichocean

> people thinking they should have a feature because other databases have it (which seems fair)

Literally NO ONE wants query hinting in Postgres to check some kind of feature box because other databases have it.

We know we want it because…other databases have it, and it's INCREDIBLY USEFUL.

> They straight up say that they're willing to consider the idea if somebody wants to say how to do it without the pitfalls of other systems.

Pitfalls my ass. We want exactly the functionality that is already present in other systems, pitfalls and all. That's just an excuse to do nothing, Apple-style, "because we know better than our own users" while trying to appear reasonable.

It's akin to not adopting SQL until you can do so "while avoiding the pitfalls of SQL." Just utter bullshit.

fragmede

We're software engineers not marketing folk who just want to check a box.

There's a difference between "have a feature just because other databases have it" and "this is a very useful feature that would have its own PostgreSQL-isms and also we got this idea because other databases have it". The query planner isn't infallible, so being able to hint queries to not accidentally use a temporary table that just can't fit in ram isn't just copying a feature "just because everyone else has it".

ghusbands

> they're willing to consider the idea if somebody wants to say how to do it without the pitfalls of other systems

That is basically a blanket ban. Saying you won't implement a widely-implemented feature unless someone comes up with a whole new theory about how to do it better is saying you simply won't implement it. Other databases do well enough with hints, and they do help with some problems.

zmmmmm

Definitely my biggest issue with Postgres.

I have one shameful query where, unable to convince it to execute a subquery that is essentially a constant for most of the rows outside of the hot loop of the main index scan, I pulled it out into a temp table and had the main query select from the temp table instead. Even creating a temp table and indexing it was faster than the plan the Postgres query planner absolutely insisted on. Things like CTEs etc made no difference, it would still come up with the same dumb plan every way I expressed the query.

The worst thing is not even being able to debug or understand what is going on because you can't influence the query plan to try alternative hypotheses out easily.

fdr

Personally, I think they would entertain it, but the implementation has to be fairly good and they have to be up for maintaining whatever that implementation is. There was a lot of hemming and hawing about hot standby & replication for years, but once someone showed up with a credible implementation and track record of maintaining and fixing it, the objections seemed to get a lot quieter.

Hints are a somewhat invasive feature that are hard to tweak once they are integrated into applications. I don't think the half dozen people that are in the best position to consider its evolution have found it the use of their time they wish to expend.

potamic

Shameful? A bit much for what is essentially a gift to the community? They are building a free product and are entitled to their opinion on how to go about building it, and decisions on taking up scope is very much an integral part of it. In engineering, every decision is a tradeoff that might benefit one set of users while impacting another. Free software is all about choice so that different users can find that thing that is most suitable to them.

zorgmonkey

I've never used it but their is a postgres extension called pg_hint_plan [0] for query hints and I am guessing it is pretty decent because it is even mentioned in the AWS Aurora documentation

[0]: https://pghintplan.osdn.jp/pg_hint_plan.html

pmontra

I can understand that people wants to get absolute control on their queries sometimes. I never used hints so I shouldn't even write this note but IMHO that page has a quite balanced view of pros and cons of existing hint systems.

williamdclt

It does. But when you're watching your database, and therefore your business, crash and burn because it decided to change the query plan for an important query, all these "problems with existing Hint systems" sound very irrelevant.

I'd love to have a way to lock a plan in a temporary "emergency measure" fashion. But of course it's hard/impossible to design this without letting people abuse it and pretend it's just a hint system.

zmmmmm

> that page has a quite balanced view of pros and cons of existing hint systems

I found the cons section somewhat disappointing given how much respect I have for Postgresql maintainers in general.

Most of the "problems" are essentially manifestations of dysfunction within users or PostgreSQL development itself. People won't report optimizer bugs if they can fix them themselves, etc. (far more likely they won't report the bugs if they have no way to prove their alternative query plan actually performs better, eg: by adding a hint).

Many are just assertions which I highly doubt would hold up if validated ("most of the time the optimizer is actually right", "hints in queries require massive refactoring" ...).

pzduniak

I recall using https://pghintplan.osdn.jp/pg_hint_plan.html for this feature, it did what it claimed to be able to do. It's a hack, but it really helps in the worst case scenarios.

rattray

There's now a github repo for pg_hint_plan: https://github.com/ossc-db/pg_hint_plan

electroly

I really appreciate that in SQL Server we can just use query hints. I don't think I will ever see eye-to-eye with the PostgreSQL team on this. No query planner is perfect. SQL Server's planner certainly isn't. We have had the same sort of "query plan suddenly changes in production and everything breaks" incidents, but they are easily fixed with query hints and/or forced plans, and you can be sure that future statistics updates won't break them again.

rattray

You can use pg_hint_plan for hinting query plans to postgres: https://github.com/ossc-db/pg_hint_plan

SigmundA

I still have nightmares of SQL servers optimizer taking down production systems at random times due to picking a bad plan, judicious use of hints has made my life much easier and allowed me as a developer to be in control and not have to ask the DB nicely and pray it listens.

fennecfoxen

I worked for a payments firm that submitted transfers in bulk every day, but had a monthly surge day, and a late cut-off period to submit or cancel payments. It doesn't matter if you can do everything else in 15 minutes, if the query planner decides to take a 30-second query and turn it into a 4-hour nightmare because different customers are making different numbers of payments today, and it trips over some Postgres heuristic.

The Postgres query planner was, quite frankly, the enemy. By the time I left we were at the point that we considered it a business risk and were looking at alternatives. If you need queries to run in a predictable amount of time — forget fast, simply predictable — then Postgres is quite simply not fit for purpose.

newlisp

I guess that is why cloud databases like spanner charge big bucks.

baq

even plan locking itself would be a welcome feature. we've had multiple instances of plans going haywire (never traced it down, but 99% sure it was after autovacuum analyze) impacting prod and getting back to normal after a manual re-ANALYZE.

SigmundA

I haven't had much use for plan locking even after MS SQL server got it, even though its a nice feature with being able save load plans from a file etc.

I just like good set of hints since they are part of source code and I can put comments and see why they are there in source history later.

zmmmmm

if nothing else I would love to have that just so we don't end up debugging heisenbugs that occur and then disappear when you try to find them (or we can't rule it out when we test the same query and it magically starts performing well).

ghusbands

Since the PostgreSQL folks are clearly against hints, maybe another random feature suggestion: If a (read-only) query is taking "too long", maybe you could launch a second copy with a different plan. Maybe even a third after twice as long and a fourth after twice as long again. Kill them all once any one returns. Trades some extra load now against waiting forever for a 'bad' query plan.

The idea obviously needs work and has probably already been suggested and dismissed, somewhere, but I thought I might throw it out there, especially with modern computers having so many cores.

zepearl

On one hand this could theoretically work if the optimizer tracks historical SQLs + the query plans used at that time + their runtime and decides to try to use a different query plan, then aborts the SQL being executed when its runtime exceeds the historical runtime. But in practice it's likely to mess things up because data volume/distribution/machine load (CPU, disks, etc) might differ compared to the past etc... .

On the other hand, in general there are often too many postential combinations of query plans to try out (hundreds even for a relatively simple SQL) and trying them all out would need hours/days/etc... . The "good plan" might be something that a machine might categorize as "very unlikely to work" so it might end up being the one tested automatically at the very end.

Normal hints would still be a lot easier to handle in the code and for the user.

lgessler

I'm not a DB expert, but aren't most queries bottlenecked by disk bandwidth? If so, addtl queries might hurt more than help.

jandrewrogers

Modern databases on modern server hardware are not limited by disk bandwidth typically, though Postgres is not a modern design and struggles to use the disk bandwidth available. That said, since these queries are accessing roughly the same data, concurrent queries will mostly be hitting cache. Memory bandwidth is often a larger limitation for databases these days, so it would still be wasteful.

jonatron

Having been in a situation where Postgres suddenly decides to use a much slower query plan on a hot query in production, I'd agree.

AaronFriel

Regarding this:

> Only a minuscule part of the Runs in the database are active at any given time. Most Stacks stay dormant most of the time. Whenever a user makes a commit, only the affected Stacks will execute Runs. There will obviously be Stacks that are constantly in use, but most won’t. Moreover, intensive users will usually use private Worker Pools. The public shared Worker Pool is predominantly utilized by smaller users, so that’s another reason for not seeing many Runs here.

> Right now, we’re iterating over all existing Stacks attached to the public worker pool and getting relevant Runs for each.

Is there an opportunity to make this query more robust with a bit of denormalization, if necessary, to indicate which runs and which stacks are active and a filtered index (index with a WHERE clause) on each to allow efficiently retrieving the list of active entries?

Best practices for database optimization tell you "don't index on low cardinality columns" (columns with few unique values), but the real advice should be "ensure each valid key of an index has a low cardinality relative to the size of the data" (it's OK to have an index that selects 1% of your data!).

That is, conventional wisdom says that indexing on a boolean is bad, because in the best case scenario 50% of your entries are behind each key, and in the worst case almost all entries are behind either "true" or "false". There are performance impacts from that and there's the risk the query optimizer tries to use the "bad" side of the index, which would be worse than a full table scan.

But with the improved advice and insight that 0.01% of rows have value "true", you can create an index with a `WHERE active = true` clause, et voila: you have an index that finds you exactly what you want, without the cost and performance issues of maintaining the "false" side of the index.

cube2222

Yep, you could indeed use a partial index with the two relevant conditions to solve this query well. We're using those all over the place as well.

no-s

It’s impressive how many developer misconceptions I see cropping up in these articles. The point of the query planner is to replace an army of programmers, not evade design faults. Using a server RDBMS it’s trivially easy to decouple schema and query evolution from application. Yet somehow developers always seem to back themselves into situations requiring extreme coupling. Using plan guides in the MSSQL environment helps somewhat in patchwork solutions, tho typically the deficiencies are usually related to obsolete statistics…(and that’s not even necessary). In Postgres you should ensure design separation so you can rewrite the query easily.

newlisp

Using a server RDBMS it’s trivially easy to decouple schema and query evolution from application.

Curious, how do you decouple query evolution from the DB layer of your application?

dragonwriter

> Curious, how do you decouple query evolution from the DB layer of your application?

Use application-specific views, and the evolution driven by DB optimization, etc., can happen in view definitions without the app DB layer being involved. It’s a traditional best practice anyway, going back decades to the time when it was expected an RDBMS would serve multiple apps that you needed to keep isolated from each others changes, but it also works to isolate concerns at different levels for a single-app stack.

(Because of practical limits of views in some DBs, especially in the 1990s, and developer preferences for procedural code, a common enterprise alternative substitutes stored procs for views, to similar effect.)

no-s

hah, exactly. “application-specific views...Going back decades”, or “substitute Stored Procs for views”.

It’s not really much extra overhead to use a schema+views. I ask developers to avoid creating views with “Select *”, always specify the columns desired.

Further decoupling is possibly beneficial, e.g. CQRS is wonderful perspective for designing data models that may be more easily distributed or cacheable (by explicitly choosing to separate query schema from modification schema).

newlisp

Oh yes views, they are not with no effort and become numerous and specific. But are extremely missed with a document store I'm working on write now ;)

deltaonefour

This article is everything that's wrong with SQL.

How about just providing us access to the query plan and allowing users to directly modify it? It's like needing assembly language to optimize a really tight section of code but having that access blocked off.

Why do we have to rely on leakage to a higher level interface in order to manipulate the lower level details? Just give us manual access when we need it.

shock-value

I think SQL is generally great but I do agree with you. Access to lower level primitives from which one could construct their own query plan would be awesome. Query hints (not available in Postgres) would also probably get you a lot of the way there.

cube2222

Hey, author here.

The article describes a fun debugging and optimization session I’ve had recently. Happy to answer any questions!

mijoharas

Hey, I think you'll probably get better performance if you swap your subquery around a little from:

  SELECT COUNT(*)
  FROM runs runs_other
  WHERE (SELECT COUNT(*)
        FROM stacks
        WHERE stacks.id = runs_other.stack_id
          AND stacks.account_id = accounts.id
          AND stacks.worker_pool_id = worker_pools.id) > 0
   AND runs_other.worker_id IS NOT NULL
To:

  SELECT COUNT(*)
  FROM runs runs_other
  WHERE EXISTS (SELECT *
        FROM stacks
        WHERE stacks.id = runs_other.stack_id
          AND stacks.account_id = accounts.id
          AND stacks.worker_pool_id = worker_pools.id)
   AND runs_other.worker_id IS NOT NULL
Your previous option needs to hit every row in the stacks subquery to generate the count. The modified one uses EXISTS so it only needs to check there's at least one row, and then can short-circuit and exit that subquery. (a smart enough query planner would be able to do the same thing, but PG generally doesn't).

cube2222

Thanks for the tip! I didn't know EXISTS can provide such benefits.

However, I've tried it out and it resulted in a 10x performance loss. Looking at the query plan, it actually got PostgreSQL to be too smart and undo most of the optimizations I've done in this blog post... It's still an order of magnitude faster than the original query though!

mijoharas

I stand corrected, and always glad to see the main point about perf optimisation proven. (always measure! :) )

tqi

Interesting write up! Did you happen to try this with a CTE instead of a subquery? ie something like:

  WITH
    tmp AS (SELECT accounts_other.id, COUNT(*) n
              FROM accounts accounts_other
              JOIN stacks stacks_other ON accounts_other.id = stacks_other.account_id
              JOIN runs runs_other ON stacks_other.id = runs_other.stack_id
             WHERE (stacks_other.worker_pool_id IS NULL OR
                    stacks_other.worker_pool_id = worker_pools.id)
               AND runs_other.worker_id IS NOT NULL
             GROUP BY 1
            )
  SELECT COUNT(*)                                                                  as "count",
         COALESCE(MAX(EXTRACT(EPOCH FROM age(now(), runs.created_at)))::bigint, 0) AS "max_age"
  FROM runs
    JOIN stacks ON runs.stack_id = stacks.id
    JOIN worker_pools ON worker_pools.id = stacks.worker_pool_id
    JOIN accounts ON stacks.account_id = accounts.id
    LEFT JOIN tmp ON tmp.id = accounts.id
  WHERE worker_pools.is_public = true
    AND runs.type IN (1, 4)
    AND runs.state = 1
    AND runs.worker_id IS NULL
    AND accounts.max_public_parallelism / 2 > COALESCE(tmp.n, 0)

I don't know this for sure, but in my experience it has seemed like Postgres is bad at optimizing subqueries, and will execute once per row.

cube2222

Haven't run it, but based on the query plan in the first picture, PostgreSQL already does that part kind of right. It precalculates it for all accounts and then does a three-way hash join.

aeyes

For Postgres 12+ you'll have to use to force the plan

> WITH tmp AS MATERIALIZED

dragonwriter

Aren't CTEs still optimization barriers in Postgres, so wouldn't that make any subquert optimization problem worse?

briffle

The fun thing is, they have corrected a bunch of that in newer versions of postgresql.

I'm surprised that the problem stemmed from an index where the 'AND NOT NULL' took forevery to scan, and they didn't include a partial index, or an index in a different order since they are sorted.

srcreigh

Depends if materialization helps.

As a silly example, if the CTE materializes 1/100th of a table via table scan, scanning 1/100th of a table N times may be faster than scanning the entire table N times.

yxhuvud

That could even be a feature if used to compute just the list of active jobs. Later pgs might need to be explicit to reinsert the optimization barrier, as the implicit one for all Cates was removed.

null

[deleted]

kristjansson

Mostly no, since PostgreSQL 12

gaha

Have you tried unnesting the subquery? From the images posted in your blog it's not clear to me how often the subquery is run. Maybe unnesting it somehow like this should work well:

    SELECT COUNT(*)        as "count",
         COALESCE(MAX(EXTRACT(EPOCH FROM age(now(), runs.created_at)))::bigint, 0) AS "max_age"
    FROM runs
      JOIN stacks ON runs.stack_id = stacks.id
      JOIN worker_pools ON worker_pools.id = stacks.worker_pool_id
      JOIN accounts ON stacks.account_id = accounts.id
      /*unnesting*/
      join (SELECT accounts_other.id,  COUNT(*) as cnt
       FROM accounts accounts_other
         JOIN stacks stacks_other ON accounts_other.id = stacks_other.account_id
         JOIN runs runs_other ON stacks_other.id = runs_other.stack_id
       WHERE accounts_other.id
         AND (stacks_other.worker_pool_id IS NULL OR
          stacks_other.worker_pool_id = worker_pools.id)
         AND runs_other.worker_id IS NOT NULL
        group by accounts_other.id) as acc_cnt 
      on acc_cnt.id = accounts.id and  accounts.max_public_parallelism / 2 > acc_cnt.cnt
     /*end*/
    WHERE worker_pools.is_public = true
      AND runs.type IN (1, 4)
      AND runs.state = 1
      AND runs.worker_id IS NULL
      /* maybe also copy these filter predicates inside the unnested query above */
Edit: format and typo

JoelJacobson

Hi Jacob, I would like to try rewriting your query using different JOIN FOREIGN [1] syntax alternatives, to study how readability is affected.

It would help if you could share the table definitions and foreign keys for the tables involved in the query. I could probably deduce most of it from the query thanks to proper naming of tables/columns, but just want to be sure I get it right. Many thanks!

    [1]: https://gist.github.com/joelonsql/15b50b65ec343dce94db6249cfea8aaa

null

[deleted]

soperj

what are the downsides?

cube2222

The downside is that a PostgreSQL version update can change internal optimizer rules, which may result in the plan being changed to a less performant one again.

In that case it's good to have monitoring in place to catch something like that. We're using Datadog Database monitoring which gives you a good breakdown of what the most pricey queries are - it uses information from the pg_stat_statements table, which contains a lot of useful stats about query execution performance.

You should regularly check such statistics to see if there aren't any queries which are unexpectedly consuming a big chunk of your database resources, as that might mean you're missing an index, or a non-optimal query plan has been chosen somewhere.

topspin

> The downside is that a PostgreSQL version update can change internal optimizer rules

This has actually happened relatively recently.

PostgreSQL Common Table Expressions (WITH x as (some query)) used to be optimization fences by default, meaning that you can influence the optimizer using a CTE. This is a well known technique among PostgreSQL users that resort to it for lack of optimizer hints.

In Pg12 they enhanced the optimizer to remove the optimization fence by default, so the query plans of existing queries automatically changed and sometimes became much slower as a result. If you want the old CTE behavior you have to modify the WITH clause via AS MATERIALIZED.

JoelJacobson

Deduced schema based on the naming of tables/columns:

    CREATE TABLE accounts (
        id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
        max_public_parallelism integer NOT NULL,
        PRIMARY KEY (id)
    );
    
    CREATE TABLE worker_pools (
        id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
        is_public boolean NOT NULL,
        PRIMARY KEY (id)
    );
    
    CREATE TABLE stacks (
        id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
        worker_pool_id bigint NOT NULL,
        account_id bigint NOT NULL,
        PRIMARY KEY (id),
        FOREIGN KEY (worker_pool_id) REFERENCES worker_pools,
        FOREIGN KEY (account_id) REFERENCES accounts
    );
    
    CREATE TABLE runs (
        id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
        stack_id bigint,
        worker_id bigint,
        created_at timestamptz NOT NULL DEFAULT now(),
        type integer NOT NULL,
        state integer NOT NULL,
        PRIMARY KEY (id),
        FOREIGN KEY (stack_id) REFERENCES stacks
    );

JoelJacobson

cube222, does the schema above look more or less correct to you?

Below are the query plans I got when running the query against empty tables in PostgreSQL 15dev (not Aurora Serverless), for comparison.

Would be nice to populate the tables and add indexes, to make a real comparison. Would help a lot with some rough estimates on the number of rows in each table and count(distinct ...) of each column, etc?

Unoptimized version:

                                                                        QUERY PLAN
    ---------------------------------------------------------------------------------------------------------------------------------------------------
     Aggregate  (cost=82.14..82.16 rows=1 width=16) (actual time=0.009..0.012 rows=1 loops=1)
       ->  Hash Join  (cost=46.05..82.13 rows=1 width=8) (actual time=0.005..0.007 rows=0 loops=1)
             Hash Cond: (stacks.worker_pool_id = worker_pools.id)
             Join Filter: ((accounts.max_public_parallelism / 2) > (SubPlan 1))
             ->  Nested Loop  (cost=0.30..36.38 rows=1 width=28) (actual time=0.005..0.005 rows=0 loops=1)
                   ->  Nested Loop  (cost=0.15..36.18 rows=1 width=24) (actual time=0.004..0.005 rows=0 loops=1)
                         ->  Seq Scan on runs  (cost=0.00..28.00 rows=1 width=16) (actual time=0.004..0.004 rows=0 loops=1)
                               Filter: ((worker_id IS NULL) AND (type = ANY ('{1,4}'::integer[])) AND (state = 1))
                         ->  Index Scan using stacks_pkey on stacks  (cost=0.15..8.17 rows=1 width=24) (never executed)
                               Index Cond: (id = runs.stack_id)
                   ->  Index Scan using accounts_pkey on accounts  (cost=0.15..0.20 rows=1 width=12) (never executed)
                         Index Cond: (id = stacks.account_id)
             ->  Hash  (cost=32.00..32.00 rows=1100 width=8) (never executed)
                   ->  Seq Scan on worker_pools  (cost=0.00..32.00 rows=1100 width=8) (never executed)
                         Filter: is_public
             SubPlan 1
               ->  Aggregate  (cost=66.89..66.90 rows=1 width=8) (never executed)
                     ->  Nested Loop  (cost=33.72..66.89 rows=1 width=0) (never executed)
                           ->  Hash Join  (cost=33.56..58.71 rows=1 width=8) (never executed)
                                 Hash Cond: (runs_other.stack_id = stacks_other.id)
                                 ->  Seq Scan on runs runs_other  (cost=0.00..22.00 rows=1194 width=8) (never executed)
                                       Filter: (worker_id IS NOT NULL)
                                 ->  Hash  (cost=33.55..33.55 rows=1 width=16) (never executed)
                                       ->  Seq Scan on stacks stacks_other  (cost=0.00..33.55 rows=1 width=16) (never executed)
                                             Filter: (((worker_pool_id IS NULL) OR (worker_pool_id = worker_pools.id)) AND (account_id = accounts.id))
                           ->  Index Only Scan using accounts_pkey on accounts accounts_other  (cost=0.15..8.17 rows=1 width=8) (never executed)
                                 Index Cond: (id = accounts.id)
                                 Heap Fetches: 0
Optimized version:

                                                                  QUERY PLAN
    --------------------------------------------------------------------------------------------------------------------------------------
     Aggregate  (cost=82.14..82.16 rows=1 width=16) (actual time=0.010..0.012 rows=1 loops=1)
       ->  Hash Join  (cost=46.05..82.13 rows=1 width=8) (actual time=0.006..0.007 rows=0 loops=1)
             Hash Cond: (stacks.worker_pool_id = worker_pools.id)
             Join Filter: ((accounts.max_public_parallelism / 2) > (SubPlan 2))
             ->  Nested Loop  (cost=0.30..36.38 rows=1 width=28) (actual time=0.005..0.006 rows=0 loops=1)
                   ->  Nested Loop  (cost=0.15..36.18 rows=1 width=24) (actual time=0.005..0.005 rows=0 loops=1)
                         ->  Seq Scan on runs  (cost=0.00..28.00 rows=1 width=16) (actual time=0.004..0.005 rows=0 loops=1)
                               Filter: ((worker_id IS NULL) AND (type = ANY ('{1,4}'::integer[])) AND (state = 1))
                         ->  Index Scan using stacks_pkey on stacks  (cost=0.15..8.17 rows=1 width=24) (never executed)
                               Index Cond: (id = runs.stack_id)
                   ->  Index Scan using accounts_pkey on accounts  (cost=0.15..0.20 rows=1 width=12) (never executed)
                         Index Cond: (id = stacks.account_id)
             ->  Hash  (cost=32.00..32.00 rows=1100 width=8) (never executed)
                   ->  Seq Scan on worker_pools  (cost=0.00..32.00 rows=1100 width=8) (never executed)
                         Filter: is_public
             SubPlan 2
               ->  Aggregate  (cost=9851.00..9851.01 rows=1 width=8) (never executed)
                     ->  Seq Scan on runs runs_other  (cost=0.00..9850.00 rows=398 width=0) (never executed)
                           Filter: ((worker_id IS NOT NULL) AND ((SubPlan 1) > 0))
                           SubPlan 1
                             ->  Aggregate  (cost=8.18..8.19 rows=1 width=8) (never executed)
                                   ->  Index Scan using stacks_pkey on stacks stacks_1  (cost=0.15..8.18 rows=1 width=0) (never executed)
                                         Index Cond: (id = runs_other.stack_id)
                                         Filter: ((account_id = accounts.id) AND (worker_pool_id = worker_pools.id))

igammarays

Are you happy with Aurora Serverless so far? Is it really the "painless scaling" it's made out to be? I'm really hesitant to use anything other than self-hosted Postgres with full control, because I'm always afraid of running into situations like these and not being able do what I need to do, like have an offline analytical read replica. Also I imagine the cost is an order of magnitude more than a self-hosted instance.

cube2222

Not really. But it's mostly because the support for other features is basically non-existant. It also seems like all development is happening on Aurora Serverless v2 (which I think is in preview state?). The scaling also isn't seamless, it breaks your connections if you're using prepared statements - we have it pinned to a specific number of capacity units.

We'll probably move to plain Aurora at some point, as that supports all bells and whistles, but for now Aurora Serverless is good enough.

hnov

In this case denormalizing the data right into the runs table would likely give you better, but more importantly more predictable performance.

null

[deleted]

Mathnerd314

Similar article that instead solved it by recording more statistics: https://build.affinity.co/how-we-used-postgres-extended-stat...

victor106

From the article

> Aurora Serverless also doesn’t support read replicas to offload analytical queries to them, so no luck there either.

Amazon Aurora Serverless v2 does support read replicas. Although it’s still in preview.

notatoad

aurora serverless v2 doesn't support postgres at all...