SmackerNews

Show HN: PgDog – Scale Postgres without changing the app

326 points · 63 comments · 1 month ago · levkk

Hey HN! Lev and Justin here, authors of PgDog (https://pgdog.dev/), a connection pooler, load balancer and database sharder for PostgreSQL. If you build apps with a lot of traffic, you know the first thing to break is the database. We are solving this with a network proxy that works without requiring application code changes or database migrations.

Our post from last year: https://smackernews.com/item/44099187 HN

The most important update: we are in production. Sharding is used a lot, with direct-to-shard queries (one shard per query) working pretty much all the time. Cross-shard (or multi-database) queries are still a work in progress, but we are making headway.

Aggregate functions like count(), min(), max(), avg(), stddev() and variance() are working, without refactoring the app. PgDog calculates the aggregate in-transit, while transparently rewriting queries to fetch any missing info. For example, multi-database average calculation requires a total count of rows to calculate the original sum. PgDog will add count() to the query, if it’s not there already, and remove it from the rows sent to the app.

Sorting and grouping works, including DISTINCT, if the columns(s) are referenced in the result. Over 10 data types are supported, like, timestamp(tz), all integers, varchar, etc.

Cross-shard writes, including schema changes (CREATE/DROP/ALTER), are now atomic and synchronized between all shards with two-phase commit. PgDog keeps track of the transaction state internally and will rollback the transaction if the first phase fails. You don’t need to monkeypatch your ORM to use this: PgDog will intercept the COMMIT statement and execute PREPARE TRANSACTION and COMMIT PREPARED instead.

Omnisharded tables, a.k.a replicated or mirrored (identical on all shards), support atomic reads and writes. That’s important because most databases can’t be completely sharded and will have some common data on all databases that has to be kept in-sync.

Multi-tuple inserts, e.g., INSERT INTO table_x VALUES ($1, $2), ($3, $4), are split by our query rewriter and distributed to their respective shards automatically. They are used by ORMs like Prisma, Sequelize, and others, so those now work without code changes too.

Sharding keys can be mutated. PgDog will intercept and rewrite the update statement into 3 queries, SELECT, INSERT, and DELETE, moving the row between shards. If you’re using Citus (for everyone else, Citus is a Postgres extension for sharding databases), this might be worth a look.

If you’re like us and prefer integers to UUIDs for your primary keys, we built a cross-shard unique sequence, directly inside PgDog. It uses the system clock (and a couple other inputs), can be called like a Postgres function, and will automatically inject values into queries, so ORMs like ActiveRecord will continue to work out of the box. It’s monotonically increasing, just like a real Postgres sequence, and can generate up to 4 million numbers per second with a range of 69.73 years, so no need to migrate to UUIDv7 just yet.

    INSERT INTO my_table (id, created_at) VALUES (pgdog.unique_id(), now());

Resharding is now built-in. We can move gigabytes of tables per second, by parallelizing logical replication streams across replicas. This is really cool! Last time we tried this at Instacart, it took over two weeks to move 10 TB between two machines. Now, we can do this in just a few hours, in big part thanks to the work of the core team that added support for logical replication slots to streaming replicas in Postgres 16.

Sharding hardly works without a good load balancer. PgDog can monitor replicas and move write traffic to a promoted primary during a failover. This works with managed Postgres, like RDS (incl. Aurora), Azure Pg, GCP Cloud SQL, etc., because it just polls each instance with “SELECT pg_is_in_recovery()”. Primary election is not supported yet, so if you’re self-hosting with Patroni, you should keep it around for now, but you don’t need to run HAProxy in front of the DBs anymore.

The load balancer is getting pretty smart and can handle edge cases like SELECT FOR UPDATE and CTEs with INSERT/UPDATE statements, but if you still prefer to handle your read/write separation in code, you can do that too with manual routing. This works by giving PgDog a hint at runtime: a connection parameter (-c pgdog.role=primary), SET statement, or a query comment. If you have multiple connection pools in your app, you can replace them with just one connection to PgDog instead. For multi-threaded Python/Ruby/Go apps, this helps by reducing memory usage, I/O and context switching overhead.

Speaking of connection pooling, PgDog can automatically rollback unfinished transactions and drain and re-sync partially sent queries, all in an effort to preserve connections to the database. If you’ve seen Postgres go to 100% CPU because of a connection storm caused by an application crash, this might be for you. Draining connections works by receiving and discarding rows from abandoned queries and sending the Sync message via the Postgres wire protocol, which clears the query context and returns the connection to a normal state.

PgDog is open source and welcomes contributions and feedback in any form. As always, all features are configurable and can be turned off/on, so should you choose to give it a try, you can do so at your own pace. Our docs (https://docs.pgdog.dev) should help too.

Thanks for reading and happy hacking!

gregw21 month ago
As someone who has worked on many-TB-sized "custom" sharded systems with 30-150 shards at multiple (ok, 2) employers, a key challenge to the overall sharding landscape is unsharding all the data back at the analytics layer.
This at a minimum often involved adding back a shard key to the physical data, or partitioning, and/or physical data sorting easily in the "OLAP" layer. And a surprising number of CDC and ETL toolkits don't make it easy to parameterize a single code/configuration base, nor handle situations like shards being down at different times for maintenance or fetching data from each shard at a time of day specified by its end-of-day or handling retransmissions or reconciliation or gaps or data quality of a single shard when back in an unsharded landscape. SQL UNION ALL to reunite shards works, until it doesn't.
YMMV but would be curious if you have a story/solution/thoughts along these lines. It's easier if you shard with unified analytics/reporting in mind on day one of a sharded system design, but in the worlds I've lived in, nobody ever does. But maybe you could.
saisrirampur1 month ago
Great progress, guys! It’s impressive to see all the enhancements - more types, more aggregate functions, cross-node DML, resharding, and reliability-focused connection pooling and more. Very cool! These were really hard problems and took multiple years to build at Citus. Kudos to the shipping velocity.
mijoharas1 month ago
Happy pgdog user here, I can recommend it from a user perspective as a connection pooler to anyone checking this out (we're also running tests and positive about sharding, but haven't run it in prod yet, so I can't 100% vouch for it on that, but that's where we're headed.)
@Lev, how is the 2pc coming along? I think it was pretty new when I last checked, and I haven't looked into it much since then. Is it feeling pretty solid now?
codegeek1 month ago
Stupid question but does this shard the database as well or do we shard manually and then setup the configuration accordingly ?
cuu5081 month ago
Some HTTP proxies can do retries -- if a connection to one backend fails, it is retried on a different backend. Can PgDog (or PgBouncer, or any other tool) do something similar -- if there's a "database server shutting down" error or a connection reset, retry it on another backend?
noleary1 month ago
If you build apps with a lot of traffic, you know the first thing to break is the database.
Just out of curiosity, what kinds of high-traffic apps have been most interested in using PgDog? I see you guys have Coinbase and Ramp logos on your homepage -- seems like fintech is a fit?
oulipo21 month ago
How do you know when/if it's justified to add additional complexity like PgDog?
Is there a number of simultaneous connection / req per sec that's a good threshold?
Is it easy on my postgres instance to get the number of simulataneous connections, for instance if I simulate traffic, to know if I would gain anything from a connection pooler?
mosselman1 month ago
I see the word 'replication' mentioned quite a few times. Is this managed by pgdog? Would I be able to replace other logical replication setups with pgdog to create a High Availability cluster?
Do you have any write up on how to do this?
jackfischer1 month ago
Congrats guys! Curious how the read write splitting is reliable in practice due to replication lag. Do you need to run the underlying cluster with synchronous replication?
cpursley1 month ago
Looks great - I'd love to include it in https://postgresisenough.dev (just put in a PR: https://github.com/agoodway/postgresisenough?tab=readme-ov-f...)
lordofgibbons1 month ago
This looks great! I have a couple of questions:
1) Is it possible to start off with plain Postgres and add pgdog without scheduled downtime down the road when scaling via sharding becomes necessary?
2) How are schema updates handled when using physical multi-tenancy? Does pgdog just loop over all the databases that it knows about and issues the update schema command to each?
written-beyond1 month ago
Can you elaborate a bit more on the challenges faced in making Postgres shard-able?
I remember that adding sharing to Postgres natively was an uphill battle. There were a few companies who has proprietary solutions for it. What you've been able to achieve is nothing less than a miracle.
dujuku1 month ago
Really exciting to see the progress on this project! I'm not sure I understand the update "we are in production." Is this referencing a particular release or a more general statement about adoption?
farsa1 month ago
Congrats on the progress! What is the behavior of PgDoc if it receives some sort of query it can't currently handle properly? Is there a linter/static analysis tool I can use to evaluate if my query will work?
array_loader1 month ago
(apologies for new account - NDA applies to the specifics)
Nice surprise to see this here today. I was working on a deployment just last week.
Unfortunately for me, I found that it crashed when doing a very specific bulk load (COPY FORMAT BINARY with array columns inside a transaction). The process loads around 200MB of array columns (in the region of 10K rows) into a variety of tables. Very early in the COPY process PgDog crashes with :
"pgdog router error: failed to fill whole buffer"
So it appears something is not quite right for my specific use case (COPY with array columns). I'm not familiar enough with Rust but the failed to fill whole buffer seemed to come from Rust (rather than PgDog) based on what little I could find with searches.
I was very disappointed as it looked much simpler to get set up and running that PgPool-II (which I have had to revert to as my backup plan - I'm finding it more difficult to configured, but it does cope with the COPY command without issues).
I would have preferred to stick with PgDog.
ijustlovemath1 month ago
How would this product compare to a PostgREST based approach (this is the cool tech behind the original supabase) with load balancing at the HTTP level?
I_am_tiberius1 month ago
I really hope to use the sharding feature one day.
febed1 month ago
Does it support extensions like PostGIS?
yilugurlu1 month ago
Sorry if this is a weird question, but can I use this with TimescaleDB?
jacobsenscott1 month ago
I've been watching PgDog for a while now. Great progress!
[deleted]
octoclaw1 month ago
The cross-shard aggregate rewriting is really nice. Transparently injecting count() for average calculations sounds straightforward but there are so many edge cases once you add GROUP BY, HAVING, subqueries, etc.
Curious about latency overhead for the common case. On a direct-to-shard read where no rewriting happens, what's the added latency from going through PgDog vs connecting to Postgres directly? Sub-millisecond?
umairnadeem1231 month ago
the connection draining feature is really clever. weve dealt with connection storms from app crashes before and the standard playbook is just to restart everything and hope the thundering herd doesnt immediately repeat. having the proxy absorb that chaos at the wire protocol level is much better than trying to handle it in application code, especially when you have 20+ microservices all competing for the same connection pool.
the unique_id() sequence is interesting too - monotonically increasing cross-shard IDs solve a real pain point for pagination. with UUIDs you end up doing cursor-based pagination with composite keys which makes your ORM code ugly fast.

news.ycombinator.com/item?id=47123631