SmackerNews

PgDog is funded and coming to a database near you

514 points · 244 comments · 1 day ago · levkk

pgdog.dev

ahachete3 hours ago
I have mentioned this before, but here it goes again:
I'm really happy that there's more options for Postgres sharding and I applaud Pgdog and the team's efforts and energy.
Having said that, this makes it a no-go for me:
shard_number = hash(data) % num_shards
https://docs.pgdog.dev/features/sharding/basics/#terminology
Most sharding solutions distribute the hash value over linear ranges, that then split across "virtual shards", that are then placed on the physical shards or worker. This allows for shard replacement when needed. For example, Citus works this way, and even adds convenience functions for shard migration (using logical migration) in an automated way. That's all I'd need.
Operationally, it's worlds apart. With modulo distribution the only way to replace data is to reshard everything --something you don't want to do however fast the operation may be.
eikenberry22 hours ago
The reason DBs like Mongo or Dynamo exist is because Postgres has a scaling problem.
I've used Postgres at a few places and the #1 problem was always high availability, not scaling. One Postgres cluster could easily handle 100000 transactions per minute, but when a primary node went down it was a page and manually failing over to the spare then manually replacing the spare. The manual tooling was very finicky but at least it worked, no automated solution came even close. Lack of a good HA story is why I avoid self-managed Postgres as much as possible.
codegeek23 hours ago
"Why Us" => "I ran Postgres at Instacart, where we scaled the company 5x in April of 2020. The biggest problem we had was making Postgres serve 100,000s of grocery delivery orders per minute"
Couldn't be a better why us :)
chrisvenum1 day ago
I am trying to gain a basic understanding of this: Right now I have a 4TB DB on one large box. Is the idea that using a proxy tool like PGDog I could spin up 8 smaller boxes handling ~500GB each and then one medium box for the proxy?
Right now I have a project that has very heavy write traffic from multiple services and a web app that reads from this. We are starting to hit the point where no amount of indexing, query optimisation, caching or box upgrades is helping us. We are looking at maybe moving the bulk of the static data to clickhouse to reduce the DB size but I would love to hear if PgDog or other kind of sharding could be useful for this use case.
yabones1 day ago
I'm curious how this might help with our biggest downtime-causer with postgres, which is major version upgrades. Poolers do a great job for failover and load balancing, but we consistently need ~10-20 minutes of downtime once or twice a year to do upgrades. Logical replication between old->new versions could probably help, but it would still require flipping everything over to the new cluster without partial writes or anything silly. Anybody have experience with this?
directionless3 hours ago
We used `pgdog` as a proxy during a recent database backend migration (Heroku -> EC2 -> RDS) and it was much smoother than PgBouncer. Really nice seeing more things in this space, and having the team's work recognized.
netswift3 hours ago
We've run into so many issues with PgBouncer and Postgres that I wish we didn't have to deal with as a new growing company. Nice to see more options out there!
aejm21 hours ago
I notice there is an Enterprise Edition, can you please specify which features are not open source? Do you predict new features you add will be ee licensed as a way to pay back your VC funders?
Ozzie_osman1 day ago
```
  We sharded over 20 TB that we know about.
```
This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than that
tschellenbach23 hours ago
PgDog, Neki, multigres, awesome to see. And yes this is the main issue with postgres. Well this and not having index hints, looking forward to 19
mijoharas6 hours ago
Congrats on the funding Lev!
Just to say we're happy pgdog users here! One feature we quite like (of the proxy) is the handling of different connection settings per connection (i.e. statement_timeout). When we investigated RDS proxy (ages ago) it wasn't supported, I think the same was true for pgbouncer so it required a bunch of application changes. With pgdog, it just works transparently.
karolist17 hours ago
Love PgDog. I don't need it honestly, but using it in my on-prem k8s because I heard about you in Postgres FM podcast randomly when I had nothing to listen to on a hike in the woods and it picked up my interest.
https://open.spotify.com/episode/6qgpfiW68KcvRASs6649Fb
ParadisoShlee1 day ago
I've moved from pgbouncer to pgdog a few months ago without issue. Huge fan.
floriferous7 hours ago
Is this comparable to Supabase's just announced multigres?
frollogaston14 hours ago
Reminds me of long ago, before Postgres even had things like parallel scan to utilize multiple CPU cores on a single machine, I used to have Python helpers to split up queries by ranges of IDs. If a query was complicated, I'd EXPLAIN it first then pick either the innermost or outermost index scan, and often get a linear speedup. But it was quite manual, required using temp tables for SELECTs, and ofc had no consistency.
kjuulh1 day ago
I tried out PgDog a while ago, but couldn't find a good way of handling the config except for having this users / pgdog toml file, which makes it a bit awkward to handle in kubernetes where we often do multi-tenancy in postgres - or rather having many databases on the same instance(s), and have them come and go at will.
Also had an issue with it because it cached authentication requests when doing passthrough it seems, I'd changed the roles password, but it kept using the old one, which was no bueno ;).
PgDog seems to make more sense when you really care about a few databases that need massive scale, rather than a simple proxy in front of postgres. I'll keep following the development though, it is much needed in this space, postgres can use all the investment it can get to get it past the single machine scale that it excels at currently.
drchaim1 day ago
Good stuff, although I’m not quite sure about the fast OLAP use case.
If you’re already sharding by tenant for other reasons, OK… But I see CDC to a true OLAP system as more scalable.
PostgreSQL still needs real columnar tables in the core, hopefully one day
gen22021 hours ago
Is there an explainer for people who are broadly familiar with the DB space? It sounds like you're building an equivalent to Vitesse for Postgres, but it's not super clear from the article (which I know is not the point of this, but still :) ).
Edit: It also might be interesting to point out how your solution differs from what the folks at Planetscale are building https://planetscale.com/neki
mnbbrown1 day ago
I've loved using pgdog for the last 6 months. It's been incredibly stable. It's nifty how they've solved the LISTEN/NOTIFY on a transaction pooler problem.
welder23 hours ago
Three real-world issues I've run into recently with PgBouncer + Postgres are:
1. pool exhaustion from idle connections inside open long-running transactions
2. SQLAlchemy's client-side pool using dead connections that PgBouncer had already killed, causing periodic request errors
3. Some tasks have to bypass PgBouncer when they use SET or prepared statements
I've already sharded large datasets at the application layer, but looks like PgDog solves the above problems for any future work?
jeremyjh22 hours ago
With $5.5M from Basis Set, YC, Pioneer Fund and other great investors, we have years of runway,
This is years of product development with a three person team. If Enterprise sales and support are a big part of your business plan it will suck up a lot more than that.
htrp1 day ago
PgDog is a sharder, connection pooler and load balancer for PostgreSQL. Written in Rust, PgDog is fast, reliable and scales databases horizontally without requiring changes to application code.
Still trying to figure out how this works technically, is the performance gain really just re-write in rust?
simonw1 day ago
Suggestion: have more than just helm and Docker in your quickstart documentation. I'd like to try this out just to see what it can do, but not quite enough to fire up one of those systems for it.
Is there a binary I can run directly?
valorzard22 hours ago
I've seen a couple of these "distributed" postgres extensions.
My question is, has any of them been talked about being upstreamed to postgres itself? Or, adding a custom built in feature to postgres itself?
maherbeg1 day ago
I'm a big PGDog fan! It really helped us scale our connection proxy needs pretty substantially and it has great features like auto mode to support Aurora failovers neatly. It's infra that just works.
jeremyjh1 day ago
It’s surprising they don’t mention advantages over other sharding systems like Citus. Maybe it’s just the fact that it’s only a proxy and not core extensions? But that could limit capabilities.
bourbonproof1 day ago
the reason mongo is a joy to use in scaled env is because no additional setup/software needed and all drivers natively support secondary/primary writes/reads and topological changes. so it's end to end, and adding is as a new proxy in frontend of postgres leads to all clients being incompatible or the code itself has no control anymore about when to use a secondary and what allowed stall is acceptable for a particular query. Any solutions to this by pgdog?
BowBun19 hours ago
I really wish they'd acknowledge the prior art and name that they've taken inspiration from - https://github.com/postgresml/pgcat
Don't pay a startup for your DB proxy, you should own that layer yourself inside of your infrastructure.
TurdF3rguson8 hours ago
let's say i have a primary with 100M rows of addresses and indexes on things like city, state, zip code (all in memory). I also have 3 read replicas that struggle to do 1000 lookups per minute each. Does PgDog help?
andrey-g11 hours ago
How does this compare to Aurora Serverless?
mamcx22 hours ago
I do tenant per PG schema, most are smallish some are bigger (not much, can do all in a single box) but moving forward eventually will need something like this. Also plan to provide "get your own VPS" for more enterprise customers.
This kind of tool will help in this case?
zadikian12 hours ago
This is exciting. INSERT (SELECT ...) doesn't work though, right? The docs only mention VALUES inserts.
bart3r15 hours ago
We are still using Pgpool-II and it's been very solid, but would be interested in moving to PgDog.
Would love to hear the advantages of moving to PgDog.
snihalani18 hours ago
I'd love to advocate for PgDog if there were more than 2 managed service providers. Adding a single company with no substitute in your supply chain feels hard
philippemnoel16 hours ago
Let's go. Very bullish on PgDog. Lev understands this space better than anyone else. If you are sharding Postgres, you should talk to him.
fulafel1 day ago
Does making it "just work" here come with any caveats vs standard PG?
Wonnk1322 hours ago
I wish them all the best. Supabase, Timescale, etc etc. there's a whole cottage industry of extending postgres to whatever you need.
SamInTheShell19 hours ago
Scratching my head. Wondering why I would reach for this over just running a Yugabyte cluster.
sgt7 hours ago
Is this like on prem RDS?
redmonduser21 hours ago
How is this different from Citus?
dzonga20 hours ago
I us pg. not that I know much about database internals, besides the 'b-tree' stuff we learned in college.
I don't know how the pg scaling story gets fixed unless certain things are rewritten. that's my fear of going all in pg.
mysql has vitess etc & even upgrades are easier. though pg is more extensible.
melon_tsui1 day ago
2M qps in production is legit. Curious how much RAM and CPU that takes on average per deployment though
Pet_Ant1 day ago
I hope people pronounce this as „pig-dog” and has a mascot that looks like „man-bear-pig”
faangguyindia1 day ago
i am not using any tool like pgbouncer and have not run into any issues so far. Is it even required these days? Have you guys tested your setup without these connection poolers/multiplexers?
moralestapia1 day ago
Cool work, thanks.
Wrt. the pooler, how do you compare with pgbouncer?
I'm interested because I have a postgres instance, low-traffic but still like ... tens of r(eads)ps. I was not running anything close to the machine limits but still added pgbouncer to improve performance and didn't see a noticeable difference. I was stress-testing the machine obv., I'm not talking about the 10 rps, lol.
For context, my numbers were something like 10k rps +/- 1k vanilla postgres and like 9k rps +/- 1k with pgbouncer in front of it. So ... slightly slower but big error bars so I wouldn't say for sure. I ended up not using pgbouncer as the benefit was immaterial.
Also yeah, in case you want to check it out, it's the db that backs this project: https://httpstate.com.
sandeepkd23 hours ago
Nit-Pick: It might be anti-marketing, still it would be helpful if the use cases can be articulated in a way where it would make sense to use this Vs any other type of database. Honesty goes a long way with the more technical folks for anything related to infrastructure.
Surfacing where and how PG is better than Dynamo or any other database is probably a good starting point instead of calling out PG a silver bullet for everything. At the end of the day its all a trade-off.
s3cur3n3t4 hours ago
This is just awsome
9999000009991 day ago
How are 3 developers going to QA this properly ?
skiwithuge1 day ago
we are using PG bouncer in production. Interesting, I will follow the evolution of this project
gregaccount16 hours ago
Fix the bad license.
afr0ck17 hours ago
Is this vibe-coded?
esafak12 hours ago
I think sharding is the wrong approach; who wants to mess about with sharding logic? Distributed key-value stores are the way to go. But cockroach already offers that so I suppose you can try the other way.
antonvs3 hours ago
we don’t think you would use anything else.
This just seems like fanboyism to me. At the very least, you need to qualify what scenarios you think it's useful for.
I don't doubt that Postgres is good for all the projects you've ever worked on. Generalizing from that, though, is hubristic.
orliesaurus22 hours ago
how does it compare to PlanetScale ?
octernion19 hours ago
congrats, lev! brings back fond memories of database fires.
i'm sure you'll get 100x comments about "why not just have one fast SSD? it can do 2000 trillion writes/s"
xenophonf22 hours ago
This commit looks... odd.
https://github.com/pgdogdev/pgdog/commit/36434f93f03dec1d7d4...
I want to have as much fun as the next developer, but that makes me worry, what with supply chain attacks in the news and all.
exabrial20 hours ago
The reason DBs like Mongo or Dynamo exist is because
Not quite. The reason "DBs" like those exist is purely due to fashion. Lets not kid ourselves into thinking they do anything better, save the exception of making data hard to access, which might be a project goal in some cases.

news.ycombinator.com/item?id=48476466