514 points · 244 comments · 1 day ago · levkk
pgdog.devahachete
eikenberry
The reason DBs like Mongo or Dynamo exist is because Postgres has a scaling problem.
I've used Postgres at a few places and the #1 problem was always high availability, not scaling. One Postgres cluster could easily handle 100000 transactions per minute, but when a primary node went down it was a page and manually failing over to the spare then manually replacing the spare. The manual tooling was very finicky but at least it worked, no automated solution came even close. Lack of a good HA story is why I avoid self-managed Postgres as much as possible.
codegeek
Couldn't be a better why us :)
chrisvenum
Right now I have a project that has very heavy write traffic from multiple services and a web app that reads from this. We are starting to hit the point where no amount of indexing, query optimisation, caching or box upgrades is helping us. We are looking at maybe moving the bulk of the static data to clickhouse to reduce the DB size but I would love to hear if PgDog or other kind of sharding could be useful for this use case.
yabones
directionless
netswift
aejm
Ozzie_osman
We sharded over 20 TB that we know about.
This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than thattschellenbach
mijoharas
Just to say we're happy pgdog users here! One feature we quite like (of the proxy) is the handling of different connection settings per connection (i.e. statement_timeout). When we investigated RDS proxy (ages ago) it wasn't supported, I think the same was true for pgbouncer so it required a bunch of application changes. With pgdog, it just works transparently.
karolist
ParadisoShlee
floriferous
frollogaston
kjuulh
Also had an issue with it because it cached authentication requests when doing passthrough it seems, I'd changed the roles password, but it kept using the old one, which was no bueno ;).
PgDog seems to make more sense when you really care about a few databases that need massive scale, rather than a simple proxy in front of postgres. I'll keep following the development though, it is much needed in this space, postgres can use all the investment it can get to get it past the single machine scale that it excels at currently.
drchaim
If you’re already sharding by tenant for other reasons, OK… But I see CDC to a true OLAP system as more scalable.
PostgreSQL still needs real columnar tables in the core, hopefully one day
gen220
Edit: It also might be interesting to point out how your solution differs from what the folks at Planetscale are building https://planetscale.com/neki
mnbbrown
welder
1. pool exhaustion from idle connections inside open long-running transactions
2. SQLAlchemy's client-side pool using dead connections that PgBouncer had already killed, causing periodic request errors
3. Some tasks have to bypass PgBouncer when they use SET or prepared statements
I've already sharded large datasets at the application layer, but looks like PgDog solves the above problems for any future work?
jeremyjh
With $5.5M from Basis Set, YC, Pioneer Fund and other great investors, we have years of runway,
This is years of product development with a three person team. If Enterprise sales and support are a big part of your business plan it will suck up a lot more than that.
htrp
PgDog is a sharder, connection pooler and load balancer for PostgreSQL. Written in Rust, PgDog is fast, reliable and scales databases horizontally without requiring changes to application code.
Still trying to figure out how this works technically, is the performance gain really just re-write in rust?
simonw
Is there a binary I can run directly?
valorzard
My question is, has any of them been talked about being upstreamed to postgres itself? Or, adding a custom built in feature to postgres itself?
maherbeg
jeremyjh
bourbonproof
BowBun
Don't pay a startup for your DB proxy, you should own that layer yourself inside of your infrastructure.
TurdF3rguson
andrey-g
mamcx
This kind of tool will help in this case?
zadikian
bart3r
Would love to hear the advantages of moving to PgDog.
snihalani
philippemnoel
fulafel
Wonnk13
SamInTheShell
sgt
redmonduser
dzonga
I don't know how the pg scaling story gets fixed unless certain things are rewritten. that's my fear of going all in pg.
mysql has vitess etc & even upgrades are easier. though pg is more extensible.
melon_tsui
Pet_Ant
faangguyindia
moralestapia
Wrt. the pooler, how do you compare with pgbouncer?
I'm interested because I have a postgres instance, low-traffic but still like ... tens of r(eads)ps. I was not running anything close to the machine limits but still added pgbouncer to improve performance and didn't see a noticeable difference. I was stress-testing the machine obv., I'm not talking about the 10 rps, lol.
For context, my numbers were something like 10k rps +/- 1k vanilla postgres and like 9k rps +/- 1k with pgbouncer in front of it. So ... slightly slower but big error bars so I wouldn't say for sure. I ended up not using pgbouncer as the benefit was immaterial.
Also yeah, in case you want to check it out, it's the db that backs this project: https://httpstate.com.
sandeepkd
Surfacing where and how PG is better than Dynamo or any other database is probably a good starting point instead of calling out PG a silver bullet for everything. At the end of the day its all a trade-off.
s3cur3n3t
999900000999
skiwithuge
gregaccount
afr0ck
esafak
antonvs
we don’t think you would use anything else.
This just seems like fanboyism to me. At the very least, you need to qualify what scenarios you think it's useful for.
I don't doubt that Postgres is good for all the projects you've ever worked on. Generalizing from that, though, is hubristic.
orliesaurus
octernion
i'm sure you'll get 100x comments about "why not just have one fast SSD? it can do 2000 trillion writes/s"
xenophonf
https://github.com/pgdogdev/pgdog/commit/36434f93f03dec1d7d4...
I want to have as much fun as the next developer, but that makes me worry, what with supply chain attacks in the news and all.
exabrial
The reason DBs like Mongo or Dynamo exist is because
Not quite. The reason "DBs" like those exist is purely due to fashion. Lets not kid ourselves into thinking they do anything better, save the exception of making data hard to access, which might be a project goal in some cases.
I'm really happy that there's more options for Postgres sharding and I applaud Pgdog and the team's efforts and energy.
Having said that, this makes it a no-go for me:
shard_number = hash(data) % num_shards
https://docs.pgdog.dev/features/sharding/basics/#terminology
Most sharding solutions distribute the hash value over linear ranges, that then split across "virtual shards", that are then placed on the physical shards or worker. This allows for shard replacement when needed. For example, Citus works this way, and even adds convenience functions for shard migration (using logical migration) in an automated way. That's all I'd need.
Operationally, it's worlds apart. With modulo distribution the only way to replace data is to reshard everything --something you don't want to do however fast the operation may be.