Has anybody tried to make use of logical replication with desk partitioning for async service communication?
Proof of idea (with pictures): https://gist.github.com/shuber/8e53d42d0de40e90edaf4fb182b59dfc
Companies would commit messages to their very own databases together with the remainder of their information (with the identical transactional ensures) after which messages are “realtime” replicated (with all of its options and ensures) to the receiving service’s database the place their staff (e.g. que-rb, skip locked polling, and so on) are ready to reply by inserting messages into their database to be replicated again.
Throw in a set off to robotically acknowledge/cleanup/notify messages and I feel we have got one thing that resembles a queue? Possibly make that very same set off match incoming messages towards a “routes” desk (primarily based on message sort, sure JSON schemas within the payload, and so on) and write matches to the que-rb jobs desk as a substitute for some form of distributed/replicated work queue hybrid?
My motivations for this line of considering had been largely primarily based round excessive availability and isolating service downtime from one another. Our PostgreSQL databases are probably the most important items of infrastructure for all of our providers – if it is down then we do not need the impacted service to even try to be doing work. Then again, we do not desire a service’s downtime (even for upkeep) to impression its skill to obtain (queued) messages from different providers that it will possibly resume consuming (as soon as, so as) when it is again up.
We’re exploring different message queues however preserve getting drawn again to PostgreSQL as a result of we are able to get the identical transactional ensures with our messages/jobs as the remainder of our information. Even the act of enqueuing a job or sending a message to a different service is one thing that should be dedicated and may be rolled again like the whole lot else.
For our potential use case particularly, we’re not coping with excessive ranges of realtime site visitors and so on – we’re not even near 1k jobs/messages per second.
I am trying to poke holes on this idea earlier than sinking anymore time exploring the thought. Any suggestions/warnings/issues could be a lot appreciated, thanks on your time!