For most databases (like Postgres), you typically run a single database (per shard, possibly), and replicate changes to a live read-only backup as fast as possible. If the live R/W database fails, you quickly switch the backup to R/W, and point traffic there instead.
Then, there's a class of databases that tries to actively commit across multiple geographies. You pay a cost (in terms of latency, and typically also $$$), but when a commit succeeds, it has been written durably and reliably, using some consensus protocol, across multiple geographies.
The exemplar is probably Spanner, which uses atomic clocks to get very specific about time to narrow the latency gap as much as possible. Cockroach is broadly in the same class, although without atomic clocks I believe it's using network roundtrip measurements and/or some kind of mathematical time abstraction (like counters of come kind) to do the same thing. Can't ever be quite as fast, but you don't need atomic clocks!
What's _really_ funny is when people start out choosing Spanner because of its global replication, then decide it's too expensive, and settle on regional non-replicated Spanner DBs to save cost. Like, that's just a database, man. (Or maybe something slightly above a single database, like Aurora replicated across Availability Zones in the same Region).
Other folks can chime in, but there are a growing number of databases in this class. TiDB I believe is one. I _thought_ PlanetScale was just sharded mysql (Vitess+MySQL = clever auto-(re-)sharding), but perhaps it does replicated writes too - I see it getting mentioned here a bunch.
Assuming I need to host on prem, do any fully open source solutions exist for this .
It really looks like every database company is trying to become Oracle. You want your clients to be trapped and unable to leave, so if you hypothetically just up the price by 30 or 40% upon renewal they either have to rewrite their entire stack, or pay the piper.
CockroachDB is basically "run postgres on a cluster with more fault tolerance" - you can have machines (or entire datacenters) going down, netsplits etc. and as long as there's enough infra up to keep going, it will.
Presumably only a small subset of postgres users really need this feature - and those that do, are big enough to need an enterprise licence.
I'll admit I haven't worked directly in this space in a good while, but the whole mystery terms really rubs me the wrong way .
For example if I have a company that provisions databases on behalf of my clients, is this 10 million revenue cap for my company, or for the clients themselves .
The pricing isn't even on the website for self hosting, I presume it's one of those if you need to ask you can't afford it type situations.
Plus you're locking yourself into a vendor that has no worries about changing its terms again later on.
>Required only during the trial period. Businesses that cannot accommodate telemetry may contact sales to request an exception. Paid use does not require telemetry.
From some of the industries I've worked in, this is a massive red flag. We don't want to give you telemetry at any point in our process.
Of special interest is that they are maintaining a completely free pre-rugpull version of CockroachDB that was forked before Cockroach's retroactively relicensed security fixes.
I would look seriously at using that instead of starting down Cockroach's free with telemetry offering.
Sharding (huge data and local distribution, even worldwide) and HA by retaining serializable transactions. Possibly easier to operate.
The downsides are:
- slower - Postgres (if it can handle the amount of data, which is very much on proper hardware and partitioning of > 1B row tables) is much faster, esp. for joins
Does it perform significantly better to justify the cost? Back in the day I worked heavily with databases and we always tilted towards open source.