CockroachDB serverless internals

April 1, 2022

Recently, I was on a livestream with Aydrian, where we talked about all things related to serverless databases.

Later, for a demo, migrated a Next.js app using Prisma and Heroku’s PostgreSQL onto CockroachDB serverless. This was just to play around with the new Prisma and CockroachDB integration.

Later, as I talked to other folks from the Cockroach community like their VP of Engineering, Jordan, and Dominik, I gained some more unique insights into the platform that I wanted to share with y’all.

The foundation

CockroachDB doesn’t store rows in the same was Postgres does. Internally:

  • Every table is mapped into a sorted keyspace
  • Rows become key-value entries
  • Indexes are separate keyspaces
  • Everything lives in a replicated KV store backed by Pebble, which is a RocksDB style LSM

So SQL in CockroachDB is like a logical layer on top of a distributed key–value store, not the system that actually stores data.

Sharding (Ranges)

Traditionally, solutions like Vitess have used a process of sharding (storing data across multiple machines) to horizontally scale SQL.

CockroachDB takes a unique approach here:

  • Data is split into ranges
  • Each range auto-splits, auto-moves, and is independently replicated via Raft.
  • No shard keys
  • No concept of a database-level primary

So basically, Vitess scales by moving queries to shards and CockroachDB scales by moving data continuously.

Replication

Each range is backed by a Raft group with 1 leader, multiple followers (usually 3 replicas), and the writes go leader → followers → commit.

Different ranges have different leaders, and leadership is spread across the cluster. This design is a double edged sword. It’s the reason why Cockroach can survive node and zone failures but it’s also why writes are more expensive than single-node Postgres.

MVCC

CockroachDB uses MVCC with timestamps, where each key can have multiple versions, reads operate on a consistent snapshot at a specific timestamp, and writes first create provisional versions that are finalized on commit.

Writes inside a transaction produce intents:

  • provisional MVCC versions tied to the transaction
  • act like distributed write locks without a central lock manager
  • are stored directly in the KV layer
  • are resolved asynchronously on commit

The transactions are globally serializable by default and coordinated through transaction records. A pretty expensive process overall.

What’s different

CockroachDB and their serverless offering are essentially the same distributed system, just wrapped in multi-tenant control planes. That’s where all the magic happens.

Serverless splits responsibilities:

LayerBehavior
KV + RaftLong-lived, shared, stable
SQL computeElastic, on-demand, multi-tenant

SQL gateways spin up/down with load, execute query plans, and route reads/writes to KV ranges

KV nodes on the other hand keep data alive and maintain Raft state.

The end result is a system where durability and consistency live in a stable core, while query execution scales elastically as per demand.