Distributed Systems: Understanding LWW, Quorums, and Clock S

In my previous post, we explored the intricacies of Read-after-Write consistency in leader-based systems. We came to a conclusion that achieving strong consistency at the individual record level is relatively straightforward when you route reads and writes to the single leader node.

As a quick refresher: In a Leader-based architecture, a single node is assigned as the main coordinator for a specific partition and key. This node accepts all incoming writes.

However, there is another architectural paradigm: Leaderless Replication (commonly associated with Apache Cassandra).

Software engineers often prescribe leaderless architectures as a magic pill for massive write throughput and high availability. However, a common misconception is that moving to a leaderless model is a simple swap from systems like DynamoDB or MongoDB. While those databases can be configured for strong consistency at the individual record level by leveraging their leader-based nature, Cassandra is a different beast that requires a mental shift in how we reason about data integrity.

The Concurrency Conflict

Without a leader, we fall into a classic distributed systems trap: The Lost Update. In a leaderless world, the responsibility for a single key is shared across multiple nodes.

To see this in action, imagine there is a website where you can donate to charities. Alice and Bob are both concurrently donating money to the same charity. Your team owns a service that tracks a total amount of money raised in a Charity table with a total_raised field and you use Cassandra.

In a traditional leader-based database, one leader would accept these operations. In a leaderless world, Alice and Bob might talk to two different nodes simultaneously. Neither node may know that the other is currently processing an update for the same record.

The resulting code - which looks perfectly fine becomes a massive liability in production:

// Alice and Bob trigger this concurrently across different service instances
public void processDonation(String charityId, BigDecimal donationAmount) {
    // 1. FETCH: Both threads read the current state (e.g., $100)
    Charity status = charityRepository.findById(charityId); 
    
    // 2. MODIFY: Alice calculates 100 + 10, Bob calculates 100 + 20
    BigDecimal updatedTotal = status.getTotal().add(donationAmount);
    status.setTotal(updatedTotal);
    
    // 3. PERSIST: Both attempt to save their version back to the cluster
    charityRepository.save(status); 
}

The "Silent" Data Loss

This is where your on-call nightmare may kick in. In a leaderless system, Last Write Wins (LWW) is often chosen as the mechanism for conflict resolution. Because the database lacks a central coordinator to order events, it doesn't "fall back" to LWW; it relies on it to merge diverging versions of records.

Imagine Alice and Bob’s updates hitting the cluster simultaneously. The system compares the microsecond timestamps of the two writes: if Bob’s request reached his node even 1ms after Alice’s reached hers, Bob’s version ($120) becomes the official record. Alice’s $10 donation doesn't "fail" or throw an error -it simply vanishes. The system didn't crash; it just silently overwrote history.

The Quorum Illusion

"But wait," you might say, "can't we just use a Quorum to fix this?"

In leaderless systems, a Quorum is a mechanism where your system ensures that a majority of nodes agree on a piece of data before a read or write is considered successful. The industry typically relies on the formula:

W + R > N

Quick Refresher: To achieve a Quorum, you write to a majority of nodes (W) and read from a majority of nodes (R). If the total number of nodes you've touched W + R is greater than the total number of replicas N, math dictates that at least one node in your read should have the latest write under normal conditions

It looks like a perfect solution for consistency. However, even with a perfect Quorum, you are still at the mercy of the Last Write Wins (LWW) rule.

As Martin Kleppmann notes in Designing Data-Intensive Applications, Quorums help you find the latest version of a record, but they don't help you sequence concurrent operations. If Alice and Bob both reach a Quorum with different values for the same counter at the same time, the system doesn't "merge" them - it just picks the one with the slightly newer timestamp and deletes the other.

In a leaderless world, 1 + 1 doesn't always equal 2. Sometimes, the second 1 just arrives a microsecond late and overwrites the first. Quorums provide a consistent view of a conflict, but they do not resolve it. They merely ensure you see the collision before LWW blindly chooses a winner

Clock Skew

Even if your Quorum is perfectly configured (W+R > N) and your network is 100% healthy, your physical hardware can be the culprit.

In a leader-based system, we rely on the leader. In a leaderless system like Cassandra, there is no "source of truth" for time. Each node (or client) attaches its own timestamp to a write. This leads to a phenomenon known as Clock Skew - where two nodes' internal clocks are slightly out of sync.

The "Reverse" Lost Update

Imagine Alice and Bob both send a donation at nearly the same time:

Alice sends her write first (at 10:00:01.005). However, the node she hits has a clock that is running fast. It tags her write with a timestamp of 10:00:01.010.
Bob sends his write second (at 10:00:01.008). The node he hits has a perfectly accurate clock. It tags his write with 10:00:01.008.

Even though Bob’s update is physically the latest version of the truth, the database compares the numbers and sees that Alice’s timestamp (10:00:01.010) is "newer" than Bob’s (10:00:01.008). The system keeps Alice's older data and discards Bob’s more recent update.

This is why Cassandra introduced Lightweight Transactions (LWT). Under the hood, LWT initiates a consensus algorithm (like Paxos) to ensure that all participating nodes agree on the state before the update is committed. In a way, an LWT turns your leaderless system into a "temporary leader" system.

While LWTs help to stop the "Lost Update," they come with in an expense of latency and throughput. There are other advanced mechanisms designed to handle concurrent merging - such as CRDTs (Conflict-free Replicated Data Types) - but that is a complex topic that deserves a dedicated post of its own.

Why Last Write Wins can be a villain

The Concurrency Conflict

The "Silent" Data Loss

The Quorum Illusion

Clock Skew

The "Reverse" Lost Update

Comments

More from this blog

Did Version Vectors Lose?

20 Years of Eventual Consistency: Why We’re Still Falling for the Same Pitfalls

Command Palette

The Concurrency Conflict

The "Silent" Data Loss

The Quorum Illusion

Clock Skew

The "Reverse" Lost Update

Comments

More from this blog