CAP Theorem

202404301213
Status: #idea
Tags: DC

CAP Theorem

Any networked system providing shared data can provided only 2 of the 3 properties:
- Consistency -> A shared and replicated data item appears as a single, up-to-date copy
- Availability -> Updates will always be eventually executed
- Partition tolerance -> Tolerant to the partitioning of a process group
In a network subject to communication failures, it is impossible to realize an atomic read/write that guarantees a response to every request
Consider 2 processes P and Q
- A-P -> They can no longer communicate, allow P and Q to go ahead independently
- C-P -> Allow only one of P, Q to go ahead. The other one is not participating
- C-A -> P and Q have to be assumed to continue communication
  - Partition is not tolerated. If partition occurs, shutdown the system

Choosing Consistency over Availability

i.e. we are guaranteeing atomic read/writes
May shut down, refuse writes (like 2-phase commit), OR only response for pieces of data whose 'master' node is inside the partition component

Choosing Availibility over Consistency

Will responds to all requests
Address inconsistency via causal ordering vector clocks and application-specific resolution procedures
Eg
- Dynamo systems usually offer both of these
- Cassandra -> Last-writer-wins. Only uses application-specific resolution

Examples

C+A

Single site databases, cluster databases, LDAP, xFS file system
RDBMS, GreenPlum

C+P

Distributed databases, distributed locking, majority protocols, Google BigTable, MongoDB

A+P

Coda (Constant Data Availability Distributed File System) by CMU
Web caching, DNS
Amazon Dynamo, Apache Cassandra, Apache Zookeeper, Apache CouchDB

Asynchronous Network - Impossibility Result

Theorem 1

t is impossible in the asynchronous network model to implement a read/write data object that guarantees:
- Availability AND Atomic consistency
- in all fair executions (including those in which messages are lost)

Proof (by contradiciotn)

Assume an Algorithm $A$ that meets atomicity, availability and partition tolerance
Assume the Process Group $G$ has at least 2 nodes and it can be divided into 2 empty disjoint non-empty sets (partitions) $G_{1}$ and $G_{2}$ . Assume all messages b/w them are LOST
Let $v_{0}$ be the initial value of an object $o$ (present on both nodes) that obeys atomic
Let $α_{1}$ be the prefix of an execution of $A$ in which $w r i t e (o, v_{1})$ is performed in $G_{1}$ and this write operation is the last activity in $α_{1}$
Let $α_{2}$ be the prefix of execution of $A$ in which a $r e a d (o)$ occurs in $G_{2}$
Due to partition, during $α_{2}$ , no message is exchanged from $G_{1}$ to $G_{2}$
Due to availability, both write and read operations must be successful
Now, consider $α = α_{1} α_{2}$
- For nodes in $G_{2}$ , it is impossible to distinguish $α_{2}$ from $α$ (as all messages from $G_{1}$ to $G_{2}$ are lost)
- Thus in $α$ execution, the read request from $α_{2}$ will be honored and will return $v_{0}$
  - However, as per atomicity, read does not begin until write from $α_{1}$ has completed
- Hence, contradicts consistency property, and proves that such an algorithm $A$ does NOT exist.

Corollary

It is impossible in the asynchronous network model (each node has its own clock) to implement a read/write data object that guarantees:
- Availability AND Atomic consistency
- in all fair executions, even when NO messages are LOST

Proof

Assume that $A$ exists
$A$ has no way to determine whether a message is lost or is delayed arbitrarily
Therefore, if there existed an algorithm $A$ in executions in which no messages were lost, then there would exist an algorithm that guarantees atomic consistency in all executions. But this violates Theorem 1

What does it signify?

Look at the application and decide what is more important
When partitioning is unavoidable
- Either go for C or A
- Plan for recovery and resolve inconsistency
- This is application-dependent
Moves designers form theoretical solutions to an engineering solution

Recovery

When a failure occurs, we need to bring the system into an error-free state:
- Forward recovery -> Find a new state from which the system can continueoperation
- Backward recovery ->Bring the system back into a previous error-free state
Practice
- Use backward error recovery, requiring that we establish recovery points
Observation
- Recovery in distributed systems is complicated
- Processes need to cooperate in identifying a consistent state from where to recovery
- Processes periodically save their state to a stable storage. The saved state is a checkpoint
  - Checkpointing software -> Condor, Libckpt, DMTCP

How to save and restart?

Interrupts (say SIGUSR2)
Save state
- Happens in signal handler routine
- Save memory to disk
During restart
- You would load
  - Register content
  - Memory state
- Typically, you would use
- #include <ucontext.h>

Checkpointing in Distributed System

State of each thread/process and communication network needs to be saved
Global snapshot is the set of checkpoints for every participating process
- It needs to be consistent
  - There is no message which is received in a checkpoint by a process $q$ but not yet sent by another process $p$
Recovery line -> Most recent set of snapshots that are consistent
2 types of checkpointing:
- Coordinated checkpoint -> All concurrent processes are frozen and checkpointed at the same time (consistent)
- Independent checkpoint -> Each process is checkpointed independently (small overhead)
TODO image from slide 53 30th Apr

Example of Consistent Snapshot

TODO image from slide 54 30th apr
Global consistency
- $a$ -> Weakly consistent (i.e. can be recovered)
- $b$ -> Strongly consistent (No recovery required)
- $c$ -> Inconsistent. $y$ 's send is not sna by $Q$ , but the recieve is snapshotted in $P$

CAP Theorem

Choosing Consistency over Availability

Choosing Availibility over Consistency

Examples

C+A

C+P

A+P

Asynchronous Network - Impossibility Result

Theorem 1

Proof (by contradiciotn)

Corollary

Proof

What does it signify?

Recovery

How to save and restart?

Checkpointing in Distributed System

Example of Consistent Snapshot

References