CAP Theorem
202404301213
Status: #idea
Tags: DC
CAP Theorem
- Any networked system providing shared data can provided only 2 of the 3 properties:
- Consistency -> A shared and replicated data item appears as a single, up-to-date copy
- Availability -> Updates will always be eventually executed
- Partition tolerance -> Tolerant to the partitioning of a process group
- In a network subject to communication failures, it is impossible to realize an atomic read/write that guarantees a response to every request
- Consider 2 processes P and Q
- A-P -> They can no longer communicate, allow P and Q to go ahead independently
- C-P -> Allow only one of P, Q to go ahead. The other one is not participating
- C-A -> P and Q have to be assumed to continue communication
- Partition is not tolerated. If partition occurs, shutdown the system
Choosing Consistency over Availability
- i.e. we are guaranteeing atomic read/writes
- May shut down, refuse writes (like 2-phase commit), OR only response for pieces of data whose 'master' node is inside the partition component
Choosing Availibility over Consistency
- Will responds to all requests
- Address inconsistency via causal ordering vector clocks and application-specific resolution procedures
- Eg
- Dynamo systems usually offer both of these
- Cassandra -> Last-writer-wins. Only uses application-specific resolution
Examples
C+A
- Single site databases, cluster databases, LDAP, xFS file system
- RDBMS, GreenPlum
C+P
- Distributed databases, distributed locking, majority protocols, Google BigTable, MongoDB
A+P
- Coda (Constant Data Availability Distributed File System) by CMU
- Web caching, DNS
- Amazon Dynamo, Apache Cassandra, Apache Zookeeper, Apache CouchDB
Asynchronous Network - Impossibility Result
Theorem 1
- t is impossible in the asynchronous network model to implement a read/write data object that guarantees:
- Availability AND Atomic consistency
- in all fair executions (including those in which messages are lost)
Proof (by contradiciotn)
- Assume an Algorithm
that meets atomicity, availability and partition tolerance - Assume the Process Group
has at least 2 nodes and it can be divided into 2 empty disjoint non-empty sets (partitions) and . Assume all messages b/w them are LOST - Let
be the initial value of an object (present on both nodes) that obeys atomic - Let
be the prefix of an execution of in which is performed in and this write operation is the last activity in - Let
be the prefix of execution of in which a occurs in - Due to partition, during
, no message is exchanged from to - Due to availability, both write and read operations must be successful
- Now, consider
- For nodes in
, it is impossible to distinguish from (as all messages from to are lost) - Thus in
execution, the read request from will be honored and will return - However, as per atomicity, read does not begin until write from
has completed
- However, as per atomicity, read does not begin until write from
- Hence, contradicts consistency property, and proves that such an algorithm
does NOT exist.
- For nodes in
Corollary
- It is impossible in the asynchronous network model (each node has its own clock) to implement a read/write data object that guarantees:
- Availability AND Atomic consistency
- in all fair executions, even when NO messages are LOST
Proof
- Assume that
exists has no way to determine whether a message is lost or is delayed arbitrarily - Therefore, if there existed an algorithm
in executions in which no messages were lost, then there would exist an algorithm that guarantees atomic consistency in all executions. But this violates Theorem 1
What does it signify?
- Look at the application and decide what is more important
- When partitioning is unavoidable
- Either go for C or A
- Plan for recovery and resolve inconsistency
- This is application-dependent
- Moves designers form theoretical solutions to an engineering solution
Recovery
- When a failure occurs, we need to bring the system into an error-free state:
- Forward recovery -> Find a new state from which the system can continueoperation
- Backward recovery ->Bring the system back into a previous error-free state
- Practice
- Use backward error recovery, requiring that we establish recovery points
- Observation
- Recovery in distributed systems is complicated
- Processes need to cooperate in identifying a consistent state from where to recovery
- Processes periodically save their state to a stable storage. The saved state is a checkpoint
- Checkpointing software -> Condor, Libckpt, DMTCP
How to save and restart?
- Interrupts (say
SIGUSR2) - Save state
- Happens in signal handler routine
- Save memory to disk
- During restart
- You would load
- Register content
- Memory state
- Typically, you would use
#include <ucontext.h>
- You would load
Checkpointing in Distributed System
- State of each thread/process and communication network needs to be saved
- Global snapshot is the set of checkpoints for every participating process
- It needs to be consistent
- There is no message which is received in a checkpoint by a process
but not yet sent by another process
- There is no message which is received in a checkpoint by a process
- It needs to be consistent
- Recovery line -> Most recent set of snapshots that are consistent
- 2 types of checkpointing:
- Coordinated checkpoint -> All concurrent processes are frozen and checkpointed at the same time (consistent)
- Independent checkpoint -> Each process is checkpointed independently (small overhead)
- TODO image from slide 53 30th Apr
Example of Consistent Snapshot
- TODO image from slide 54 30th apr
- Global consistency
-> Weakly consistent (i.e. can be recovered) -> Strongly consistent (No recovery required) -> Inconsistent. 's send is not sna by , but the recieve is snapshotted in