Multiversion concurrency control (MVCC)
202504291915
Status: #idea
Tags:
Multiversion concurrency control (MVCC)
Introduction
- Logical object → Identified by primary key
- DB maintains multiple physical versions of a logical object
- Txn will see a consistent snaphshot of the DB form the start time of txn
- Each txn has an end timestamp and begin timestamp
Ideas
- Readers do not block writer AND writers do not block readers
- Read-only txns can read a consistent snapshot w/o acquiring locks
- Use timestamp (logical) to determine visibility
Info
MVCC without GC supports time-travel queries
Example

Notice that
Attention
Warning
Snapshot isolation is susceptible to write-skew anomaly
Version storage
- Observation: Versions form a singly linked list
1) Append-only storage
- Worst
- New versions appended to same table
- Entire row is copied
Version chain ordering
- Oldest to Newest (O2N)
- Lookups traverse (version) chain
- Newest to Oldest (N2O)
- Need to update the index pointers every version
- Better (most txns only care about latest value)
2) Time-travel storage
- Entire row is copied
- All older versions are stored in a separate (time-travel) table
3) Delta storage
- Best.
- Only sensible approach (imo)
- Only columns which are changed are stored.
- Delta storage table acts as an “undo-log”
Garbage collection
- Clean up versions not used by active txn
- 2 design questions
- How to look for expired txns?
- How to decide if safe to reclaim
1) Tuple-level
a) Background vacuuming
- Dedicated worker does GC
- Dirty tuple bitmap
- Only scans dirty tuples for GC
- More popular
b) Cooperative cleaning
- Workers clean when performing table scans
- Only works on O2N chains
- Data is never cleaned if it is not accessed
2) Txn-level
- Txn tracks which versions it used
- Rarely implemented
Index management
- Primary indexes point to version chain head (trivial)
- Secondary indices need to be managed
1) Logical pointer
- Uses primary key as pointer
- Requires a translation table
- Not used IRL
2) Physical address
- Literal pointer
- Optimization: Postgres does not update secondary index if new version is in the same page
- Occurs
of the time
- Occurs
Deletes
- Can physically delete a tuple only when all versions of a logically deleted tuple are not visible
1) Deleted flag
- Maintain a flag
- Either in tuple header or a separate column
2) Tombstone tuple
- Create an empty physical version to indicate