Using MVCC on a Binary Tree
202504291955
Status: #idea
Tags: DSA
Using MVCC on a Binary Tree
Versioning nodes
- Instead of versions on rows of a DB, we have versions of nodes
- Each node stores version information, in addition to its children and data
- Treat nodes as immutable. Any changes require a new version to be create
- Each node maintains a chain of alternate versions
- Imo, it would be best if all nodes of each version of the tree are connected
- This way, we can jump both across versions of a node, and also traverse through the tree at any node
- Visualize this as a bunch of BT’s on top of another. Each layer represents a version
- Imo, it would be best if all nodes of each version of the tree are connected
Important
All references to timestamps are logical timestamps and not wall-clock timestamps
Propagation of changes to root
- Notice that in a tree, any changes to any part of a given nodes subtree, will result in a version change propagating till the root
- For each change, all ancestor nodes need to move up a version
- Why? Because their child pointers will change
- For each change, all ancestor nodes need to move up a version
Consistency
- After building the out an new version tree (due to changes propagating to the root node, we can atomically replace the tree version by swapping out the root nodes.
- This can be considered equivalent to a commit operation
Reader/writer design
- Similar to the start and end timestamps for each transaction that we saw in a DB, we will need to maintain the start and end timestamp for each reader and writer
- Readers
- Note that we are using snapshots for readers to ensure that we do not get inconsistency due to writers
- Multiple readers can all read from the same copy of the tree (i.e. the same version of the root), provided that they all start after the same writer (i.e. all readers which occur occur between 2 consecutive writer instances)
- Writers
- Writers are going to modify the tree, so we need to create a new version
- Any change, propagates up, as described in Using MVCC on a Binary Tree#Propagation of changes to root
Garbage collection of obsolete versions
- Unused versions need to be reclaimed
- Readers/writers register their active version timestamps. All versions older than the oldest active timestamps can be GC-ed
Performance optimizations
Read-optimized traversal
- Maintain bitmap, which as acts as a summary indicating which region has been modified, allowing readers to skip unmodified sub-trees
- Cache-aware tree
- Separate the metadata and data, in order to reduce cache misses
Write-optimized
- Delta encoding
- Just like in the DB, we can use delta encoding for the data.
- Useful if the data is large. However, this reduces cache locality, as trees are may not be contiguous data structures
Observed behavior
- Readers do not block writers
- Operate on a specific older snapshot
- Do not acquire locks
- Writers can concurrently create new versions
- Writers do not block readers
- They create new node versions and path
- Atomically update the root node
- Readers use older snapshots, and are thus unaffected
- Readers have read consistency
- During a read, the reader uses the same snapshot
- Writes are not serialized
- If both writers start, and then one commits, and then the next, the later one might overwrite the earlier one