Processing Model
202406212051
Status: #idea
Tags: CMU Advanced Database Systems
Processing Model
- Defines how system executes a query plan and moves data from one data to the next
- Consists of 2 types of execution paths:
- Control flow -> How DBMS invokes an operator
- Data flow -> How operator sends its result
- Output of an operator can either be whole tuples (NSM) or subsets of columns (DSM)
Iterator Model
- AKA volcano or pipeline model
- Each query plan operator implements a
Next()function- On each invocation, operator returns a single tuple or EOF marker
- Operator implements a loop that calls next on its children and then processes them
- Each operator also implements
Open()andClose(), which are analogous to constructors and destructors - Used in almost every DBMS
- Easy to implement/debug
- Output control works easily with this approach (like limit)
- Allows pipelineing
- Pipeline breaking is an operator that cannot finish until all its children emit all their tuples
- Tuple will often remain in cache
- Downside
- Lot of function calls
- Mixes control flow and data flow
Materialization Model
- Each operator processes its input all at once, and then emits all at once
- DBMS can push down hints (Eg.
LIMIT) to avoid scanning too many tuples - Great for OLTP, because workload queries a small number of tuples at time
- Bad for OLAP with large intermediate results
Vectorization Model
- Like the iterator model, but
Next()emits a batch of tuples - Best of both worlds
- Operator's loop processes a vector of tuples
- Size of batch can be set based on hardware or query
- Each batch will contain a null bitmap as well.
- Ideal for OLAP. Works well with out-of-order CPUs
- No data or control dependencies
- Operators perform work in tight for-loops