Parallel Grouped Aggregation in DuckDB
202407261508
Status: #idea
Tags: e6data
Parallel Grouped Aggregation in DuckDB
- Uses linear probing due to better cache locality
- For efficient resize: 2 part aggregate HT
- Separately allocated pointing array
- Contains block, offset pairs
- On resize
- Allocate a new, larger pointing array
- Read all payload blocks, hash them and fill in pointers in the new pointing array
- Optimisations
- Store hash along with the row, to prevent rehashing
- Separately allocated pointing array
- Probing optimisation
- Following the pointer creates random memory access
- Pointer array now, stores some bytes of the hash, in addition to the block-offset pair
- So, if hash doesn’t match, no need to follow the pointer
- Has nice side-effect of greatly reducing full group comparisons
- Minor memory utilisation optimisation
- For smaller HT’s, DuckDB supports both 32-bit and 64-bit pointer arrays
- Code is found in
aggregate_hashtable.cpp - Concurrency-friendly HT’s are not a solved problem yet (as of March 2022)
- For parallelism, they use the method from this TUM paper
- Partitions are created using the radix partitioning scheme on the hash
- We do NOT need to merge these partitions into a single giant HT
- DuckDB-specific optimisations
- Only start partitioning if aggregate HT exceeds a fixed number of entries (tuneable threshold. Requires benchmarking)
- Since partitioning is local to a thread, it is possible that only some threads partition.
- If this happens, we will need to partition the thread-local HT’s before the merge
- This is entirely thread local
- We stop adding values to a HT once its pointer array exceeds a certain threshold
- Every thread then builds multiple sets of (potentially) partitioned tables
- This leads to multiple entries for the same group, but that is okay, as during combination, all this is dealt with
- Only start partitioning if aggregate HT exceeds a fixed number of entries (tuneable threshold. Requires benchmarking)
- NOTE: Some aggregations can not be parallelised (e.g.
median), but DuckDB providesapprox_quanitleas an alternative