External Aggregation in DuckDB

202407261500
Status: #idea
Tags: e6data

External Aggregation in DuckDB

HT for an aggregation is traditionally not stored in the buffer pool
DuckDB stores both temporary and permanent data in buffer pool
- This means that DuckDB can use all memory for persistent data (if needed)
- If large aggregations are being done, the persistent data can be evicted from emory to make space for a large hash table
Issues with variable page sizes
- Memory fragmentation
  - Short lived allocations are only variable size
  - Use jemalloc for allocations when possible, which reduces fragmentation
- Invalid references
  - Temporary data needs serdes when moving from/to storage
  - Row major is more efficient for hash tables
  - Created a custom page layout
    - Row-major
    - Uses old invalidated pointers to recompute new valid pointers after being read back into memory
    - Places fixed size rows and variable size data on separate pages
    - Pointers are recomputed (lazily) during deserialisation if address of the page has changed
External Aggregation
- All threads have a thread local HT.
- Radix partition based aggregation, where each thread deals with 1 partition, across all thread-local HTs
- They use linear probing to resolve collisions, and salting to reduce the overhead of dealing with collisions. Explained here
- Now, each thread has one hash table, but the data in each HT is partitioned
  - Data is stored on the page layout described above
- HT’s are not resized during thread-local pre-aggregation
  - HT sizes are kept small, to reduce cache misses
  - If HT is full, it is reset and they start over. TODO
  - Data pages are unpinned, so the buffer manager can write them to disk if memory is needed.
- Partition-Wise Aggregation
  - Each thread combines data of a single partition into a HT
  - More partitions than threads are created
    - Size of partitions will be smaller
      - All partitions will not be combined simultaneously, so less memory footprint

External Aggregation in DuckDB

References