Mergeable Summaries for Heavy Hitters
202503031501
Status: #idea
Tags: UtahDB
Mergeable Summaries for Heavy Hitters - Misra-Gries and SpaceSaving Algorithms
Misra-Gries Algorithm
- Maintains
counters - Steps
- If item is tracked, counter++
- If not tracked
- If space → Add with count 1
- If no spacee →
counters--, and if counter=0, remove it
Info
Dropping things as a batch improves runtime, which is why we prefer to remove all 0’s aggressively, instead of dropping as we need space.
Merging MG Summaries
- Merge the matching elements
- If we have more than
elements, we prune (repeatedly) - Find the
th element, call it - Subtract
from all elements - Prune all elements
- Find the
Attention
We could alternatively sort it, and prune everything after the largest
Tldr
When you merge multiple MG summaries, you are not worse-off in terms of error