Re: Memory-Bounded Hash Aggregation

Поиск
Список
Период
Сортировка
От Adam Lee
Тема Re: Memory-Bounded Hash Aggregation
Дата
Msg-id 20190802064405.GA94036@mars.local
обсуждение исходный текст
Ответ на Re: Memory-Bounded Hash Aggregation  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Memory-Bounded Hash Aggregation  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
> High-level approaches:
> 
> 1. When the in-memory hash table fills, keep existing entries in the
> hash table, and spill the raw tuples for all new groups in a
> partitioned fashion. When all input tuples are read, finalize groups
> in memory and emit. Now that the in-memory hash table is cleared (and
> memory context reset), process a spill file the same as the original
> input, but this time with a fraction of the group cardinality.
> 
> 2. When the in-memory hash table fills, partition the hash space, and
> evict the groups from all partitions except one by writing out their
> partial aggregate states to disk. Any input tuples belonging to an
> evicted partition get spilled to disk. When the input is read
> entirely, finalize the groups remaining in memory and emit. Now that
> the in-memory hash table is cleared, process the next partition by
> loading its partial states into the hash table, and then processing
> its spilled tuples.

I'm late to the party.

These two approaches both spill the input tuples, what if the skewed
groups are not encountered before the hash table fills up? The spill
files' size and disk I/O could be downsides.

Greenplum spills all the groups by writing the partial aggregate states,
reset the memory context, process incoming tuples and build in-memory
hash table, then reload and combine the spilled partial states at last,
how does this sound?

-- 
Adam Lee



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Shawn Wang
Дата:
Сообщение: Re: WIP: Data at rest encryption
Следующее
От: Daniel Migowski
Дата:
Сообщение: Proposal: Clean up RangeTblEntry nodes after query preparation