Re: Memory-Bounded Hash Aggregation

Поиск

Список

Период

Сортировка

От	Adam Lee
Тема	Re: Memory-Bounded Hash Aggregation
Дата	2 августа 2019 г. 06:44:05
Msg-id	20190802064405.GA94036@mars.local обсуждение исходный текст
Ответ на	Re: Memory-Bounded Hash Aggregation (Jeff Davis <pgsql@j-davis.com>)
Ответы	Re: Memory-Bounded Hash Aggregation
Список	pgsql-hackers

Дерево обсуждения

> High-level approaches:
> 
> 1. When the in-memory hash table fills, keep existing entries in the
> hash table, and spill the raw tuples for all new groups in a
> partitioned fashion. When all input tuples are read, finalize groups
> in memory and emit. Now that the in-memory hash table is cleared (and
> memory context reset), process a spill file the same as the original
> input, but this time with a fraction of the group cardinality.
> 
> 2. When the in-memory hash table fills, partition the hash space, and
> evict the groups from all partitions except one by writing out their
> partial aggregate states to disk. Any input tuples belonging to an
> evicted partition get spilled to disk. When the input is read
> entirely, finalize the groups remaining in memory and emit. Now that
> the in-memory hash table is cleared, process the next partition by
> loading its partial states into the hash table, and then processing
> its spilled tuples.

I'm late to the party.

These two approaches both spill the input tuples, what if the skewed
groups are not encountered before the hash table fills up? The spill
files' size and disk I/O could be downsides.

Greenplum spills all the groups by writing the partial aggregate states,
reset the memory context, process incoming tuples and build in-memory
hash table, then reload and combine the spilled partial states at last,
how does this sound?

-- 
Adam Lee

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Memory-Bounded Hash Aggregation