Re: Spilling hashed SetOps and aggregates to disk

Поиск

Список

Период

Сортировка

От	David Rowley
Тема	Re: Spilling hashed SetOps and aggregates to disk
Дата	7 июня 2018 г. 03:11:37
Msg-id	CAKJS1f9VHga59dyU3tARyhYt-XYA899TzrzfqADGAoiKviSBUA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Spilling hashed SetOps and aggregates to disk (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы	Re: Spilling hashed SetOps and aggregates to disk Re: Spilling hashed SetOps and aggregates to disk
Список	pgsql-hackers

Дерево обсуждения

On 7 June 2018 at 08:11, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> On 06/06/2018 04:11 PM, Andres Freund wrote:
>> Consider e.g. a scheme where we'd switch from hashed aggregation to
>> sorted aggregation due to memory limits, but already have a number of
>> transition values in the hash table. Whenever the size of the transition
>> values in the hashtable exceeds memory size, we write one of them to the
>> tuplesort (with serialized transition value). From then on further input
>> rows for that group would only be written to the tuplesort, as the group
>> isn't present in the hashtable anymore.
>>
>
> Ah, so you're suggesting that during the second pass we'd deserialize
> the transition value and then add the tuples to it, instead of building
> a new transition value. Got it.

Having to deserialize every time we add a new tuple sounds terrible
from a performance point of view.

Can't we just:

1. HashAgg until the hash table reaches work_mem.
2. Spill the entire table to disk.
3. Destroy the table and create a new one.
4. If more tuples: goto 1
5. Merge sort and combine each dumped set of tuples.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Spilling hashed SetOps and aggregates to disk