Re: Spilling hashed SetOps and aggregates to disk

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Spilling hashed SetOps and aggregates to disk
Дата
Msg-id CAKJS1f9VHga59dyU3tARyhYt-XYA899TzrzfqADGAoiKviSBUA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Spilling hashed SetOps and aggregates to disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Spilling hashed SetOps and aggregates to disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Spilling hashed SetOps and aggregates to disk  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 7 June 2018 at 08:11, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> On 06/06/2018 04:11 PM, Andres Freund wrote:
>> Consider e.g. a scheme where we'd switch from hashed aggregation to
>> sorted aggregation due to memory limits, but already have a number of
>> transition values in the hash table. Whenever the size of the transition
>> values in the hashtable exceeds memory size, we write one of them to the
>> tuplesort (with serialized transition value). From then on further input
>> rows for that group would only be written to the tuplesort, as the group
>> isn't present in the hashtable anymore.
>>
>
> Ah, so you're suggesting that during the second pass we'd deserialize
> the transition value and then add the tuples to it, instead of building
> a new transition value. Got it.

Having to deserialize every time we add a new tuple sounds terrible
from a performance point of view.

Can't we just:

1. HashAgg until the hash table reaches work_mem.
2. Spill the entire table to disk.
3. Destroy the table and create a new one.
4. If more tuples: goto 1
5. Merge sort and combine each dumped set of tuples.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Needless additional partition check in INSERT?
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: commitfest 2018-07