Re: Spilling hashed SetOps and aggregates to disk

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Spilling hashed SetOps and aggregates to disk
Дата
Msg-id 03e73902-e84b-d208-b899-05c791b4ada6@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Spilling hashed SetOps and aggregates to disk  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Spilling hashed SetOps and aggregates to disk  (Robert Haas <robertmhaas@gmail.com>)
Re: Spilling hashed SetOps and aggregates to disk  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers

On 06/11/2018 05:16 PM, Robert Haas wrote:
> On Wed, Jun 6, 2018 at 8:16 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> ... and this is pretty much what Jeff Davis suggested, I think. The
>> trouble is, each of those cases behaves nicely/terribly in different
>> corner cases.
> 
> That's a really good point.  If the number of groups is pretty small
> compared to the number of input tuples, then you really only ever want
> to dump out transition values.  By doing so, you minimize the amount
> of data you have to write.  But if the number of groups is, say, half
> the number of input tuples, then computing transition values before
> you have all the values that belong to that group is probably a waste
> of time.  I wonder if we could decide what to do by comparing the
> number of input tuples consumed to the number of groups created at the
> time we run out of memory.  If we've got less than, I dunno, five
> tuples per group, then assume most groups are small.  Pick a strategy
> that (mainly) spools input tuples.  Otherwise, pick a strategy that
> spools transition tuples.
> 
> In either case, it seems like we can pick a pure hashing strategy or
> switch to using both hashing and sorting.  For example, IIUC, Andres's
> proposal involves spooling mostly input tuples, but can also spool
> transition tuples if the transition values consume more memory as they
> absorb more tuples.  However, you could do a similar kind of thing
> without needing sort support.  When you see a value that's not doesn't
> fit in your in-memory hash table, use the hash code to route it to 1
> of N batch files.  Have a second set of batch files for transition
> tuples in case you need to kick things out of the in-memory hash
> table.  Once you finish reading the input, emit all the values that
> remain in the in-memory hash table and then process each batch file
> separately.
> 
> Similarly, David's strategy involves spooling only transition tuples
> and then sorting on the group key, but it's again possible to do
> something similar without relying on sorting.  Instead of flushing the
> in-memory hash table to a tuple store, split the transition tuples it
> contains among N batch files based on the hash code.  Once you've read
> all of the input, go back and reprocess each batch file, combining
> transition values whenever the same group keys appear in more than one
> transition tuple.
> 

Yeah, essentially something like the batching in hash joins.

> To me, the pure-hashing strategies look a little simpler, but maybe
> there's enough performance benefit from combining hashing and sorting
> that it's worth the complexity, or maybe we should just accept
> whichever variant somebody's willing to code.  But I think we almost
> have to have separate handling for many-row-per-group and
> few-rows-per-group, because those seem fundamentally different.
> 

I think the underlying question is whether we want to treat this as an 
emergency safety against OOM (which should be a rare thing in most 
cases) or something allowing us to pick hash aggregate more often.

It would be great to get something that performs better than just 
falling back to sort (and I was advocating for that), but I'm worried we 
might be moving the goalposts way too far.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: config.{guess,sub} updates
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: file cloning in pg_upgrade and CREATE DATABASE