Re: Spilling hashed SetOps and aggregates to disk
| От | Jeff Davis |
|---|---|
| Тема | Re: Spilling hashed SetOps and aggregates to disk |
| Дата | |
| Msg-id | 1529473003.3820.44.camel@j-davis.com обсуждение исходный текст |
| Ответ на | Re: Spilling hashed SetOps and aggregates to disk (David Gershuni <dgershun@cs.cmu.edu>) |
| Ответы |
Re: Spilling hashed SetOps and aggregates to disk
|
| Список | pgsql-hackers |
On Fri, 2018-06-15 at 12:30 -0700, David Gershuni wrote:
> For example, imagine a tuple that belongs to a group G in grouping
> set A had to be
> spilled because it also belonged to an evicted group from grouping
> set B. Then group
> G would remain in set A’s hash table at the end of the phase, but it
> wouldn’t have
> aggregated the values from the spilled tuple. Of course, work-arounds
> are possible,
> but the current approach would not work as is.
I was imagining that (in the simplest approach) each hash table would
have its own set of spilled tuples. If grouping set A can be advanced
(because it's present in the hash table) but B cannot (because it's not
in the hash table and we are at the memory limit), then it goes into
the spill file for B but not A. That means that A is still finished
after the first pass.
But I am worried that I am missing something, because it appears that
for AGG_MIXED, we wait until the end of the last phase to emit the
contents of the hash tables. Wouldn't they be complete after the first
phase?
I realize that this simple approach could be inefficient if multiple
hash tables are spilling because we'd duplicate the spilled tuples. But
let me know if it won't work at all.
> But as you mentioned, this solution relies on all grouping keys being
> sortable, so we
> would need a fallback plan. For this, we could use your hash-based
> approach, but we
> would have to make modifications. One idea would be to split each
> grouping set into its
> own aggregate phase, so that only one hash table is in memory at a
> time. This would
> eliminate the performance benefits of grouping sets when keys aren’t
> sortable, but at
> least they would still work.
I am having a hard time trying to satisfy all of the constraints that
have been raised so far:
* handle data types with hash ops but no btree ops
* handle many different group size distributions and input orders
efficiently
* avoid regressions or inefficiencies for grouping sets
* handle variable-size (particularly O(n)-space) transition states
without taking on a lot of complexity. I like to work toward the right
design, but I think simplicity also factors in to the right design.
NodeAgg.c is already one of the larger and more complicated files that
we have.
A lot of the complexity we already have is that grouping sets are
performed in one giant executor node rather than exposing the
individual phases as separate executor nodes. I suppose the reason
(correct me if I'm wrong) is that there are two outputs from a phase:
the aggregated groups, and the original input tuples in a new sort
order. That seems solvable -- the executor passes bitmaps around until
BitmapHeapScan turns them into tuples.
Regards,
Jeff Davis
В списке pgsql-hackers по дате отправления: