Re: Spilling hashed SetOps and aggregates to disk

Поиск
Список
Период
Сортировка
От David Gershuni
Тема Re: Spilling hashed SetOps and aggregates to disk
Дата
Msg-id 99EFF86A-85AA-41AE-9414-DBEAD4EAC06F@cs.cmu.edu
обсуждение исходный текст
Ответ на Re: Spilling hashed SetOps and aggregates to disk  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Spilling hashed SetOps and aggregates to disk  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Jun 21, 2018, at 1:04 PM, Jeff Davis <pgsql@j-davis.com> wrote:
On Thu, 2018-06-21 at 11:04 -0700, David Gershuni wrote:
This approach seems functionally correct, but I don't like the idea
of
transforming O(N) tuples of disk I/O into O(S*N) tuples of disk I/O
(in the worst case).

It's the same amount of I/O as the idea you suggested as putting the
hash tables into separate phases, but it may consume more disk space at
once in the worst case.

We can mitigate that if it becomes a problem later.

That’s true. Both fall-back approaches would suffer from high I/O.
However, the HashSort-based would offer much better performance,
assuming the sorting issue is overcome.


b) is accomplished by not sorting groups by their actual grouping
keys, but
instead sorting by their hash values. This works because we don't
need a 
true sort. We just need to process identical groups in a consistent
order
during the merge step. As long as we're careful about collisions
during the
merge, this should work.

Can you expand on how we should handle collisions? If all we can do is
hash and compare equality, then it seems complex to draw group
boundaries.

In the HashSort algorithm, the sort order is only used as a tool in the
merge step. In the merge step, we will process one grouping set
at a time.

By sorting before merging, we ensure that we’ll encounter
all partial aggregations of a group at once by looking at the head of each
spill file. This allows us to maintain a ‘current group’ and combine all
instances of it from each spill file. Then when no more instances of that
group are present, we emit the group as output. So the order can be
anything, as long as it’s consistent.

To handle hash collisions, we can do the following:

1) We track the current hash code we’re processing, in ascending
order.

2) Instead of combining one group at at time, we’ll maintain a list of
all groups we’ve seen that match the current hash code.

3) When we peak at a spill file, if its next group’s hash code matches
our current hash code, we’ll check to see if that group matches any of the
groups in our list. If so, we’ll combine it with the matching group. If not,
we’ll add this group to our list.

4) Once the head of each spill file exceeds the current hash code, we’ll
emit our list as output, empty it, and advance to the next hash code.

Does that seem reasonable?

Best,
David
Salesforce

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Fast default stuff versus pg_upgrade
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Fast default stuff versus pg_upgrade