Re: Spilling hashed SetOps and aggregates to disk

Поиск

Список

Период

Сортировка

От	David Gershuni
Тема	Re: Spilling hashed SetOps and aggregates to disk
Дата	21 июня 2018 г. 23:44:56
Msg-id	99EFF86A-85AA-41AE-9414-DBEAD4EAC06F@cs.cmu.edu обсуждение исходный текст
Ответ на	Re: Spilling hashed SetOps and aggregates to disk (Jeff Davis <pgsql@j-davis.com>)
Ответы	Re: Spilling hashed SetOps and aggregates to disk
Список	pgsql-hackers

Дерево обсуждения

On Jun 21, 2018, at 1:04 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Thu, 2018-06-21 at 11:04 -0700, David Gershuni wrote:
This approach seems functionally correct, but I don't like the idea
of
transforming O(N) tuples of disk I/O into O(S*N) tuples of disk I/O
(in the worst case).

It's the same amount of I/O as the idea you suggested as putting the
hash tables into separate phases, but it may consume more disk space at
once in the worst case.

We can mitigate that if it becomes a problem later.

That’s true. Both fall-back approaches would suffer from high I/O.

However, the HashSort-based would offer much better performance,

assuming the sorting issue is overcome.

b) is accomplished by not sorting groups by their actual grouping
keys, but
instead sorting by their hash values. This works because we don't
need a
true sort. We just need to process identical groups in a consistent
order
during the merge step. As long as we're careful about collisions
during the
merge, this should work.

Can you expand on how we should handle collisions? If all we can do is
hash and compare equality, then it seems complex to draw group
boundaries.

In the HashSort algorithm, the sort order is only used as a tool in the

merge step. In the merge step, we will process one grouping set

at a time.

By sorting before merging, we ensure that we’ll encounter

all partial aggregations of a group at once by looking at the head of each

spill file. This allows us to maintain a ‘current group’ and combine all

instances of it from each spill file. Then when no more instances of that

group are present, we emit the group as output. So the order can be

anything, as long as it’s consistent.

To handle hash collisions, we can do the following:

1) We track the current hash code we’re processing, in ascending

order.

2) Instead of combining one group at at time, we’ll maintain a list of

all groups we’ve seen that match the current hash code.

3) When we peak at a spill file, if its next group’s hash code matches

our current hash code, we’ll check to see if that group matches any of the

groups in our list. If so, we’ll combine it with the matching group. If not,

we’ll add this group to our list.

4) Once the head of each spill file exceeds the current hash code, we’ll

emit our list as output, empty it, and advance to the next hash code.

Does that seem reasonable?

Best,

David

Salesforce

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Spilling hashed SetOps and aggregates to disk