Re: Memory-Bounded Hash Aggregation

Поиск

Список

Период

Сортировка

От	Taylor Vesely
Тема	Re: Memory-Bounded Hash Aggregation
Дата	28 августа 2019 г. 22:52:13
Msg-id	CAFaX_4Ls5UGQ2UDYkWhn2xFq=V6BKpUbKSQnWmy27VWhQ2=enA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Memory-Bounded Hash Aggregation (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы	Re: Memory-Bounded Hash Aggregation (Jeff Davis <pgsql@j-davis.com>) Re: Memory-Bounded Hash Aggregation (Jeff Davis <pgsql@j-davis.com>)
Список	pgsql-hackers

Дерево обсуждения

I started to review this patch yesterday with Melanie Plageman, so we
rebased this patch over the current master. The main conflicts were
due to a simplehash patch that has been committed separately[1]. I've
attached the rebased patch.

I was playing with the code, and if one of the table's most common
values isn't placed into the initial hash table it spills a whole lot
of tuples to disk that might have been avoided if we had some way to
'seed' the hash table with MCVs from the statistics. Seems to me that
you would need some way of dealing with values that are in the MCV
list, but ultimately don't show up in the scan. I imagine that this
kind of optimization would most useful for aggregates on a full table
scan.

Some questions:

Right now the patch always initializes 32 spill partitions. Have you given
any thought into how to intelligently pick an optimal number of
partitions yet?

> That can be done as an add-on to approach #1 by evicting the entire
> Hash table (writing out the partial states), then resetting the memory
> Context.

By add-on approach, do you mean to say that you have something in mind
to combine the two strategies? Or do you mean that it could be implemented

as a separate strategy?

> I think it's clear there's no perfect eviction strategy - for every
> algorithm we came up with we can construct a data set on which it
> performs terribly (I'm sure we could do that for the approach used by
> Greenplum, for example).
>
> So I think it makes sense to do what Jeff proposed, and then maybe try
> improving that in the future with a switch to different eviction
> strategy based on some heuristics.

I agree. It definitely feels like both spilling strategies have their
own use case.

That said, I think it's worth mentioning that with parallel aggregates
it might actually be more useful to spill the trans values instead,
and have them combined in a Gather or Finalize stage.

[1] https://www.postgresql.org/message-id/flat/48abe675e1330f0c264ab2fe0d4ff23eb244f9ef.camel%40j-davis.com

Вложения

v1-0001-Rebased-memory-bounded-hash-aggregation.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Andres Freund
Дата: 28 августа 2019 г., 22:49:05
Сообщение: Re: RFC: seccomp-bpf support

Следующее

От: Peter Eisentraut
Дата: 28 августа 2019 г., 23:07:56
Сообщение: Re: RFC: seccomp-bpf support

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Memory-Bounded Hash Aggregation

Вложения

Предыдущее

Следующее