Re: Spilling hashed SetOps and aggregates to disk

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Spilling hashed SetOps and aggregates to disk
Дата
Msg-id 1528918943.8818.86.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Spilling hashed SetOps and aggregates to disk  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, 2018-06-13 at 10:08 -0400, Robert Haas wrote:
> I wasn't using the term "average" in a mathematical sense.  I suppose
> I really meant "typical".  I agree that thinking about cases where
> the
> group size is nonuniform is a good idea, but I don't think I agree
> that all uniform distributions are created equal.  Uniform
> distributions with 1 row per group are very different than uniform
> distributions with 1000 rows per group.

The only mechanism we have for dealing efficiently with skewed groups
is hashing; no alternative has been proposed.

I think what you are saying is that the case of medium-large groups
with clustered input are kind of like skewed groups because they have
enough locality to benefit from grouping before spilling. I can see
that.

So how do we handle both of these cases (skewed groups and clustered
groups) well?

I tend toward a simple approach, which is to just put whatever we can
in the hash table. Once it's full, if the hit rate on the hash table
(the whole table) drops below a certain threshold, we just dump all the
transition states out to disk and empty the hash table.

That would handle both skewed groups and clustering fairly well.
Clustering would cause the hit rate to rapidly go to zero when we are
past the current batch of groups, causing fairly quick switching to new
groups which should handle Tomas's case[1]. And it would also handle
ordinary skewed groups fairly well -- it may write them out sometimes
(depending on how smart our eviction strategy is), but the cost of that
is not necessarily bad because it will go back into the hash table
quickly.

It also offers an incremental implementation strategy: something
resembling my patch could be first (which doesn't dump transition
states at all), followed by something that can dump transition states,
followed by some tweaking to make it smarter.

That still leaves the question about what to do with the small groups:
partition them (like my patch does) or feed them into a sort and group
them?

Regards,
    Jeff Davis

[1] https://www.postgresql.org/message-id/46734151-3245-54ac-76fc-17742
fb0e6d9%402ndquadrant.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] Optional message to user when terminating/cancellingbackend
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors