Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Дата
Msg-id 20180126234537.dyu6jwnqz7snpq7n@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Hi,

On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote:
> 1) randomized hashing, i.e. using the extended hash functions and a
> random seed

Given the lack of extended hash functions for everything this doesn't
seem quite viable. But I think we can make it work quite easily
nonetheless - we can just do the combining with a random seed outside of
the hashtable. We kinda already do so (with a really crappy combiner) in
TupleHashTableHash(), except that we'd actually have to use a random
seed.

This seems like something we should do, just out of defensiveness. It'll
not be as good as persisting the hashfunction's state across calls, but
well, ...

Let me write a patch for that.


> 2) universal hashing [1], i.e. passing the hash through ((a*h+b) mod p)
> formula, where h is "our" hash, and (a,b) are random and p is a prime.
> This can't possibly fix the collisions reported by Todd, because the
> result will remain the same. But it should fix "my" chains of values by
> spreading them a bit (and the a,b,p values are generated on each grow,
> so in the unlikely case where we get a chain, it should disappear after
> a growth)

This seems on the too expensive side of things for my taste to do
unconditionally.


> 3) disabling growth of the hash table when the fillfactor would drop too
> much (e.g. below 0.2)

Yea, I think we should just apply something like this.


> FWIW I do agree the data sets shared in this thread are pretty extreme
> and it doesn't make much sense to slow the regular cases. I'll be
> perfectly happy if we stop the OOM, making those cases fast is a bonus.

Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10,
and so something better in 11. And then later possibly backpatch if
we've grown some confidence?

Greetings,

Andres Freund


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Следующее
От: Guo Xiang Tan
Дата:
Сообщение: Re: BUG #15032: Segmentation fault when running a particular query