Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
Дата
Msg-id 12511.1517008946@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I suspect you're right the hash is biased to lohalf bits, as you wrote
> in the 19/12 message.

I don't see any bias in what it's doing, which is basically xoring the
two halves and hashing the result.  It's possible though that Todd's
data set contains values in which corresponding bits of the high and
low halves are correlated somehow, in which case the xor would produce
a lot of cancellation and a relatively small number of distinct outputs.

If we weren't bound by backwards compatibility, we could consider changing
to logic more like "if the value is within the int4 range, apply int4hash,
otherwise hash all 8 bytes normally".  But I don't see how we can change
that now that hash indexes are first-class citizens.

In any case, we still need a fix for the behavior that the hash table size
is blown out by lots of collisions, because that can happen no matter what
the hash function is.  Andres seems to have dropped the ball on doing
something about that.

            regards, tom lane


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: pg_hba_file_rules: "scram-sha256" instead of "scram-sha-256"