Re: Do we want a hashset type?

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Re: Do we want a hashset type?
Дата
Msg-id 6b8c54d8-b823-479a-9c93-d0d955b7d7cb@app.fastmail.com
обсуждение исходный текст
Ответ на Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Ответы Re: Do we want a hashset type?  (Andrew Dunstan <andrew@dunslane.net>)
Re: Do we want a hashset type?  (jian he <jian.universality@gmail.com>)
Список pgsql-hackers
On Fri, Jun 16, 2023, at 17:42, Joel Jacobson wrote:
> I realise int4hashset_hash() is broken,
> since two int4hashset's that are considered equal,
> can by coincidence get different hashes:
...
> Do we have any ideas on how to fix this without sacrificing performance?

The problem was due to hashset_hash() function accumulating the hashes
of individual elements in a non-commutative manner. As a consequence, the
final hash value was sensitive to the order in which elements were inserted
into the hashset. This behavior led to inconsistencies, as logically
equivalent sets (i.e., sets with the same elements but in different orders)
produced different hash values.

Solved by modifying the hashset_hash() function to use a commutative operation
when combining the hashes of individual elements. This change ensures that the
final hash value is independent of the element insertion order, and logically
equivalent sets produce the same hash.

An somewhat unfortunate side-effect of this fix, is that we can no longer
visually sort the hashset output format, since it's not lexicographically sorted.
I think this is an acceptable trade-off for a hashset type,
since the only alternative I see would be to sort the elements,
but then it wouldn't be a hashset, but a treeset, which different
Big-O complexity.

New patch is attached, which will henceforth always be a complete patch,
to avoid the hassle of having to assemble incremental patches.

/Joel
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Quan Zongliang
Дата:
Сообщение: Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: RFC: logical publication via inheritance root?