Re: [HACKERS] Hash Functions

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] Hash Functions
Дата
Msg-id CA+TgmoZSTkD8ZazeXefmHFMKNG8U8sap-DbKkwVM+Bw223mkVQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Hash Functions  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Hash Functions  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, Aug 3, 2017 at 6:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> That seems pretty lame, although it's sufficient to solve the
> immediate problem, and I have to admit to a certain predilection for
> things that solve the immediate problem without creating lots of
> additional work.

After some further thought, I propose the following approach to the
issues raised on this thread:

1. Allow hash functions to have a second, optional support function,
similar to what we did for btree opclasses in
c6e3ac11b60ac4a8942ab964252d51c1c0bd8845.  The second function will
have a signature of (opclass_datatype, int64) and should return int64.
The int64 argument is a salt.  When the salt is 0, the low 32 bits of
the return value should match what the existing hash support function
returns.  Otherwise, the salt should be used to perturb the hash
calculation.  This design kills two birds with one stone: it gives
callers a way to get 64-bit hash values if they want them (which
should make Tom happy, and we could later think about plugging it into
hash indexes) and it gives us a way of turning a single hash function
into many (which should allow us to prevent hash indexes or hash
tables built on a hash-partitioned table from having a heavily
lopsided distribution, and probably will also make people who are
interested in topics like Bloom filters happy).

2. Introduce a new hash opfamilies here which are more faster, more
portable, and/or better in other ways than the ones we have today.
Given our current rather simplistic notion of a "default" opclass,
there doesn't seem to be an easy to make whatever we introduce here
the default for hash partitioning while keeping the existing default
for other purposes.  That should probably be fixed at some point.
However, given the amount of debate this topic has generated, it also
doesn't seem likely that we'd actually wish to decide on a different
default in the v11 release cycle, so I don't think there's any rush to
figure out exactly how we want to fix it.  Focusing on introducing the
new opfamilies at all is probably a better use of time, IMHO.

Unless anybody strongly objects, I'm going to write a patch for #1 (or
convince somebody else to do it) and leave #2 for someone else to
tackle if they wish.  In addition, I'll tackle (or convince someone
else to tackle) the project of adding that second optional support
function to every hash opclass in the core repository.  Then Amul can
update the core hash partitioning patch to use the new infrastructure
when it's available and fall back to the existing method when it's
not, and I think we'll be in reasonably good shape.

Objections to this plan of attack?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Refactoring identifier checks to consistently use strcmp