Re: Hash tables in dynamic shared memory

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Hash tables in dynamic shared memory
Дата
Msg-id 20161004222209.tx3aiee2h4ai4lfv@alap3.anarazel.de
обсуждение исходный текст
Ответ на Hash tables in dynamic shared memory  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: Hash tables in dynamic shared memory  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
Hi,

> It's impossible to write a general purpose hash table that will be
> suitable for every use case, considering all the combinations of
> design trade offs and subsets of functionality someone might want.

Very much agreed.


> There is other work being done in this space:  I'm aware of Andres's
> current thread[1] on Robin Hood-style closed hashing tables with
> macro-ised size, hash and comparator operations à la C++ container
> templates, and I'm following along with interest.

> He vigorously
> rejects chaining hash tables as the natural enemies of memory
> hardware.

I'd not go quite that far. There are good arguments for open addressing,
and there's good arguments for open chaining. The latter is a lot more
convincing if you want concurrent access with partition based locking,
for example...


> I guess Andres's simplehash should already work in DSA memory nicely
> anyway since it doesn't have any internal pointers IIUC (?).

The data access doesn't have pointers, but the "metadata" does have a
pointer to the data. But that's a very solvable problem.  But
incremental resizing and granular locking aren't going to happen for it,
so it'd really only be suitable for presenting a read-only (or possibly
read-mostly) view.


> Potential use cases for DHT include caches, in-memory database objects
> and working state for parallel execution.

Is there a more concrete example, i.e. a user we'd convert to this at
the same time as introducing this hashtable?


> There are a couple of things I'm not happy about in this version of
> DHT: the space overhead per item is too high (hash + wasted padding +
> next pointer), and the iterator system is overly complicated.  I have
> another version in development which gets rid of the hash and padding
> and sometimes gets rid of the next pointer to achieve zero overhead,
> and I'm working on a simpler iteration algorithm that keeps the
> current guarantees but doesn't need to reorder bucket chains.  More on
> that, with some notes on performance and a testing module, soon.

FWIW, my experimentation showed that getting rid of the hash itself is
quite often *NOT* a worthwhile tradeof. Comparing keys and recomputing
hashes is often quite expensive (e.g. for GROUP BY it's where the
majority of time is spent after the simplehash patch).

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gilles Darold
Дата:
Сообщение: Re: proposal: psql \setfileref
Следующее
От: Andres Freund
Дата:
Сообщение: Idea: Tell compiler palloc() et al return aligned values