Re: Do PostgreSQL have map and set structure(like STL in C++)?

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Do PostgreSQL have map and set structure(like STL in C++)?
Дата
Msg-id CA+hUKGKhSatxWPGuSogcVxnhSrPEQm69Ps6+fr-hB53sg60j3Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Do PostgreSQL have map and set structure(like STL in C++)?  (Andrey Borodin <x4mmm@yandex-team.ru>)
Список pgsql-hackers
On Mon, Apr 22, 2019 at 5:22 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > 21 апр. 2019 г., в 17:32, 梦旅人 <liubaozhu1258@qq.com> написал(а):
> >    I want to add a feature in PostgreSQL, and I need use map structure and set structure(like STL in C++). Do
PostgreSQLhave realized these structures? Where can I find the functions? 
> >     What I need in the code is just like this:
> >     map<char*, set<char*> >
> >     set<char*>
>
> You can use HTAB at utils/hsearch.h [0]
>
> It is Larson's dynamic hashing, implementation is in backend/utils/hash/dynahash.c
> Mostly like unordered_map. Accordingly, it lacks lower bound functionality as sorted sets do.
>
> Also, you can user RB-tree in lib/rbtree.h [1] It's usual red-black tree.

There is also src/include/lib/simplehash.h.  A bit like C++'s
std::unordered_map and std::unordered_set templates, it is specialised
by type with inlined hash and eq functions (though it's done with
preprocessor tricks).  Unlike those it it uses a variant of Robin Hood
algorithm for collisions, while the C++ standard effectively requires
collision chains.  Like dynahash (hsearch.h), it's both map and set at
the same time: instead of using a pair<key,value> it has a simple
element type and it's your job to decide if the whole thing or just
part of it is the key.  Unlike dynahash, it doesn't have a way to work
in shared memory.

There is also src/include/lib/dshash.h.  It uses collision chains, and
works in DSA shared memory (a strange allocator that can deal with
memory mapped at different addresses in different processes), and has
a lock/partitioning scheme to deal with sharing, but it's not very
featureful.  Unlike dynahash in shared memory mode, it can allocate
more memory as required, and can be created and destroyed any time
(whereas dynahash in shared memory mode is trapped in a fixed sized
region of memory that lives as long as the cluster).

Andrey mentioned Larson's dynamic hashing; that's quite interesting,
and can be seen in our on-disk hash indexes, but dynahash does it for
in-memory hash tables, whereas simplehash and dshash just rebuild the
whole hash table when one unlucky caller inserts the new element that
exceeds the load factor target.  AFAIK it's quite unusual to use the
dynamic expansion trick for an in-memory-only hash table (?).

So yeah, that's three general purpose hash table/set implementations.

I wouldn't mind if we had some more generic container and algorithm
code like simplehash.h in the tree; it's easy to show performance
benefits of inlining in microbenchmarks, and it's nicer to program
without explicitly dealing in void pointers all over the place.  I
suppose it would be theoretically possible to make some of these
design choices into policies you could select when instantiating,
rather than having separate library components.  Another thing that is
on the radar for future development is concurrent lock-free extensible
hash tables as seen in some other projects.

--
Thomas Munro
https://enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Pluggable Storage - Andres's take
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Improve search for missing parent downlinks in amcheck