Re: Use simplehash.h instead of dynahash in SMgr

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Use simplehash.h instead of dynahash in SMgr
Дата
Msg-id CAApHDvq+bv13-M5JZN5Swfmy5gVWwCXZw-6hP=R6Hi1RitK0hA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Use simplehash.h instead of dynahash in SMgr  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Use simplehash.h instead of dynahash in SMgr  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
On Tue, 22 Jun 2021 at 03:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I kind of wonder if we really need four different hash table
> implementations (this being the third "generic" one, plus hash join
> has its own, and I may have forgotten others).  Should we instead
> think about revising simplehash to gain the benefits of this patch?

hmm, yeah. I definitely agree with trying to have as much reusable
code as we can when we can. It certainly reduces maintenance and bugs
tend to be found more quickly too. It's a very worthy cause.

I did happen to think of this when I was copying swathes of code out
of simplehash.h.  However, I decided that the two implementations are
sufficiently different that if I tried to merge them both into one .h
file, we'd have some unreadable and unmaintainable mess.  I just don't
think their DNA is compatible enough for the two to be mated
successfully.  For example, simplehash uses open addressing and
generichash does not.  This means that things like iterating over the
table works completely differently.  Lookups in generichash need to
perform an extra step to fetch the actual data from the segment
arrays.  I think it would certainly be possible to merge the two, but
I just don't think it would be easy code to work on if we did that.

The good thing is that that the API between the two is very similar
and it's quite easy to swap one for the other.  I did make changes
around memory allocation due to me being too cheap to zero memory when
I didn't need to and simplehash not having any means of allocating
memory without zeroing it.

I also think that there's just no one-size-fits-all hash table type.
simplehash will not perform well when the size of the stored element
is very large. There's simply too much memcpying to move data around
during insert/delete.  simplehash will also have terrible iteration
performance in sparsely populated tables.  However, simplehash will be
pretty much unbeatable for lookups where the element type is very
small, e.g single Datum, or an int.  The CPU cache efficiency there
will be pretty much unbeatable.

I tried to document the advantages of each in the file header
comments.  I should probably also add something to simplehash.h's
comments to mention generichash.h

David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Use simplehash.h instead of dynahash in SMgr
Следующее
От: Japin Li
Дата:
Сообщение: Re: Fix for segfault in logical replication on master