Re: Comment fix and question about dshash.c

Поиск
Список
Период
Сортировка
От Antonin Houska
Тема Re: Comment fix and question about dshash.c
Дата
Msg-id 15639.1540639321@localhost
обсуждение исходный текст
Ответ на Re: Comment fix and question about dshash.c  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: Comment fix and question about dshash.c
Список pgsql-hackers
Thomas Munro <thomas.munro@enterprisedb.com> wrote:

> On Sat, Oct 27, 2018 at 12:30 AM Antonin Houska <ah@cybertec.at> wrote:
>
> Thanks, will fix.
>
> > 2. Can anyone please explain this macro?
> >
> > /* Max entries before we need to grow.  Half + quarter = 75% load factor. */
> > #define MAX_COUNT_PER_PARTITION(hash_table)                             \
> >         (BUCKETS_PER_PARTITION(hash_table->size_log2) / 2 + \
> >          BUCKETS_PER_PARTITION(hash_table->size_log2) / 4)
> >
> > I'm failing to understand why the maximum number of hash table entries in a
> > partition should be smaller than the number of buckets in that partition.
> >
> > The fact that MAX_COUNT_PER_PARTITION refers to entries and not buckets is
> > obvious from this condition in dshash_find_or_insert()
> >
> >         /* Check if we are getting too full. */
> >         if (partition->count > MAX_COUNT_PER_PARTITION(hash_table))
>
> Are you saying there is a bug in this logic (which is nbuckets * 0.75
> expressed as integer maths), or saying that 0.75 is not a good maximum
> load factor?  I looked around at a couple of general purpose hash
> tables and saw that some used 0.75 and some used 1.0, as a wasted
> space-vs-collision trade-off.  If I have my maths right[1], with 0.75
> you expect to have 75 entries in ~53 buckets, but with 1.0 you expect
> to have 100 entries in ~64 buckets.

I don't know how exactly you apply the [1] formula (what is "n" and what is
"N" in your case?), but my consideration was much simpler: For example, if
BUCKETS_PER_PARTITION returns 8 (power of 2 is expected here and also more
convenient), then MAX_COUNT_PER_PARTITION returns 8 / 2 + 8 / 4 = 6. Thus the
hashtable gets resized if we're going to add the 7th entry to the partition,
i.e. we the number of entries in the partition is lower than the number of
buckets. Is that o.k.?

> [1] https://math.stackexchange.com/questions/470662/average-number-of-bins-occupied-when-throwing-n-balls-into-n-bins

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Remove obsolete pg_attrdef.adsrc column
Следующее
От: Antonin Houska
Дата:
Сообщение: Re: Comment fix and question about dshash.c