Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Дата
Msg-id CA+hUKG+ZpNdjWxTXfdKz-cAZG3iuK9HV_woNn5HTUoiXErvQrw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (Andres Freund <andres@anarazel.de>)
Ответы Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-bugs
On Thu, Nov 14, 2019 at 7:48 AM Andres Freund <andres@anarazel.de> wrote:
> I don't really see what the problem is with using just a differently
> mixed hash. There's not really a requirement for there to be a
> tremendous amount of independence between the bits used for the bucket,
> and the bits for the batch. We cannot just reuse exactly the same bits
> for the bucket that we used for the hash, as otherwise we'd just raise
> the number of bucket conflicts (as the whole hashtable will only contain
> tuples with the reduced number of bits). But I don't see why that
> implies a need for full bit indepence.  As long as the mixing guarantees
> that we use the bits for bucketing differently than the ones for
> batches, I think we're good.  All that need is that each batch is is
> roughly equally sized, and contains tuples with hashes that roughly
> equally distributed - you could bitreverse the hash for batch selection,
> and get that, no?

Right, that's what I said earlier:

>     batchno = reverse_bits(hashvalue) & (nbatch - 1);
>     bucketno = hashvalue & (nbuckets - 1);

My theory is that batchno = hash_combine(<constant>, hashvalue) &
(nbatch - 1) has about the same distribution properties.  That is, it
works nicely up until you have more than about 2^32 keys, and then it
begins to break down due to lack of information, but the break down
manifests as more conflicts in buckets, instead of the failure to
repartition once you start reading off the end of the hash value (the
cause of the present bug report).  The bit reverse approach might be a
bit less mysterious, but when I looked at ways to implement that they
looked potentially slower than the xor, shifts and adds used by
hash_combine.  I didn't attempt to test that, though.



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [BUG] Uninitializaed configOut.leafType used.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Unexpected "cache lookup failed for collation 0" failure