Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Дата
Msg-id 20191113184759.gacxzbxk523vupfc@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-bugs
Hi,

On 2019-11-11 11:46:05 +0100, Tomas Vondra wrote:
> On Mon, Nov 11, 2019 at 01:33:41PM +1300, Thomas Munro wrote:
> Hmmm, iteresting. Using a hard-coded constant seems a bit weird,
> although I don't have any precise argument why it's wrong yet.
> 
> Essentially, the patch does this:
> 
>   batchno = hash_combine(0x9e3779b9, hashvalue) & (nbatch - 1);
>   bucketno = hashvalue & (nbuckets - 1);
> 
> but adding a constant does not 'propagate' any bits from the original
> value to the hash. So it seems pretty much equivalent to

I was over IM arguing that we ought to also use a different mixing
functions when computing the hash.


>   batchno = hashvalue & (nbatch - 1);
>   bucketno = hashvalue & (nbuckets - 1);

> so we still operate with the same number of bits. It might solve this
> particular case, but I doubt it's a good idea in general ...
> 
> I think we will have to actually compute two hashes, to get more than 32
> bits. But yes, that'll be more invasive, and unlikely backpatchable.

I think it might also just generally not be acceptable, from a
performance POV. Computing the hashes is already a major bottleneck for
HJ. Penalizing everybody for these extreme cases doesn't strike me as
great. It might be ok to compute a 64bit hash, but I don't see how
computing two hashes in parallel wouldn't have significant overhead.


I don't really see what the problem is with using just a differently
mixed hash. There's not really a requirement for there to be a
tremendous amount of independence between the bits used for the bucket,
and the bits for the batch. We cannot just reuse exactly the same bits
for the bucket that we used for the hash, as otherwise we'd just raise
the number of bucket conflicts (as the whole hashtable will only contain
tuples with the reduced number of bits). But I don't see why that
implies a need for full bit indepence.  As long as the mixing guarantees
that we use the bits for bucketing differently than the ones for
batches, I think we're good.  All that need is that each batch is is
roughly equally sized, and contains tuples with hashes that roughly
equally distributed - you could bitreverse the hash for batch selection,
and get that, no?


Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: [BUG] Uninitializaed configOut.leafType used.
Следующее
От: Mukesh Chhatani
Дата:
Сообщение: Re: BUG #16109: Postgres planning time is high across version - 10.6vs 10.10