Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
От | Thomas Munro |
---|---|
Тема | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |
Дата | |
Msg-id | CA+hUKG+=uGdwcwjd6R=C62ZZHjiz-m_H_COcTEAM0Y8wTpCvig@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Список | pgsql-bugs |
On Mon, Nov 11, 2019 at 11:46 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Essentially, the patch does this: > > batchno = hash_combine(0x9e3779b9, hashvalue) & (nbatch - 1); > bucketno = hashvalue & (nbuckets - 1); > > but adding a constant does not 'propagate' any bits from the original > value to the hash. So it seems pretty much equivalent to > > batchno = hashvalue & (nbatch - 1); > bucketno = hashvalue & (nbuckets - 1); > > so we still operate with the same number of bits. It might solve this > particular case, but I doubt it's a good idea in general ... Yeah, I think you're generally right, but it's much better than the specific coding batchno = hashvalue & (nbatch - 1). With that recipe, batch #42 finishes up with all its tuples crammed into bucket #42. That's the worst case of the steal-the-bucket-bits idea mentioned earlier. After sleeping on it, I think the re-hash idea is equivalent to stealing the minimum number of bucket bits. A simple but subtly broken version of that would be: take bucket number from the low end, and batch number from the high end, and don't worry if they begin to overlap because it's the best we can do: batchno = (hashvalue >> (hash_width - nbatch_width)) & (nbatch - 1); bucketno = hashvalue & (nbuckets - 1); Tuples start concentrating in a subset of buckets when the bit ranges begin to overlap, due to a correlation between batch and bucket, but that seems to be true for the re-hash idea too. You can't actually use that exact coding, though, because of the way we increase nbatch in serial hash joins: we're only allowed to add new bits to the high end of the generated batch numbers so that every tuple either stays in the current batch or is "thrown forward" to a higher-numbered batch when nbatch increases. So instead you could write it as: batchno = reverse_bits(hashvalue) & (nbatch - 1); bucketno = hashvalue & (nbuckets - 1); I'm not sure if reversing bits can be done any more efficiently than the cheap-and-cheerful hash_combine() code, though. With the right inlining it seems like nice straight-line no-jmp no-lookup-table code, so maybe it's still worth considering if it has equivalent performance for our purposes. A tangential observation is that with any of these approaches you can probably lift the current restriction on increasing nbucket when nbatch > 1 (that prohibition is based on not wanting overlapping bucket and batch bits ranges, but if we're deciding that's OK in extreme cases and taking steps to minimise it in usual cases...) > I think we will have to actually compute two hashes, to get more than 32 > bits. But yes, that'll be more invasive, and unlikely backpatchable. Agreed. I think if you started with a different IV *inside the hash function*, in other words, if you effectively had two different hash functions (which is the way Grace is usually described) so that the two hash function each get to see (potentially) more than 32 bits' worth of input, it'd be about the same as having a single 64 bit hash and using non-overlapping bit ranges for batch and bucket. Nothing like that seems plausible for back branches.
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: BUG #16106: Patch - Radius secrets always gets lowercased