Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
От | Tomas Vondra |
---|---|
Тема | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |
Дата | |
Msg-id | 20191110215017.prjisnntwhhppnru@development обсуждение исходный текст |
Ответ на | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
|
Список | pgsql-bugs |
On Sun, Nov 10, 2019 at 10:23:52PM +0100, Tomas Vondra wrote: >On Mon, Nov 11, 2019 at 10:08:58AM +1300, Thomas Munro wrote: >>I think I see what's happening: we're running out of hash bits. >> >>>Buckets: 4194304 (originally 4194304) Batches: 32768 (originally 4096) Memory Usage: 344448kB >> >>Here it's using the lower 22 bits for the bucket number, and started >>out using 12 bits for the batch (!), and increased that until it got >>to 15 (!!). After using 22 bits for the bucket, there are only 10 >>bits left, so all the tuples go into the lower 1024 batches. >> > >Ouch! > >>I'm not sure how exactly this leads to wildly varying numbers of >>repartioning cycles (the above-quoted example did it 3 times, the >>version that crashed into MaxAllocSize did it ~10 times). >> >>Besides switching to 64 bit hashes so that we don't run out of >>information (clearly a good idea), what other options do we have? (1) >>We could disable repartitioning (give up on work_mem) after we've run >>out of bits; this will eat more memory than it should. (2) You could >>start stealing bucket bits; this will eat more CPU than it should, >>because you'd effectively have fewer active buckets (tuples will >>concentrated on the bits you didn't steal). > >Can't we simply compute two hash values, using different seeds - one for >bucket and the other for batch? Of course, that'll be more expensive. Meh, I realized that's pretty much just a different way to get 64-bit hashes (which is what you mentioned). An I think you're right we should detect cases when we use all the bits, and stop making it worse. Stealing bucket bits seems reasonable - we can't stop adding batches, because that would mean we stop enforcing work_mem. The question is how far that gets us - we enforce nbuckets to be at least 1024 (i.e. 10 bits), which leaves 32 bits for nbatch. And in one of the explains we've seen nbatch=2097152 (i.e. 21 bits). Of course, this is bound to be (extremely) slow. The hashjoin changes in 9.5 which reduced th hash table load factor from 10 to 1 resulted in speedups close to 3x. And if you go from 4194304 to 1024 buckets, those will be loooooooong chains in each bucket :-( regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления: