Re: Avoiding hash join batch explosions with extreme skew and weirdstats
От | Tomas Vondra |
---|---|
Тема | Re: Avoiding hash join batch explosions with extreme skew and weirdstats |
Дата | |
Msg-id | 20190516163451.r7crbjgsssn6eew6@development обсуждение исходный текст |
Ответ на | Avoiding hash join batch explosions with extreme skew and weird stats (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: Avoiding hash join batch explosions with extreme skew and weird stats
|
Список | pgsql-hackers |
On Thu, May 16, 2019 at 01:22:31PM +1200, Thomas Munro wrote: > ... > >Here's a quick hack to show that a 95% cut-off fixes those examples. >I don't really know how to choose the number, but I suspect it should >be much closer to 100 than 50. I think this is the easiest of three >fundamental problems that need to be solved in this area. The others >are: accounting for per-partition overheads as Tomas pointed out, and >providing an actual fallback strategy that respects work_mem when >extreme skew is detected OR per-partition overheads dominate. I plan >to experiment with nested loop hash join (or whatever you want to call >it: the thing where you join every arbitrary fragment of the hash >table against the outer batch, and somehow deal with outer match >flags) when time permits. > I think this is a step in the right direction, but as I said on the other thread(s), I think we should not disable growth forever and recheck once in a while. Otherwise we'll end up in sad situation with non-uniform data sets, as poined out by Hubert Zhang in [1]. It's probably even truer with this less strict logic, using 95% as a threshold (instead of 100%). I kinda like the idea with increasing the spaceAllowed value. Essentially, if we decide adding batches would be pointless, increasing the memory budget is the only thing we can do anyway. The problem however is that we only really look at a single bit - it may be that doubling the batches would not help, but doing it twice would actually reduce the memory usage. For example, assume there are 2 distinct values in the batch, with hash values (in binary) 101010000 101010111 and assume we currently. Clearly, splitting batches is going to do nothing until we get to the 000 vs. 111 parts. At first I thought this is rather unlikely and we can ignore that, but I'm not really sure about that - it may actually be pretty likely. We may get to 101010 bucket with sufficiently large data set, and then it's ~50% probability the next bit is the same (assuming two distinct values). So this may be quite an issue, I think. regards [1] https://www.postgresql.org/message-id/CAB0yrekv%3D6_T_eUe2kOEvWUMwufcvfd15SFmCABtYFOkxCFdfA%40mail.gmail.com -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: