Re: pgsql: Add parallel-aware hash joins.
От | Tom Lane |
---|---|
Тема | Re: pgsql: Add parallel-aware hash joins. |
Дата | |
Msg-id | 4001.1514678419@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: pgsql: Add parallel-aware hash joins. (Thomas Munro <thomas.munro@enterprisedb.com>) |
Ответы |
Re: pgsql: Add parallel-aware hash joins.
(Thomas Munro <thomas.munro@enterprisedb.com>)
|
Список | pgsql-committers |
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Sun, Dec 31, 2017 at 11:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... This isn't quite 100% reproducible on gaur/pademelon, >> but it fails more often than not seems like, so I can poke into it >> if you can say what info would be helpful. > Right. That's apparently unrelated and is the last build-farm issue > on my list (so far). I had noticed that certain BF animals are prone > to that particular failure, and they mostly have architectures that I > don't have so a few things are probably just differently sized. At > first I thought I'd tweak the tests so that the parameters were always > stable, and I got as far as installing Debian on qemu-system-ppc (it > took a looong time to compile PostgreSQL), but that seems a bit cheap > and flimsy... better to fix the size estimation error. "Size estimation error"? Why do you think it's that? We have exactly the same plan in both cases. My guess is that what's happening is that one worker or the other ends up processing the whole scan, or the vast majority of it, so that that worker's hash table has to hold substantially more than half of the tuples and thereby is forced to up the number of batches. I don't see how you can expect to estimate that situation exactly; or if you do, you'll be pessimizing the plan for cases where the split is more nearly equal. By this theory, the reason why certain BF members are more prone to the failure is that they're single-processor machines, and perhaps have kernels with relatively long scheduling quanta, so that it's more likely that the worker that gets scheduled first is able to read the whole input to the hash step. > I assume that what happens here is the planner's size estimation code > sometimes disagrees with Parallel Hash's chunk-based memory > accounting, even though in this case we had perfect tuple count and > tuple size information. In an earlier version of the patch set I > refactored the planner to be chunk-aware (even for parallel-oblivious > hash join), but later in the process I tried to simplify and shrink > the patch set and avoid making unnecessary changes to non-Parallel > Hash code paths. I think I'll need to make the planner aware of the > maximum amount of fragmentation possible when parallel-aware > (something like: up to one tuple's worth at the end of each chunk, and > up to one whole wasted chunk per participating backend). More soon. I'm really dubious that trying to model the executor's space consumption exactly is a good idea, even if it did fix this specific problem. That would expend extra planner cycles and pose a continuing maintenance gotcha. regards, tom lane
В списке pgsql-committers по дате отправления:
Следующее
От: Tom LaneДата:
Сообщение: pgsql: Merge coding of return/exit/continue cases in plpgsql's loopsta