Re: pgsql: Add parallel-aware hash joins.

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: pgsql: Add parallel-aware hash joins.
Дата
Msg-id 18190.1514943510@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: pgsql: Add parallel-aware hash joins.  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: pgsql: Add parallel-aware hash joins.  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-committers
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Sun, Dec 31, 2017 at 1:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Size estimation error"?  Why do you think it's that?  We have exactly
>> the same plan in both cases.

> I mean that ExecChooseHashTableSize() estimates the hash table size like this:
>     inner_rel_bytes = ntuples * tupsize;

> ... but then at execution time, in the Parallel Hash case, we do
> memory accounting not in tuples but in chunks.  The various
> participants pack tuples into 32KB chunks, and they trigger an
> increase in the number of batches when the total size of all chunks
> happens to exceeds the memory budget.  In this case they do so
> unexpectedly due to that extra overhead at execution time that the
> planner didn't account for.  We happened to be close to the threshold,
> in this case between choosing 8 batches and 16 batches, we can get it
> wrong and have to increase nbatch at execution time.

If that's the issue, why doesn't the test fail every time on affected
platforms?  There shouldn't be anything nondeterministic about the
number or size of tuples going into the hash table?

> ... You get a
> larger size if more workers manage to load at least one tuple, due to
> their final partially filled chunk.

Hm.  That could do it, except it doesn't really account for the observed
result that slower single-processor machines seem more prone to the
bug.  Surely they should be less likely to get multiple workers activated.

BTW, I'm seeing a few things that look bug-like about
ExecParallelHashTuplePrealloc, for instance why does it use just
"size" to decide if space_allowed is exceeded but if not then add the
typically-much-larger value "want + HASH_CHUNK_HEADER_SIZE" to
estimated_size.  That clearly can allow estimated_size to get
significantly past space_allowed --- if it's not a bug, it at least
deserves a comment explaining why not.  Another angle, which does not
apply to this test case but seems like a bug for real usage, is that
ExecParallelHashTuplePrealloc doesn't account correctly for tuples wider
than HASH_CHUNK_THRESHOLD.

I'm also wondering why the non-parallel path seems to prefer to allocate
in units of HASH_CHUNK_SIZE + HASH_CHUNK_HEADER_SIZE while the parallel
path targets allocations of exactly HASH_CHUNK_SIZE, and why there's such
inconsistency in whether tuples of exactly HASH_CHUNK_THRESHOLD bytes
are treated as "big" or not.

            regards, tom lane


В списке pgsql-committers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: pgsql: Add parallel-aware hash joins.
Следующее
От: Alvaro Herrera
Дата:
Сообщение: pgsql: Fix deadlock hazard in CREATE INDEX CONCURRENTLY