Re: BUG #15821: Parallel Workers with functions and auto_explain: ERROR: could not find key 3 in shm TOC

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #15821: Parallel Workers with functions and auto_explain: ERROR: could not find key 3 in shm TOC
Дата
Msg-id 17609.1559589661@sss.pgh.pa.us
обсуждение исходный текст
Ответ на BUG #15821: Parallel Workers with functions and auto_explain: ERROR: could not find key 3 in shm TOC  (PG Bug reporting form <noreply@postgresql.org>)
Список pgsql-bugs
PG Bug reporting form <noreply@postgresql.org> writes:
> We have enabled auto_explain and see errors on PostgreSQL 11.3 when
> SELECTing from a user defined function. No such crashes have been
> observed on 10.7.

I think that you didn't give a complete dump of relevant settings,
but after some fooling around I was able to reproduce this error,
and the cause is this: auto_explain hasn't a single clue about
parallel query.

1. In the parent process, we have a parallelizable hash join being
executed in a statement inside a function.  Since
auto_explain.log_nested_statements is not enabled, auto_explain
does not deem that it should trace the statement, so the query
starts up with estate->es_instrument = 0, and therefore
ExecHashInitializeDSM chooses not to create any shared
SharedHashInfo area.

2. In the worker processes, auto_explain manages to grab execution
control when ParallelQueryMain calls ExecutorStart, thanks to being
in ExecutorStart_hook.  Having no clue what's going on, it decides
that this is a new top-level query that it should trace, and it
sets some bits in queryDesc->instrument_options.

3. When the workers get to ExecHashInitializeWorker, they see that
instrumentation is active so they try to look up the SharedHashInfo.
Kaboom.

I'm inclined to think that explain_ExecutorStart should simply
keep its hands off of everything when in a parallel worker;
if instrumentation is required, that'll be indicated by options
passed down from the parent process.  It looks like this could
conveniently be merged with the rate-sampling logic by forcing
current_query_sampled to false when IsParallelWorker().

Likely this should be back-patched all the way to 9.6.  I'm
not sure how we managed to avoid noticing it before now,
but there are probably ways to cause visible trouble in
any release that has any parallel query support.

            regards, tom lane



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Ahmed MARFOUK
Дата:
Сообщение: Re: ste application name for psql command line query
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #15828: Server crashes inside CloneRowTriggersToPartition