Re: [parallel query] random server crash while running tpc-h query on power2

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [parallel query] random server crash while running tpc-h query on power2
Дата
Msg-id CAA4eK1LNxOnsuRS1my0PpbW=mqKu5wWXEfxg6xn3_CMWghOckw@mail.gmail.com
обсуждение исходный текст
Ответ на [parallel query] random server crash while running tpc-h query on power2  (Rushabh Lathia <rushabh.lathia@gmail.com>)
Ответы Re: [parallel query] random server crash while running tpc-h query on power2  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Sat, Aug 13, 2016 at 11:10 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
> Hi All,
>
> Recently while running tpc-h queries on postgresql master branch, I am
> noticed
> random server crash. Most of the time server crash coming while turn tpch
> query
> number 3 - (but its very random).
>
>
> Here its clear that work_instrument is either corrupted or Un-inililized
> that is the
> reason its ending up with server crash.
>
> With bit more debugging and looked at git history I found that issue started
> coming
> with commit af33039317ddc4a0e38a02e2255c2bf453115fd2. gather_readnext()
> calls
> ExecShutdownGatherWorkers() when nreaders == 0. ExecShutdownGatherWorkers()
> calls ExecParallelFinish() which collects the instrumentation before marking
> ParallelExecutorInfo to finish. ExecParallelRetrieveInstrumentation() do the
> allocation
> of planstate->worker_instrument.
>
> With commit af33039317 now we calling the gather_readnext() with per-tuple
> context,
> but with nreader == 0 with ExecShutdownGatherWorkers() we end up with
> allocation
> of planstate->worker_instrument into per-tuple context - which is wrong.
>
> Now fix can be:
>
> 1) Avoid calling ExecShutdownGatherWorkers() from the gather_readnext() and
> let
> ExecEndGather() do that things.
>

I don't think we can wait till ExecEndGather() to collect statistics,
as we need it before that for explain path.  However, we do call
ExecShutdownNode() from ExecutePlan() when there are no more tuples
which can take care of ensuring the shutdown of Gather node.   I think
the advantage of calling it in
gather_readnext() is that it will resources to be released early and
populating the instrumentation/statistics as early as possible.

> But with this change, gather_readread() and
> gather_getnext() depend on planstate->reader structure to continue reading
> tuple.
> Now either we can change those condition to be depend on planstate->nreaders
> or
> just pfree(planstate->reader) into gather_readnext() instead of calling
> ExecShutdownGatherWorkers().
>
>
> Attaching patch, which fix the issue with approach 1).
>

AFAICS, your patch seems to be the right fix for this issue, unless we
need the instrumentation information during execution (other than for
explain) for some purpose.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Rushabh Lathia
Дата:
Сообщение: [parallel query] random server crash while running tpc-h query on power2
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: new autovacuum criterion for visible pages