Re: [sqlsmith] Failed assertions on parallel worker shutdown

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: [sqlsmith] Failed assertions on parallel worker shutdown
Дата	4 июня 2016 г. 03:13:48
Msg-id	CA+TgmoYtdNMzwiOoAHzFnBPq6iHqurkwDEbhcXtMJh9T-qgihg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [sqlsmith] Failed assertions on parallel worker shutdown (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	Re: [sqlsmith] Failed assertions on parallel worker shutdown Re: [sqlsmith] Failed assertions on parallel worker shutdown
Список	pgsql-hackers

Дерево обсуждения

On Thu, May 26, 2016 at 5:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, May 24, 2016 at 6:36 PM, Andreas Seltenreich <seltenreich@gmx.de>
> wrote:
>>
>>
>> Each of the sent plans was collected when a worker dumped core due to
>> the failed assertion.  More core dumps than plans were actually
>> observed, since with this failed assertion, multiple workers usually
>> trip on and dump core simultaneously.
>>
>> The following query corresponds to plan2:
>>
>> --8<---------------cut here---------------start------------->8---
>> select
>>   pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0,
>>   subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4,
>>   (select unique1 from public.bprime limit 1 offset 9) as c5,
>>   subq_0.c2 as c6
>> from
>> (select ref_0.tablename as c0, ref_0.inherited as c1,
>>         ref_0.histogram_bounds as c2, 100 as c3
>>       from
>>         pg_catalog.pg_stats as ref_0
>>       where 49 is not NULL limit 55) as subq_0
>> where true
>> limit 58;
>> --8<---------------cut here---------------end--------------->8---
>>
>
> I am able to reproduce the assertion (it occurs once in two to three times,
> but always at same place) you have reported upthread with the above query.
> It seems to me, issue here is that while workers are writing tuples in the
> tuple queue, the master backend has detached from the queues.  The reason
> why master backend has detached from tuple queues is because of Limit
> clause, basically after processing required tuples as specified by Limit
> clause, it calls shutdown of nodes in below part of code:

I can't reproduce this assertion failure on master.  I tried running
'make installcheck' and then running this query repeatedly in the
regression database with and without
parallel_setup_cost=parallel_tuple_cost=0, and got nowhere.  Does that
work for you, or do you have some other set of steps?

> I think the workers should stop processing tuples after the tuple queues got
> detached.  This will not only handle above situation gracefully, but will
> allow to speed up the queries where Limit clause is present on top of Gather
> node.  Patch for the same is attached with mail (this was part of original
> parallel seq scan patch, but was not applied and the reason as far as I
> remember was we thought such an optimization might not be required for
> initial version).

This is very likely a good idea, but...

> Another approach to fix this issue could be to reset mqh_partial_bytes and
> mqh_length_word_complete in shm_mq_sendv  in case of SHM_MQ_DETACHED.  These
> are currently reset only incase of success.

...I think we should do this too, because it's intended that calling
shm_mq_sendv again after it previously returned SHM_MQ_DETACHED should
again return SHM_MQ_DETACHED, not fail an assertion.  Can you see
whether the attached patch fixes this for you?

(Status update for Noah: I will provide another update regarding this
issue no later than Monday COB, US time.  I assume that Amit will have
responded by then, and it should hopefully be clear what the next step
is at that point.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

dont-fail-mq-assert-v1.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [sqlsmith] Failed assertions on parallel worker shutdown

Вложения