Re: [sqlsmith] Failed assertions on parallel worker shutdown
От | Robert Haas |
---|---|
Тема | Re: [sqlsmith] Failed assertions on parallel worker shutdown |
Дата | |
Msg-id | CA+TgmoYtdNMzwiOoAHzFnBPq6iHqurkwDEbhcXtMJh9T-qgihg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [sqlsmith] Failed assertions on parallel worker shutdown (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: [sqlsmith] Failed assertions on parallel worker shutdown
Re: [sqlsmith] Failed assertions on parallel worker shutdown |
Список | pgsql-hackers |
On Thu, May 26, 2016 at 5:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, May 24, 2016 at 6:36 PM, Andreas Seltenreich <seltenreich@gmx.de> > wrote: >> >> >> Each of the sent plans was collected when a worker dumped core due to >> the failed assertion. More core dumps than plans were actually >> observed, since with this failed assertion, multiple workers usually >> trip on and dump core simultaneously. >> >> The following query corresponds to plan2: >> >> --8<---------------cut here---------------start------------->8--- >> select >> pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0, >> subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4, >> (select unique1 from public.bprime limit 1 offset 9) as c5, >> subq_0.c2 as c6 >> from >> (select ref_0.tablename as c0, ref_0.inherited as c1, >> ref_0.histogram_bounds as c2, 100 as c3 >> from >> pg_catalog.pg_stats as ref_0 >> where 49 is not NULL limit 55) as subq_0 >> where true >> limit 58; >> --8<---------------cut here---------------end--------------->8--- >> > > I am able to reproduce the assertion (it occurs once in two to three times, > but always at same place) you have reported upthread with the above query. > It seems to me, issue here is that while workers are writing tuples in the > tuple queue, the master backend has detached from the queues. The reason > why master backend has detached from tuple queues is because of Limit > clause, basically after processing required tuples as specified by Limit > clause, it calls shutdown of nodes in below part of code: I can't reproduce this assertion failure on master. I tried running 'make installcheck' and then running this query repeatedly in the regression database with and without parallel_setup_cost=parallel_tuple_cost=0, and got nowhere. Does that work for you, or do you have some other set of steps? > I think the workers should stop processing tuples after the tuple queues got > detached. This will not only handle above situation gracefully, but will > allow to speed up the queries where Limit clause is present on top of Gather > node. Patch for the same is attached with mail (this was part of original > parallel seq scan patch, but was not applied and the reason as far as I > remember was we thought such an optimization might not be required for > initial version). This is very likely a good idea, but... > Another approach to fix this issue could be to reset mqh_partial_bytes and > mqh_length_word_complete in shm_mq_sendv in case of SHM_MQ_DETACHED. These > are currently reset only incase of success. ...I think we should do this too, because it's intended that calling shm_mq_sendv again after it previously returned SHM_MQ_DETACHED should again return SHM_MQ_DETACHED, not fail an assertion. Can you see whether the attached patch fixes this for you? (Status update for Noah: I will provide another update regarding this issue no later than Monday COB, US time. I assume that Amit will have responded by then, and it should hopefully be clear what the next step is at that point.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: Jeff JanesДата:
Сообщение: Re: [BUGS] BUG #14155: bloom index error with unlogged table