Re: BUG #15290: Stuck Parallel Index Scan query

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #15290: Stuck Parallel Index Scan query
Дата
Msg-id CAEepm=2fYdJ5hsrEb8OH=MCb1-adn8c0_rnTafdhKFcumL1vug@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #15290: Stuck Parallel Index Scan query  (Victor Yegorov <vyegorov@gmail.com>)
Ответы Re: BUG #15290: Stuck Parallel Index Scan query  (Victor Yegorov <vyegorov@gmail.com>)
Список pgsql-bugs
On Mon, Jul 23, 2018 at 7:57 PM, Victor Yegorov <vyegorov@gmail.com> wrote:
> - `ERROR:  canceling statement due to conflict with recovery`, happened
> right when our problematic query started, same user

Ok, so that would explain how the master was cancelled.  In 2877's
stack we see that it was aborting here:

#11 0x00007f539697ba5e in PostgresMain (argc=1,
argv=argv@entry=0x7f5398d1bbc8, dbname=0x7f5398d1bb98 "coub",
username=0x7f5398d1bbb0 "app") at
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/backend/tcop/postgres.c:3879

That line calls AbortCurrentTransaction(), just after the call to
EmitErrorReport() that wrote something in your log.  Andres's theory
(interrupts 'held') seems promising... perhaps there could be a bug
where parallel index scans leak a share-locked page or something like
that.  I tried to reproduce this a bit, but no cigar so far.  I wonder
if there could be something about your bloated index that reaches
buggy behaviour...

If you happen to have a core file for a worker that is waiting in
ConditionVariableSleep(), or it happens again, you'd be able to see if
an LWLock is causing this by printing num_held_lwlocks.

-- 
Thomas Munro
http://www.enterprisedb.com


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: BUG #15283: Query Result equal 0 for partitioned table
Следующее
От: Victor Yegorov
Дата:
Сообщение: Re: BUG #15290: Stuck Parallel Index Scan query