Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

Поиск
Список
Период
Сортировка
От Chris Travers
Тема Re: Funny hang on PostgreSQL 10 during parallel index scan on slave
Дата
Msg-id CAN-RpxB4iVAkGFowRSh=Sj8ShYHJE7nmbpT=Z4iKO7JKZgQi5A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Funny hang on PostgreSQL 10 during parallel index scan on slave  (Andres Freund <andres@anarazel.de>)
Ответы Re: Funny hang on PostgreSQL 10 during parallel index scan on slave  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers


On Wed, Sep 5, 2018 at 6:55 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-09-05 18:48:44 +0200, Chris Travers wrote:
> Will submit a patch here shortly.  Thanks!  Should we do for master and
> 10?  Or 9.6 too?

Please don't top-post on this list.  This needs to be done in all
branches where the posix_fallocate call is present.

> > Yep,  Maybe we should check for signals there.
> >
> > On Wed, Sep 5, 2018 at 5:27 PM Thomas Munro <thomas.munro@enterprisedb.com>
> > wrote:
> >
> >> On Wed, Sep 5, 2018 at 8:23 AM Chris Travers <chris.travers@adjust.com>
> >> wrote:
> >> > 1.  The query is in a parallel index scan or similar
> >> > 2.  A process is executing a parallel plan and allocating a significant
> >> chunk of memory (2MB for example) in dynamic shared memory.
> >> > 3.  The startup process goes into a loop where it sends a sigusr1,
> >> sleeps 5m, and sends another sigusr1 etc.
> >> > 4.  The sigusr1 aborts the system call, which is then retried.
> >> > 5.  Because the system call takes more than 5ms, we end up in an
> >> endless loop

What you're presumably encountering here is a recovery conflict.

Agreed but the question is how to correct what is a fairly interesting race condition. 


> On Wed, Sep 5, 2018 at 6:40 PM Chris Travers <chris.travers@adjust.com>
> wrote:
> >> Do you mean this loop in dsm_impl_posix_resize() is getting
> >> interrupted constantly and never completing?
> >>
> >>                 /* We may get interrupted, if so just retry. */
> >>                 do
> >>                 {
> >>                         rc = posix_fallocate(fd, 0, size);
> >>                 } while (rc == EINTR);
> >>

Probably worthwile to check that the dsm code is properly robust if
errors are thrown from within here.

Will check that too.  Thanks! 


Greetings,

Andres Freund


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Bug fix for glibc broke freebsd build in REL_11_STABLE
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun