Re: Strange failure in LWLock on skink in REL9_5_STABLE

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Strange failure in LWLock on skink in REL9_5_STABLE
Дата
Msg-id CAEepm=0+TWy7tpbkvt3-TW83Er+dL3r0Nt0fN9W6wFQ1tyt39A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Strange failure in LWLock on skink in REL9_5_STABLE  (Andres Freund <andres@anarazel.de>)
Ответы Re: Strange failure in LWLock on skink in REL9_5_STABLE  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Fri, Sep 21, 2018 at 3:21 PM Andres Freund <andres@anarazel.de> wrote:
> I'm quite suspicious of the logic around:
>
>                 /*
>                  * If we received a query cancel or termination signal, we will have
>                  * EINTR set here.  If the caller said that errors are OK here, check
>                  * for interrupts immediately.
>                  */
>                 if (errno == EINTR && elevel >= ERROR)
>                         CHECK_FOR_INTERRUPTS();
>
> because it seems far from guaranteed to do anything meaningful as I
> don't see a guarantee that interrupts are active at that point (e.g. it
> seems quite reasonable to hold an lwlock while resizing).

Right, in that case CFI does nothing and you get the following
ereport() instead, so the user sees an unsightly "Interrupt system
call" message (or however your strerror() spells it).  That would
actually happen in the dsa.c case for example (something that should
be improved especially now that dsm_create() is so slow on Linux,
independently of all this).  That's probably not a great user
experience, but I'm not sure if the fact that it "works around" the
suppression of interrupts while LWLocks are held by converting them
into errors is a more serious problem or not.  The caller has to be
ready for errors to be raised here in any case.

> Afaict that might cause problems at a later stage, because at that point
> we've not adjusted the actual mapping, but *have* ftruncate()ed it. If
> there's actual data in the mapping, that certainly could cause trouble.
>
> In fact, while this commit has expanded the size of the problem, I fail
> to see how the error handling for resizing is correct. It's fine to fail
> in the ftruncate() itself - at that point no changes have been made -,
> but I don't think it's currently ok for posix_fallocate() to ever error
> out.

Right, independently of this commit, dsm_resize() might have done the
actual truncation when the error is reported.  That's not good if the
caller plans to do anything other than abandon ship.  None of this
applies to dsm_create() though, which uses the same code path but
knows how to clean up.  There may be ways to fix the dsm_resize() path
based on the observation that you don't need to fallocate() if you
made the mapping smaller, and if you made it bigger then you could
always undo that on error (or not) and you haven't thrown away any
data.  Hmm... I note that there are actually no callers of
dsm_resize(), and it's not implemented on Windows or SystemV.

--
Thomas Munro
http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Naylor
Дата:
Сообщение: Re: generating bootstrap entries for array types
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Problem while setting the fpw with SIGHUP