Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.
Дата
Msg-id CAMsr+YHj4asAu8dJ1Q5hiW+wxL2Jd1=uxO52dvXhMxive-xd0g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On 21 September 2017 at 16:56, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Sep 21, 2017 at 1:07 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
>>> backup has been introduced, so we should back-patch to the all
>>> supported versions.
>>
>> There is a typo here right? Non-exclusive backups have been introduced
>> in 9.6. Why would a back-patch further down be needed?
>
> I think the non-exclusive backups infrastructure has been introduced
> in 9.1 for pg_basebackup. I've not checked yet that it can be
> reproduced using pg_basebackup in PG9.1 but I thought it could happen
> as far as I checked the code.

Yep, but the deficiency is caused by the use before_shmem_exit() in
the SQL-level functions present in 9.6~ which make the cleanup happen
potentially twice. If you look at the code of basebackup.c, you would
notice that the cleanups of the counters only happen within the
PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be
enough.

>> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
>> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
>> +       XLogCtl->Insert.nonExclusiveBackups--;
>> Hm, no, I don't agree. I think that instead you should just leave
>> do_pg_abort_backup() immediately if sessionBackupState is set to
>> SESSION_BACKUP_NONE. This variable is the link between the global
>> counters and the session stopping the backup so I don't think that we
>> should touch this assertion of this counter. I think that this method
>> would be safe as well for backup start as pg_start_backup_callback
>> takes care of any cleanup. Also because the counters are incremented
>> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
>> sessionBackupState is updated just after leaving the block.
>
> I think that the assertion failure still can happen if the process
> aborts after decremented the counter and before setting to
> SESSION_BACKUP_NONE. Am I missing something?

The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
the cleanup at this moment. So this happens when waiting for the
archives to be done, and the session flag is set to NONE at this
point.

Another one to watch out for is that elog(...) and ereport(...) invoke CHECK_FOR_INTERRUPTS. That's given me exciting surprises before when combined with assertion checking and various exit cleanup hooks.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] Windows warnings from VS 2017
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.