Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.
Дата
Msg-id CAB7nPqSkQR16UYeyeKfs28j5eHXxTWmWPrtt2frX5MJQTU7GRg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.  (Michael Paquier <michael.paquier@gmail.com>)
Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Sep 21, 2017 at 1:07 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
>>> backup has been introduced, so we should back-patch to the all
>>> supported versions.
>>
>> There is a typo here right? Non-exclusive backups have been introduced
>> in 9.6. Why would a back-patch further down be needed?
>
> I think the non-exclusive backups infrastructure has been introduced
> in 9.1 for pg_basebackup. I've not checked yet that it can be
> reproduced using pg_basebackup in PG9.1 but I thought it could happen
> as far as I checked the code.

Yep, but the deficiency is caused by the use before_shmem_exit() in
the SQL-level functions present in 9.6~ which make the cleanup happen
potentially twice. If you look at the code of basebackup.c, you would
notice that the cleanups of the counters only happen within the
PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be
enough.

>> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
>> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
>> +       XLogCtl->Insert.nonExclusiveBackups--;
>> Hm, no, I don't agree. I think that instead you should just leave
>> do_pg_abort_backup() immediately if sessionBackupState is set to
>> SESSION_BACKUP_NONE. This variable is the link between the global
>> counters and the session stopping the backup so I don't think that we
>> should touch this assertion of this counter. I think that this method
>> would be safe as well for backup start as pg_start_backup_callback
>> takes care of any cleanup. Also because the counters are incremented
>> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
>> sessionBackupState is updated just after leaving the block.
>
> I think that the assertion failure still can happen if the process
> aborts after decremented the counter and before setting to
> SESSION_BACKUP_NONE. Am I missing something?

The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
the cleanup at this moment. So this happens when waiting for the
archives to be done, and the session flag is set to NONE at this
point.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: [HACKERS] Should we cacheline align PGXACT?