Обсуждение: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

Поиск
Список
Период
Сортировка

[HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
Hi,

I got an assert failure when executing pg_terminate_backend to the
process that waiting for WAL to be archived in non-exclusive backup
mode.

TRAP: FailedAssertion("!(XLogCtl->Insert.nonExclusiveBackups > 0)",
File: "xlog.c", Line: 11123)

The one reproducing procedure is,
1. Start postgres with enabling the WAL archive
2. Execute pg_start_backup()
3. Revoke the access privileges of archive directory. (e.g., chown
root:root /path/to/archive)
4. Execute pg_stop_backup() and hangs
5. Execute pg_terminate_backend() to the process that is waiting for
WAL to be archived.
Got the assertion failure.

Perhaps we can reproduce it using pg_basebackup as well.

I think the cause of this is that do_pg_abort_backup can be called
even after we decremented XLogCtl->Insert.nonExclusiveBackups in
do_pg_stop_backup(). That is, do_pg_abort_backup can be called with
XLogCtl->Insert.nonExclusiveBackups = 0 when the transaction aborts
after processed the nonExclusiveBackups (e.g, during waiting for WAL
to be archived)
The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
backup has been introduced, so we should back-patch to the all
supported versions.

I changed do_pg_abort_backup() so that it decrements
nonExclusiveBackups only if > 0. Feedback is very welcome.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Thu, Sep 21, 2017 at 1:07 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
> backup has been introduced, so we should back-patch to the all
> supported versions.

There is a typo here right? Non-exclusive backups have been introduced
in 9.6. Why would a back-patch further down be needed?

> I changed do_pg_abort_backup() so that it decrements
> nonExclusiveBackups only if > 0. Feedback is very welcome.

+-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
+   if (XLogCtl->Insert.nonExclusiveBackups > 0)
+       XLogCtl->Insert.nonExclusiveBackups--;
Hm, no, I don't agree. I think that instead you should just leave
do_pg_abort_backup() immediately if sessionBackupState is set to
SESSION_BACKUP_NONE. This variable is the link between the global
counters and the session stopping the backup so I don't think that we
should touch this assertion of this counter. I think that this method
would be safe as well for backup start as pg_start_backup_callback
takes care of any cleanup. Also because the counters are incremented
before entering in the PG_ENSURE_ERROR_CLEANUP block, and
sessionBackupState is updated just after leaving the block.

About the patch:
+-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
There is a typo on this line.

Adding that to the next CF would be a good idea so as we don't forget
about it. Feel free to add me as reviewer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 1:07 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
>> backup has been introduced, so we should back-patch to the all
>> supported versions.
>
> There is a typo here right? Non-exclusive backups have been introduced
> in 9.6. Why would a back-patch further down be needed?

I think the non-exclusive backups infrastructure has been introduced
in 9.1 for pg_basebackup. I've not checked yet that it can be
reproduced using pg_basebackup in PG9.1 but I thought it could happen
as far as I checked the code.

>
>> I changed do_pg_abort_backup() so that it decrements
>> nonExclusiveBackups only if > 0. Feedback is very welcome.
>
> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
> +       XLogCtl->Insert.nonExclusiveBackups--;
> Hm, no, I don't agree. I think that instead you should just leave
> do_pg_abort_backup() immediately if sessionBackupState is set to
> SESSION_BACKUP_NONE. This variable is the link between the global
> counters and the session stopping the backup so I don't think that we
> should touch this assertion of this counter. I think that this method
> would be safe as well for backup start as pg_start_backup_callback
> takes care of any cleanup. Also because the counters are incremented
> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
> sessionBackupState is updated just after leaving the block.

I think that the assertion failure still can happen if the process
aborts after decremented the counter and before setting to
SESSION_BACKUP_NONE. Am I missing something?

>
> About the patch:
> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
> There is a typo on this line.
>
> Adding that to the next CF would be a good idea so as we don't forget
> about it. Feel free to add me as reviewer.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Sep 21, 2017 at 1:07 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
>>> backup has been introduced, so we should back-patch to the all
>>> supported versions.
>>
>> There is a typo here right? Non-exclusive backups have been introduced
>> in 9.6. Why would a back-patch further down be needed?
>
> I think the non-exclusive backups infrastructure has been introduced
> in 9.1 for pg_basebackup. I've not checked yet that it can be
> reproduced using pg_basebackup in PG9.1 but I thought it could happen
> as far as I checked the code.

Yep, but the deficiency is caused by the use before_shmem_exit() in
the SQL-level functions present in 9.6~ which make the cleanup happen
potentially twice. If you look at the code of basebackup.c, you would
notice that the cleanups of the counters only happen within the
PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be
enough.

>> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
>> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
>> +       XLogCtl->Insert.nonExclusiveBackups--;
>> Hm, no, I don't agree. I think that instead you should just leave
>> do_pg_abort_backup() immediately if sessionBackupState is set to
>> SESSION_BACKUP_NONE. This variable is the link between the global
>> counters and the session stopping the backup so I don't think that we
>> should touch this assertion of this counter. I think that this method
>> would be safe as well for backup start as pg_start_backup_callback
>> takes care of any cleanup. Also because the counters are incremented
>> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
>> sessionBackupState is updated just after leaving the block.
>
> I think that the assertion failure still can happen if the process
> aborts after decremented the counter and before setting to
> SESSION_BACKUP_NONE. Am I missing something?

The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
the cleanup at this moment. So this happens when waiting for the
archives to be done, and the session flag is set to NONE at this
point.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Thu, Sep 21, 2017 at 5:56 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
>>> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
>>> +       XLogCtl->Insert.nonExclusiveBackups--;
>>> Hm, no, I don't agree. I think that instead you should just leave
>>> do_pg_abort_backup() immediately if sessionBackupState is set to
>>> SESSION_BACKUP_NONE. This variable is the link between the global
>>> counters and the session stopping the backup so I don't think that we
>>> should touch this assertion of this counter. I think that this method
>>> would be safe as well for backup start as pg_start_backup_callback
>>> takes care of any cleanup. Also because the counters are incremented
>>> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
>>> sessionBackupState is updated just after leaving the block.
>>
>> I think that the assertion failure still can happen if the process
>> aborts after decremented the counter and before setting to
>> SESSION_BACKUP_NONE. Am I missing something?
>
> The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
> the cleanup at this moment. So this happens when waiting for the
> archives to be done, and the session flag is set to NONE at this
> point.

And actually, with two non-exclusive backups taken in parallel, it is
still possible to fail on another assertion in do_pg_stop_backup()
even with your patch. Imagine the following:
1) Two backups are taken, counter for non-exclusive backups is set at 2.
2) One backup is stopped, then interrupted. This causes the counter to
be decremented twice, once in do_pg_stop_backup, and once when
aborting. Counter is at 0, switching as well forcePageWrites to
false..
3) The second backup stops, a new assertion failure is triggered.
Without assertions the counter would get at -1.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Craig Ringer
Дата:
On 21 September 2017 at 16:56, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Sep 21, 2017 at 1:07 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
>>> backup has been introduced, so we should back-patch to the all
>>> supported versions.
>>
>> There is a typo here right? Non-exclusive backups have been introduced
>> in 9.6. Why would a back-patch further down be needed?
>
> I think the non-exclusive backups infrastructure has been introduced
> in 9.1 for pg_basebackup. I've not checked yet that it can be
> reproduced using pg_basebackup in PG9.1 but I thought it could happen
> as far as I checked the code.

Yep, but the deficiency is caused by the use before_shmem_exit() in
the SQL-level functions present in 9.6~ which make the cleanup happen
potentially twice. If you look at the code of basebackup.c, you would
notice that the cleanups of the counters only happen within the
PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be
enough.

>> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
>> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
>> +       XLogCtl->Insert.nonExclusiveBackups--;
>> Hm, no, I don't agree. I think that instead you should just leave
>> do_pg_abort_backup() immediately if sessionBackupState is set to
>> SESSION_BACKUP_NONE. This variable is the link between the global
>> counters and the session stopping the backup so I don't think that we
>> should touch this assertion of this counter. I think that this method
>> would be safe as well for backup start as pg_start_backup_callback
>> takes care of any cleanup. Also because the counters are incremented
>> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
>> sessionBackupState is updated just after leaving the block.
>
> I think that the assertion failure still can happen if the process
> aborts after decremented the counter and before setting to
> SESSION_BACKUP_NONE. Am I missing something?

The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
the cleanup at this moment. So this happens when waiting for the
archives to be done, and the session flag is set to NONE at this
point.

Another one to watch out for is that elog(...) and ereport(...) invoke CHECK_FOR_INTERRUPTS. That's given me exciting surprises before when combined with assertion checking and various exit cleanup hooks.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Fri, Sep 22, 2017 at 10:41 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Another one to watch out for is that elog(...) and ereport(...) invoke
> CHECK_FOR_INTERRUPTS. That's given me exciting surprises before when
> combined with assertion checking and various exit cleanup hooks.

Ahah. Good point. In this case LWLockWaitForVar() is one thing to
worry about if LWLock tracing is enabled because it can log things
before holding the existing interrupts. This can be avoided easily in
the case of this issue by updating sessionBackupState before calling
WALInsertLockRelease and while holding the WAL insert lock. Surely
this meritates a comment in the patch we want here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Thu, Sep 21, 2017 at 5:56 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>
> Yep, but the deficiency is caused by the use before_shmem_exit() in
> the SQL-level functions present in 9.6~ which make the cleanup happen
> potentially twice. If you look at the code of basebackup.c, you would
> notice that the cleanups of the counters only happen within the
> PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be
> enough.

Thank you for explaining. I understood and agreed..

On Fri, Sep 22, 2017 at 8:02 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 5:56 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
>>>> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
>>>> +       XLogCtl->Insert.nonExclusiveBackups--;
>>>> Hm, no, I don't agree. I think that instead you should just leave
>>>> do_pg_abort_backup() immediately if sessionBackupState is set to
>>>> SESSION_BACKUP_NONE. This variable is the link between the global
>>>> counters and the session stopping the backup so I don't think that we
>>>> should touch this assertion of this counter. I think that this method
>>>> would be safe as well for backup start as pg_start_backup_callback
>>>> takes care of any cleanup. Also because the counters are incremented
>>>> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
>>>> sessionBackupState is updated just after leaving the block.
>>>
>>> I think that the assertion failure still can happen if the process
>>> aborts after decremented the counter and before setting to
>>> SESSION_BACKUP_NONE. Am I missing something?
>>
>> The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
>> the cleanup at this moment. So this happens when waiting for the
>> archives to be done, and the session flag is set to NONE at this
>> point.
>
> And actually, with two non-exclusive backups taken in parallel, it is
> still possible to fail on another assertion in do_pg_stop_backup()
> even with your patch. Imagine the following:
> 1) Two backups are taken, counter for non-exclusive backups is set at 2.
> 2) One backup is stopped, then interrupted. This causes the counter to
> be decremented twice, once in do_pg_stop_backup, and once when
> aborting. Counter is at 0, switching as well forcePageWrites to
> false..
> 3) The second backup stops, a new assertion failure is triggered.
> Without assertions the counter would get at -1.

You're right. I updated the patch so that it exits do_pg_abort_backup
if the state is NONE and setting the state to NONE in
do_pg_stop_backup before releasing the WAL insert lock.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Fri, Sep 22, 2017 at 11:34 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> You're right. I updated the patch so that it exits do_pg_abort_backup
> if the state is NONE and setting the state to NONE in
> do_pg_stop_backup before releasing the WAL insert lock.

-    /* Clean up session-level lock */
+    /*
+     * Clean up session-level lock. To avoid interrupting before changing
+     * the backup state by LWLockWaitForVar we change it while holding the
+     * WAL insert lock.
+     */    sessionBackupState = SESSION_BACKUP_NONE;
You could just mention directly CHECK_FOR_INTERRUPTS here.

+    /* Quick exit if we have done the backup */
+    if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)
+        return;

The patch contents look fine to me. I have also tested things in
depths but could not find any problems. I also looked again at the
backup start code, trying to see any flows between the moment the
session backup lock is changed and the moment the callback to do
backup cleanup is registered but I have not found any issues. I am
marking this as ready for committer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Fri, Sep 22, 2017 at 3:00 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Sep 22, 2017 at 11:34 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> You're right. I updated the patch so that it exits do_pg_abort_backup
>> if the state is NONE and setting the state to NONE in
>> do_pg_stop_backup before releasing the WAL insert lock.
>
> -    /* Clean up session-level lock */
> +    /*
> +     * Clean up session-level lock. To avoid interrupting before changing
> +     * the backup state by LWLockWaitForVar we change it while holding the
> +     * WAL insert lock.
> +     */
>      sessionBackupState = SESSION_BACKUP_NONE;
> You could just mention directly CHECK_FOR_INTERRUPTS here.

Thank you for the reviewing.
I have a question. Since WALInsertLockRelease seems not to call
LWLockWaitForVar I thought you wanted to mean LWLockReleaseClearVar
instead, is that right?

> +    /* Quick exit if we have done the backup */
> +    if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)
> +        return;
>
> The patch contents look fine to me. I have also tested things in
> depths but could not find any problems. I also looked again at the
> backup start code, trying to see any flows between the moment the
> session backup lock is changed and the moment the callback to do
> backup cleanup is registered but I have not found any issues. I am
> marking this as ready for committer.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Fri, Sep 22, 2017 at 3:24 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I have a question. Since WALInsertLockRelease seems not to call
> LWLockWaitForVar I thought you wanted to mean LWLockReleaseClearVar
> instead, is that right?

This looks like a copy-pasto from my side. Thanks for spotting it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Fri, Sep 22, 2017 at 3:46 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Sep 22, 2017 at 3:24 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I have a question. Since WALInsertLockRelease seems not to call
>> LWLockWaitForVar I thought you wanted to mean LWLockReleaseClearVar
>> instead, is that right?
>
> This looks like a copy-pasto from my side. Thanks for spotting it.

Okay, attached the latest version patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Fujii Masao
Дата:
On Fri, Sep 22, 2017 at 4:42 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Sep 22, 2017 at 3:46 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Sep 22, 2017 at 3:24 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> I have a question. Since WALInsertLockRelease seems not to call
>>> LWLockWaitForVar I thought you wanted to mean LWLockReleaseClearVar
>>> instead, is that right?
>>
>> This looks like a copy-pasto from my side. Thanks for spotting it.
>
> Okay, attached the latest version patch.

+ /* Quick exit if we have done the backup */
+ if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)
+ return;

This quick exit seems to cause another problem. Please imagine the
case where there is no exclusive backup running, a backend starts
non-exclusive backup and is terminated before calling pg_stop_backup().
In this case, do_pg_abort_backup() should decrement
XLogCtl->Insert.nonExclusiveBackups, but, with the patch, because of
the above quick exit, do_pg_abort_backup() fails to decrement that.

Regards,

-- 
Fujii Masao


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:

Thank you for the reviewing!

On Nov 15, 2017 2:59 AM, "Fujii Masao" <masao.fujii@gmail.com> wrote:
On Fri, Sep 22, 2017 at 4:42 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Sep 22, 2017 at 3:46 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Sep 22, 2017 at 3:24 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> I have a question. Since WALInsertLockRelease seems not to call
>>> LWLockWaitForVar I thought you wanted to mean LWLockReleaseClearVar
>>> instead, is that right?
>>
>> This looks like a copy-pasto from my side. Thanks for spotting it.
>
> Okay, attached the latest version patch.

+ /* Quick exit if we have done the backup */
+ if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)
+ return;

This quick exit seems to cause another problem. Please imagine the
case where there is no exclusive backup running, a backend starts
non-exclusive backup and is terminated before calling pg_stop_backup().
In this case, do_pg_abort_backup() should decrement
XLogCtl->Insert.nonExclusiveBackups, but, with the patch, because of
the above quick exit, do_pg_abort_backup() fails to decrement that.

Hmm, I think that in this case because  exclusiveBackupState is not EXCLUSIVE_BACKUP_NONE, nonExclusiveBackups is surely decremented. Am I missing something?

Regards,

--
Masahiko Sawada

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Wed, Nov 15, 2017 at 9:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Nov 15, 2017 2:59 AM, "Fujii Masao" <masao.fujii@gmail.com> wrote:
>> + /* Quick exit if we have done the backup */
>> + if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)
>> + return;
>>
>> This quick exit seems to cause another problem. Please imagine the
>> case where there is no exclusive backup running, a backend starts
>> non-exclusive backup and is terminated before calling pg_stop_backup().
>> In this case, do_pg_abort_backup() should decrement
>> XLogCtl->Insert.nonExclusiveBackups, but, with the patch, because of
>> the above quick exit, do_pg_abort_backup() fails to decrement that.
>
> Hmm, I think that in this case because  exclusiveBackupState is not
> EXCLUSIVE_BACKUP_NONE, nonExclusiveBackups is surely decremented. Am I
> missing something?

Nah. Fujii-san is right here as exclusiveBackupState is never updated
for non-exclusive backups. You need an extra check on
sessionBackupState to make sure that even for non-exclusive backups
the counter is correctly decremented if a non-exclusive session lock
is hold. For an exclusive backup, the session lock can be either
SESSION_BACKUP_EXCLUSIVE if an exclusive backup is stopped on the same
session as the start phase, or SESSION_BACKUP_NONE if the exclusive
backup is stopped from a different session. So you'd basically need
that:
+   /*
+    * Quick exit if we have done the exclusive backup or if session is
+    * not keeping around a started non-exclusive backup.
+    */
+   if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
+       sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
+       return;

At the same time it would be safer at startup phase to update
sessionBackupState when holding the WAL insert lock to prevent other
CHECK_FOR_INTERRUPTS hazards. Suggestion of patch attached.
-- 
Michael

Вложения

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Wed, Nov 15, 2017 at 10:05 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Nov 15, 2017 at 9:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Nov 15, 2017 2:59 AM, "Fujii Masao" <masao.fujii@gmail.com> wrote:
>>> + /* Quick exit if we have done the backup */
>>> + if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)
>>> + return;
>>>
>>> This quick exit seems to cause another problem. Please imagine the
>>> case where there is no exclusive backup running, a backend starts
>>> non-exclusive backup and is terminated before calling pg_stop_backup().
>>> In this case, do_pg_abort_backup() should decrement
>>> XLogCtl->Insert.nonExclusiveBackups, but, with the patch, because of
>>> the above quick exit, do_pg_abort_backup() fails to decrement that.
>>
>> Hmm, I think that in this case because  exclusiveBackupState is not
>> EXCLUSIVE_BACKUP_NONE, nonExclusiveBackups is surely decremented. Am I
>> missing something?
>
> Nah. Fujii-san is right here as exclusiveBackupState is never updated
> for non-exclusive backups.

Thank you, I was wrong.

> You need an extra check on
> sessionBackupState to make sure that even for non-exclusive backups
> the counter is correctly decremented if a non-exclusive session lock
> is hold. For an exclusive backup, the session lock can be either
> SESSION_BACKUP_EXCLUSIVE if an exclusive backup is stopped on the same
> session as the start phase, or SESSION_BACKUP_NONE if the exclusive
> backup is stopped from a different session. So you'd basically need
> that:
> +   /*
> +    * Quick exit if we have done the exclusive backup or if session is
> +    * not keeping around a started non-exclusive backup.
> +    */
> +   if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
> +       sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
> +       return;
>
> At the same time it would be safer at startup phase to update
> sessionBackupState when holding the WAL insert lock to prevent other
> CHECK_FOR_INTERRUPTS hazards. Suggestion of patch attached.

I think we need to check only sessionBackupState and don't need to
check XLogCtl->Insert.exclusiveBackupState in do_pg_abort_backup(). We
can quickly return if sessionBackupState !=
SESSION_BACKUP_NON_EXCLUSIVE. In your suggestion, I think we can still
get an assertion failure when pg_stop_backup(false) waiting for
archiving is terminated while concurrent an exclusive backup is in
progress.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Wed, Nov 15, 2017 at 12:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I think we need to check only sessionBackupState and don't need to
> check XLogCtl->Insert.exclusiveBackupState in do_pg_abort_backup(). We
> can quickly return if sessionBackupState !=
> SESSION_BACKUP_NON_EXCLUSIVE. In your suggestion, I think we can still
> get an assertion failure when pg_stop_backup(false) waiting for
> archiving is terminated while concurrent an exclusive backup is in
> progress.

I have just gone through the thread once again, and noticed that it is
actually what I suggested upthread:
https://www.postgresql.org/message-id/CAB7nPqTm5CDrR5Y7yyfKy+PVDZ6dWS_jKG1KStaN5m95gAMTFQ@mail.gmail.com
But your v2 posted here did not do that so it is incorrect from the start:
https://www.postgresql.org/message-id/CAD21AoA+isXYL1_ZXMnk9xJhYEL5h6rxJtTovLi7fumqfmCYgg@mail.gmail.com

We both got a bit confused here. As do_pg_abort_backup() is only used
for non-exclusive backups (including those taken through the
replication protocol), going through the session lock for checks is
fine. Could you update your patch accordingly please?
-- 
Michael


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Wed, Nov 15, 2017 at 2:38 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Nov 15, 2017 at 12:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I think we need to check only sessionBackupState and don't need to
>> check XLogCtl->Insert.exclusiveBackupState in do_pg_abort_backup(). We
>> can quickly return if sessionBackupState !=
>> SESSION_BACKUP_NON_EXCLUSIVE. In your suggestion, I think we can still
>> get an assertion failure when pg_stop_backup(false) waiting for
>> archiving is terminated while concurrent an exclusive backup is in
>> progress.
>
> I have just gone through the thread once again, and noticed that it is
> actually what I suggested upthread:
> https://www.postgresql.org/message-id/CAB7nPqTm5CDrR5Y7yyfKy+PVDZ6dWS_jKG1KStaN5m95gAMTFQ@mail.gmail.com
> But your v2 posted here did not do that so it is incorrect from the start:
> https://www.postgresql.org/message-id/CAD21AoA+isXYL1_ZXMnk9xJhYEL5h6rxJtTovLi7fumqfmCYgg@mail.gmail.com

Sorry, it's my fault. I didn't mean it but I forgot.

> We both got a bit confused here. As do_pg_abort_backup() is only used
> for non-exclusive backups (including those taken through the
> replication protocol), going through the session lock for checks is
> fine. Could you update your patch accordingly please?

One question is, since we need to check only the session lock I think
that the following change is not necessary. Even if calling
CHECK_FOR_INTERRUPTS after set sessionBackupState =
SESSION_BACKUP_EXCLUSIVE; we never call do_pg_abort_backup(). Is that
right?

@@ -10636,8 +10636,14 @@ do_pg_start_backup(const char *backupidstr,
bool fast, TimeLineID *starttli_p,       {               WALInsertLockAcquireExclusive();
XLogCtl->Insert.exclusiveBackupState=
 
EXCLUSIVE_BACKUP_IN_PROGRESS;
-               WALInsertLockRelease();
+
+               /*
+                * Clean up session-level lock. To avoid calling
CHECK_FOR_INTERRUPTS
+                * by LWLockReleaseClearVar before changing the backup
state we change
+                * it while holding the WAL insert lock.
+                */               sessionBackupState = SESSION_BACKUP_EXCLUSIVE;
+               WALInsertLockRelease();       }       else               sessionBackupState =
SESSION_BACKUP_NON_EXCLUSIVE;

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Wed, Nov 15, 2017 at 5:21 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Wed, Nov 15, 2017 at 2:38 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Wed, Nov 15, 2017 at 12:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> I think we need to check only sessionBackupState and don't need to
>>> check XLogCtl->Insert.exclusiveBackupState in do_pg_abort_backup(). We
>>> can quickly return if sessionBackupState !=
>>> SESSION_BACKUP_NON_EXCLUSIVE. In your suggestion, I think we can still
>>> get an assertion failure when pg_stop_backup(false) waiting for
>>> archiving is terminated while concurrent an exclusive backup is in
>>> progress.
>>
>> I have just gone through the thread once again, and noticed that it is
>> actually what I suggested upthread:
>> https://www.postgresql.org/message-id/CAB7nPqTm5CDrR5Y7yyfKy+PVDZ6dWS_jKG1KStaN5m95gAMTFQ@mail.gmail.com
>> But your v2 posted here did not do that so it is incorrect from the start:
>> https://www.postgresql.org/message-id/CAD21AoA+isXYL1_ZXMnk9xJhYEL5h6rxJtTovLi7fumqfmCYgg@mail.gmail.com
>
> Sorry, it's my fault. I didn't mean it but I forgot.

My review was wrong as well :)

>> We both got a bit confused here. As do_pg_abort_backup() is only used
>> for non-exclusive backups (including those taken through the
>> replication protocol), going through the session lock for checks is
>> fine. Could you update your patch accordingly please?
>
> One question is, since we need to check only the session lock I think
> that the following change is not necessary. Even if calling
> CHECK_FOR_INTERRUPTS after set sessionBackupState =
> SESSION_BACKUP_EXCLUSIVE; we never call do_pg_abort_backup(). Is that
> right?

Yeah, this one is not strictly necessary for this bug, but it seems to
me that it would be a good idea for robustness wiht interrupts to be
consistent with the stop phase when updating the session lock.
-- 
Michael


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Wed, Nov 15, 2017 at 5:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Nov 15, 2017 at 5:21 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Wed, Nov 15, 2017 at 2:38 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Wed, Nov 15, 2017 at 12:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> I think we need to check only sessionBackupState and don't need to
>>>> check XLogCtl->Insert.exclusiveBackupState in do_pg_abort_backup(). We
>>>> can quickly return if sessionBackupState !=
>>>> SESSION_BACKUP_NON_EXCLUSIVE. In your suggestion, I think we can still
>>>> get an assertion failure when pg_stop_backup(false) waiting for
>>>> archiving is terminated while concurrent an exclusive backup is in
>>>> progress.
>>>
>>> I have just gone through the thread once again, and noticed that it is
>>> actually what I suggested upthread:
>>> https://www.postgresql.org/message-id/CAB7nPqTm5CDrR5Y7yyfKy+PVDZ6dWS_jKG1KStaN5m95gAMTFQ@mail.gmail.com
>>> But your v2 posted here did not do that so it is incorrect from the start:
>>> https://www.postgresql.org/message-id/CAD21AoA+isXYL1_ZXMnk9xJhYEL5h6rxJtTovLi7fumqfmCYgg@mail.gmail.com
>>
>> Sorry, it's my fault. I didn't mean it but I forgot.
>
> My review was wrong as well :)
>
>>> We both got a bit confused here. As do_pg_abort_backup() is only used
>>> for non-exclusive backups (including those taken through the
>>> replication protocol), going through the session lock for checks is
>>> fine. Could you update your patch accordingly please?
>>
>> One question is, since we need to check only the session lock I think
>> that the following change is not necessary. Even if calling
>> CHECK_FOR_INTERRUPTS after set sessionBackupState =
>> SESSION_BACKUP_EXCLUSIVE; we never call do_pg_abort_backup(). Is that
>> right?
>
> Yeah, this one is not strictly necessary for this bug, but it seems to
> me that it would be a good idea for robustness wiht interrupts to be
> consistent with the stop phase when updating the session lock.

Agreed. Attached the updated patch, please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Thu, Nov 16, 2017 at 10:57 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Agreed. Attached the updated patch, please review it.

+   /*
+    * Quick exit if session is not keeping around a non-exclusive backup
+    * already started.
+    */
+   if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
+       return;
I think that it would be more solid to use SESSION_BACKUP_NONE for the
comparison, and complete the assertion after the quick exit as follows
as this code path should never be taken for an exclusive backup:
+   Assert(XLogCtl->Insert.nonExclusiveBackups > 0 &&
+          sessionBackupState == SESSION_BACKUP_NON_EXCLUSIVE);

And your patch would discard both SESSION_BACKUP_EXCLUSIVE and
SESSION_BACKUP_NONE.
-- 
Michael


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Thu, Nov 16, 2017 at 1:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Nov 16, 2017 at 10:57 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Agreed. Attached the updated patch, please review it.

Thank you for the comment.

> +   /*
> +    * Quick exit if session is not keeping around a non-exclusive backup
> +    * already started.
> +    */
> +   if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
> +       return;
> I think that it would be more solid to use SESSION_BACKUP_NONE for the
> comparison, and complete the assertion after the quick exit as follows
> as this code path should never be taken for an exclusive backup:

Agreed.

> +   Assert(XLogCtl->Insert.nonExclusiveBackups > 0 &&
> +          sessionBackupState == SESSION_BACKUP_NON_EXCLUSIVE);
>
> And your patch would discard both SESSION_BACKUP_EXCLUSIVE and
> SESSION_BACKUP_NONE.

Attached the latest patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Thu, Nov 16, 2017 at 8:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Nov 16, 2017 at 1:11 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> +   /*
>> +    * Quick exit if session is not keeping around a non-exclusive backup
>> +    * already started.
>> +    */
>> +   if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
>> +       return;
>> I think that it would be more solid to use SESSION_BACKUP_NONE for the
>> comparison, and complete the assertion after the quick exit as follows
>> as this code path should never be taken for an exclusive backup:
>
> Agreed.

I have spent some time doing an extra lookup with tests involving one
and two sessions doing backups checking for failure code paths while
waiting for archives:
- One session with non-exclusive backup.
- One session with exclusive backup.
- One session is exclusive and the second is non-exclusive
- Both are exclusive.
Also double-checked on the way the logic around the cleanup callback
and sessionBackupState during startup, and those look clean to me. One
thing that was bothering me is that the callback
nonexclusive_base_backup_cleanup is called after do_pg_start_backup()
finishes. But between the moment sessionBackupState is set and the
callback is registered there is no CHECK_FOR_INTERRUPTS or even elog()
calls so even if the process starting a non-exclusive backup is tried
to be terminated between the moment the session lock is set and the
callback is registered things are handled correctly.

So I think that we are basically safe for backups running with the SQL
interface.

However, things are not completely clean for base backups taken using
the replication protocol. While monitoring more the code, I have
noticed that perform_base_backup() calls do_pg_stop_backup() *without*
taking any cleanup action. So if a base backup is interrupted, say
with SIGTERM, while do_pg_stop_backup() is called and before the
session lock is updated then it is possible to finish with
inconsistent in-memory counters. Oops.

No need to play with breakpoints and signals in this case, using
something like that is enough to create inconsistent counters.
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -327,6 +327,8 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)   }
PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup,(Datum) 0);
 

+   elog(ERROR, "base backups don't decrement counters here, stupid!");
+   endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);

A simple fix for this one is to call do_pg_stop_backup() before
PG_END_ENSURE_ERROR_CLEANUP(). By doing so, we also make the callback
cleanup logic consistent with what is done for the SQL equivalent
where the callback is removed after finishing going through
do_pg_stop_backup(). A comment would be adapted here, say something
like "finish the backup while still holding the cleanup callback to
avoid inconsistent in-memory data should the this call fail before
sessionBackupState is updated."

For the start phase, the current logic is fine, because in the case of
the SQL interface the cleanup callback is registered after finishing
do_pg_start_backup().

What do you think?
-- 
Michael


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Fri, Nov 17, 2017 at 11:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Nov 16, 2017 at 8:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Thu, Nov 16, 2017 at 1:11 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> +   /*
>>> +    * Quick exit if session is not keeping around a non-exclusive backup
>>> +    * already started.
>>> +    */
>>> +   if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
>>> +       return;
>>> I think that it would be more solid to use SESSION_BACKUP_NONE for the
>>> comparison, and complete the assertion after the quick exit as follows
>>> as this code path should never be taken for an exclusive backup:
>>
>> Agreed.
>
> I have spent some time doing an extra lookup with tests involving one
> and two sessions doing backups checking for failure code paths while
> waiting for archives:
> - One session with non-exclusive backup.
> - One session with exclusive backup.
> - One session is exclusive and the second is non-exclusive
> - Both are exclusive.
> Also double-checked on the way the logic around the cleanup callback
> and sessionBackupState during startup, and those look clean to me. One
> thing that was bothering me is that the callback
> nonexclusive_base_backup_cleanup is called after do_pg_start_backup()
> finishes. But between the moment sessionBackupState is set and the
> callback is registered there is no CHECK_FOR_INTERRUPTS or even elog()
> calls so even if the process starting a non-exclusive backup is tried
> to be terminated between the moment the session lock is set and the
> callback is registered things are handled correctly.
>
> So I think that we are basically safe for backups running with the SQL
> interface.

Thank you for double checking!

> However, things are not completely clean for base backups taken using
> the replication protocol. While monitoring more the code, I have
> noticed that perform_base_backup() calls do_pg_stop_backup() *without*
> taking any cleanup action. So if a base backup is interrupted, say
> with SIGTERM, while do_pg_stop_backup() is called and before the
> session lock is updated then it is possible to finish with
> inconsistent in-memory counters. Oops.

Good catch! I'd missed this case.

> No need to play with breakpoints and signals in this case, using
> something like that is enough to create inconsistent counters.
> --- a/src/backend/replication/basebackup.c
> +++ b/src/backend/replication/basebackup.c
> @@ -327,6 +327,8 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
>     }
>     PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
>
> +   elog(ERROR, "base backups don't decrement counters here, stupid!");
> +
>     endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
>
> A simple fix for this one is to call do_pg_stop_backup() before
> PG_END_ENSURE_ERROR_CLEANUP(). By doing so, we also make the callback
> cleanup logic consistent with what is done for the SQL equivalent
> where the callback is removed after finishing going through
> do_pg_stop_backup(). A comment would be adapted here, say something
> like "finish the backup while still holding the cleanup callback to
> avoid inconsistent in-memory data should the this call fail before
> sessionBackupState is updated."
>
> For the start phase, the current logic is fine, because in the case of
> the SQL interface the cleanup callback is registered after finishing
> do_pg_start_backup().
>
> What do you think?

I agree with your approach. It makes sense to me.

Attached updated patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Fujii Masao
Дата:
On Mon, Nov 20, 2017 at 4:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Nov 17, 2017 at 11:00 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Nov 16, 2017 at 8:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Thu, Nov 16, 2017 at 1:11 PM, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>> +   /*
>>>> +    * Quick exit if session is not keeping around a non-exclusive backup
>>>> +    * already started.
>>>> +    */
>>>> +   if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
>>>> +       return;
>>>> I think that it would be more solid to use SESSION_BACKUP_NONE for the
>>>> comparison, and complete the assertion after the quick exit as follows
>>>> as this code path should never be taken for an exclusive backup:
>>>
>>> Agreed.
>>
>> I have spent some time doing an extra lookup with tests involving one
>> and two sessions doing backups checking for failure code paths while
>> waiting for archives:
>> - One session with non-exclusive backup.
>> - One session with exclusive backup.
>> - One session is exclusive and the second is non-exclusive
>> - Both are exclusive.
>> Also double-checked on the way the logic around the cleanup callback
>> and sessionBackupState during startup, and those look clean to me. One
>> thing that was bothering me is that the callback
>> nonexclusive_base_backup_cleanup is called after do_pg_start_backup()
>> finishes. But between the moment sessionBackupState is set and the
>> callback is registered there is no CHECK_FOR_INTERRUPTS or even elog()
>> calls so even if the process starting a non-exclusive backup is tried
>> to be terminated between the moment the session lock is set and the
>> callback is registered things are handled correctly.
>>
>> So I think that we are basically safe for backups running with the SQL
>> interface.
>
> Thank you for double checking!
>
>> However, things are not completely clean for base backups taken using
>> the replication protocol. While monitoring more the code, I have
>> noticed that perform_base_backup() calls do_pg_stop_backup() *without*
>> taking any cleanup action. So if a base backup is interrupted, say
>> with SIGTERM, while do_pg_stop_backup() is called and before the
>> session lock is updated then it is possible to finish with
>> inconsistent in-memory counters. Oops.
>
> Good catch! I'd missed this case.
>
>> No need to play with breakpoints and signals in this case, using
>> something like that is enough to create inconsistent counters.
>> --- a/src/backend/replication/basebackup.c
>> +++ b/src/backend/replication/basebackup.c
>> @@ -327,6 +327,8 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
>>     }
>>     PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
>>
>> +   elog(ERROR, "base backups don't decrement counters here, stupid!");
>> +
>>     endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
>>
>> A simple fix for this one is to call do_pg_stop_backup() before
>> PG_END_ENSURE_ERROR_CLEANUP(). By doing so, we also make the callback
>> cleanup logic consistent with what is done for the SQL equivalent
>> where the callback is removed after finishing going through
>> do_pg_stop_backup(). A comment would be adapted here, say something
>> like "finish the backup while still holding the cleanup callback to
>> avoid inconsistent in-memory data should the this call fail before
>> sessionBackupState is updated."
>>
>> For the start phase, the current logic is fine, because in the case of
>> the SQL interface the cleanup callback is registered after finishing
>> do_pg_start_backup().
>>
>> What do you think?
>
> I agree with your approach. It makes sense to me.
>
> Attached updated patch. Please review it.

Thanks for updating the patch! The patch basically looks good to me.

+ /*
+ * Clean up session-level lock. To avoid calling CHECK_FOR_INTERRUPTS by
+ * LWLockReleaseClearVar before changing the backup state we change it
+ * while holding the WAL insert lock.
+ */

I think that you should comment *why* we need to avoid calling
CHECK_FOR_INTERRUPTS before changing the backup state, here.

Regards,

-- 
Fujii Masao


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Tue, Nov 21, 2017 at 3:11 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Nov 20, 2017 at 4:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I agree with your approach. It makes sense to me.
>>
>> Attached updated patch. Please review it.
>
> Thanks for updating the patch! The patch basically looks good to me.

I am not seeing problems either. The start and stop logic of base
backups is what I would expect they should.

> + /*
> + * Clean up session-level lock. To avoid calling CHECK_FOR_INTERRUPTS by
> + * LWLockReleaseClearVar before changing the backup state we change it
> + * while holding the WAL insert lock.
> + */
>
> I think that you should comment *why* we need to avoid calling
> CHECK_FOR_INTERRUPTS before changing the backup state, here.

You could just add "as this allows to keep backup counters kept in
shared memory consistent with the state of the session starting or
stopping a backup.".
-- 
Michael


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Tue, Nov 21, 2017 at 8:03 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Nov 21, 2017 at 3:11 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Mon, Nov 20, 2017 at 4:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> I agree with your approach. It makes sense to me.
>>>
>>> Attached updated patch. Please review it.
>>
>> Thanks for updating the patch! The patch basically looks good to me.
>
> I am not seeing problems either. The start and stop logic of base
> backups is what I would expect they should.

Thank you for reviewing.

>> + /*
>> + * Clean up session-level lock. To avoid calling CHECK_FOR_INTERRUPTS by
>> + * LWLockReleaseClearVar before changing the backup state we change it
>> + * while holding the WAL insert lock.
>> + */
>>
>> I think that you should comment *why* we need to avoid calling
>> CHECK_FOR_INTERRUPTS before changing the backup state, here.
>
> You could just add "as this allows to keep backup counters kept in
> shared memory consistent with the state of the session starting or
> stopping a backup.".

Thank you for the suggestion, Michael-san. Attached updated patch.
Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Tue, Nov 21, 2017 at 9:37 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Nov 21, 2017 at 8:03 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> You could just add "as this allows to keep backup counters kept in
>> shared memory consistent with the state of the session starting or
>> stopping a backup.".
>
> Thank you for the suggestion, Michael-san. Attached updated patch.
> Please review it.

[nit]
+     * or stoppping a backup.
s/stoppping/stopping/
Fujii-san, please note that the same concept does not apply to
do_pg_start_backup().
     * reason, *all* functionality between do_pg_start_backup() and
-     * do_pg_stop_backup() should be inside the error cleanup block!
+     * do_pg_stop_backup(), including do_pg_stop_backup() should be inside
+     * the error cleanup block!     */
Weirdly worded here. "between do_pg_start_backup until
do_pg_stop_backup is done" sounds better?

[/nit]
-- 
Michael


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Tue, Nov 21, 2017 at 10:01 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Nov 21, 2017 at 9:37 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Tue, Nov 21, 2017 at 8:03 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> You could just add "as this allows to keep backup counters kept in
>>> shared memory consistent with the state of the session starting or
>>> stopping a backup.".
>>
>> Thank you for the suggestion, Michael-san. Attached updated patch.
>> Please review it.
>
> [nit]
> +     * or stoppping a backup.
> s/stoppping/stopping/

Oops.

> Fujii-san, please note that the same concept does not apply to
> do_pg_start_backup().
>
>       * reason, *all* functionality between do_pg_start_backup() and
> -     * do_pg_stop_backup() should be inside the error cleanup block!
> +     * do_pg_stop_backup(), including do_pg_stop_backup() should be inside
> +     * the error cleanup block!
>       */
> Weirdly worded here. "between do_pg_start_backup until
> do_pg_stop_backup is done" sounds better?

Agreed.

Thank you for comments. Attached updated patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Robert Haas
Дата:
On Tue, Nov 21, 2017 at 4:32 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Thank you for comments. Attached updated patch.

I see that Michael has marked this Ready for Committer, but also that
he didn't update the thread, so perhaps some interested committer
(Fujii Masao?) might have missed the fact that Michael thinks it's
ready to go.  Anyone interested in taking a look at this one for
commit?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Wed, Nov 29, 2017 at 6:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Nov 21, 2017 at 4:32 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Thank you for comments. Attached updated patch.
>
> I see that Michael has marked this Ready for Committer, but also that
> he didn't update the thread, so perhaps some interested committer
> (Fujii Masao?) might have missed the fact that Michael thinks it's
> ready to go.

Sorry for not mentioning that directly on the thread.

> Anyone interested in taking a look at this one for commit?

I would assume that Fujii-san would be the chosen one here as he has
worked already on the thread.
-- 
Michael


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Fri, Dec 1, 2017 at 11:51 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Nov 29, 2017 at 7:36 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Wed, Nov 29, 2017 at 6:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Tue, Nov 21, 2017 at 4:32 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> Thank you for comments. Attached updated patch.
>>>
>>> I see that Michael has marked this Ready for Committer, but also that
>>> he didn't update the thread, so perhaps some interested committer
>>> (Fujii Masao?) might have missed the fact that Michael thinks it's
>>> ready to go.
>>
>> Sorry for not mentioning that directly on the thread.
>>
>>> Anyone interested in taking a look at this one for commit?
>>
>> I would assume that Fujii-san would be the chosen one here as he has
>> worked already on the thread.
>
> No replies yet. So I am moving this patch to next CF.

After off-discussion with Fujii-san, I've updated the comment of why
we should disallow interrupts before setting/cleanup the session-level
lock. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Fri, Dec 8, 2017 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> After off-discussion with Fujii-san, I've updated the comment of why
> we should disallow interrupts before setting/cleanup the session-level
> lock. Please review it.

+       /*
+        * Set session-level lock. If we allow interrupts before setting
+        * session-level lock, we could call callbacks with an inconsistent
+        * state. To avoid calling CHECK_FOR_INTERRUPTS by LWLockReleaseClearVar
+        * which is called by WALInsertLockRelease before changing the backup
+        * state we change it while holding the WAL insert lock.
+        */
So you are just adding the reference to WALInsertLockRelease.. Instead
of writing the function names for LWLocks, I would just write "To
avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
LWLock" and be done with it. There is no point to list a full function
dependency list, which could change in the future with static routines
of lwlock.c.
-- 
Michael


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Robert Haas
Дата:
On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Dec 8, 2017 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> After off-discussion with Fujii-san, I've updated the comment of why
>> we should disallow interrupts before setting/cleanup the session-level
>> lock. Please review it.
>
> +       /*
> +        * Set session-level lock. If we allow interrupts before setting
> +        * session-level lock, we could call callbacks with an inconsistent
> +        * state. To avoid calling CHECK_FOR_INTERRUPTS by LWLockReleaseClearVar
> +        * which is called by WALInsertLockRelease before changing the backup
> +        * state we change it while holding the WAL insert lock.
> +        */
> So you are just adding the reference to WALInsertLockRelease.. Instead
> of writing the function names for LWLocks, I would just write "To
> avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
> LWLock" and be done with it. There is no point to list a full function
> dependency list, which could change in the future with static routines
> of lwlock.c.

I think it's actually good to be explicit here.  I looked at this
patch a bit last week and had great difficulty understanding how the
CHECK_FOR_INTERRUPTS() could happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Dec 8, 2017 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> After off-discussion with Fujii-san, I've updated the comment of why
>>> we should disallow interrupts before setting/cleanup the session-level
>>> lock. Please review it.
>>
>> +       /*
>> +        * Set session-level lock. If we allow interrupts before setting
>> +        * session-level lock, we could call callbacks with an inconsistent
>> +        * state. To avoid calling CHECK_FOR_INTERRUPTS by LWLockReleaseClearVar
>> +        * which is called by WALInsertLockRelease before changing the backup
>> +        * state we change it while holding the WAL insert lock.
>> +        */
>> So you are just adding the reference to WALInsertLockRelease.. Instead
>> of writing the function names for LWLocks,

I also added a sentence "If we allow interrupts before cleanup
session-level lock, we could call do_pg_abort_backup with an
inconsistent state" at two places: setting and cleanup session-level
lock.

>> I would just write "To
>> avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
>> LWLock" and be done with it. There is no point to list a full function
>> dependency list, which could change in the future with static routines
>> of lwlock.c.

Agreed. Updated the comment.

>
> I think it's actually good to be explicit here.  I looked at this
> patch a bit last week and had great difficulty understanding how the
> CHECK_FOR_INTERRUPTS() could happen.
>

Attached the updated version patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Mon, Dec 11, 2017 at 2:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> I would just write "To
>>> avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
>>> LWLock" and be done with it. There is no point to list a full function
>>> dependency list, which could change in the future with static routines
>>> of lwlock.c.
>
> Agreed. Updated the comment.

Robert actually liked adding the complete routine list. Let's see what
Fujii-san thinks at the end, there is still some time until the next
round of minor releases. Robert, if Fujii-san does not show  up in
time, would you look at this patch? I won't fight if you rework the
comments the way you think is better :)
-- 
Michael


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Fujii Masao
Дата:
On Mon, Dec 11, 2017 at 2:16 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Dec 11, 2017 at 2:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>> I would just write "To
>>>> avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
>>>> LWLock" and be done with it. There is no point to list a full function
>>>> dependency list, which could change in the future with static routines
>>>> of lwlock.c.
>>
>> Agreed. Updated the comment.
>
> Robert actually liked adding the complete routine list. Let's see what
> Fujii-san thinks at the end, there is still some time until the next
> round of minor releases.

What I think is the patch I attached. Thought?

Regards,

-- 
Fujii Masao

Вложения

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Fri, Dec 15, 2017 at 3:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Dec 11, 2017 at 2:16 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Mon, Dec 11, 2017 at 2:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier
>>>> <michael.paquier@gmail.com> wrote:
>>>>> I would just write "To
>>>>> avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
>>>>> LWLock" and be done with it. There is no point to list a full function
>>>>> dependency list, which could change in the future with static routines
>>>>> of lwlock.c.
>>>
>>> Agreed. Updated the comment.
>>
>> Robert actually liked adding the complete routine list. Let's see what
>> Fujii-san thinks at the end, there is still some time until the next
>> round of minor releases.
>
> What I think is the patch I attached. Thought?

That's OK for me. Thanks.
-- 
Michael


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Fri, Dec 15, 2017 at 6:46 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Dec 15, 2017 at 3:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Mon, Dec 11, 2017 at 2:16 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Mon, Dec 11, 2017 at 2:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>>> On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier
>>>>> <michael.paquier@gmail.com> wrote:
>>>>>> I would just write "To
>>>>>> avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
>>>>>> LWLock" and be done with it. There is no point to list a full function
>>>>>> dependency list, which could change in the future with static routines
>>>>>> of lwlock.c.
>>>>
>>>> Agreed. Updated the comment.
>>>
>>> Robert actually liked adding the complete routine list. Let's see what
>>> Fujii-san thinks at the end, there is still some time until the next
>>> round of minor releases.
>>
>> What I think is the patch I attached. Thought?
>
> That's OK for me. Thanks.

+1 from me.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Robert Haas
Дата:
On Thu, Dec 14, 2017 at 7:51 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> +1 from me.

Works for me, too, although I still don't really follow how it's
happening in the present coding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Mon, Dec 18, 2017 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 14, 2017 at 7:51 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> +1 from me.
>
> Works for me, too, although I still don't really follow how it's
> happening in the present coding.

Craig has mentioned at least one way upthread:
https://www.postgresql.org/message-id/CAMsr+YHj4asAu8dJ1Q5hiW+wxL2Jd1=uxO52dvXhMxive-xd0g@mail.gmail.com
And that's possible when building with LOCK_DEBUG with trace_lwlocks
enabled at least. So I would rather not rely on assuming that
CHECK_FOR_INTERRUPTS() as a code path never taken for the cases
discussed here.
-- 
Michael


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Fujii Masao
Дата:
On Fri, Dec 15, 2017 at 9:51 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Dec 15, 2017 at 6:46 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Dec 15, 2017 at 3:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Mon, Dec 11, 2017 at 2:16 PM, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>> On Mon, Dec 11, 2017 at 2:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>> On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>>>> On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier
>>>>>> <michael.paquier@gmail.com> wrote:
>>>>>>> I would just write "To
>>>>>>> avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
>>>>>>> LWLock" and be done with it. There is no point to list a full function
>>>>>>> dependency list, which could change in the future with static routines
>>>>>>> of lwlock.c.
>>>>>
>>>>> Agreed. Updated the comment.
>>>>
>>>> Robert actually liked adding the complete routine list. Let's see what
>>>> Fujii-san thinks at the end, there is still some time until the next
>>>> round of minor releases.
>>>
>>> What I think is the patch I attached. Thought?
>>
>> That's OK for me. Thanks.
>
> +1 from me.

Committed. Thanks!

Regards,

-- 
Fujii Masao


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Masahiko Sawada
Дата:
On Tue, Dec 19, 2017 at 3:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Dec 15, 2017 at 9:51 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Fri, Dec 15, 2017 at 6:46 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Fri, Dec 15, 2017 at 3:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> On Mon, Dec 11, 2017 at 2:16 PM, Michael Paquier
>>>> <michael.paquier@gmail.com> wrote:
>>>>> On Mon, Dec 11, 2017 at 2:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>>> On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>>>>> On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier
>>>>>>> <michael.paquier@gmail.com> wrote:
>>>>>>>> I would just write "To
>>>>>>>> avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
>>>>>>>> LWLock" and be done with it. There is no point to list a full function
>>>>>>>> dependency list, which could change in the future with static routines
>>>>>>>> of lwlock.c.
>>>>>>
>>>>>> Agreed. Updated the comment.
>>>>>
>>>>> Robert actually liked adding the complete routine list. Let's see what
>>>>> Fujii-san thinks at the end, there is still some time until the next
>>>>> round of minor releases.
>>>>
>>>> What I think is the patch I attached. Thought?
>>>
>>> That's OK for me. Thanks.
>>
>> +1 from me.
>
> Committed. Thanks!
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

От
Michael Paquier
Дата:
On Tue, Dec 19, 2017 at 5:11 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Dec 19, 2017 at 3:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Committed. Thanks!
>
> Thank you!

Thanks all. I can see that I have been credited as author as well,
though it seems to me that I played mainly a reviewer role.
-- 
Michael