Обсуждение: Wait event that should be reported while waiting for WAL archiving tofinish

Поиск
Список
Период
Сортировка

Wait event that should be reported while waiting for WAL archiving tofinish

От
Fujii Masao
Дата:
Hi,

When I saw pg_stat_activity.wait_event while pg_basebackup -X none
is waiting for WAL archiving to finish, it was either NULL or
CheckpointDone. I think this is confusing. What about introducing
new wait_event like WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
(BackupWaitWalArchive) and reporting it during that period?

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Michael Paquier
Дата:
On Thu, Feb 13, 2020 at 02:29:20AM +0900, Fujii Masao wrote:
> When I saw pg_stat_activity.wait_event while pg_basebackup -X none
> is waiting for WAL archiving to finish, it was either NULL or
> CheckpointDone. I think this is confusing. What about introducing
> new wait_event like WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
> (BackupWaitWalArchive) and reporting it during that period?

Sounds like a good idea to me.  You need to be careful that this does
not overwrite more low-level wait event registration though, so that
could be more tricky than it looks at first sight.
--
Michael

Вложения

Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Fujii Masao
Дата:

On 2020/02/13 12:28, Michael Paquier wrote:
> On Thu, Feb 13, 2020 at 02:29:20AM +0900, Fujii Masao wrote:
>> When I saw pg_stat_activity.wait_event while pg_basebackup -X none
>> is waiting for WAL archiving to finish, it was either NULL or
>> CheckpointDone. I think this is confusing. What about introducing
>> new wait_event like WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
>> (BackupWaitWalArchive) and reporting it during that period?
> 
> Sounds like a good idea to me.  You need to be careful that this does
> not overwrite more low-level wait event registration though, so that
> could be more tricky than it looks at first sight.

Thanks for the advise! Patch attached.

I found that the wait events "LogicalRewriteTruncate" and
"GSSOpenServer" are not documented. I'm thinking to add
them into doc separately if ok.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Michael Paquier
Дата:
On Thu, Feb 13, 2020 at 03:35:50PM +0900, Fujii Masao wrote:
> I found that the wait events "LogicalRewriteTruncate" and
> "GSSOpenServer" are not documented. I'm thinking to add
> them into doc separately if ok.

Nice catch.  The ordering of the entries is not respected either for
GSSOpenServer in pgstat.h.  The portion for the code and the docs can
be fixed in back-branches, but not the enum list in WaitEventClient or
we would have an ABI breakage.  But this can be fixed on HEAD.  Can
you take care of it?  If you need help, please feel free to poke me. I
think that this should be fixed first, before adding the new event.

>           <entry><literal>SyncRep</literal></entry>
>           <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
>          </row>
> +        <row>
> +         <entry><literal>BackupWaitWalArchive</literal></entry>
> +         <entry>Waiting for WAL files required for the backup to be successfully archived.</entry>
> +        </row>

The category IPC is adapted.  You forgot to update the markup morerows
from "36" to "37", causing the table of the wait events to have a
weird format (the bottom should be incorrect).

> +        pgstat_report_wait_start(WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
>          while (XLogArchiveIsBusy(lastxlogfilename) ||
>                 XLogArchiveIsBusy(histfilename))
>          {
> @@ -11120,6 +11121,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
>                                   "but the database backup will not be usable without all the WAL segments.")));
>              }
>          }
> +        pgstat_report_wait_end();

Okay, that position is right.

> @@ -3848,6 +3848,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
>          case WAIT_EVENT_SYNC_REP:
>              event_name = "SyncRep";
>              break;
> +        case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE:
> +            event_name = "BackupWaitWalArchive";
> +            break;
>              /* no default case, so that compiler will warn */
> [...]
> @@ -853,7 +853,8 @@ typedef enum
>      WAIT_EVENT_REPLICATION_ORIGIN_DROP,
>      WAIT_EVENT_REPLICATION_SLOT_DROP,
>      WAIT_EVENT_SAFE_SNAPSHOT,
> -    WAIT_EVENT_SYNC_REP
> +    WAIT_EVENT_SYNC_REP,
> +    WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
>  } WaitEventIPC;

It would be good to keep entries in alphabetical order in the header,
the code and in the docs (see the effort from 5ef037c), and your patch
is missing that concept for all three places where it matters for this
new event.
--
Michael

Вложения

Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Fujii Masao
Дата:

On 2020/02/13 16:30, Michael Paquier wrote:
> On Thu, Feb 13, 2020 at 03:35:50PM +0900, Fujii Masao wrote:
>> I found that the wait events "LogicalRewriteTruncate" and
>> "GSSOpenServer" are not documented. I'm thinking to add
>> them into doc separately if ok.
> 
> Nice catch.  The ordering of the entries is not respected either for
> GSSOpenServer in pgstat.h.  The portion for the code and the docs can
> be fixed in back-branches, but not the enum list in WaitEventClient or
> we would have an ABI breakage.  But this can be fixed on HEAD.  Can
> you take care of it?
Yes. Patch attached.

logical_rewrite_truncate_v1.patch adds the description of
LogicalRewriteTruncate into the doc. This needs to be
back-patched to v10 where commit 249cf070e3 introduced
LogicalRewriteTruncate event.

gss_open_server_v1.patch adds the description of GSSOpenServer
into the doc and update the code in pgstat_get_wait_client().
This needs to be applied in v12 where commit b0b39f72b9 introduced
GSSOpenServer event.

gss_open_server_for_master_v1.patch does not only what the above
patch does but also update wait event enum into alphabetical order.
This needs to be applied in the master.

> 
>>            <entry><literal>SyncRep</literal></entry>
>>            <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
>>           </row>
>> +        <row>
>> +         <entry><literal>BackupWaitWalArchive</literal></entry>
>> +         <entry>Waiting for WAL files required for the backup to be successfully archived.</entry>
>> +        </row>
> 
> The category IPC is adapted.  You forgot to update the markup morerows
> from "36" to "37", causing the table of the wait events to have a
> weird format (the bottom should be incorrect).

Fixed. Thanks for the review!

> 
>> +        pgstat_report_wait_start(WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
>>           while (XLogArchiveIsBusy(lastxlogfilename) ||
>>                  XLogArchiveIsBusy(histfilename))
>>           {
>> @@ -11120,6 +11121,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
>>                                    "but the database backup will not be usable without all the WAL segments.")));
>>               }
>>           }
>> +        pgstat_report_wait_end();
> 
> Okay, that position is right.
> 
>> @@ -3848,6 +3848,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
>>           case WAIT_EVENT_SYNC_REP:
>>               event_name = "SyncRep";
>>               break;
>> +        case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE:
>> +            event_name = "BackupWaitWalArchive";
>> +            break;
>>               /* no default case, so that compiler will warn */
>> [...]
>> @@ -853,7 +853,8 @@ typedef enum
>>       WAIT_EVENT_REPLICATION_ORIGIN_DROP,
>>       WAIT_EVENT_REPLICATION_SLOT_DROP,
>>       WAIT_EVENT_SAFE_SNAPSHOT,
>> -    WAIT_EVENT_SYNC_REP
>> +    WAIT_EVENT_SYNC_REP,
>> +    WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE
>>   } WaitEventIPC;
> 
> It would be good to keep entries in alphabetical order in the header,
> the code and in the docs (see the effort from 5ef037c), and your patch
> is missing that concept for all three places where it matters for this
> new event.

Fixed. Patch attached.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Michael Paquier
Дата:
On Fri, Feb 14, 2020 at 12:47:19PM +0900, Fujii Masao wrote:
> logical_rewrite_truncate_v1.patch adds the description of
> LogicalRewriteTruncate into the doc. This needs to be
> back-patched to v10 where commit 249cf070e3 introduced
> LogicalRewriteTruncate event.

Indeed.  You just be careful about the number of fields for morerows,
as that's not the same across branches.

> gss_open_server_v1.patch adds the description of GSSOpenServer
> into the doc and update the code in pgstat_get_wait_client().
> This needs to be applied in v12 where commit b0b39f72b9 introduced
> GSSOpenServer event.
>
> gss_open_server_for_master_v1.patch does not only what the above
> patch does but also update wait event enum into alphabetical order.
> This needs to be applied in the master.

Thanks for splitting things.  All that looks correct to me.
--
Michael

Вложения

Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Robert Haas
Дата:
On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> Fixed. Thanks for the review!

I think it would be safer to just report the wait event during
pg_usleep(1000000L) rather than putting those calls around the whole
loop. It does not seem impossible that ereport() or
CHECK_FOR_INTERRUPTS() could do something that reports a wait event
internally.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Fujii Masao
Дата:

On 2020/02/14 15:45, Michael Paquier wrote:
> On Fri, Feb 14, 2020 at 12:47:19PM +0900, Fujii Masao wrote:
>> logical_rewrite_truncate_v1.patch adds the description of
>> LogicalRewriteTruncate into the doc. This needs to be
>> back-patched to v10 where commit 249cf070e3 introduced
>> LogicalRewriteTruncate event.
> 
> Indeed.  You just be careful about the number of fields for morerows,
> as that's not the same across branches.
> 
>> gss_open_server_v1.patch adds the description of GSSOpenServer
>> into the doc and update the code in pgstat_get_wait_client().
>> This needs to be applied in v12 where commit b0b39f72b9 introduced
>> GSSOpenServer event.
>>
>> gss_open_server_for_master_v1.patch does not only what the above
>> patch does but also update wait event enum into alphabetical order.
>> This needs to be applied in the master.
> 
> Thanks for splitting things.  All that looks correct to me.

Thanks for the review! Pushed the patches for
LogicalRewriteTruncate and GSSOpenServer.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Fujii Masao
Дата:

On 2020/02/14 23:43, Robert Haas wrote:
> On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> Fixed. Thanks for the review!
> 
> I think it would be safer to just report the wait event during
> pg_usleep(1000000L) rather than putting those calls around the whole
> loop. It does not seem impossible that ereport() or
> CHECK_FOR_INTERRUPTS() could do something that reports a wait event
> internally.

OK, so I attached the updated version of the patch.
Thanks for the review!

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Michael Paquier
Дата:
On Mon, Feb 17, 2020 at 04:30:00PM +0900, Fujii Masao wrote:
> On 2020/02/14 23:43, Robert Haas wrote:
>> On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
>> <masao.fujii@oss.nttdata.com> wrote:
>>> Fixed. Thanks for the review!
>>
>> I think it would be safer to just report the wait event during
>> pg_usleep(1000000L) rather than putting those calls around the whole
>> loop. It does not seem impossible that ereport() or
>> CHECK_FOR_INTERRUPTS() could do something that reports a wait event
>> internally.

CHECK_FOR_INTERRUPTS() would reset the event wait state.  Hm..  You
may be right about the WARNING and it would be better to not rely on
that.  Do you remember the states which may be triggered?

> OK, so I attached the updated version of the patch.
> Thanks for the review!

Actually, I have some questions:
1) Should a new wait event be added in recoveryPausesHere()?  That
would be IMO useful.
2) Perhaps those two points should be replaced with WaitLatch(), where
we would use the new wait events introduced?
--
Michael

Вложения

Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Fujii Masao
Дата:

On 2020/02/17 18:48, Michael Paquier wrote:
> On Mon, Feb 17, 2020 at 04:30:00PM +0900, Fujii Masao wrote:
>> On 2020/02/14 23:43, Robert Haas wrote:
>>> On Thu, Feb 13, 2020 at 10:47 PM Fujii Masao
>>> <masao.fujii@oss.nttdata.com> wrote:
>>>> Fixed. Thanks for the review!
>>>
>>> I think it would be safer to just report the wait event during
>>> pg_usleep(1000000L) rather than putting those calls around the whole
>>> loop. It does not seem impossible that ereport() or
>>> CHECK_FOR_INTERRUPTS() could do something that reports a wait event
>>> internally.
> 
> CHECK_FOR_INTERRUPTS() would reset the event wait state.  Hm..  You
> may be right about the WARNING and it would be better to not rely on
> that.  Do you remember the states which may be triggered?
> 
>> OK, so I attached the updated version of the patch.
>> Thanks for the review!
> 
> Actually, I have some questions:
> 1) Should a new wait event be added in recoveryPausesHere()?  That
> would be IMO useful.

Yes, it's useful, I think. But it's better to implement that
as a separate patch.

> 2) Perhaps those two points should be replaced with WaitLatch(), where
> we would use the new wait events introduced?

For what? Maybe it should, but I'm not sure it's worth the trouble.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Michael Paquier
Дата:
On Mon, Feb 17, 2020 at 10:21:23PM +0900, Fujii Masao wrote:
> On 2020/02/17 18:48, Michael Paquier wrote:
>> Actually, I have some questions:
>> 1) Should a new wait event be added in recoveryPausesHere()?  That
>> would be IMO useful.
>
> Yes, it's useful, I think. But it's better to implement that
> as a separate patch.

No problem for me.

>> 2) Perhaps those two points should be replaced with WaitLatch(), where
>> we would use the new wait events introduced?
>
> For what? Maybe it should, but I'm not sure it's worth the trouble.

I don't have more to offer than signal handling consistency for both
without relying on pg_usleep()'s behavior depending on the platform,
and power consumption.  For the recovery pause, the second argument
may not be worth carrying, but we never had this argument for the
archiving wait, did we?  For both, on top of it you don't need to
worry about concurrent issues with the wait events attached around.
--
Michael

Вложения

Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Fujii Masao
Дата:

On 2020/02/18 12:39, Michael Paquier wrote:
> On Mon, Feb 17, 2020 at 10:21:23PM +0900, Fujii Masao wrote:
>> On 2020/02/17 18:48, Michael Paquier wrote:
>>> Actually, I have some questions:
>>> 1) Should a new wait event be added in recoveryPausesHere()?  That
>>> would be IMO useful.
>>
>> Yes, it's useful, I think. But it's better to implement that
>> as a separate patch.
> 
> No problem for me.

On second thought, it's OK to add that event into the patch.
Attached is the updated version of the patch. This patch adds
two wait events for WAL archiving and recovery pause.


>>> 2) Perhaps those two points should be replaced with WaitLatch(), where
>>> we would use the new wait events introduced?
>>
>> For what? Maybe it should, but I'm not sure it's worth the trouble.
> 
> I don't have more to offer than signal handling consistency for both
> without relying on pg_usleep()'s behavior depending on the platform,
> and power consumption.  For the recovery pause, the second argument
> may not be worth carrying, but we never had this argument for the
> archiving wait, did we?

I have no idea about this. But I wonder how much that change
is helpful to reduce the power consumption because waiting
for WAL archive during the backup basically not so frequently
happens.

Regards,


-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Atsushi Torikoshi
Дата:

On Wed, Feb 26, 2020 at 9:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
I have no idea about this. But I wonder how much that change
is helpful to reduce the power consumption because waiting
for WAL archive during the backup basically not so frequently
happens.

+1. 
And as far as I reviewed the patch,  I didn't find any problems.

Regards,

--
Atsushi Torikoshi

Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Fujii Masao
Дата:

On 2020/03/19 19:39, Atsushi Torikoshi wrote:
> 
> On Wed, Feb 26, 2020 at 9:19 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>
wrote:
> 
>     I have no idea about this. But I wonder how much that change
>     is helpful to reduce the power consumption because waiting
>     for WAL archive during the backup basically not so frequently
>     happens.
> 
> 
> +1.
> And as far as I reviewed the patch,  I didn't find any problems.

Thanks for the review!
Barring any objection, I will commit this patch.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: Wait event that should be reported while waiting for WALarchiving to finish

От
Fujii Masao
Дата:

On 2020/03/23 15:56, Fujii Masao wrote:
> 
> 
> On 2020/03/19 19:39, Atsushi Torikoshi wrote:
>>
>> On Wed, Feb 26, 2020 at 9:19 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>
wrote:
>>
>>     I have no idea about this. But I wonder how much that change
>>     is helpful to reduce the power consumption because waiting
>>     for WAL archive during the backup basically not so frequently
>>     happens.
>>
>>
>> +1.
>> And as far as I reviewed the patch,  I didn't find any problems.
> 
> Thanks for the review!
> Barring any objection, I will commit this patch.

Pushed! Thanks!

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters