Обсуждение: pg_stat_progress_basebackup - progress reporting for pg_basebackup,in the server side

Поиск
Список
Период
Сортировка

pg_stat_progress_basebackup - progress reporting for pg_basebackup,in the server side

От
Fujii Masao
Дата:
Hi,

pg_basebackup reports the backup progress if --progress option is specified,
and we can monitor it in the client side. I think that it's useful if we can
monitor the progress information also in the server side because, for example,
we can easily check that by using SQL. So I'd like to propose
pg_stat_progress_basebackup view that allows us to monitor the progress
of pg_basebackup, in the server side. Thought?

POC patch is attached.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Kyotaro Horiguchi
Дата:
At Wed, 29 Jan 2020 23:16:08 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> Hi,
> 
> pg_basebackup reports the backup progress if --progress option is
> specified,
> and we can monitor it in the client side. I think that it's useful if
> we can
> monitor the progress information also in the server side because, for
> example,
> we can easily check that by using SQL. So I'd like to propose
> pg_stat_progress_basebackup view that allows us to monitor the
> progress
> of pg_basebackup, in the server side. Thought?
> 
> POC patch is attached.

| postgres=# \d pg_stat_progress_basebackup
|          View "pg_catalog.pg_stat_progress_basebackup"
|        Column        |  Type   | Collation | Nullable | Default 
| ---------------------+---------+-----------+----------+---------
|  pid                 | integer |           |          | 
|  phase               | text    |           |          | 
|  backup_total        | bigint  |           |          | 
|  backup_streamed     | bigint  |           |          | 
|  tablespace_total    | bigint  |           |          | 
|  tablespace_streamed | bigint  |           |          | 

I think the view needs client identity such like host/port pair
besides pid (host/port pair fails identify client in the case of
unix-sockets.).  Also elapsed time from session start might be
useful. pg_stat_progress_acuum has datid, datname and relid.

+    if (backup_total > 0 && backup_streamed > backup_total)
+    {
+        backup_total = backup_streamed;

Do we need the condition "backup_total > 0"?


+        int        tblspc_streamed = 0;
+
+        pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+                                     PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP);

This starts "streaming backup" phase with backup_total = 0. Coudln't
we move to that phase after setting backup total and tablespace total?
That is, just after calling SendBackupHeader().

+            WHEN 3 THEN 'stopping backup'::text

I'm not sure, but the "stop" seems suggesting the backup is terminated
before completion. If it is following the name of the function
pg_stop_backup, I think the name is suggesting to stop "the state for
performing backup", not a backup.

In the first place, the progress is about "backup" so it seems strange
that we have another phase after the "stopping backup" phase.  It
might be better that it is "finishing file transfer" or such.

   "initializing"
-> "starting file transfer"
-> "transferring files"
-> "finishing file transfer"
-> "transaferring WAL"

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/01/30 12:58, Kyotaro Horiguchi wrote:
> At Wed, 29 Jan 2020 23:16:08 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>> Hi,
>>
>> pg_basebackup reports the backup progress if --progress option is
>> specified,
>> and we can monitor it in the client side. I think that it's useful if
>> we can
>> monitor the progress information also in the server side because, for
>> example,
>> we can easily check that by using SQL. So I'd like to propose
>> pg_stat_progress_basebackup view that allows us to monitor the
>> progress
>> of pg_basebackup, in the server side. Thought?
>>
>> POC patch is attached.
> 
> | postgres=# \d pg_stat_progress_basebackup
> |          View "pg_catalog.pg_stat_progress_basebackup"
> |        Column        |  Type   | Collation | Nullable | Default
> | ---------------------+---------+-----------+----------+---------
> |  pid                 | integer |           |          |
> |  phase               | text    |           |          |
> |  backup_total        | bigint  |           |          |
> |  backup_streamed     | bigint  |           |          |
> |  tablespace_total    | bigint  |           |          |
> |  tablespace_streamed | bigint  |           |          |
> 
> I think the view needs client identity such like host/port pair
> besides pid (host/port pair fails identify client in the case of
> unix-sockets.).

I don't think this is job of a progress reporting. If those information
is necessary, we can join this view with pg_stat_replication using
pid column as the join key.

> Also elapsed time from session start might be
> useful.

+1
I think that not only pg_stat_progress_basebackup but also all the other
progress views should report the time when the target command was
started and the time when the phase was last changed. Those times
would be useful to estimate the remaining execution time from the
progress infromation.

> pg_stat_progress_acuum has datid, datname and relid.
> 
> +    if (backup_total > 0 && backup_streamed > backup_total)
> +    {
> +        backup_total = backup_streamed;
> 
> Do we need the condition "backup_total > 0"?

I added the condition for the case where --progress option is not supplied
in pg_basebackup, i.e., the case where the total amount of backup is not
estimated at the beginning of the backup. In this case, total_backup is
always 0.

> +        int        tblspc_streamed = 0;
> +
> +        pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
> +                                     PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP);
> 
> This starts "streaming backup" phase with backup_total = 0. Coudln't
> we move to that phase after setting backup total and tablespace total?
> That is, just after calling SendBackupHeader().

OK, that's a bit less confusing for users. I will change in that way.

> +            WHEN 3 THEN 'stopping backup'::text
> 
> I'm not sure, but the "stop" seems suggesting the backup is terminated
> before completion. If it is following the name of the function
> pg_stop_backup, I think the name is suggesting to stop "the state for
> performing backup", not a backup.
> 
> In the first place, the progress is about "backup" so it seems strange
> that we have another phase after the "stopping backup" phase.  It
> might be better that it is "finishing file transfer" or such.
> 
>     "initializing"
> -> "starting file transfer"
> -> "transferring files"
> -> "finishing file transfer"
> -> "transaferring WAL"

Better name is always welcome! If "stopping back" is confusing,
what about "performing pg_stop_backup"? So

initializing
performing pg_start_backup
streaming database files
performing pg_stop_backup
transfering WAL files

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Masahiko Sawada
Дата:
On Fri, 31 Jan 2020 at 02:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/01/30 12:58, Kyotaro Horiguchi wrote:
> > At Wed, 29 Jan 2020 23:16:08 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
> >> Hi,
> >>
> >> pg_basebackup reports the backup progress if --progress option is
> >> specified,
> >> and we can monitor it in the client side. I think that it's useful if
> >> we can
> >> monitor the progress information also in the server side because, for
> >> example,
> >> we can easily check that by using SQL. So I'd like to propose
> >> pg_stat_progress_basebackup view that allows us to monitor the
> >> progress
> >> of pg_basebackup, in the server side. Thought?
> >>
> >> POC patch is attached.
> >
> > | postgres=# \d pg_stat_progress_basebackup
> > |          View "pg_catalog.pg_stat_progress_basebackup"
> > |        Column        |  Type   | Collation | Nullable | Default
> > | ---------------------+---------+-----------+----------+---------
> > |  pid                 | integer |           |          |
> > |  phase               | text    |           |          |
> > |  backup_total        | bigint  |           |          |
> > |  backup_streamed     | bigint  |           |          |
> > |  tablespace_total    | bigint  |           |          |
> > |  tablespace_streamed | bigint  |           |          |
> >
> > I think the view needs client identity such like host/port pair
> > besides pid (host/port pair fails identify client in the case of
> > unix-sockets.).
>
> I don't think this is job of a progress reporting. If those information
> is necessary, we can join this view with pg_stat_replication using
> pid column as the join key.
>
> > Also elapsed time from session start might be
> > useful.
>
> +1
> I think that not only pg_stat_progress_basebackup but also all the other
> progress views should report the time when the target command was
> started and the time when the phase was last changed. Those times
> would be useful to estimate the remaining execution time from the
> progress infromation.
>
> > pg_stat_progress_acuum has datid, datname and relid.
> >
> > +     if (backup_total > 0 && backup_streamed > backup_total)
> > +     {
> > +             backup_total = backup_streamed;
> >
> > Do we need the condition "backup_total > 0"?
>
> I added the condition for the case where --progress option is not supplied
> in pg_basebackup, i.e., the case where the total amount of backup is not
> estimated at the beginning of the backup. In this case, total_backup is
> always 0.
>
> > +             int             tblspc_streamed = 0;
> > +
> > +             pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
> > +                                                                      PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP);
> >
> > This starts "streaming backup" phase with backup_total = 0. Coudln't
> > we move to that phase after setting backup total and tablespace total?
> > That is, just after calling SendBackupHeader().
>
> OK, that's a bit less confusing for users. I will change in that way.
>
> > +            WHEN 3 THEN 'stopping backup'::text
> >
> > I'm not sure, but the "stop" seems suggesting the backup is terminated
> > before completion. If it is following the name of the function
> > pg_stop_backup, I think the name is suggesting to stop "the state for
> > performing backup", not a backup.
> >
> > In the first place, the progress is about "backup" so it seems strange
> > that we have another phase after the "stopping backup" phase.  It
> > might be better that it is "finishing file transfer" or such.
> >
> >     "initializing"
> > -> "starting file transfer"
> > -> "transferring files"
> > -> "finishing file transfer"
> > -> "transaferring WAL"
>
> Better name is always welcome! If "stopping back" is confusing,
> what about "performing pg_stop_backup"? So
>
> initializing
> performing pg_start_backup
> streaming database files
> performing pg_stop_backup
> transfering WAL files

Another idea I came up with is to show steps that take time instead of
pg_start_backup/pg_stop_backup, for better understanding the
situation. That is, "performing checkpoint" and "performing WAL
archive" etc, which engage the most of the time of these functions.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/02/02 14:59, Masahiko Sawada wrote:
> On Fri, 31 Jan 2020 at 02:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2020/01/30 12:58, Kyotaro Horiguchi wrote:
>>> At Wed, 29 Jan 2020 23:16:08 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>>> Hi,
>>>>
>>>> pg_basebackup reports the backup progress if --progress option is
>>>> specified,
>>>> and we can monitor it in the client side. I think that it's useful if
>>>> we can
>>>> monitor the progress information also in the server side because, for
>>>> example,
>>>> we can easily check that by using SQL. So I'd like to propose
>>>> pg_stat_progress_basebackup view that allows us to monitor the
>>>> progress
>>>> of pg_basebackup, in the server side. Thought?
>>>>
>>>> POC patch is attached.
>>>
>>> | postgres=# \d pg_stat_progress_basebackup
>>> |          View "pg_catalog.pg_stat_progress_basebackup"
>>> |        Column        |  Type   | Collation | Nullable | Default
>>> | ---------------------+---------+-----------+----------+---------
>>> |  pid                 | integer |           |          |
>>> |  phase               | text    |           |          |
>>> |  backup_total        | bigint  |           |          |
>>> |  backup_streamed     | bigint  |           |          |
>>> |  tablespace_total    | bigint  |           |          |
>>> |  tablespace_streamed | bigint  |           |          |
>>>
>>> I think the view needs client identity such like host/port pair
>>> besides pid (host/port pair fails identify client in the case of
>>> unix-sockets.).
>>
>> I don't think this is job of a progress reporting. If those information
>> is necessary, we can join this view with pg_stat_replication using
>> pid column as the join key.
>>
>>> Also elapsed time from session start might be
>>> useful.
>>
>> +1
>> I think that not only pg_stat_progress_basebackup but also all the other
>> progress views should report the time when the target command was
>> started and the time when the phase was last changed. Those times
>> would be useful to estimate the remaining execution time from the
>> progress infromation.
>>
>>> pg_stat_progress_acuum has datid, datname and relid.
>>>
>>> +     if (backup_total > 0 && backup_streamed > backup_total)
>>> +     {
>>> +             backup_total = backup_streamed;
>>>
>>> Do we need the condition "backup_total > 0"?
>>
>> I added the condition for the case where --progress option is not supplied
>> in pg_basebackup, i.e., the case where the total amount of backup is not
>> estimated at the beginning of the backup. In this case, total_backup is
>> always 0.
>>
>>> +             int             tblspc_streamed = 0;
>>> +
>>> +             pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
>>> +                                                                      PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP);
>>>
>>> This starts "streaming backup" phase with backup_total = 0. Coudln't
>>> we move to that phase after setting backup total and tablespace total?
>>> That is, just after calling SendBackupHeader().
>>
>> OK, that's a bit less confusing for users. I will change in that way.

Fixed. Attached is the updated version of the patch.
I also fixed the regression test failure.

>>
>>> +            WHEN 3 THEN 'stopping backup'::text
>>>
>>> I'm not sure, but the "stop" seems suggesting the backup is terminated
>>> before completion. If it is following the name of the function
>>> pg_stop_backup, I think the name is suggesting to stop "the state for
>>> performing backup", not a backup.
>>>
>>> In the first place, the progress is about "backup" so it seems strange
>>> that we have another phase after the "stopping backup" phase.  It
>>> might be better that it is "finishing file transfer" or such.
>>>
>>>      "initializing"
>>> -> "starting file transfer"
>>> -> "transferring files"
>>> -> "finishing file transfer"
>>> -> "transaferring WAL"
>>
>> Better name is always welcome! If "stopping back" is confusing,
>> what about "performing pg_stop_backup"? So
>>
>> initializing
>> performing pg_start_backup
>> streaming database files
>> performing pg_stop_backup
>> transfering WAL files
> 
> Another idea I came up with is to show steps that take time instead of
> pg_start_backup/pg_stop_backup, for better understanding the
> situation. That is, "performing checkpoint" and "performing WAL
> archive" etc, which engage the most of the time of these functions.

Yeah, that's an idea. ISTM that "waiting for WAL archiving" sounds
better than "performing WAL archive". Thought?
I've not applied this change in the patch yet, but if there is no
other idea, I'd like to adopt this.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Mon, Feb 3, 2020 at 1:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2020/02/02 14:59, Masahiko Sawada wrote:
> > On Fri, 31 Jan 2020 at 02:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >> On 2020/01/30 12:58, Kyotaro Horiguchi wrote:
> >>> +            WHEN 3 THEN 'stopping backup'::text
> >>>
> >>> I'm not sure, but the "stop" seems suggesting the backup is terminated
> >>> before completion. If it is following the name of the function
> >>> pg_stop_backup, I think the name is suggesting to stop "the state for
> >>> performing backup", not a backup.
> >>>
> >>> In the first place, the progress is about "backup" so it seems strange
> >>> that we have another phase after the "stopping backup" phase.  It
> >>> might be better that it is "finishing file transfer" or such.
> >>>
> >>>      "initializing"
> >>> -> "starting file transfer"
> >>> -> "transferring files"
> >>> -> "finishing file transfer"
> >>> -> "transaferring WAL"
> >>
> >> Better name is always welcome! If "stopping back" is confusing,
> >> what about "performing pg_stop_backup"? So
> >>
> >> initializing
> >> performing pg_start_backup
> >> streaming database files
> >> performing pg_stop_backup
> >> transfering WAL files
> >
> > Another idea I came up with is to show steps that take time instead of
> > pg_start_backup/pg_stop_backup, for better understanding the
> > situation. That is, "performing checkpoint" and "performing WAL
> > archive" etc, which engage the most of the time of these functions.
>
> Yeah, that's an idea. ISTM that "waiting for WAL archiving" sounds
> better than "performing WAL archive". Thought?
> I've not applied this change in the patch yet, but if there is no
> other idea, I'd like to adopt this.

If we are trying to "pg_stop_backup" in phase name, maybe we should
avoid "pg_start_backup"?  Then maybe:

initializing
starting backup / waiting for [ backup start ] checkpoint to finish
transferring database files
finishing backup / waiting for WAL archiving to finish
transferring WAL files

?

Some comments on documentation changes in v2 patch:

+      Amount of data already streamed.

"already" may be redundant.  For example, in pg_start_progress_vacuum,
heap_blks_scanned is described as "...blocks scanned", not "...blocks
already scanned".

+     <entry><structfield>tablespace_total</structfield></entry>
+     <entry><structfield>tablespace_streamed</structfield></entry>

Better to use plural tablespaces_total and tablespaces_streamed for consistency?

+       The WAL sender process is currently performing
+       <function>pg_start_backup</function> and setting up for
+       making a base backup.

How about "taking" instead of "making" in the above sentence?

-  <varlistentry>
+  <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP">

I don't see any new text in the documentation patch that uses above
xref, so no need to define it?

Thanks,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Mon, Feb 3, 2020 at 4:28 PM Amit Langote <amitlangote09@gmail.com> wrote:
> If we are trying to "pg_stop_backup" in phase name, maybe we should
> avoid "pg_start_backup"?  Then maybe:

Sorry, I meant to write:

If we are trying to avoid "pg_stop_backup" in phase name, maybe we
should avoid "pg_start_backup"?  Then maybe:

Thanks,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/02/03 16:28, Amit Langote wrote:
> On Mon, Feb 3, 2020 at 1:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> On 2020/02/02 14:59, Masahiko Sawada wrote:
>>> On Fri, 31 Jan 2020 at 02:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>> On 2020/01/30 12:58, Kyotaro Horiguchi wrote:
>>>>> +            WHEN 3 THEN 'stopping backup'::text
>>>>>
>>>>> I'm not sure, but the "stop" seems suggesting the backup is terminated
>>>>> before completion. If it is following the name of the function
>>>>> pg_stop_backup, I think the name is suggesting to stop "the state for
>>>>> performing backup", not a backup.
>>>>>
>>>>> In the first place, the progress is about "backup" so it seems strange
>>>>> that we have another phase after the "stopping backup" phase.  It
>>>>> might be better that it is "finishing file transfer" or such.
>>>>>
>>>>>       "initializing"
>>>>> -> "starting file transfer"
>>>>> -> "transferring files"
>>>>> -> "finishing file transfer"
>>>>> -> "transaferring WAL"
>>>>
>>>> Better name is always welcome! If "stopping back" is confusing,
>>>> what about "performing pg_stop_backup"? So
>>>>
>>>> initializing
>>>> performing pg_start_backup
>>>> streaming database files
>>>> performing pg_stop_backup
>>>> transfering WAL files
>>>
>>> Another idea I came up with is to show steps that take time instead of
>>> pg_start_backup/pg_stop_backup, for better understanding the
>>> situation. That is, "performing checkpoint" and "performing WAL
>>> archive" etc, which engage the most of the time of these functions.
>>
>> Yeah, that's an idea. ISTM that "waiting for WAL archiving" sounds
>> better than "performing WAL archive". Thought?
>> I've not applied this change in the patch yet, but if there is no
>> other idea, I'd like to adopt this.
> 
> If we are trying to "pg_stop_backup" in phase name, maybe we should
> avoid "pg_start_backup"?  Then maybe:
> 
> initializing
> starting backup / waiting for [ backup start ] checkpoint to finish
> transferring database files
> finishing backup / waiting for WAL archiving to finish
> transferring WAL files

So we now have the following ideas about the phase names for pg_basebackup.

1.
initializing

2.
2-1. starting backup
2-2. starting file transfer
2-3. performing pg_start_backup
2-4. performing checkpoint
2-5. waiting for [ backup start ] checkpoint to finish

3.
3-1. streaming backup
3-2. transferring database files
3-3. streaming database files
3-4. transferring files

4.
4-1. stopping backup
4-2. finishing file transfer
4-3. performing pg_stop_backup
4-4. finishing backup
4-5. waiting for WAL archiving to finish
4-6. performing WAL archive

5.
1. transferring wal
2. transferring WAL files

What conbination of these do you prefer?
> Some comments on documentation changes in v2 patch:
> 
> +      Amount of data already streamed.

Ok, fixed.

> "already" may be redundant.  For example, in pg_start_progress_vacuum,
> heap_blks_scanned is described as "...blocks scanned", not "...blocks
> already scanned".
> 
> +     <entry><structfield>tablespace_total</structfield></entry>
> +     <entry><structfield>tablespace_streamed</structfield></entry>
> 
> Better to use plural tablespaces_total and tablespaces_streamed for consistency?

Fixed.

> +       The WAL sender process is currently performing
> +       <function>pg_start_backup</function> and setting up for
> +       making a base backup.
> 
> How about "taking" instead of "making" in the above sentence?

Fixed. Attached is the updated version of the patch.

> 
> -  <varlistentry>
> +  <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP">
> 
> I don't see any new text in the documentation patch that uses above
> xref, so no need to define it?

The following description that I added uses this.

     certain commands during command execution.  Currently, the only commands
     which support progress reporting are <command>ANALYZE</command>,
     <command>CLUSTER</command>,
-   <command>CREATE INDEX</command>, and <command>VACUUM</command>.
+   <command>CREATE INDEX</command>, <command>VACUUM</command>,
+   and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
+   command that <xref linkend="app-pgbasebackup"/> issues to take
+   a base backup).

Regards,


-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Mon, Feb 3, 2020 at 11:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> So we now have the following ideas about the phase names for pg_basebackup.
>
> 1.
> initializing
>
> 2.
> 2-1. starting backup
> 2-2. starting file transfer
> 2-3. performing pg_start_backup
> 2-4. performing checkpoint
> 2-5. waiting for [ backup start ] checkpoint to finish
>
> 3.
> 3-1. streaming backup
> 3-2. transferring database files
> 3-3. streaming database files
> 3-4. transferring files
>
> 4.
> 4-1. stopping backup
> 4-2. finishing file transfer
> 4-3. performing pg_stop_backup
> 4-4. finishing backup
> 4-5. waiting for WAL archiving to finish
> 4-6. performing WAL archive
>
> 5.
> 1. transferring wal
> 2. transferring WAL files
>
> What conbination of these do you prefer?

I like:

1. initializing
2-5 waiting for backup start checkpoint to finish
3-3 streaming database files
4-5 waiting for wal archiving to finish
5-1 transferring wal (or streaming wal)

> > -  <varlistentry>
> > +  <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP">
> >
> > I don't see any new text in the documentation patch that uses above
> > xref, so no need to define it?
>
> The following description that I added uses this.
>
>      certain commands during command execution.  Currently, the only commands
>      which support progress reporting are <command>ANALYZE</command>,
>      <command>CLUSTER</command>,
> -   <command>CREATE INDEX</command>, and <command>VACUUM</command>.
> +   <command>CREATE INDEX</command>, <command>VACUUM</command>,
> +   and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
> +   command that <xref linkend="app-pgbasebackup"/> issues to take
> +   a base backup).

Sorry, I missed that.  I was mistakenly expecting a different value of linkend.

Some comments on v3:

+     <entry>Process ID of a WAL sender process.</entry>

"a" sounds redundant.  Maybe:

of this WAL sender process or
of WAL sender process

Reading this:

+     <entry><structfield>backup_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+      Total amount of data that will be streamed. If progress reporting
+      is not enabled in <application>pg_basebackup</application>
+      (i.e., <literal>--progress</literal> option is not specified),
+      this is <literal>0</literal>.

I wonder how hard would it be to change basebackup.c to always set
backup_total, irrespective of whether --progress is specified with
pg_basebackup or not?  It seems quite misleading to leave it set to 0,
because one may panic that they have lost their data, that is, if they
haven't first read the documentation.

Thanks,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Tue, Feb 4, 2020 at 10:34 AM Amit Langote <amitlangote09@gmail.com> wrote:
> Reading this:
>
> +     <entry><structfield>backup_total</structfield></entry>
> +     <entry><type>bigint</type></entry>
> +     <entry>
> +      Total amount of data that will be streamed. If progress reporting
> +      is not enabled in <application>pg_basebackup</application>
> +      (i.e., <literal>--progress</literal> option is not specified),
> +      this is <literal>0</literal>.
>
> I wonder how hard would it be to change basebackup.c to always set
> backup_total, irrespective of whether --progress is specified with
> pg_basebackup or not?  It seems quite misleading to leave it set to 0,
> because one may panic that they have lost their data, that is, if they
> haven't first read the documentation.

For example, the attached patch changes basebackup.c to always set
tablespaceinfo.size, irrespective of whether --progress was passed
with BASE_BACKUP command.  It passes make check-world, so maybe safe.
Maybe it would be a good idea to add a couple of more comments around
tablespaceinfo struct definition, such as how 'size' is to be
interpreted.

Thanks,
Amit

Вложения

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/02/04 10:34, Amit Langote wrote:
> On Mon, Feb 3, 2020 at 11:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> So we now have the following ideas about the phase names for pg_basebackup.
>>
>> 1.
>> initializing
>>
>> 2.
>> 2-1. starting backup
>> 2-2. starting file transfer
>> 2-3. performing pg_start_backup
>> 2-4. performing checkpoint
>> 2-5. waiting for [ backup start ] checkpoint to finish
>>
>> 3.
>> 3-1. streaming backup
>> 3-2. transferring database files
>> 3-3. streaming database files
>> 3-4. transferring files
>>
>> 4.
>> 4-1. stopping backup
>> 4-2. finishing file transfer
>> 4-3. performing pg_stop_backup
>> 4-4. finishing backup
>> 4-5. waiting for WAL archiving to finish
>> 4-6. performing WAL archive
>>
>> 5.
>> 1. transferring wal
>> 2. transferring WAL files
>>
>> What conbination of these do you prefer?
> 
> I like:

Thanks for reviewing the patch!

> 1. initializing
> 2-5 waiting for backup start checkpoint to finish

Can we shorten this to "waiting for checkpoint"? IMO the simpler
phase name is better and "to finish" sounds a bit redundant. Also
in the description of pg_stat_progress_create_index, basically
"waiting for xxx" is used.

> 3-3 streaming database files
> 4-5 waiting for wal archiving to finish

Can we shorten this to "waiting for wal archiving" because of
the same reason as above?

> 5-1 transferring wal (or streaming wal)

IMO "transferring wal" is better because this phase happens only when
"--wal-method=fetch" is specified in pg_basebackup. "streaming wal"
seems to implie "--wal-method=stream", instead.

>>> -  <varlistentry>
>>> +  <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP">
>>>
>>> I don't see any new text in the documentation patch that uses above
>>> xref, so no need to define it?
>>
>> The following description that I added uses this.
>>
>>       certain commands during command execution.  Currently, the only commands
>>       which support progress reporting are <command>ANALYZE</command>,
>>       <command>CLUSTER</command>,
>> -   <command>CREATE INDEX</command>, and <command>VACUUM</command>.
>> +   <command>CREATE INDEX</command>, <command>VACUUM</command>,
>> +   and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
>> +   command that <xref linkend="app-pgbasebackup"/> issues to take
>> +   a base backup).
> 
> Sorry, I missed that.  I was mistakenly expecting a different value of linkend.
> 
> Some comments on v3:
> 
> +     <entry>Process ID of a WAL sender process.</entry>
> 
> "a" sounds redundant.  Maybe:
> 
> of this WAL sender process or
> of WAL sender process

I borrowed "Process ID of a WAL sender process" from the description
of pg_stat_replication.pid. But if it's better to get rid of "a",
I'm happy to do that!

> Reading this:
> 
> +     <entry><structfield>backup_total</structfield></entry>
> +     <entry><type>bigint</type></entry>
> +     <entry>
> +      Total amount of data that will be streamed. If progress reporting
> +      is not enabled in <application>pg_basebackup</application>
> +      (i.e., <literal>--progress</literal> option is not specified),
> +      this is <literal>0</literal>.
> 
> I wonder how hard would it be to change basebackup.c to always set
> backup_total, irrespective of whether --progress is specified with
> pg_basebackup or not?  It seems quite misleading to leave it set to 0,
> because one may panic that they have lost their data, that is, if they
> haven't first read the documentation.

Yeah, I understand your concern. The pg_basebackup document explains
the risk when --progress is specified, as follows. Since I imagined that
someone may explicitly disable --progress to avoid this risk, I made
the server estimate the total size only when --progress is specified.
But you think that this overhead by --progress is negligibly small?

--------------------
When this is enabled, the backup will start by enumerating the size of
the entire database, and then go back and send the actual contents.
This may make the backup take slightly longer, and in particular it will
take longer before the first data is sent.
--------------------

If we really can always estimate the total size, whether --progress is
specified or not, we should get rid of PROGRESS option from BASE_BACKUP
replication command because it will no longer be necessary, I think.

Regards,  

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2020/02/04 10:34, Amit Langote wrote:
> > I like:
>
> Thanks for reviewing the patch!
>
> > 1. initializing
> > 2-5 waiting for backup start checkpoint to finish
>
> Can we shorten this to "waiting for checkpoint"? IMO the simpler
> phase name is better and "to finish" sounds a bit redundant. Also
> in the description of pg_stat_progress_create_index, basically
> "waiting for xxx" is used.

"waiting for checkpoint" works for me.

> > 3-3 streaming database files
> > 4-5 waiting for wal archiving to finish
>
> Can we shorten this to "waiting for wal archiving" because of
> the same reason as above?

Yes.

> > 5-1 transferring wal (or streaming wal)
>
> IMO "transferring wal" is better because this phase happens only when
> "--wal-method=fetch" is specified in pg_basebackup. "streaming wal"
> seems to implie "--wal-method=stream", instead.

Ah, okay,

> > Reading this:
> >
> > +     <entry><structfield>backup_total</structfield></entry>
> > +     <entry><type>bigint</type></entry>
> > +     <entry>
> > +      Total amount of data that will be streamed. If progress reporting
> > +      is not enabled in <application>pg_basebackup</application>
> > +      (i.e., <literal>--progress</literal> option is not specified),
> > +      this is <literal>0</literal>.
> >
> > I wonder how hard would it be to change basebackup.c to always set
> > backup_total, irrespective of whether --progress is specified with
> > pg_basebackup or not?  It seems quite misleading to leave it set to 0,
> > because one may panic that they have lost their data, that is, if they
> > haven't first read the documentation.
>
> Yeah, I understand your concern. The pg_basebackup document explains
> the risk when --progress is specified, as follows. Since I imagined that
> someone may explicitly disable --progress to avoid this risk, I made
> the server estimate the total size only when --progress is specified.
> But you think that this overhead by --progress is negligibly small?
>
> --------------------
> When this is enabled, the backup will start by enumerating the size of
> the entire database, and then go back and send the actual contents.
> This may make the backup take slightly longer, and in particular it will
> take longer before the first data is sent.
> --------------------

Sorry, I hadn't read this before.  So, my proposal would make this a lie.

Still, if "streaming database files" is the longest phase, then not
having even an approximation of how much data is to be streamed over
doesn't much help estimating progress,  at least as long as one only
has this view to look at.

That said, the overhead of checking the size before sending any data
may be worse for some people than others, so having the option to
avoid that might be good after all.

Thanks,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Kyotaro Horiguchi
Дата:
At Wed, 5 Feb 2020 16:29:54 +0900, Amit Langote <amitlangote09@gmail.com> wrote in 
> On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > On 2020/02/04 10:34, Amit Langote wrote:
> > > I like:
> >
> > Thanks for reviewing the patch!
> >
> > > 1. initializing
> > > 2-5 waiting for backup start checkpoint to finish
> >
> > Can we shorten this to "waiting for checkpoint"? IMO the simpler
> > phase name is better and "to finish" sounds a bit redundant. Also
> > in the description of pg_stat_progress_create_index, basically
> > "waiting for xxx" is used.
> 
> "waiting for checkpoint" works for me.

I'm not sure, but doesn't that mean "waiting for a checkpoint to
start"?  Sorry in advance if that is not the case.

> > > 3-3 streaming database files
> > > 4-5 waiting for wal archiving to finish
> >
> > Can we shorten this to "waiting for wal archiving" because of
> > the same reason as above?
> 
> Yes.
> 
> > > 5-1 transferring wal (or streaming wal)
> >
> > IMO "transferring wal" is better because this phase happens only when
> > "--wal-method=fetch" is specified in pg_basebackup. "streaming wal"
> > seems to implie "--wal-method=stream", instead.
> 
> Ah, okay,
> 
> > > Reading this:
> > >
> > > +     <entry><structfield>backup_total</structfield></entry>
> > > +     <entry><type>bigint</type></entry>
> > > +     <entry>
> > > +      Total amount of data that will be streamed. If progress reporting
> > > +      is not enabled in <application>pg_basebackup</application>
> > > +      (i.e., <literal>--progress</literal> option is not specified),
> > > +      this is <literal>0</literal>.
> > >
> > > I wonder how hard would it be to change basebackup.c to always set
> > > backup_total, irrespective of whether --progress is specified with
> > > pg_basebackup or not?  It seems quite misleading to leave it set to 0,
> > > because one may panic that they have lost their data, that is, if they
> > > haven't first read the documentation.
> >
> > Yeah, I understand your concern. The pg_basebackup document explains
> > the risk when --progress is specified, as follows. Since I imagined that
> > someone may explicitly disable --progress to avoid this risk, I made
> > the server estimate the total size only when --progress is specified.
> > But you think that this overhead by --progress is negligibly small?
> >
> > --------------------
> > When this is enabled, the backup will start by enumerating the size of
> > the entire database, and then go back and send the actual contents.
> > This may make the backup take slightly longer, and in particular it will
> > take longer before the first data is sent.
> > --------------------
> 
> Sorry, I hadn't read this before.  So, my proposal would make this a lie.
> 
> Still, if "streaming database files" is the longest phase, then not
> having even an approximation of how much data is to be streamed over
> doesn't much help estimating progress,  at least as long as one only
> has this view to look at.
> 
> That said, the overhead of checking the size before sending any data
> may be worse for some people than others, so having the option to
> avoid that might be good after all.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Wed, Feb 5, 2020 at 5:32 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> At Wed, 5 Feb 2020 16:29:54 +0900, Amit Langote <amitlangote09@gmail.com> wrote in
> > On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > > On 2020/02/04 10:34, Amit Langote wrote:
> > > > I like:
> > >
> > > Thanks for reviewing the patch!
> > >
> > > > 1. initializing
> > > > 2-5 waiting for backup start checkpoint to finish
> > >
> > > Can we shorten this to "waiting for checkpoint"? IMO the simpler
> > > phase name is better and "to finish" sounds a bit redundant. Also
> > > in the description of pg_stat_progress_create_index, basically
> > > "waiting for xxx" is used.
> >
> > "waiting for checkpoint" works for me.
>
> I'm not sure, but doesn't that mean "waiting for a checkpoint to
> start"?  Sorry in advance if that is not the case.

No, I really meant "to finish".  As Sawada-san said upthread, we
should really use text that describes the activity that usually takes
long.  While it takes takes only a moment to actually start the
checkpoint, it might take long for it to finish.  As Fujii-san says
though we don't need the noise words "to finish".

Thanks,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Wed, Feb 5, 2020 at 18:25 Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Feb 5, 2020 at 6:15 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> 2020年2月5日(水) 17:54 Amit Langote <amitlangote09@gmail.com>:
>>
>> > I'm not sure, but doesn't that mean "waiting for a checkpoint to
>> > start"?  Sorry in advance if that is not the case.
>>
>> No, I really meant "to finish".  As Sawada-san said upthread, we
>> should really use text that describes the activity that usually takes
>> long.  While it takes takes only a moment to actually start the
>> checkpoint, it might take long for it to finish.
>
> I meant that the wording might sound as if it implies "to start", but..

Ah, I misunderstood then, sorry.

So, maybe you're saying that "waiting for checkpoint" is ambiguous and
most people will assume it means "...to start".  As for me, I assume
it ends with "...to finish".

>> As Fujii-san says
>> though we don't need the noise words "to finish".
>
> Understood, sorry for my noise.

Actually, that's an important point to consider and we should strive
to use words that are unambiguous.

Last two messages weren’t sent to the list.

Thanks,
Amit

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Kyotaro Horiguchi
Дата:
At Wed, 5 Feb 2020 18:53:19 +0900, Amit Langote <amitlangote09@gmail.com> wrote in 
> Last two messages weren’t sent to the list.

Oops! Sorry, I made a mistake sending the mail.

> On Wed, Feb 5, 2020 at 18:25 Amit Langote <amitlangote09@gmail.com> wrote:
> 
> > On Wed, Feb 5, 2020 at 6:15 PM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > > 2020年2月5日(水) 17:54 Amit Langote <amitlangote09@gmail.com>:
> > >>
> > So, maybe you're saying that "waiting for checkpoint" is ambiguous and
> > most people will assume it means "...to start".  As for me, I assume
> > it ends with "...to finish".

I'm not sure "most peple will assume" or not, so I said "I'm not
sure".  For example, I feel strangeness to use "I'm waiting for Amit"
to express that I'm waiting Amit to leave there.  That phrase gives me
such kind of uneasiness.

I thought of "establishing checkpoint" or "running a checkpoint" as
other candidates.

> > >> As Fujii-san says
> > >> though we don't need the noise words "to finish".
> > >
> > > Understood, sorry for my noise.
> >
> > Actually, that's an important point to consider and we should strive
> > to use words that are unambiguous.

I think it's not ambiguous if knowing what happens during backup so my
concern was not unambiguity, but that it might give feeling of
strangeness that that sentense appears in that context.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Thu, Feb 6, 2020 at 9:51 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> > On Wed, Feb 5, 2020 at 18:25 Amit Langote <amitlangote09@gmail.com> wrote:
> > > So, maybe you're saying that "waiting for checkpoint" is ambiguous and
> > > most people will assume it means "...to start".  As for me, I assume
> > > it ends with "...to finish".
>
> I'm not sure "most peple will assume" or not, so I said "I'm not
> sure".  For example, I feel strangeness to use "I'm waiting for Amit"
> to express that I'm waiting Amit to leave there.  That phrase gives me
> such kind of uneasiness.
>
> I thought of "establishing checkpoint" or "running a checkpoint" as
> other candidates.

Okay, I understand.  I am fine with "running checkpoint", although I
think "waiting for checkpoint" isn't totally wrong either.

Thanks,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Wed, Feb 5, 2020 at 4:29 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > Yeah, I understand your concern. The pg_basebackup document explains
> > the risk when --progress is specified, as follows. Since I imagined that
> > someone may explicitly disable --progress to avoid this risk, I made
> > the server estimate the total size only when --progress is specified.
> > But you think that this overhead by --progress is negligibly small?
> >
> > --------------------
> > When this is enabled, the backup will start by enumerating the size of
> > the entire database, and then go back and send the actual contents.
> > This may make the backup take slightly longer, and in particular it will
> > take longer before the first data is sent.
> > --------------------
>
> Sorry, I hadn't read this before.  So, my proposal would make this a lie.
>
> Still, if "streaming database files" is the longest phase, then not
> having even an approximation of how much data is to be streamed over
> doesn't much help estimating progress,  at least as long as one only
> has this view to look at.
>
> That said, the overhead of checking the size before sending any data
> may be worse for some people than others, so having the option to
> avoid that might be good after all.

By the way, if calculating backup total size can take significantly
long in some cases, that is when requested by specifying --progress,
then it might be a good idea to define a separate phase for that, like
"estimating backup size" or some such.  Currently, it's part of
"starting backup", which covers both running the checkpoint and size
estimation which run one after another.

I suspect people might never get stuck on "estimating backup size" as
they might on "running checkpoint", which perhaps only strengthens the
case for *always* calculating the size before sending the backup
header.

Thanks,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/02/06 11:07, Amit Langote wrote:
> On Thu, Feb 6, 2020 at 9:51 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
>>> On Wed, Feb 5, 2020 at 18:25 Amit Langote <amitlangote09@gmail.com> wrote:
>>>> So, maybe you're saying that "waiting for checkpoint" is ambiguous and
>>>> most people will assume it means "...to start".  As for me, I assume
>>>> it ends with "...to finish".
>>
>> I'm not sure "most peple will assume" or not, so I said "I'm not
>> sure".  For example, I feel strangeness to use "I'm waiting for Amit"
>> to express that I'm waiting Amit to leave there.  That phrase gives me
>> such kind of uneasiness.
>>
>> I thought of "establishing checkpoint" or "running a checkpoint" as
>> other candidates.
> 
> Okay, I understand.  I am fine with "running checkpoint", although I
> think "waiting for checkpoint" isn't totally wrong either.

Yeah, but if "waiting for XXX" sounds a bit confusing to some people,
I'm OK to back to "waiting for XXX to finish" that you originally
proposed.

Attached the updated version of the patch. This patch uses the following
descriptions of the phases.

   waiting for checkpoint to finish
   estimating backup size
   streaming database files
   waiting for wal archiving to finish
   transferring wal files

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/02/06 11:35, Amit Langote wrote:
> On Wed, Feb 5, 2020 at 4:29 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> Yeah, I understand your concern. The pg_basebackup document explains
>>> the risk when --progress is specified, as follows. Since I imagined that
>>> someone may explicitly disable --progress to avoid this risk, I made
>>> the server estimate the total size only when --progress is specified.
>>> But you think that this overhead by --progress is negligibly small?
>>>
>>> --------------------
>>> When this is enabled, the backup will start by enumerating the size of
>>> the entire database, and then go back and send the actual contents.
>>> This may make the backup take slightly longer, and in particular it will
>>> take longer before the first data is sent.
>>> --------------------
>>
>> Sorry, I hadn't read this before.  So, my proposal would make this a lie.
>>
>> Still, if "streaming database files" is the longest phase, then not
>> having even an approximation of how much data is to be streamed over
>> doesn't much help estimating progress,  at least as long as one only
>> has this view to look at.
>>
>> That said, the overhead of checking the size before sending any data
>> may be worse for some people than others, so having the option to
>> avoid that might be good after all.
> 
> By the way, if calculating backup total size can take significantly
> long in some cases, that is when requested by specifying --progress,
> then it might be a good idea to define a separate phase for that, like
> "estimating backup size" or some such.  Currently, it's part of
> "starting backup", which covers both running the checkpoint and size
> estimation which run one after another.

OK, I added this phase in the latest patch that I posted upthread.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Mon, Feb 17, 2020 at 10:00 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> On 2020/02/06 11:07, Amit Langote wrote:
> > On Thu, Feb 6, 2020 at 9:51 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> >> I thought of "establishing checkpoint" or "running a checkpoint" as
> >> other candidates.
> >
> > Okay, I understand.  I am fine with "running checkpoint", although I
> > think "waiting for checkpoint" isn't totally wrong either.
>
> Yeah, but if "waiting for XXX" sounds a bit confusing to some people,
> I'm OK to back to "waiting for XXX to finish" that you originally
> proposed.
>
> Attached the updated version of the patch. This patch uses the following
> descriptions of the phases.
>
>    waiting for checkpoint to finish
>    estimating backup size
>    streaming database files
>    waiting for wal archiving to finish
>    transferring wal files

Thanks for the new patch.

I noticed that there is missing </para> tag in the documentation changes:

+     <row>
+      <entry><literal>waiting for checkpoint to finish</literal></entry>
+      <entry>
+       The WAL sender process is currently performing
+       <function>pg_start_backup</function> to set up for
+       taking a base backup, and waiting for backup start
+       checkpoint to finish.
+      </entry>
+     <row>

There should be a </row> between </entry> and <row> at the end of the
hunk shown above.

Sorry for not saying it before, but the following text needs revisiting:

+  <para>
+   Whenever <application>pg_basebackup</application> is taking a base
+   backup, the <structname>pg_stat_progress_basebackup</structname>
+   view will contain a row for each WAL sender process that is currently
+   running <command>BASE_BACKUP</command> replication command
+   and streaming the backup.

I understand that you wrote "Whenever pg_basebackup is taking a
backup...", because description of other views contains a similar
starting line.  But, it may not only be pg_basebackup that would be
served by this view, no?  It could be any tool that speaks Postgres'
replication protocol and thus be able to send a BASE_BACKUP command.
If that is correct, I would write something like "When an application
is taking a backup" or some such without specific reference to
pg_basebackup.  Thoughts?

Thanks,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/02/18 16:02, Amit Langote wrote:
> On Mon, Feb 17, 2020 at 10:00 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> On 2020/02/06 11:07, Amit Langote wrote:
>>> On Thu, Feb 6, 2020 at 9:51 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>>>> I thought of "establishing checkpoint" or "running a checkpoint" as
>>>> other candidates.
>>>
>>> Okay, I understand.  I am fine with "running checkpoint", although I
>>> think "waiting for checkpoint" isn't totally wrong either.
>>
>> Yeah, but if "waiting for XXX" sounds a bit confusing to some people,
>> I'm OK to back to "waiting for XXX to finish" that you originally
>> proposed.
>>
>> Attached the updated version of the patch. This patch uses the following
>> descriptions of the phases.
>>
>>     waiting for checkpoint to finish
>>     estimating backup size
>>     streaming database files
>>     waiting for wal archiving to finish
>>     transferring wal files
> 
> Thanks for the new patch.

Thanks for reviewing the patch!

> I noticed that there is missing </para> tag in the documentation changes:

Could you tell me where I should add </para> tag?

> +     <row>
> +      <entry><literal>waiting for checkpoint to finish</literal></entry>
> +      <entry>
> +       The WAL sender process is currently performing
> +       <function>pg_start_backup</function> to set up for
> +       taking a base backup, and waiting for backup start
> +       checkpoint to finish.
> +      </entry>
> +     <row>
> 
> There should be a </row> between </entry> and <row> at the end of the
> hunk shown above.

Will fix. Thanks!

> Sorry for not saying it before, but the following text needs revisiting:

Of course, no problem. I'm happy to improve the patch!

> +  <para>
> +   Whenever <application>pg_basebackup</application> is taking a base
> +   backup, the <structname>pg_stat_progress_basebackup</structname>
> +   view will contain a row for each WAL sender process that is currently
> +   running <command>BASE_BACKUP</command> replication command
> +   and streaming the backup.
> 
> I understand that you wrote "Whenever pg_basebackup is taking a
> backup...", because description of other views contains a similar
> starting line.  But, it may not only be pg_basebackup that would be
> served by this view, no?  It could be any tool that speaks Postgres'
> replication protocol and thus be able to send a BASE_BACKUP command.
> If that is correct, I would write something like "When an application
> is taking a backup" or some such without specific reference to
> pg_basebackup.  Thoughts?

Yeah, there may be some such applications. But most users would
use pg_basebackup, so getting rid of the reference to pg_basebackup
would make the description a bit difficult-to-read. Also I can imagine
that an user of those backup applications would get to know
the progress reporting view from their documents. So I prefer
the existing one or something like "Whenever an application like
  pg_basebackup ...". Thought?

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Tue, Feb 18, 2020 at 4:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2020/02/18 16:02, Amit Langote wrote:
> > I noticed that there is missing </para> tag in the documentation changes:
>
> Could you tell me where I should add </para> tag?
>
> > +     <row>
> > +      <entry><literal>waiting for checkpoint to finish</literal></entry>
> > +      <entry>
> > +       The WAL sender process is currently performing
> > +       <function>pg_start_backup</function> to set up for
> > +       taking a base backup, and waiting for backup start
> > +       checkpoint to finish.
> > +      </entry>
> > +     <row>
> >
> > There should be a </row> between </entry> and <row> at the end of the
> > hunk shown above.
>
> Will fix. Thanks!

Just to clarify, that's the missing </para> tag I am talking about above.

> > +  <para>
> > +   Whenever <application>pg_basebackup</application> is taking a base
> > +   backup, the <structname>pg_stat_progress_basebackup</structname>
> > +   view will contain a row for each WAL sender process that is currently
> > +   running <command>BASE_BACKUP</command> replication command
> > +   and streaming the backup.
> >
> > I understand that you wrote "Whenever pg_basebackup is taking a
> > backup...", because description of other views contains a similar
> > starting line.  But, it may not only be pg_basebackup that would be
> > served by this view, no?  It could be any tool that speaks Postgres'
> > replication protocol and thus be able to send a BASE_BACKUP command.
> > If that is correct, I would write something like "When an application
> > is taking a backup" or some such without specific reference to
> > pg_basebackup.  Thoughts?
>
> Yeah, there may be some such applications. But most users would
> use pg_basebackup, so getting rid of the reference to pg_basebackup
> would make the description a bit difficult-to-read. Also I can imagine
> that an user of those backup applications would get to know
> the progress reporting view from their documents. So I prefer
> the existing one or something like "Whenever an application like
>   pg_basebackup ...". Thought?

Sure, "an application like pg_basebackup" sounds fine to me.

Thanks,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/02/18 16:53, Amit Langote wrote:
> On Tue, Feb 18, 2020 at 4:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> On 2020/02/18 16:02, Amit Langote wrote:
>>> I noticed that there is missing </para> tag in the documentation changes:
>>
>> Could you tell me where I should add </para> tag?
>>
>>> +     <row>
>>> +      <entry><literal>waiting for checkpoint to finish</literal></entry>
>>> +      <entry>
>>> +       The WAL sender process is currently performing
>>> +       <function>pg_start_backup</function> to set up for
>>> +       taking a base backup, and waiting for backup start
>>> +       checkpoint to finish.
>>> +      </entry>
>>> +     <row>
>>>
>>> There should be a </row> between </entry> and <row> at the end of the
>>> hunk shown above.
>>
>> Will fix. Thanks!
> 
> Just to clarify, that's the missing </para> tag I am talking about above.

OK, so I added </row> tag just after the above </entry>.

>>> +  <para>
>>> +   Whenever <application>pg_basebackup</application> is taking a base
>>> +   backup, the <structname>pg_stat_progress_basebackup</structname>
>>> +   view will contain a row for each WAL sender process that is currently
>>> +   running <command>BASE_BACKUP</command> replication command
>>> +   and streaming the backup.
>>>
>>> I understand that you wrote "Whenever pg_basebackup is taking a
>>> backup...", because description of other views contains a similar
>>> starting line.  But, it may not only be pg_basebackup that would be
>>> served by this view, no?  It could be any tool that speaks Postgres'
>>> replication protocol and thus be able to send a BASE_BACKUP command.
>>> If that is correct, I would write something like "When an application
>>> is taking a backup" or some such without specific reference to
>>> pg_basebackup.  Thoughts?
>>
>> Yeah, there may be some such applications. But most users would
>> use pg_basebackup, so getting rid of the reference to pg_basebackup
>> would make the description a bit difficult-to-read. Also I can imagine
>> that an user of those backup applications would get to know
>> the progress reporting view from their documents. So I prefer
>> the existing one or something like "Whenever an application like
>>    pg_basebackup ...". Thought?
> 
> Sure, "an application like pg_basebackup" sounds fine to me.

OK, I changed the doc that way. Attached the updated version of the patch.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Tue, Feb 18, 2020 at 9:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> OK, I changed the doc that way. Attached the updated version of the patch.

Thank you.  Looks good to me.

Regards,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/02/19 11:22, Amit Langote wrote:
> On Tue, Feb 18, 2020 at 9:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> OK, I changed the doc that way. Attached the updated version of the patch.
> 
> Thank you.  Looks good to me.

Thanks for the review!
You think that the patch can be marked as "ready for committer"?

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Wed, Feb 19, 2020 at 9:49 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2020/02/19 11:22, Amit Langote wrote:
> > On Tue, Feb 18, 2020 at 9:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >> OK, I changed the doc that way. Attached the updated version of the patch.
> >
> > Thank you.  Looks good to me.
>
> Thanks for the review!
> You think that the patch can be marked as "ready for committer"?

As far as I am concerned, yes.  :)

Thanks,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/02/18 21:31, Fujii Masao wrote:
> 
> 
> On 2020/02/18 16:53, Amit Langote wrote:
>> On Tue, Feb 18, 2020 at 4:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> On 2020/02/18 16:02, Amit Langote wrote:
>>>> I noticed that there is missing </para> tag in the documentation changes:
>>>
>>> Could you tell me where I should add </para> tag?
>>>
>>>> +     <row>
>>>> +      <entry><literal>waiting for checkpoint to finish</literal></entry>
>>>> +      <entry>
>>>> +       The WAL sender process is currently performing
>>>> +       <function>pg_start_backup</function> to set up for
>>>> +       taking a base backup, and waiting for backup start
>>>> +       checkpoint to finish.
>>>> +      </entry>
>>>> +     <row>
>>>>
>>>> There should be a </row> between </entry> and <row> at the end of the
>>>> hunk shown above.
>>>
>>> Will fix. Thanks!
>>
>> Just to clarify, that's the missing </para> tag I am talking about above.
> 
> OK, so I added </row> tag just after the above </entry>.
> 
>>>> +  <para>
>>>> +   Whenever <application>pg_basebackup</application> is taking a base
>>>> +   backup, the <structname>pg_stat_progress_basebackup</structname>
>>>> +   view will contain a row for each WAL sender process that is currently
>>>> +   running <command>BASE_BACKUP</command> replication command
>>>> +   and streaming the backup.
>>>>
>>>> I understand that you wrote "Whenever pg_basebackup is taking a
>>>> backup...", because description of other views contains a similar
>>>> starting line.  But, it may not only be pg_basebackup that would be
>>>> served by this view, no?  It could be any tool that speaks Postgres'
>>>> replication protocol and thus be able to send a BASE_BACKUP command.
>>>> If that is correct, I would write something like "When an application
>>>> is taking a backup" or some such without specific reference to
>>>> pg_basebackup.  Thoughts?
>>>
>>> Yeah, there may be some such applications. But most users would
>>> use pg_basebackup, so getting rid of the reference to pg_basebackup
>>> would make the description a bit difficult-to-read. Also I can imagine
>>> that an user of those backup applications would get to know
>>> the progress reporting view from their documents. So I prefer
>>> the existing one or something like "Whenever an application like
>>>    pg_basebackup ...". Thought?
>>
>> Sure, "an application like pg_basebackup" sounds fine to me.
> 
> OK, I changed the doc that way. Attached the updated version of the patch.

Attached is the updated version of the patch.

The previous patch used only pgstat_progress_update_param()
even when updating multiple values. Since those updates are
not atomic, this can cause readers of the values to see
the intermediate states. To avoid this issue, the latest patch
uses pgstat_progress_update_multi_param(), instead.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/02/26 23:18, Fujii Masao wrote:
> 
> 
> On 2020/02/18 21:31, Fujii Masao wrote:
>>
>>
>> On 2020/02/18 16:53, Amit Langote wrote:
>>> On Tue, Feb 18, 2020 at 4:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>> On 2020/02/18 16:02, Amit Langote wrote:
>>>>> I noticed that there is missing </para> tag in the documentation changes:
>>>>
>>>> Could you tell me where I should add </para> tag?
>>>>
>>>>> +     <row>
>>>>> +      <entry><literal>waiting for checkpoint to finish</literal></entry>
>>>>> +      <entry>
>>>>> +       The WAL sender process is currently performing
>>>>> +       <function>pg_start_backup</function> to set up for
>>>>> +       taking a base backup, and waiting for backup start
>>>>> +       checkpoint to finish.
>>>>> +      </entry>
>>>>> +     <row>
>>>>>
>>>>> There should be a </row> between </entry> and <row> at the end of the
>>>>> hunk shown above.
>>>>
>>>> Will fix. Thanks!
>>>
>>> Just to clarify, that's the missing </para> tag I am talking about above.
>>
>> OK, so I added </row> tag just after the above </entry>.
>>
>>>>> +  <para>
>>>>> +   Whenever <application>pg_basebackup</application> is taking a base
>>>>> +   backup, the <structname>pg_stat_progress_basebackup</structname>
>>>>> +   view will contain a row for each WAL sender process that is currently
>>>>> +   running <command>BASE_BACKUP</command> replication command
>>>>> +   and streaming the backup.
>>>>>
>>>>> I understand that you wrote "Whenever pg_basebackup is taking a
>>>>> backup...", because description of other views contains a similar
>>>>> starting line.  But, it may not only be pg_basebackup that would be
>>>>> served by this view, no?  It could be any tool that speaks Postgres'
>>>>> replication protocol and thus be able to send a BASE_BACKUP command.
>>>>> If that is correct, I would write something like "When an application
>>>>> is taking a backup" or some such without specific reference to
>>>>> pg_basebackup.  Thoughts?
>>>>
>>>> Yeah, there may be some such applications. But most users would
>>>> use pg_basebackup, so getting rid of the reference to pg_basebackup
>>>> would make the description a bit difficult-to-read. Also I can imagine
>>>> that an user of those backup applications would get to know
>>>> the progress reporting view from their documents. So I prefer
>>>> the existing one or something like "Whenever an application like
>>>>    pg_basebackup ...". Thought?
>>>
>>> Sure, "an application like pg_basebackup" sounds fine to me.
>>
>> OK, I changed the doc that way. Attached the updated version of the patch.
> 
> Attached is the updated version of the patch.
> 
> The previous patch used only pgstat_progress_update_param()
> even when updating multiple values. Since those updates are
> not atomic, this can cause readers of the values to see
> the intermediate states. To avoid this issue, the latest patch
> uses pgstat_progress_update_multi_param(), instead.

Attached the updated version of the patch.
Barring any objections, I plan to commit this patch.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

От
Sergei Kornilov
Дата:
Hello

I reviewed a recently published patch. Looks good for me.
One small note: the values ​​for the new definitions in progress.h seems not to be aligned vertically. However,
pgindentdoesn't objects.
 

regards, Sergei



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Kyotaro Horiguchi
Дата:
At Mon, 2 Mar 2020 17:29:30 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> > Attached is the updated version of the patch.
> > The previous patch used only pgstat_progress_update_param()
> > even when updating multiple values. Since those updates are
> > not atomic, this can cause readers of the values to see
> > the intermediate states. To avoid this issue, the latest patch
> > uses pgstat_progress_update_multi_param(), instead.
> 
> Attached the updated version of the patch.
> Barring any objections, I plan to commit this patch.

It is working as designed and the status names are fine to me.

The last one comment from me.
The newly defined symbols have inconsistent indents.

===
#define PROGRESS_BASEBACKUP_PHASE                        0
#define PROGRESS_BASEBACKUP_BACKUP_TOTAL            1
#define PROGRESS_BASEBACKUP_BACKUP_STREAMED            2
#define PROGRESS_BASEBACKUP_TBLSPC_TOTAL                3
#define PROGRESS_BASEBACKUP_TBLSPC_STREAMED            4

/* Phases of pg_basebackup (as advertised via PROGRESS_BASEBACKUP_PHASE) */
#define PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT        1
#define PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE        2
#define PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP        3
#define PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE        4
#define PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL        5
====

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/02 19:27, Sergei Kornilov wrote:
> Hello
> 
> I reviewed a recently published patch. Looks good for me.

Thanks for the review! I pushed the patch.

> One small note: the values ​​for the new definitions in progress.h seems not to be aligned vertically. However,
pgindentdoesn't objects.
 

Yes, I fixed that.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/03 9:27, Kyotaro Horiguchi wrote:
> At Mon, 2 Mar 2020 17:29:30 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>> Attached is the updated version of the patch.
>>> The previous patch used only pgstat_progress_update_param()
>>> even when updating multiple values. Since those updates are
>>> not atomic, this can cause readers of the values to see
>>> the intermediate states. To avoid this issue, the latest patch
>>> uses pgstat_progress_update_multi_param(), instead.
>>
>> Attached the updated version of the patch.
>> Barring any objections, I plan to commit this patch.
> 
> It is working as designed and the status names are fine to me.

Thanks for the review! I pushed the patch.

> The last one comment from me.
> The newly defined symbols have inconsistent indents.

Yes, I fixed that.

Regards,


-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



RE: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
"Shinoda, Noriyoshi (PN Japan A&PS Delivery)"
Дата:
Hi, 

Thank you for developing good features.
The attached patch is a small fix to the committed documentation. This patch fixes the description literal for the
backup_streamedcolumn.
 

Regards,
Noriyoshi Shinoda

-----Original Message-----
From: Fujii Masao [mailto:masao.fujii@oss.nttdata.com] 
Sent: Tuesday, March 3, 2020 12:09 PM
To: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Cc: amitlangote09@gmail.com; masahiko.sawada@2ndquadrant.com; pgsql-hackers@postgresql.org
Subject: Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side



On 2020/03/03 9:27, Kyotaro Horiguchi wrote:
> At Mon, 2 Mar 2020 17:29:30 +0900, Fujii Masao 
> <masao.fujii@oss.nttdata.com> wrote in
>>> Attached is the updated version of the patch.
>>> The previous patch used only pgstat_progress_update_param() even 
>>> when updating multiple values. Since those updates are not atomic, 
>>> this can cause readers of the values to see the intermediate states. 
>>> To avoid this issue, the latest patch uses 
>>> pgstat_progress_update_multi_param(), instead.
>>
>> Attached the updated version of the patch.
>> Barring any objections, I plan to commit this patch.
> 
> It is working as designed and the status names are fine to me.

Thanks for the review! I pushed the patch.

> The last one comment from me.
> The newly defined symbols have inconsistent indents.

Yes, I fixed that.

Regards,


--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Вложения

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/03 14:37, Shinoda, Noriyoshi (PN Japan A&PS Delivery) wrote:
> Hi,
> 
> Thank you for developing good features.
> The attached patch is a small fix to the committed documentation. This patch fixes the description literal for the
backup_streamedcolumn.
 

Thanks for the report and patch! Pushed.

Regards,


-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Magnus Hagander
Дата:
On Mon, Mar 2, 2020 at 10:03 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/03/03 14:37, Shinoda, Noriyoshi (PN Japan A&PS Delivery) wrote:
> > Hi,
> >
> > Thank you for developing good features.
> > The attached patch is a small fix to the committed documentation. This patch fixes the description literal for the
backup_streamedcolumn.
 
>
> Thanks for the report and patch! Pushed.

This patch requires, AIUI, that you add -P to the pg_basebackup
commandline in order to get the progress tracked in details
serverside. But this also generates output in the client that one
might not want.

Should we perhaps have a switch in pg_basebackup that enables the
server side tracking only, without generating output in the client?

//Magnus



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/05 9:31, Magnus Hagander wrote:
> On Mon, Mar 2, 2020 at 10:03 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2020/03/03 14:37, Shinoda, Noriyoshi (PN Japan A&PS Delivery) wrote:
>>> Hi,
>>>
>>> Thank you for developing good features.
>>> The attached patch is a small fix to the committed documentation. This patch fixes the description literal for the
backup_streamedcolumn.
 
>>
>> Thanks for the report and patch! Pushed.
> 
> This patch requires, AIUI, that you add -P to the pg_basebackup
> commandline in order to get the progress tracked in details
> serverside.

Whether --progress is enabled or not, the pg_stat_progress_basebackup
view report the progress of the backup in the server side. But the total
amount of data that will be streamed is estimated and reported only when
this option is enabled.

> But this also generates output in the client that one
> might not want.
> 
> Should we perhaps have a switch in pg_basebackup that enables the
> server side tracking only, without generating output in the client?

Yes, this sounds reasonable.

I have two ideas.

(1) Extend --progress option so that it accepts the setting values like
     none, server, both (better names?). If both is specified, PROGRESS
     option is specified in BASE_BACKUP replication command and
     the total backup size is estimated in the server side, but the progress
     is not reported in a client side. If none, PROGRESS option is not
     specified in BASE_BACKUP. The demerit of this idea is that --progress
     option without argument is not supported yet and the existing
     application using --progress option when using pg_basebackup needs
     to be updated when upgrading PostgreSQL version to v13.

(2) Add something like --estimate-backup-size (better name?) option
     to pg_basebackup. If it's specified, PROGRESS option is specified but
     the progress is not reported in a client side.

Thought?

Or, as another approach, it might be worth considering to make
the server always estimate the total backup size whether --progress is
specified or not, as Amit argued upthread. If the time required to
estimate the backup size is negligible compared to total backup time,
IMO this approach seems better. If we adopt this, we can also get
rid of PROGESS option from BASE_BACKUP replication command.

Regards,


-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Peter Eisentraut
Дата:
On 2020-03-05 05:53, Fujii Masao wrote:
> Or, as another approach, it might be worth considering to make
> the server always estimate the total backup size whether --progress is
> specified or not, as Amit argued upthread. If the time required to
> estimate the backup size is negligible compared to total backup time,
> IMO this approach seems better. If we adopt this, we can also get
> rid of PROGESS option from BASE_BACKUP replication command.

I think that would be preferable.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Julien Rouhaud
Дата:
On Thu, Mar 5, 2020 at 8:15 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-03-05 05:53, Fujii Masao wrote:
> > Or, as another approach, it might be worth considering to make
> > the server always estimate the total backup size whether --progress is
> > specified or not, as Amit argued upthread. If the time required to
> > estimate the backup size is negligible compared to total backup time,
> > IMO this approach seems better. If we adopt this, we can also get
> > rid of PROGESS option from BASE_BACKUP replication command.
>
> I think that would be preferable.

+1



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Jehan-Guillaume de Rorthais
Дата:
On Thu, 5 Mar 2020 10:32:45 +0100
Julien Rouhaud <rjuju123@gmail.com> wrote:

> On Thu, Mar 5, 2020 at 8:15 AM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> >
> > On 2020-03-05 05:53, Fujii Masao wrote:  
> > > Or, as another approach, it might be worth considering to make
> > > the server always estimate the total backup size whether --progress is
> > > specified or not, as Amit argued upthread. If the time required to
> > > estimate the backup size is negligible compared to total backup time,
> > > IMO this approach seems better. If we adopt this, we can also get
> > > rid of PROGESS option from BASE_BACKUP replication command.  
> >
> > I think that would be preferable.  
> 
> +1

+1



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Kyotaro Horiguchi
Дата:
At Thu, 5 Mar 2020 10:32:45 +0100, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> On Thu, Mar 5, 2020 at 8:15 AM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> >
> > On 2020-03-05 05:53, Fujii Masao wrote:
> > > Or, as another approach, it might be worth considering to make
> > > the server always estimate the total backup size whether --progress is
> > > specified or not, as Amit argued upthread. If the time required to
> > > estimate the backup size is negligible compared to total backup time,
> > > IMO this approach seems better. If we adopt this, we can also get
> > > rid of PROGESS option from BASE_BACKUP replication command.
> >
> > I think that would be preferable.
> 
> +1

+1

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Magnus Hagander
Дата:
On Wed, Mar 4, 2020 at 11:15 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-03-05 05:53, Fujii Masao wrote:
> > Or, as another approach, it might be worth considering to make
> > the server always estimate the total backup size whether --progress is
> > specified or not, as Amit argued upthread. If the time required to
> > estimate the backup size is negligible compared to total backup time,
> > IMO this approach seems better. If we adopt this, we can also get
> > rid of PROGESS option from BASE_BACKUP replication command.
>
> I think that would be preferable.

From a UI perspective I definitely agree.

The problem with that one is that it can take a non-trivlal amount of
time, that's why it was made an option (in the protocol) in the first
place. Particularly if you have a database with many small objets.

Is it enough to care about? I'm not sure, but it's definitely
something to consider. It was not negligible in some tests I ran then,
but it is quite some time ago and reality has definitely changed since
then.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/06 0:45, Magnus Hagander wrote:
> On Wed, Mar 4, 2020 at 11:15 PM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>>
>> On 2020-03-05 05:53, Fujii Masao wrote:
>>> Or, as another approach, it might be worth considering to make
>>> the server always estimate the total backup size whether --progress is
>>> specified or not, as Amit argued upthread. If the time required to
>>> estimate the backup size is negligible compared to total backup time,
>>> IMO this approach seems better. If we adopt this, we can also get
>>> rid of PROGESS option from BASE_BACKUP replication command.
>>
>> I think that would be preferable.
> 
>  From a UI perspective I definitely agree.
> 
> The problem with that one is that it can take a non-trivlal amount of
> time, that's why it was made an option (in the protocol) in the first
> place. Particularly if you have a database with many small objets.

Yeah, this is why I made the server estimate the total backup size
only when --progress is specified.

Another idea is;
- Make pg_basebackup specify PROGRESS option in BASE_BACKUP command
   whether --progress is specified or not. This causes the server to estimate
   the total backup size even when users don't specify --progress.
- Change pg_basebackup so that it treats --progress option as just a knob to
   determine whether to report the progress in a client-side.
- Add new option like --no-estimate-backup-size (better name?) to
   pg_basebackup. If this option is specified, pg_basebackup doesn't use
   PROGRESS in BASE_BACKUP and the server doesn't estimate the backup size.

I believe that the time required to estimate the backup size is not so large
in most cases, so in the above idea, most users don't need to specify more
option for the estimation. This is good for UI perspective.

OTOH, users who are worried about the estimation time can use
--no-estimate-backup-size option and skip the time-consuming estimation.

Thought?

Regards,


-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Magnus Hagander
Дата:
On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/03/06 0:45, Magnus Hagander wrote:
> > On Wed, Mar 4, 2020 at 11:15 PM Peter Eisentraut
> > <peter.eisentraut@2ndquadrant.com> wrote:
> >>
> >> On 2020-03-05 05:53, Fujii Masao wrote:
> >>> Or, as another approach, it might be worth considering to make
> >>> the server always estimate the total backup size whether --progress is
> >>> specified or not, as Amit argued upthread. If the time required to
> >>> estimate the backup size is negligible compared to total backup time,
> >>> IMO this approach seems better. If we adopt this, we can also get
> >>> rid of PROGESS option from BASE_BACKUP replication command.
> >>
> >> I think that would be preferable.
> >
> >  From a UI perspective I definitely agree.
> >
> > The problem with that one is that it can take a non-trivlal amount of
> > time, that's why it was made an option (in the protocol) in the first
> > place. Particularly if you have a database with many small objets.
>
> Yeah, this is why I made the server estimate the total backup size
> only when --progress is specified.
>
> Another idea is;
> - Make pg_basebackup specify PROGRESS option in BASE_BACKUP command
>    whether --progress is specified or not. This causes the server to estimate
>    the total backup size even when users don't specify --progress.
> - Change pg_basebackup so that it treats --progress option as just a knob to
>    determine whether to report the progress in a client-side.
> - Add new option like --no-estimate-backup-size (better name?) to
>    pg_basebackup. If this option is specified, pg_basebackup doesn't use
>    PROGRESS in BASE_BACKUP and the server doesn't estimate the backup size.
>
> I believe that the time required to estimate the backup size is not so large
> in most cases, so in the above idea, most users don't need to specify more
> option for the estimation. This is good for UI perspective.
>
> OTOH, users who are worried about the estimation time can use
> --no-estimate-backup-size option and skip the time-consuming estimation.

Personally, I think this is the best idea. it brings a "reasonable
default", since most people are not going to have this problem, and
yet a good way to get out from the issue for those that potentially
have it. Especially since we are now already showing the state that
"walsender is estimating the size", it should be easy enugh for people
to determine if they need to use this flag or not.

In nitpicking mode, I'd just call the flag --no-estimate-size -- it's
pretty clear things are about backups when you call pg_basebackup, and
it keeps the option a bit more reasonable in length.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Kyotaro Horiguchi
Дата:
At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander <magnus@hagander.net> wrote in 
> On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > I believe that the time required to estimate the backup size is not so large
> > in most cases, so in the above idea, most users don't need to specify more
> > option for the estimation. This is good for UI perspective.
> >
> > OTOH, users who are worried about the estimation time can use
> > --no-estimate-backup-size option and skip the time-consuming estimation.
> 
> Personally, I think this is the best idea. it brings a "reasonable
> default", since most people are not going to have this problem, and
> yet a good way to get out from the issue for those that potentially
> have it. Especially since we are now already showing the state that
> "walsender is estimating the size", it should be easy enugh for people
> to determine if they need to use this flag or not.
> 
> In nitpicking mode, I'd just call the flag --no-estimate-size -- it's
> pretty clear things are about backups when you call pg_basebackup, and
> it keeps the option a bit more reasonable in length.

I agree to the negative option and the shortened name.  What if both
--no-estimate-size and -P are specifed?  Rejecting as conflicting
options or -P supercedes?  I would choose the former because we don't
know which of them has priority.

$ pg_basebackup --no-estimate-size -P
pg_basebackup: -P requires size estimate.
$ 

regads.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Magnus Hagander
Дата:
On Sun, Mar 8, 2020 at 10:13 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander <magnus@hagander.net> wrote in
> > On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > > I believe that the time required to estimate the backup size is not so large
> > > in most cases, so in the above idea, most users don't need to specify more
> > > option for the estimation. This is good for UI perspective.
> > >
> > > OTOH, users who are worried about the estimation time can use
> > > --no-estimate-backup-size option and skip the time-consuming estimation.
> >
> > Personally, I think this is the best idea. it brings a "reasonable
> > default", since most people are not going to have this problem, and
> > yet a good way to get out from the issue for those that potentially
> > have it. Especially since we are now already showing the state that
> > "walsender is estimating the size", it should be easy enugh for people
> > to determine if they need to use this flag or not.
> >
> > In nitpicking mode, I'd just call the flag --no-estimate-size -- it's
> > pretty clear things are about backups when you call pg_basebackup, and
> > it keeps the option a bit more reasonable in length.
>
> I agree to the negative option and the shortened name.  What if both
> --no-estimate-size and -P are specifed?  Rejecting as conflicting
> options or -P supercedes?  I would choose the former because we don't
> know which of them has priority.

I would definitely prefer rejecting an invalid combination of options.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/09 14:21, Magnus Hagander wrote:
> On Sun, Mar 8, 2020 at 10:13 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
>>
>> At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander <magnus@hagander.net> wrote in
>>> On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>> I believe that the time required to estimate the backup size is not so large
>>>> in most cases, so in the above idea, most users don't need to specify more
>>>> option for the estimation. This is good for UI perspective.
>>>>
>>>> OTOH, users who are worried about the estimation time can use
>>>> --no-estimate-backup-size option and skip the time-consuming estimation.
>>>
>>> Personally, I think this is the best idea. it brings a "reasonable
>>> default", since most people are not going to have this problem, and
>>> yet a good way to get out from the issue for those that potentially
>>> have it. Especially since we are now already showing the state that
>>> "walsender is estimating the size", it should be easy enugh for people
>>> to determine if they need to use this flag or not.
>>>
>>> In nitpicking mode, I'd just call the flag --no-estimate-size -- it's
>>> pretty clear things are about backups when you call pg_basebackup, and
>>> it keeps the option a bit more reasonable in length.

+1

>> I agree to the negative option and the shortened name.  What if both
>> --no-estimate-size and -P are specifed?  Rejecting as conflicting
>> options or -P supercedes?  I would choose the former because we don't
>> know which of them has priority.
> 
> I would definitely prefer rejecting an invalid combination of options.

+1

So, I will make the patch adding support for --no-estimate-size option
in pg_basebackup.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/10 11:36, Fujii Masao wrote:
> 
> 
> On 2020/03/09 14:21, Magnus Hagander wrote:
>> On Sun, Mar 8, 2020 at 10:13 PM Kyotaro Horiguchi
>> <horikyota.ntt@gmail.com> wrote:
>>>
>>> At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander <magnus@hagander.net> wrote in
>>>> On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>> I believe that the time required to estimate the backup size is not so large
>>>>> in most cases, so in the above idea, most users don't need to specify more
>>>>> option for the estimation. This is good for UI perspective.
>>>>>
>>>>> OTOH, users who are worried about the estimation time can use
>>>>> --no-estimate-backup-size option and skip the time-consuming estimation.
>>>>
>>>> Personally, I think this is the best idea. it brings a "reasonable
>>>> default", since most people are not going to have this problem, and
>>>> yet a good way to get out from the issue for those that potentially
>>>> have it. Especially since we are now already showing the state that
>>>> "walsender is estimating the size", it should be easy enugh for people
>>>> to determine if they need to use this flag or not.
>>>>
>>>> In nitpicking mode, I'd just call the flag --no-estimate-size -- it's
>>>> pretty clear things are about backups when you call pg_basebackup, and
>>>> it keeps the option a bit more reasonable in length.
> 
> +1
> 
>>> I agree to the negative option and the shortened name.  What if both
>>> --no-estimate-size and -P are specifed?  Rejecting as conflicting
>>> options or -P supercedes?  I would choose the former because we don't
>>> know which of them has priority.
>>
>> I would definitely prefer rejecting an invalid combination of options.
> 
> +1
> 
> So, I will make the patch adding support for --no-estimate-size option
> in pg_basebackup.

Patch attached.

Regards,


-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > So, I will make the patch adding support for --no-estimate-size option
> > in pg_basebackup.
>
> Patch attached.

Like the idea and the patch looks mostly good.

+      total size. If the estimation is disabled in
+      <application>pg_basebackup</application>
+      (i.e., <literal>--no-estimate-size</literal> option is specified),
+      this is always <literal>0</literal>.

"always" seems unnecessary.

+        This option prevents the server from estimating the total
+        amount of backup data that will be streamed. In other words,
+        <literal>backup_total</literal> column in the
+        <structname>pg_stat_progress_basebackup</structname>
+        view always indicates <literal>0</literal> if this option is enabled.

Here too.

Thanks,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/10 22:43, Amit Langote wrote:
> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> So, I will make the patch adding support for --no-estimate-size option
>>> in pg_basebackup.
>>
>> Patch attached.
> 
> Like the idea and the patch looks mostly good.

Thanks for reviewing the patch!

> +      total size. If the estimation is disabled in
> +      <application>pg_basebackup</application>
> +      (i.e., <literal>--no-estimate-size</literal> option is specified),
> +      this is always <literal>0</literal>.
> 
> "always" seems unnecessary.

Fixed.

> +        This option prevents the server from estimating the total
> +        amount of backup data that will be streamed. In other words,
> +        <literal>backup_total</literal> column in the
> +        <structname>pg_stat_progress_basebackup</structname>
> +        view always indicates <literal>0</literal> if this option is enabled.
> 
> Here too.

Fixed.

Attached is the updated version of the patch.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Magnus Hagander
Дата:
On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/03/10 22:43, Amit Langote wrote:
> > On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>> So, I will make the patch adding support for --no-estimate-size option
> >>> in pg_basebackup.
> >>
> >> Patch attached.
> >
> > Like the idea and the patch looks mostly good.
>
> Thanks for reviewing the patch!
>
> > +      total size. If the estimation is disabled in
> > +      <application>pg_basebackup</application>
> > +      (i.e., <literal>--no-estimate-size</literal> option is specified),
> > +      this is always <literal>0</literal>.
> >
> > "always" seems unnecessary.
>
> Fixed.
>
> > +        This option prevents the server from estimating the total
> > +        amount of backup data that will be streamed. In other words,
> > +        <literal>backup_total</literal> column in the
> > +        <structname>pg_stat_progress_basebackup</structname>
> > +        view always indicates <literal>0</literal> if this option is enabled.
> >
> > Here too.
>
> Fixed.
>
> Attached is the updated version of the patch.

Would it perhaps be better to return NULL instead of 0 in the
statistics view if there is no data?

Also, should it really  be the server version that decides how this
feature behaves, and not the pg_basebackup version? Given that the
implementation is entirely in the client, it seems that's more
logical?


and a few docs nitpicks:

        <para>
         Whether this is enabled or not, the
         <structname>pg_stat_progress_basebackup</structname> view
-        report the progress of the backup in the server side. But note
-        that the total amount of data that will be streamed is estimated
-        and reported only when this option is enabled. In other words,
-        <literal>backup_total</literal> column in the view always
-        indicates <literal>0</literal> if this option is disabled.
+        report the progress of the backup in the server side.
+       </para>
+       <para>
+        This option is not allowed when using
+        <option>--no-estimate-size</option>.
        </para>

I think you should just remove that whole paragraph. The details are
now listed under the disable parameter.

+        This option prevents the server from estimating the total
+        amount of backup data that will be streamed. In other words,
+        <literal>backup_total</literal> column in the
+        <structname>pg_stat_progress_basebackup</structname>
+        view indicates <literal>0</literal> if this option is enabled.

I suggest just "This option prevents the server from estimating the
total amount of backup data that will be streamed, resulting in the
ackup_total column in pg_stat_progress_basebackup to be (zero or NULL
depending on above)".

(Markup needed on that of course ,but you get the idea)

+        When this is disabled, the backup will start by enumerating

I'd try to avoid the double negation, with something "without this
parameter, the backup will start..."



+  <para>
+   <application>pg_basebackup</application> asks the server to estimate
+   the total amount of data that will be streamed by default (unless
+   <option>--no-estimate-size</option> is specified) in version 13 or later,
+   and does that only when <option>--progress</option> is specified in
+   the older versions.
+  </para>

That's an item for the release notes, not for the reference page, I
think. It's already explained under the --disable parameter, so I
suggest removing this paragraph as well.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Wed, Mar 11, 2020 at 3:39 AM Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > Attached is the updated version of the patch.
>
> Would it perhaps be better to return NULL instead of 0 in the
> statistics view if there is no data?

NULL sounds better than 0.

Thank you,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/11 13:44, Amit Langote wrote:
> On Wed, Mar 11, 2020 at 3:39 AM Magnus Hagander <magnus@hagander.net> wrote:
>> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> Attached is the updated version of the patch.
>>
>> Would it perhaps be better to return NULL instead of 0 in the
>> statistics view if there is no data?
> 
> NULL sounds better than 0.

Ok, I will make the patch that changes the view so that NULL is
returned instead of 0. I'm thinking to commit that patch
after applying the change of pg_basebackup client side first.

Regards,


-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/11 3:39, Magnus Hagander wrote:
> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2020/03/10 22:43, Amit Langote wrote:
>>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>> So, I will make the patch adding support for --no-estimate-size option
>>>>> in pg_basebackup.
>>>>
>>>> Patch attached.
>>>
>>> Like the idea and the patch looks mostly good.
>>
>> Thanks for reviewing the patch!
>>
>>> +      total size. If the estimation is disabled in
>>> +      <application>pg_basebackup</application>
>>> +      (i.e., <literal>--no-estimate-size</literal> option is specified),
>>> +      this is always <literal>0</literal>.
>>>
>>> "always" seems unnecessary.
>>
>> Fixed.
>>
>>> +        This option prevents the server from estimating the total
>>> +        amount of backup data that will be streamed. In other words,
>>> +        <literal>backup_total</literal> column in the
>>> +        <structname>pg_stat_progress_basebackup</structname>
>>> +        view always indicates <literal>0</literal> if this option is enabled.
>>>
>>> Here too.
>>
>> Fixed.
>>
>> Attached is the updated version of the patch.
> 
> Would it perhaps be better to return NULL instead of 0 in the
> statistics view if there is no data?
> 
> Also, should it really  be the server version that decides how this
> feature behaves, and not the pg_basebackup version? Given that the
> implementation is entirely in the client, it seems that's more
> logical?

Yeah, you're right. I changed the patch that way.
Attached is the updated version of the patch.
  
> and a few docs nitpicks:
> 
>          <para>
>           Whether this is enabled or not, the
>           <structname>pg_stat_progress_basebackup</structname> view
> -        report the progress of the backup in the server side. But note
> -        that the total amount of data that will be streamed is estimated
> -        and reported only when this option is enabled. In other words,
> -        <literal>backup_total</literal> column in the view always
> -        indicates <literal>0</literal> if this option is disabled.
> +        report the progress of the backup in the server side.
> +       </para>
> +       <para>
> +        This option is not allowed when using
> +        <option>--no-estimate-size</option>.
>          </para>
> 
> I think you should just remove that whole paragraph. The details are
> now listed under the disable parameter.

Fixed.

> +        This option prevents the server from estimating the total
> +        amount of backup data that will be streamed. In other words,
> +        <literal>backup_total</literal> column in the
> +        <structname>pg_stat_progress_basebackup</structname>
> +        view indicates <literal>0</literal> if this option is enabled.
> 
> I suggest just "This option prevents the server from estimating the
> total amount of backup data that will be streamed, resulting in the
> ackup_total column in pg_stat_progress_basebackup to be (zero or NULL
> depending on above)".
> 
> (Markup needed on that of course ,but you get the idea)

Yes, fixed.

> +        When this is disabled, the backup will start by enumerating
> 
> I'd try to avoid the double negation, with something "without this
> parameter, the backup will start..."

Fixed. I used "Without using this option ...".
  
> +  <para>
> +   <application>pg_basebackup</application> asks the server to estimate
> +   the total amount of data that will be streamed by default (unless
> +   <option>--no-estimate-size</option> is specified) in version 13 or later,
> +   and does that only when <option>--progress</option> is specified in
> +   the older versions.
> +  </para>
> 
> That's an item for the release notes, not for the reference page, I
> think. It's already explained under the --disable parameter, so I
> suggest removing this paragraph as well.

Fixed.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Magnus Hagander
Дата:
On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/03/11 3:39, Magnus Hagander wrote:
> > On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>
> >>
> >>
> >> On 2020/03/10 22:43, Amit Langote wrote:
> >>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>>>> So, I will make the patch adding support for --no-estimate-size option
> >>>>> in pg_basebackup.
> >>>>
> >>>> Patch attached.
> >>>
> >>> Like the idea and the patch looks mostly good.
> >>
> >> Thanks for reviewing the patch!
> >>
> >>> +      total size. If the estimation is disabled in
> >>> +      <application>pg_basebackup</application>
> >>> +      (i.e., <literal>--no-estimate-size</literal> option is specified),
> >>> +      this is always <literal>0</literal>.
> >>>
> >>> "always" seems unnecessary.
> >>
> >> Fixed.
> >>
> >>> +        This option prevents the server from estimating the total
> >>> +        amount of backup data that will be streamed. In other words,
> >>> +        <literal>backup_total</literal> column in the
> >>> +        <structname>pg_stat_progress_basebackup</structname>
> >>> +        view always indicates <literal>0</literal> if this option is enabled.
> >>>
> >>> Here too.
> >>
> >> Fixed.
> >>
> >> Attached is the updated version of the patch.
> >
> > Would it perhaps be better to return NULL instead of 0 in the
> > statistics view if there is no data?

Did you miss this comment, or not agree? :)


> > Also, should it really  be the server version that decides how this
> > feature behaves, and not the pg_basebackup version? Given that the
> > implementation is entirely in the client, it seems that's more
> > logical?
>
> Yeah, you're right. I changed the patch that way.
> Attached is the updated version of the patch.

The other changes in it look good!

//Magnus



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/19 0:37, Magnus Hagander wrote:
> On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2020/03/11 3:39, Magnus Hagander wrote:
>>> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2020/03/10 22:43, Amit Langote wrote:
>>>>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>>>> So, I will make the patch adding support for --no-estimate-size option
>>>>>>> in pg_basebackup.
>>>>>>
>>>>>> Patch attached.
>>>>>
>>>>> Like the idea and the patch looks mostly good.
>>>>
>>>> Thanks for reviewing the patch!
>>>>
>>>>> +      total size. If the estimation is disabled in
>>>>> +      <application>pg_basebackup</application>
>>>>> +      (i.e., <literal>--no-estimate-size</literal> option is specified),
>>>>> +      this is always <literal>0</literal>.
>>>>>
>>>>> "always" seems unnecessary.
>>>>
>>>> Fixed.
>>>>
>>>>> +        This option prevents the server from estimating the total
>>>>> +        amount of backup data that will be streamed. In other words,
>>>>> +        <literal>backup_total</literal> column in the
>>>>> +        <structname>pg_stat_progress_basebackup</structname>
>>>>> +        view always indicates <literal>0</literal> if this option is enabled.
>>>>>
>>>>> Here too.
>>>>
>>>> Fixed.
>>>>
>>>> Attached is the updated version of the patch.
>>>
>>> Would it perhaps be better to return NULL instead of 0 in the
>>> statistics view if there is no data?
> 
> Did you miss this comment, or not agree? :)

Oh, I forgot to attached the patch... Patch attached.
This patch needs to be applied after applying
add_no_estimate_size_v3.patch.

>>> Also, should it really  be the server version that decides how this
>>> feature behaves, and not the pg_basebackup version? Given that the
>>> implementation is entirely in the client, it seems that's more
>>> logical?
>>
>> Yeah, you're right. I changed the patch that way.
>> Attached is the updated version of the patch.
> 
> The other changes in it look good!

Thanks for the review!

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Вложения

Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Magnus Hagander
Дата:
On Wed, Mar 18, 2020 at 5:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/03/19 0:37, Magnus Hagander wrote:
> > On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>
> >>
> >>
> >> On 2020/03/11 3:39, Magnus Hagander wrote:
> >>> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2020/03/10 22:43, Amit Langote wrote:
> >>>>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>>>>>> So, I will make the patch adding support for --no-estimate-size option
> >>>>>>> in pg_basebackup.
> >>>>>>
> >>>>>> Patch attached.
> >>>>>
> >>>>> Like the idea and the patch looks mostly good.
> >>>>
> >>>> Thanks for reviewing the patch!
> >>>>
> >>>>> +      total size. If the estimation is disabled in
> >>>>> +      <application>pg_basebackup</application>
> >>>>> +      (i.e., <literal>--no-estimate-size</literal> option is specified),
> >>>>> +      this is always <literal>0</literal>.
> >>>>>
> >>>>> "always" seems unnecessary.
> >>>>
> >>>> Fixed.
> >>>>
> >>>>> +        This option prevents the server from estimating the total
> >>>>> +        amount of backup data that will be streamed. In other words,
> >>>>> +        <literal>backup_total</literal> column in the
> >>>>> +        <structname>pg_stat_progress_basebackup</structname>
> >>>>> +        view always indicates <literal>0</literal> if this option is enabled.
> >>>>>
> >>>>> Here too.
> >>>>
> >>>> Fixed.
> >>>>
> >>>> Attached is the updated version of the patch.
> >>>
> >>> Would it perhaps be better to return NULL instead of 0 in the
> >>> statistics view if there is no data?
> >
> > Did you miss this comment, or not agree? :)
>
> Oh, I forgot to attached the patch... Patch attached.
> This patch needs to be applied after applying
> add_no_estimate_size_v3.patch.

:)

Hmm. I'm slightly irked by doing the -1 -> NULL conversion in the SQL
view.  I wonder if it might be worth teaching
pg_stat_get_progress_info() about returning NULL?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/19 1:22, Magnus Hagander wrote:
> On Wed, Mar 18, 2020 at 5:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2020/03/19 0:37, Magnus Hagander wrote:
>>> On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2020/03/11 3:39, Magnus Hagander wrote:
>>>>> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2020/03/10 22:43, Amit Langote wrote:
>>>>>>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>>>>>> So, I will make the patch adding support for --no-estimate-size option
>>>>>>>>> in pg_basebackup.
>>>>>>>>
>>>>>>>> Patch attached.
>>>>>>>
>>>>>>> Like the idea and the patch looks mostly good.
>>>>>>
>>>>>> Thanks for reviewing the patch!
>>>>>>
>>>>>>> +      total size. If the estimation is disabled in
>>>>>>> +      <application>pg_basebackup</application>
>>>>>>> +      (i.e., <literal>--no-estimate-size</literal> option is specified),
>>>>>>> +      this is always <literal>0</literal>.
>>>>>>>
>>>>>>> "always" seems unnecessary.
>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>>> +        This option prevents the server from estimating the total
>>>>>>> +        amount of backup data that will be streamed. In other words,
>>>>>>> +        <literal>backup_total</literal> column in the
>>>>>>> +        <structname>pg_stat_progress_basebackup</structname>
>>>>>>> +        view always indicates <literal>0</literal> if this option is enabled.
>>>>>>>
>>>>>>> Here too.
>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>> Attached is the updated version of the patch.
>>>>>
>>>>> Would it perhaps be better to return NULL instead of 0 in the
>>>>> statistics view if there is no data?
>>>
>>> Did you miss this comment, or not agree? :)
>>
>> Oh, I forgot to attached the patch... Patch attached.
>> This patch needs to be applied after applying
>> add_no_estimate_size_v3.patch.
> 
> :)
> 
> Hmm. I'm slightly irked by doing the -1 -> NULL conversion in the SQL
> view.  I wonder if it might be worth teaching
> pg_stat_get_progress_info() about returning NULL?

That's possible by
- adding the boolean array like st_progress_null[PGSTAT_NUM_PROGRESS_PARAM]
    that indicates whether each column is NULL or not, into PgBackendStatus
- extending pgstat_progress_update_param() and pgstat_progress_update_multi_param()
    so that they can update the boolean array for NULL
- updating the progress reporting code so that the extended versions of
    function are used

I didn't adopt this idea because it looks a bit overkill for the purpose.
OTOH, this would be good improvement for the progress reporting
infrastructure and I'm fine to implement it.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
Hello,

On Thu, Mar 19, 2020 at 1:47 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2020/03/19 1:22, Magnus Hagander wrote:
> >>>>> Would it perhaps be better to return NULL instead of 0 in the
> >>>>> statistics view if there is no data?
> >>>
> >>> Did you miss this comment, or not agree? :)
> >>
> >> Oh, I forgot to attached the patch... Patch attached.
> >> This patch needs to be applied after applying
> >> add_no_estimate_size_v3.patch.
> >
> > :)
> >
> > Hmm. I'm slightly irked by doing the -1 -> NULL conversion in the SQL
> > view.  I wonder if it might be worth teaching
> > pg_stat_get_progress_info() about returning NULL?
>
> That's possible by
> - adding the boolean array like st_progress_null[PGSTAT_NUM_PROGRESS_PARAM]
>     that indicates whether each column is NULL or not, into PgBackendStatus
> - extending pgstat_progress_update_param() and pgstat_progress_update_multi_param()
>     so that they can update the boolean array for NULL
> - updating the progress reporting code so that the extended versions of
>     function are used
>
> I didn't adopt this idea because it looks a bit overkill for the purpose.

I tend to agree that this would be too many changes for something only
one place currently needs to use.

> OTOH, this would be good improvement for the progress reporting
> infrastructure and I'm fine to implement it.

Magnus' idea of checking the values in pg_stat_get_progress_info() to
determine whether to return NULL seems fine to me.  We will need to
update the documentation of st_progress_param, because it currently
says:

     *  ...but the meaning of each element in the
     * st_progress_param array is command-specific.
     */
    ProgressCommandType st_progress_command;
    Oid         st_progress_command_target;
    int64       st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
} PgBackendStatus;

If we are to define -1 in st_progress_param[] as NULL to the users,
that must be mentioned here.

-- 
Thank you,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Alvaro Herrera
Дата:
On 2020-Mar-19, Amit Langote wrote:

> Magnus' idea of checking the values in pg_stat_get_progress_info() to
> determine whether to return NULL seems fine to me.  We will need to
> update the documentation of st_progress_param, because it currently
> says:
> 
>      *  ...but the meaning of each element in the
>      * st_progress_param array is command-specific.
>      */
>     ProgressCommandType st_progress_command;
>     Oid         st_progress_command_target;
>     int64       st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
> } PgBackendStatus;
> 
> If we are to define -1 in st_progress_param[] as NULL to the users,
> that must be mentioned here.

Hmm, why -1?  It seems like a value that we might want to use for other
purposes in other params.  Maybe INT64_MIN is a better choice?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> On 2020-Mar-19, Amit Langote wrote:
>
> > Magnus' idea of checking the values in pg_stat_get_progress_info() to
> > determine whether to return NULL seems fine to me.  We will need to
> > update the documentation of st_progress_param, because it currently
> > says:
> >
> >      *  ...but the meaning of each element in the
> >      * st_progress_param array is command-specific.
> >      */
> >     ProgressCommandType st_progress_command;
> >     Oid         st_progress_command_target;
> >     int64       st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
> > } PgBackendStatus;
> >
> > If we are to define -1 in st_progress_param[] as NULL to the users,
> > that must be mentioned here.
>
> Hmm, why -1?  It seems like a value that we might want to use for other
> purposes in other params.  Maybe INT64_MIN is a better choice?

Yes, maybe.

-- 
Thank you,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Alvaro Herrera
Дата:
On 2020-Mar-19, Amit Langote wrote:

> On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > On 2020-Mar-19, Amit Langote wrote:
> >
> > > Magnus' idea of checking the values in pg_stat_get_progress_info() to
> > > determine whether to return NULL seems fine to me.  We will need to
> > > update the documentation of st_progress_param, because it currently
> > > says:
> > >
> > >      *  ...but the meaning of each element in the
> > >      * st_progress_param array is command-specific.
> > >      */
> > >     ProgressCommandType st_progress_command;
> > >     Oid         st_progress_command_target;
> > >     int64       st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
> > > } PgBackendStatus;
> > >
> > > If we are to define -1 in st_progress_param[] as NULL to the users,
> > > that must be mentioned here.
> >
> > Hmm, why -1?  It seems like a value that we might want to use for other
> > purposes in other params.  Maybe INT64_MIN is a better choice?
> 
> Yes, maybe.

Looking at the code involved, I think it's okay to use -1 in that
specific param and teach the SQL query to display null when it finds
that value.  We have plenty of magic numbers in the progress params, and
it's always the definition of the view that interprets them.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/19 11:32, Amit Langote wrote:
> On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> On 2020-Mar-19, Amit Langote wrote:
>>
>>> Magnus' idea of checking the values in pg_stat_get_progress_info() to
>>> determine whether to return NULL seems fine to me.

So you think that the latest patch is good enough?

>>>  We will need to
>>> update the documentation of st_progress_param, because it currently
>>> says:
>>>
>>>       *  ...but the meaning of each element in the
>>>       * st_progress_param array is command-specific.
>>>       */
>>>      ProgressCommandType st_progress_command;
>>>      Oid         st_progress_command_target;
>>>      int64       st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
>>> } PgBackendStatus;
>>>
>>> If we are to define -1 in st_progress_param[] as NULL to the users,
>>> that must be mentioned here.
>>
>> Hmm, why -1?  It seems like a value that we might want to use for other
>> purposes in other params.  Maybe INT64_MIN is a better choice?
> 
> Yes, maybe.

I don't think that we need to define the specific value like -1 as NULL globally.
Which value should be used for that purpose may vary by each command. Only for
pg_stat_progress_basebackup.backup_total, IMO using -1 as special value for
NULL is not so bad idea.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Amit Langote
Дата:
On Thu, Mar 19, 2020 at 11:45 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> On 2020/03/19 11:32, Amit Langote wrote:
> > On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera
> > <alvherre@2ndquadrant.com> wrote:
> >> On 2020-Mar-19, Amit Langote wrote:
> >>
> >>> Magnus' idea of checking the values in pg_stat_get_progress_info() to
> >>> determine whether to return NULL seems fine to me.
>
> So you think that the latest patch is good enough?

I see that the latest patch modifies pg_stat_progress_basebackup view
to return NULL, so not exactly.  IIUC, Magnus seems to be advocating
to *centralize* this in pg_stat_get_progress_info(), which all views
are based on, which means we need to globally define a NULL param
value, as Alvaro also pointed out.

But...

> >>>  We will need to
> >>> update the documentation of st_progress_param, because it currently
> >>> says:
> >>>
> >>>       *  ...but the meaning of each element in the
> >>>       * st_progress_param array is command-specific.
> >>>       */
> >>>      ProgressCommandType st_progress_command;
> >>>      Oid         st_progress_command_target;
> >>>      int64       st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
> >>> } PgBackendStatus;
> >>>
> >>> If we are to define -1 in st_progress_param[] as NULL to the users,
> >>> that must be mentioned here.
> >>
> >> Hmm, why -1?  It seems like a value that we might want to use for other
> >> purposes in other params.  Maybe INT64_MIN is a better choice?
> >
> > Yes, maybe.
>
> I don't think that we need to define the specific value like -1 as NULL globally.
> Which value should be used for that purpose may vary by each command. Only for
> pg_stat_progress_basebackup.backup_total, IMO using -1 as special value for
> NULL is not so bad idea.

This is the first instance of needing to display NULL in a progress
view, so a non-general solution may be enough for now.  IOW, your
latest patch is good enough for that. :)

--
Thank you,
Amit



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/19 1:13, Fujii Masao wrote:
> 
> 
> On 2020/03/19 0:37, Magnus Hagander wrote:
>> On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>
>>>
>>>
>>> On 2020/03/11 3:39, Magnus Hagander wrote:
>>>> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2020/03/10 22:43, Amit Langote wrote:
>>>>>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>>>>> So, I will make the patch adding support for --no-estimate-size option
>>>>>>>> in pg_basebackup.
>>>>>>>
>>>>>>> Patch attached.
>>>>>>
>>>>>> Like the idea and the patch looks mostly good.
>>>>>
>>>>> Thanks for reviewing the patch!
>>>>>
>>>>>> +      total size. If the estimation is disabled in
>>>>>> +      <application>pg_basebackup</application>
>>>>>> +      (i.e., <literal>--no-estimate-size</literal> option is specified),
>>>>>> +      this is always <literal>0</literal>.
>>>>>>
>>>>>> "always" seems unnecessary.
>>>>>
>>>>> Fixed.
>>>>>
>>>>>> +        This option prevents the server from estimating the total
>>>>>> +        amount of backup data that will be streamed. In other words,
>>>>>> +        <literal>backup_total</literal> column in the
>>>>>> +        <structname>pg_stat_progress_basebackup</structname>
>>>>>> +        view always indicates <literal>0</literal> if this option is enabled.
>>>>>>
>>>>>> Here too.
>>>>>
>>>>> Fixed.
>>>>>
>>>>> Attached is the updated version of the patch.
>>>>
>>>> Would it perhaps be better to return NULL instead of 0 in the
>>>> statistics view if there is no data?
>>
>> Did you miss this comment, or not agree? :)
> 
> Oh, I forgot to attached the patch... Patch attached.
> This patch needs to be applied after applying
> add_no_estimate_size_v3.patch.
> 
>>>> Also, should it really  be the server version that decides how this
>>>> feature behaves, and not the pg_basebackup version? Given that the
>>>> implementation is entirely in the client, it seems that's more
>>>> logical?
>>>
>>> Yeah, you're right. I changed the patch that way.
>>> Attached is the updated version of the patch.
>>
>> The other changes in it look good!
> 
> Thanks for the review!

Pushed! Thanks!

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/19 12:02, Amit Langote wrote:
> On Thu, Mar 19, 2020 at 11:45 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> On 2020/03/19 11:32, Amit Langote wrote:
>>> On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera
>>> <alvherre@2ndquadrant.com> wrote:
>>>> On 2020-Mar-19, Amit Langote wrote:
>>>>
>>>>> Magnus' idea of checking the values in pg_stat_get_progress_info() to
>>>>> determine whether to return NULL seems fine to me.
>>
>> So you think that the latest patch is good enough?
> 
> I see that the latest patch modifies pg_stat_progress_basebackup view
> to return NULL, so not exactly.  IIUC, Magnus seems to be advocating
> to *centralize* this in pg_stat_get_progress_info(), which all views
> are based on, which means we need to globally define a NULL param
> value, as Alvaro also pointed out.
> 
> But...
> 
>>>>>   We will need to
>>>>> update the documentation of st_progress_param, because it currently
>>>>> says:
>>>>>
>>>>>        *  ...but the meaning of each element in the
>>>>>        * st_progress_param array is command-specific.
>>>>>        */
>>>>>       ProgressCommandType st_progress_command;
>>>>>       Oid         st_progress_command_target;
>>>>>       int64       st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
>>>>> } PgBackendStatus;
>>>>>
>>>>> If we are to define -1 in st_progress_param[] as NULL to the users,
>>>>> that must be mentioned here.
>>>>
>>>> Hmm, why -1?  It seems like a value that we might want to use for other
>>>> purposes in other params.  Maybe INT64_MIN is a better choice?
>>>
>>> Yes, maybe.
>>
>> I don't think that we need to define the specific value like -1 as NULL globally.
>> Which value should be used for that purpose may vary by each command. Only for
>> pg_stat_progress_basebackup.backup_total, IMO using -1 as special value for
>> NULL is not so bad idea.
> 
> This is the first instance of needing to display NULL in a progress
> view, so a non-general solution may be enough for now.  IOW, your
> latest patch is good enough for that. :)

Ok, so barring any objection, I will commit the latest patch.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Andres Freund
Дата:
Hi,

On 2020-03-19 17:21:38 +0900, Fujii Masao wrote:
> Pushed! Thanks!

FWIW, I'm a bit doubtful that incuring the overhead of this by default
on everybody is a nice thing. On filesystems with high latency and with
a lot of small relations the overhead of stating a lot of files can be
almost as high as the actual base backup.

Greetings,

Andres Freund



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/20 3:39, Andres Freund wrote:
> Hi,
> 
> On 2020-03-19 17:21:38 +0900, Fujii Masao wrote:
>> Pushed! Thanks!
> 
> FWIW, I'm a bit doubtful that incuring the overhead of this by default
> on everybody is a nice thing. On filesystems with high latency and with
> a lot of small relations the overhead of stating a lot of files can be
> almost as high as the actual base backup.

Yeah, so if we receive lots of complaints like that during beta and
RC phases, we should consider to change the default behavior.

Also maybe I should measure how long the estimation takes on the env
where, for example, ten thousand tables (i.e., files) exist, in order to
whether the default behavior is really time-consuming or not?

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side

От
Fujii Masao
Дата:

On 2020/03/19 17:22, Fujii Masao wrote:
> 
> 
> On 2020/03/19 12:02, Amit Langote wrote:
>> On Thu, Mar 19, 2020 at 11:45 AM Fujii Masao
>> <masao.fujii@oss.nttdata.com> wrote:
>>> On 2020/03/19 11:32, Amit Langote wrote:
>>>> On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera
>>>> <alvherre@2ndquadrant.com> wrote:
>>>>> On 2020-Mar-19, Amit Langote wrote:
>>>>>
>>>>>> Magnus' idea of checking the values in pg_stat_get_progress_info() to
>>>>>> determine whether to return NULL seems fine to me.
>>>
>>> So you think that the latest patch is good enough?
>>
>> I see that the latest patch modifies pg_stat_progress_basebackup view
>> to return NULL, so not exactly.  IIUC, Magnus seems to be advocating
>> to *centralize* this in pg_stat_get_progress_info(), which all views
>> are based on, which means we need to globally define a NULL param
>> value, as Alvaro also pointed out.
>>
>> But...
>>
>>>>>>   We will need to
>>>>>> update the documentation of st_progress_param, because it currently
>>>>>> says:
>>>>>>
>>>>>>        *  ...but the meaning of each element in the
>>>>>>        * st_progress_param array is command-specific.
>>>>>>        */
>>>>>>       ProgressCommandType st_progress_command;
>>>>>>       Oid         st_progress_command_target;
>>>>>>       int64       st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
>>>>>> } PgBackendStatus;
>>>>>>
>>>>>> If we are to define -1 in st_progress_param[] as NULL to the users,
>>>>>> that must be mentioned here.
>>>>>
>>>>> Hmm, why -1?  It seems like a value that we might want to use for other
>>>>> purposes in other params.  Maybe INT64_MIN is a better choice?
>>>>
>>>> Yes, maybe.
>>>
>>> I don't think that we need to define the specific value like -1 as NULL globally.
>>> Which value should be used for that purpose may vary by each command. Only for
>>> pg_stat_progress_basebackup.backup_total, IMO using -1 as special value for
>>> NULL is not so bad idea.
>>
>> This is the first instance of needing to display NULL in a progress
>> view, so a non-general solution may be enough for now.  IOW, your
>> latest patch is good enough for that. :)
> 
> Ok, so barring any objection, I will commit the latest patch.

Pushed! Thanks all!

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters