Обсуждение: Standby accepts recovery_target_timeline setting?

Поиск
Список
Период
Сортировка

Standby accepts recovery_target_timeline setting?

От
David Steele
Дата:
While testing against PG12 I noticed the documentation states that
recovery targets are not valid when standby.signal is present.

But surely the exception is recovery_target_timeline?  My testing
confirms that this works just as in prior versions with standy_mode=on.

Documentation patch is attached.

Regards,
-- 
-David
david@pgmasters.net

Вложения

Re: Standby accepts recovery_target_timeline setting?

От
Fujii Masao
Дата:
On Thu, Sep 26, 2019 at 5:22 AM David Steele <david@pgmasters.net> wrote:
>
> While testing against PG12 I noticed the documentation states that
> recovery targets are not valid when standby.signal is present.

Or that description in the doc is not true? Other recovery target
parameters seem to take effect even when standby.signal exists.

Regards,

-- 
Fujii Masao



Re: Standby accepts recovery_target_timeline setting?

От
David Steele
Дата:
On 9/26/19 5:55 AM, Fujii Masao wrote:
> On Thu, Sep 26, 2019 at 5:22 AM David Steele <david@pgmasters.net> wrote:
>>
>> While testing against PG12 I noticed the documentation states that
>> recovery targets are not valid when standby.signal is present.
> 
> Or that description in the doc is not true? Other recovery target
> parameters seem to take effect even when standby.signal exists.

Yes, and this is true for or any combination of recovery.signal and
standby signal as far as I can see.  We have been tracking down some
strange behaviors over the last few days as we have been adding PG12
support to pgBackRest.  Late in the day I know, but we just got the
relevant code migrated to C and we did not fancy coding it twice.

The main thing is if you set recovery_target_time in
postgresql.auto.conf then recovery will always try to hit that target
with any combination of recovery.signal and standby.signal.  But
target_action is only active when recovery.signal, standby.signal, or
both are present.

All these tests were done on 12rc1.

So given this postgresql.auto.conf:

recovery_target_time = '2019-09-26 14:39:51.280711+00'
restore_command = 'cp /home/vagrant/test/archive/%f "%p"'
recovery_target_timeline = current
recovery_target_action = promote

And these settings added to postgresql.conf:

wal_level = replica
archive_mode = on
archive_command = 'test ! -f /home/vagrant/test/archive/%f && cp %p
/home/vagrant/test/archive/%f'

And this backup_label:

START WAL LOCATION: 0/2000028 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/2000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2019-09-26 14:39:49 UTC
LABEL: pg_basebackup base backup
START TIMELINE: 1

The backup we are recovering contains a table that exists at the target
time but is dropped after that as an additional confirmation.  In all
the recovery scenarios below the table exists after recovery.

Here's what recovery looks like with recovery.signal:

2019-09-26 14:49:52.758 UTC [25353] LOG:  database system was
interrupted; last known up at 2019-09-26 14:39:49 UTC
2019-09-26 14:49:52.824 UTC [25353] LOG:  starting point-in-time
recovery to 2019-09-26 14:39:51.280711+00
2019-09-26 14:49:52.836 UTC [25353] LOG:  restored log file
"000000010000000000000002" from archive
2019-09-26 14:49:52.885 UTC [25353] LOG:  redo starts at 0/2000028
2019-09-26 14:49:52.894 UTC [25353] LOG:  consistent recovery state
reached at 0/2000100
2019-09-26 14:49:52.894 UTC [25352] LOG:  database system is ready to
accept read only connections
2019-09-26 14:49:52.905 UTC [25353] LOG:  restored log file
"000000010000000000000003" from archive
2019-09-26 14:49:52.940 UTC [25353] LOG:  recovery stopping before
commit of transaction 487, time 2019-09-26 14:39:54.981557+00
2019-09-26 14:49:52.940 UTC [25353] LOG:  redo done at 0/30096A0
cp: cannot stat '/home/vagrant/test/archive/00000002.history': No such
file or directory
2019-09-26 14:49:52.943 UTC [25353] LOG:  selected new timeline ID: 2
2019-09-26 14:49:52.998 UTC [25353] LOG:  archive recovery complete
2019-09-26 14:49:52.998 UTC [25353] LOG:  database system is ready to
accept connections

This is completely normal and what you would expect.

Now without recovery.signal from a fresh restore:

2019-09-26 14:52:29.491 UTC [25409] LOG:  database system was
interrupted; last known up at 2019-09-26 14:39:49 UTC
2019-09-26 14:52:29.574 UTC [25409] LOG:  restored log file
"000000010000000000000002" from archive
2019-09-26 14:52:29.622 UTC [25409] LOG:  redo starts at 0/2000028
2019-09-26 14:52:29.631 UTC [25409] LOG:  consistent recovery state
reached at 0/2000100
2019-09-26 14:52:29.642 UTC [25409] LOG:  restored log file
"000000010000000000000003" from archive
2019-09-26 14:52:29.666 UTC [25409] LOG:  recovery stopping before
commit of transaction 487, time 2019-09-26 14:39:54.981557+00
2019-09-26 14:52:29.666 UTC [25409] LOG:  redo done at 0/30096A0
2019-09-26 14:52:29.716 UTC [25408] LOG:  database system is ready to
accept connections

Now there is no "starting point-in-time recovery" message but we are
still stopping in the same place, "recovery stopping before commit of
transaction 487".  There is no promotion so now we are now logging on
timeline 1 (so there are duplicate WAL errors as soon as archive_command
runs).  In PG < 12 you could do this by shutting down, removing
recovery.conf and restarting, but it is now much easier to end up on the
same timeline.

Now with with standby.signal only from a fresh restore:

2019-09-26 14:59:36.889 UTC [25522] LOG:  database system was
interrupted; last known up at 2019-09-26 14:39:49 UTC
2019-09-26 14:59:36.983 UTC [25522] LOG:  entering standby mode
2019-09-26 14:59:36.994 UTC [25522] LOG:  restored log file
"000000010000000000000002" from archive
2019-09-26 14:59:37.038 UTC [25522] LOG:  redo starts at 0/2000028
2019-09-26 14:59:37.047 UTC [25522] LOG:  consistent recovery state
reached at 0/2000100
2019-09-26 14:59:37.047 UTC [25521] LOG:  database system is ready to
accept read only connections
2019-09-26 14:59:37.061 UTC [25522] LOG:  restored log file
"000000010000000000000003" from archive
2019-09-26 14:59:37.093 UTC [25522] LOG:  recovery stopping before
commit of transaction 487, time 2019-09-26 14:39:54.981557+00
2019-09-26 14:59:37.093 UTC [25522] LOG:  redo done at 0/30096A0
cp: cannot stat '/home/vagrant/test/archive/00000002.history': No such
file or directory
2019-09-26 14:59:37.097 UTC [25522] LOG:  selected new timeline ID: 2
2019-09-26 14:59:37.270 UTC [25522] LOG:  archive recovery complete
cp: cannot stat '/home/vagrant/test/archive/00000001.history': No such
file or directory
2019-09-26 14:59:37.338 UTC [25521] LOG:  database system is ready to
accept connections

The cluster starts in standby mode, hits the recovery target,
then promotes even though no recovery.signal is present.

And finally with both recovery.signal and standby.signal you get the
same as with standby.signal only.

I was able to get the same results using an xid target:

recovery_target_xid = 487
recovery_target_inclusive = false

All of this is roughly analogous to use cases that were possible
before, but there were fewer permutations then.  You had no standby and
no recovery target without recovery.conf so "recovery.signal" was always
there, more or less.

At the very least, according to the docs, none of the target options are
supposed to be active unless recovery.signal is in place.  Since
outdated entries in postgresql.auto.conf can have effect even in
the absence of recovery.signal, it seems pretty important to make sure
that mechanism is working correctly - or that the caveat is clearly
documented.

I do think this issue needs to be addressed before GA.

Fujii -- I just became aware of your email at [1] so I'll respond to
that as well.

-- 
-David
david@pgmasters.net

[1]
https://www.postgresql.org/message-id/CAHGQGwEYYg_Ng%2B03FtZczacCpYgJ2Pn%3DB_wPtWF%2BFFLYDgpa1g%40mail.gmail.com



Re: Standby accepts recovery_target_timeline setting?

От
Peter Eisentraut
Дата:
On 2019-09-25 22:21, David Steele wrote:
> While testing against PG12 I noticed the documentation states that
> recovery targets are not valid when standby.signal is present.
> 
> But surely the exception is recovery_target_timeline?  My testing
> confirms that this works just as in prior versions with standy_mode=on.

Or maybe we should move recovery_target_timeline to a different section?
 But which one?

I don't know if recovery_target_timeline is actually useful to change in
standby mode.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Standby accepts recovery_target_timeline setting?

От
David Steele
Дата:
On 9/26/19 4:48 PM, Peter Eisentraut wrote:
> On 2019-09-25 22:21, David Steele wrote:
>> While testing against PG12 I noticed the documentation states that
>> recovery targets are not valid when standby.signal is present.
>>
>> But surely the exception is recovery_target_timeline?  My testing
>> confirms that this works just as in prior versions with standy_mode=on.
> 
> Or maybe we should move recovery_target_timeline to a different section?
>   But which one?

Not sure.  I think just noting it as an exception is OK, if it is the 
only exception.  But currently that does not seem to be the case.

> I don't know if recovery_target_timeline is actually useful to change in
> standby mode.

It is.  I just dealt with a split-brain case that required the standbys 
to be rebuilt on a specific timeline (not latest).

Of course, you could do recovery on that timeline, shutdown, and then 
bring the cluster back up as a standby, but that seems like a lot of 
extra work.

But as Fujii noted and I've demonstrated in the follow-up pretty much 
all target options are allowed for standby recovery.  I don't think that 
makes sense, personally, but apparently it was allowed in prior versions 
so we'll need to think carefully before disallowing it.

-- 
-David
david@pgmasters.net



Re: Standby accepts recovery_target_timeline setting?

От
Peter Eisentraut
Дата:
On 2019-09-26 23:02, David Steele wrote:
> On 9/26/19 4:48 PM, Peter Eisentraut wrote:
>> On 2019-09-25 22:21, David Steele wrote:
>>> While testing against PG12 I noticed the documentation states that
>>> recovery targets are not valid when standby.signal is present.
>>>
>>> But surely the exception is recovery_target_timeline?  My testing
>>> confirms that this works just as in prior versions with standy_mode=on.
>>
>> Or maybe we should move recovery_target_timeline to a different section?
>>   But which one?
> 
> Not sure.  I think just noting it as an exception is OK, if it is the 
> only exception.  But currently that does not seem to be the case.
> 
>> I don't know if recovery_target_timeline is actually useful to change in
>> standby mode.

OK, I have committed your original documentation patch.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Standby accepts recovery_target_timeline setting?

От
David Steele
Дата:
Hi Peter,

On 9/27/19 10:36 AM, Peter Eisentraut wrote:
> On 2019-09-26 23:02, David Steele wrote:
>> On 9/26/19 4:48 PM, Peter Eisentraut wrote:
>>
>>> I don't know if recovery_target_timeline is actually useful to change in
>>> standby mode.
> 
> OK, I have committed your original documentation patch.

Thanks, that's a good start.

As Fujii noticed and I have demonstrated upthread, just about any target
setting can be used in a standby restore.  This matches the behavior of
prior versions so it's not exactly a regression, but the old docs made
no claim that standby_mode disabled targeted restore.

If fact, for both PG12 and before, setting a recovery target in standby
mode causes the cluster to drop out of standby mode.

Also, the presence or absence of recovery.signal does not seem to have
any effect on how targeted recovery proceeds, except as Fujii has
demonstrated in [1].

I'm not sure what the best thing to do is.  The docs are certainly
incorrect, but fixing them would be weird.  What do we say, setting
targets will exit standby mode?  That certainly what happens, though.

Also, the fact that target settings are being used when recovery.signal
is missing is contrary to the docs, and this is a new behavior in PG12.
 Prior to 12 you could not have target settings without recovery.conf
being present by definition.

I think, at the very least, the fact that targeted recovery proceeds in
the absence of recovery.signal represents a bug.

-- 
-David
david@pgmasters.net

[1]
https://www.postgresql.org/message-id/CAHGQGwEYYg_Ng%2B03FtZczacCpYgJ2Pn%3DB_wPtWF%2BFFLYDgpa1g%40mail.gmail.com



Re: Standby accepts recovery_target_timeline setting?

От
Fujii Masao
Дата:
On Sat, Sep 28, 2019 at 12:14 AM David Steele <david@pgmasters.net> wrote:
>
> Hi Peter,
>
> On 9/27/19 10:36 AM, Peter Eisentraut wrote:
> > On 2019-09-26 23:02, David Steele wrote:
> >> On 9/26/19 4:48 PM, Peter Eisentraut wrote:
> >>
> >>> I don't know if recovery_target_timeline is actually useful to change in
> >>> standby mode.
> >
> > OK, I have committed your original documentation patch.
>
> Thanks, that's a good start.
>
> As Fujii noticed and I have demonstrated upthread, just about any target
> setting can be used in a standby restore.  This matches the behavior of
> prior versions so it's not exactly a regression, but the old docs made
> no claim that standby_mode disabled targeted restore.
>
> If fact, for both PG12 and before, setting a recovery target in standby
> mode causes the cluster to drop out of standby mode.
>
> Also, the presence or absence of recovery.signal does not seem to have
> any effect on how targeted recovery proceeds, except as Fujii has
> demonstrated in [1].
>
> I'm not sure what the best thing to do is.  The docs are certainly
> incorrect, but fixing them would be weird.  What do we say, setting
> targets will exit standby mode?  That certainly what happens, though.
>
> Also, the fact that target settings are being used when recovery.signal
> is missing is contrary to the docs, and this is a new behavior in PG12.
>  Prior to 12 you could not have target settings without recovery.conf
> being present by definition.
>
> I think, at the very least, the fact that targeted recovery proceeds in
> the absence of recovery.signal represents a bug.

Yes, recovery target settings are used even when neither backup_label
nor recovery.signal exist, i.e., just a crash recovery, in v12. This is
completely different behavior from prior versions.

IMO, since v12 is RC1 now, it's not good idea to change the logic to new.
So at least for v12, we basically should change the recovery logic so that
it behaves in the same way as prior versions. That is,

- Stop the recovery with an error if any recovery target is set in
   crash recovery
- Use recovery target settings if set even when standby mode
- Do not enter an archive recovery mode if recovery.signal is missing

Thought?

If we want new behavior in recovery, we can change the logic for v13.

Regards,

-- 
Fujii Masao



Re: Standby accepts recovery_target_timeline setting?

От
David Steele
Дата:
On 9/27/19 11:58 AM, Fujii Masao wrote:
> On Sat, Sep 28, 2019 at 12:14 AM David Steele <david@pgmasters.net> wrote:
>>
>> I think, at the very least, the fact that targeted recovery proceeds in
>> the absence of recovery.signal represents a bug.
> 
> Yes, recovery target settings are used even when neither backup_label
> nor recovery.signal exist, i.e., just a crash recovery, in v12. This is
> completely different behavior from prior versions.

I'm not able to reproduce this.  I only see recovery settings being used
if backup_label, recovery.signal, or standby.signal is present.

Do you have an example?

> IMO, since v12 is RC1 now, it's not good idea to change the logic to new.
> So at least for v12, we basically should change the recovery logic so that
> it behaves in the same way as prior versions. That is,
> 
> - Stop the recovery with an error if any recovery target is set in
>    crash recovery

This seems reasonable.  I tried adding a recovery.signal and
restore_command for crash recovery and I just got an error that it
couldn't find 00000002.history in the archive.

> - Use recovery target settings if set even when standby mode

Yes, this is weird, but it is present in current versions.

> - Do not enter an archive recovery mode if recovery.signal is missing

Agreed.  Perhaps it's OK to use restore_command if a backup_label is
present, but we certainly should not be doing targeted recovery.

> If we want new behavior in recovery, we can change the logic for v13.

Agreed, but it's not at all clear to me how invasive these changes would be.

Regards,
-- 
-David
david@pgmasters.net



Re: Standby accepts recovery_target_timeline setting?

От
Fujii Masao
Дата:
On Sat, Sep 28, 2019 at 2:01 AM David Steele <david@pgmasters.net> wrote:
>
> On 9/27/19 11:58 AM, Fujii Masao wrote:
> > On Sat, Sep 28, 2019 at 12:14 AM David Steele <david@pgmasters.net> wrote:
> >>
> >> I think, at the very least, the fact that targeted recovery proceeds in
> >> the absence of recovery.signal represents a bug.
> >
> > Yes, recovery target settings are used even when neither backup_label
> > nor recovery.signal exist, i.e., just a crash recovery, in v12. This is
> > completely different behavior from prior versions.
>
> I'm not able to reproduce this.  I only see recovery settings being used
> if backup_label, recovery.signal, or standby.signal is present.
>
> Do you have an example?

Yes, here is the example:

initdb -D data
pg_ctl -D data start
psql -c "select pg_create_restore_point('hoge')"
psql -c "alter system set recovery_target_name to 'hoge'"
psql -c "create table test as select num from generate_series(1, 100) num"
pg_ctl -D data -m i stop
pg_ctl -D data start

After restarting the server at the above final step, you will see
the following log messages indicating that recovery stops at
recovery_target_name.

2019-09-28 22:42:04.849 JST [16944] LOG:  recovery stopping at restore
point "hoge", time 2019-09-28 22:42:03.86558+09
2019-09-28 22:42:04.849 JST [16944] FATAL:  requested recovery stop
point is before consistent recovery point

> > IMO, since v12 is RC1 now, it's not good idea to change the logic to new.
> > So at least for v12, we basically should change the recovery logic so that
> > it behaves in the same way as prior versions. That is,
> >
> > - Stop the recovery with an error if any recovery target is set in
> >    crash recovery
>
> This seems reasonable.  I tried adding a recovery.signal and
> restore_command for crash recovery and I just got an error that it
> couldn't find 00000002.history in the archive.

You added recovery.signal, so it means that you started an archive recovery,
not crash recovery. Right?

Anyway I'm thinking to apply something like attached patch, to emit an error
if recovery target is set in crash recovery.

> > - Use recovery target settings if set even when standby mode
>
> Yes, this is weird, but it is present in current versions.

Yes, and some users might be using this current behavior.
If we keep this behavior as it is in v12, the documentation
needs to be corrected.

> > - Do not enter an archive recovery mode if recovery.signal is missing
>
> Agreed.  Perhaps it's OK to use restore_command if a backup_label is
> present

Yeah, it's maybe OK, but differenet behavior from current version.
So, at least for v12, I'm inclined to prevent crash recovery with backup_label
from using restore_command, i.e., only WAL files in pg_wal will be replayed
in this case.

Regards,

-- 
Fujii Masao

Вложения

Re: Standby accepts recovery_target_timeline setting?

От
David Steele
Дата:
On 9/28/19 10:54 AM, Fujii Masao wrote:
> On Sat, Sep 28, 2019 at 2:01 AM David Steele <david@pgmasters.net> wrote:
>> On 9/27/19 11:58 AM, Fujii Masao wrote:
>>>
>>> Yes, recovery target settings are used even when neither backup_label
>>> nor recovery.signal exist, i.e., just a crash recovery, in v12. This is
>>> completely different behavior from prior versions.
>>
>> I'm not able to reproduce this.  I only see recovery settings being used
>> if backup_label, recovery.signal, or standby.signal is present.
>>
>> Do you have an example?
> 
> Yes, here is the example:
> 
> initdb -D data
> pg_ctl -D data start
> psql -c "select pg_create_restore_point('hoge')"
> psql -c "alter system set recovery_target_name to 'hoge'"
> psql -c "create table test as select num from generate_series(1, 100) num"
> pg_ctl -D data -m i stop
> pg_ctl -D data start
> 
> After restarting the server at the above final step, you will see
> the following log messages indicating that recovery stops at
> recovery_target_name.
> 
> 2019-09-28 22:42:04.849 JST [16944] LOG:  recovery stopping at restore
> point "hoge", time 2019-09-28 22:42:03.86558+09
> 2019-09-28 22:42:04.849 JST [16944] FATAL:  requested recovery stop
> point is before consistent recovery point

That's definitely not good behavior.

>>> IMO, since v12 is RC1 now, it's not good idea to change the logic to new.
>>> So at least for v12, we basically should change the recovery logic so that
>>> it behaves in the same way as prior versions. That is,
>>>
>>> - Stop the recovery with an error if any recovery target is set in
>>>    crash recovery
>>
>> This seems reasonable.  I tried adding a recovery.signal and
>> restore_command for crash recovery and I just got an error that it
>> couldn't find 00000002.history in the archive.
> 
> You added recovery.signal, so it means that you started an archive recovery,
> not crash recovery. Right?

Correct.

> Anyway I'm thinking to apply something like attached patch, to emit an error
> if recovery target is set in crash recovery.

The patch looks reasonable.

>>> - Do not enter an archive recovery mode if recovery.signal is missing
>>
>> Agreed.  Perhaps it's OK to use restore_command if a backup_label is
>> present
> 
> Yeah, it's maybe OK, but differenet behavior from current version.
> So, at least for v12, I'm inclined to prevent crash recovery with backup_label
> from using restore_command, i.e., only WAL files in pg_wal will be replayed
> in this case.

Agreed.  Seems like that could be added to the patch above easily
enough.  More checks would be needed to prevent the behaviors I've been
seeing in the other thread, but it should be possible to more or less
mimic the old behavior with sufficient checks.

Regards,
-- 
-David
david@pgmasters.net



Re: Standby accepts recovery_target_timeline setting?

От
Fujii Masao
Дата:
On Sun, Sep 29, 2019 at 12:51 AM David Steele <david@pgmasters.net> wrote:
>
> On 9/28/19 10:54 AM, Fujii Masao wrote:
> > On Sat, Sep 28, 2019 at 2:01 AM David Steele <david@pgmasters.net> wrote:
> >> On 9/27/19 11:58 AM, Fujii Masao wrote:
> >>>
> >>> Yes, recovery target settings are used even when neither backup_label
> >>> nor recovery.signal exist, i.e., just a crash recovery, in v12. This is
> >>> completely different behavior from prior versions.
> >>
> >> I'm not able to reproduce this.  I only see recovery settings being used
> >> if backup_label, recovery.signal, or standby.signal is present.
> >>
> >> Do you have an example?
> >
> > Yes, here is the example:
> >
> > initdb -D data
> > pg_ctl -D data start
> > psql -c "select pg_create_restore_point('hoge')"
> > psql -c "alter system set recovery_target_name to 'hoge'"
> > psql -c "create table test as select num from generate_series(1, 100) num"
> > pg_ctl -D data -m i stop
> > pg_ctl -D data start
> >
> > After restarting the server at the above final step, you will see
> > the following log messages indicating that recovery stops at
> > recovery_target_name.
> >
> > 2019-09-28 22:42:04.849 JST [16944] LOG:  recovery stopping at restore
> > point "hoge", time 2019-09-28 22:42:03.86558+09
> > 2019-09-28 22:42:04.849 JST [16944] FATAL:  requested recovery stop
> > point is before consistent recovery point
>
> That's definitely not good behavior.
>
> >>> IMO, since v12 is RC1 now, it's not good idea to change the logic to new.
> >>> So at least for v12, we basically should change the recovery logic so that
> >>> it behaves in the same way as prior versions. That is,
> >>>
> >>> - Stop the recovery with an error if any recovery target is set in
> >>>    crash recovery
> >>
> >> This seems reasonable.  I tried adding a recovery.signal and
> >> restore_command for crash recovery and I just got an error that it
> >> couldn't find 00000002.history in the archive.
> >
> > You added recovery.signal, so it means that you started an archive recovery,
> > not crash recovery. Right?
>
> Correct.
>
> > Anyway I'm thinking to apply something like attached patch, to emit an error
> > if recovery target is set in crash recovery.
>
> The patch looks reasonable.
>
> >>> - Do not enter an archive recovery mode if recovery.signal is missing
> >>
> >> Agreed.  Perhaps it's OK to use restore_command if a backup_label is
> >> present
> >
> > Yeah, it's maybe OK, but differenet behavior from current version.
> > So, at least for v12, I'm inclined to prevent crash recovery with backup_label
> > from using restore_command, i.e., only WAL files in pg_wal will be replayed
> > in this case.
>
> Agreed.  Seems like that could be added to the patch above easily
> enough.  More checks would be needed to prevent the behaviors I've been
> seeing in the other thread, but it should be possible to more or less
> mimic the old behavior with sufficient checks.

Yeah, more checks would be necessary. IMO easy fix is to forbid not only
recovery target parameters but also any recovery parameters (specified
in recovery.conf in previous versions) in crash recovery.

In v11 or before, any parameters in recovery.conf cannot take effect in
crash recovery because crash recovery always starts without recovery.conf.
But in v12, those parameters are specified in postgresql.conf,
so they may take effect even in crash recovery (i.e., when both
recovery.signal and standby.signal are missing). This would be the root
cause of the problems that we are discussing, I think.

There might be some recovery parameters that we can safely use
even in crash recovery, e.g., maybe recovery_end_command
(now, you can see that recovery_end_command is executed in
crash recovery in v12). But at this stage of v12, it's worth thinking to
just cause crash recovery to exit with an error when any recovery
parameter is set. Thought?

Or if that change is overkill, alternatively we can make crash recovery
"ignore" any recovery parameters, e.g., by forcibly disabling
the parameters.

Regards,

-- 
Fujii Masao



Re: Standby accepts recovery_target_timeline setting?

От
Tom Lane
Дата:
Fujii Masao <masao.fujii@gmail.com> writes:
>> Agreed.  Seems like that could be added to the patch above easily
>> enough.  More checks would be needed to prevent the behaviors I've been
>> seeing in the other thread, but it should be possible to more or less
>> mimic the old behavior with sufficient checks.

> Yeah, more checks would be necessary. IMO easy fix is to forbid not only
> recovery target parameters but also any recovery parameters (specified
> in recovery.conf in previous versions) in crash recovery.

> In v11 or before, any parameters in recovery.conf cannot take effect in
> crash recovery because crash recovery always starts without recovery.conf.
> But in v12, those parameters are specified in postgresql.conf,
> so they may take effect even in crash recovery (i.e., when both
> recovery.signal and standby.signal are missing). This would be the root
> cause of the problems that we are discussing, I think.

So ... what I'm wondering about here is what happens during *actual* crash
recovery, eg a postmaster-driven restart of the startup process after
a backend crash in hot standby.  The direction you guys are going in
seems likely to cause the startup process to refuse to function until
those parameters are removed from postgresql.conf, which seems quite
user-unfriendly.

Maybe I'm misunderstanding, but I think that rather than adding error
checks that were not there before, the right path to fixing this is
to cause these settings to be ignored if we're doing crash recovery.
Not make the user take them out (and possibly later put them back).

            regards, tom lane



Re: Standby accepts recovery_target_timeline setting?

От
David Steele
Дата:
On 9/28/19 1:26 PM, Fujii Masao wrote:
> On Sun, Sep 29, 2019 at 12:51 AM David Steele <david@pgmasters.net> wrote:
> 
> Yeah, more checks would be necessary. IMO easy fix is to forbid not only
> recovery target parameters but also any recovery parameters (specified
> in recovery.conf in previous versions) in crash recovery.
> 
> In v11 or before, any parameters in recovery.conf cannot take effect in
> crash recovery because crash recovery always starts without recovery.conf.
> But in v12, those parameters are specified in postgresql.conf,
> so they may take effect even in crash recovery (i.e., when both
> recovery.signal and standby.signal are missing). This would be the root
> cause of the problems that we are discussing, I think.
> 
> There might be some recovery parameters that we can safely use
> even in crash recovery, e.g., maybe recovery_end_command
> (now, you can see that recovery_end_command is executed in
> crash recovery in v12). But at this stage of v12, it's worth thinking to
> just cause crash recovery to exit with an error when any recovery
> parameter is set. Thought?

I dislike the idea of crash recovery throwing fatal errors because there
are recovery settings in postgresql.auto.conf.  Since there is no
defined mechanism for cleaning out old recovery settings we have to
assume that they will persist (and accumulate) more or less forever.

> Or if that change is overkill, alternatively we can make crash recovery
> "ignore" any recovery parameters, e.g., by forcibly disabling
> the parameters.

I'd rather load recovery settings *only* if recovery.signal or
standby.signal is present and do this only after crash recovery is
complete, i.e. in the absence of backup_label.

I think blindly loading recovery settings then trying to ignore them
later is pretty much why we are having these issues in the first place.
 I'd rather not extend that pattern if possible.

Regards,
-- 
-David
david@pgmasters.net



Re: Standby accepts recovery_target_timeline setting?

От
Peter Eisentraut
Дата:
On 2019-09-28 19:45, Tom Lane wrote:
> Maybe I'm misunderstanding, but I think that rather than adding error
> checks that were not there before, the right path to fixing this is
> to cause these settings to be ignored if we're doing crash recovery.

That makes sense to me.  Something like this (untested)?

diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 0daab3ff4b..25cae57131 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5618,6 +5618,13 @@ recoveryStopsBefore(XLogReaderState *record)
     TimestampTz recordXtime = 0;
     TransactionId recordXid;

+    /*
+     * Ignore recovery target settings when not in archive recovery (meaning
+     * we are in crash recovery).
+     */
+    if (!InArchiveRecovery)
+        return false;
+
     /* Check if we should stop as soon as reaching consistency */
     if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE && reachedConsistency)
     {
@@ -5759,6 +5766,13 @@ recoveryStopsAfter(XLogReaderState *record)
     uint8        rmid;
     TimestampTz recordXtime;

+    /*
+     * Ignore recovery target settings when not in archive recovery (meaning
+     * we are in crash recovery).
+     */
+    if (!InArchiveRecovery)
+        return false;
+
     info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
     rmid = XLogRecGetRmid(record);



-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Standby accepts recovery_target_timeline setting?

От
Fujii Masao
Дата:
On Sun, Sep 29, 2019 at 6:08 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2019-09-28 19:45, Tom Lane wrote:
> > Maybe I'm misunderstanding, but I think that rather than adding error
> > checks that were not there before, the right path to fixing this is
> > to cause these settings to be ignored if we're doing crash recovery.
>
> That makes sense to me.

+1

> Something like this (untested)?

Yes, but ArchiveRecoveryRequested should be checked instead of
InArchiveRecovery, I think. Otherwise recovery targets would take effect
when recovery.signal is missing but backup_label exists. In this case,
InArchiveRecovery is set to true though ArchiveRecoveryRequested is
false because recovery.signal is missing.

With the attached patch, I checked that the steps that I described
upthread didn't reproduce the issue.

Regards,

-- 
Fujii Masao

Вложения

Re: Standby accepts recovery_target_timeline setting?

От
Peter Eisentraut
Дата:
On 2019-09-27 17:14, David Steele wrote:
> On 9/27/19 10:36 AM, Peter Eisentraut wrote:
>> On 2019-09-26 23:02, David Steele wrote:
>>> On 9/26/19 4:48 PM, Peter Eisentraut wrote:
>>>
>>>> I don't know if recovery_target_timeline is actually useful to change in
>>>> standby mode.
>> OK, I have committed your original documentation patch.
> Thanks, that's a good start.
> 
> As Fujii noticed and I have demonstrated upthread, just about any target
> setting can be used in a standby restore.  This matches the behavior of
> prior versions so it's not exactly a regression, but the old docs made
> no claim that standby_mode disabled targeted restore.

I have further fixed the documentation.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Standby accepts recovery_target_timeline setting?

От
Peter Eisentraut
Дата:
On 2019-09-29 18:36, Fujii Masao wrote:
> Yes, but ArchiveRecoveryRequested should be checked instead of
> InArchiveRecovery, I think. Otherwise recovery targets would take effect
> when recovery.signal is missing but backup_label exists. In this case,
> InArchiveRecovery is set to true though ArchiveRecoveryRequested is
> false because recovery.signal is missing.
> 
> With the attached patch, I checked that the steps that I described
> upthread didn't reproduce the issue.

Your patch looks correct to me.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Standby accepts recovery_target_timeline setting?

От
Fujii Masao
Дата:
On Mon, Sep 30, 2019 at 6:59 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2019-09-29 18:36, Fujii Masao wrote:
> > Yes, but ArchiveRecoveryRequested should be checked instead of
> > InArchiveRecovery, I think. Otherwise recovery targets would take effect
> > when recovery.signal is missing but backup_label exists. In this case,
> > InArchiveRecovery is set to true though ArchiveRecoveryRequested is
> > false because recovery.signal is missing.
> >
> > With the attached patch, I checked that the steps that I described
> > upthread didn't reproduce the issue.
>
> Your patch looks correct to me.

Thanks! So I committed the patch.

Also we need to do the same thing for other recovery options like
restore_command. Attached is the patch which makes crash recovery
ignore restore_command and recovery_end_command.

Regards,

-- 
Fujii Masao

Вложения

Re: Standby accepts recovery_target_timeline setting?

От
Stephen Frost
Дата:
Greetings,

* David Steele (david@pgmasters.net) wrote:
> On 9/28/19 1:26 PM, Fujii Masao wrote:
> > On Sun, Sep 29, 2019 at 12:51 AM David Steele <david@pgmasters.net> wrote:
> >
> > Yeah, more checks would be necessary. IMO easy fix is to forbid not only
> > recovery target parameters but also any recovery parameters (specified
> > in recovery.conf in previous versions) in crash recovery.
> >
> > In v11 or before, any parameters in recovery.conf cannot take effect in
> > crash recovery because crash recovery always starts without recovery.conf.
> > But in v12, those parameters are specified in postgresql.conf,
> > so they may take effect even in crash recovery (i.e., when both
> > recovery.signal and standby.signal are missing). This would be the root
> > cause of the problems that we are discussing, I think.
> >
> > There might be some recovery parameters that we can safely use
> > even in crash recovery, e.g., maybe recovery_end_command
> > (now, you can see that recovery_end_command is executed in
> > crash recovery in v12). But at this stage of v12, it's worth thinking to
> > just cause crash recovery to exit with an error when any recovery
> > parameter is set. Thought?
>
> I dislike the idea of crash recovery throwing fatal errors because there
> are recovery settings in postgresql.auto.conf.  Since there is no
> defined mechanism for cleaning out old recovery settings we have to
> assume that they will persist (and accumulate) more or less forever.
>
> > Or if that change is overkill, alternatively we can make crash recovery
> > "ignore" any recovery parameters, e.g., by forcibly disabling
> > the parameters.
>
> I'd rather load recovery settings *only* if recovery.signal or
> standby.signal is present and do this only after crash recovery is
> complete, i.e. in the absence of backup_label.
>
> I think blindly loading recovery settings then trying to ignore them
> later is pretty much why we are having these issues in the first place.
>  I'd rather not extend that pattern if possible.

Agreed.

Thanks,

Stephen

Вложения

Re: Standby accepts recovery_target_timeline setting?

От
Michael Paquier
Дата:
On Wed, Oct 02, 2019 at 03:30:38AM -0400, Stephen Frost wrote:
> * David Steele (david@pgmasters.net) wrote:
>> On 9/28/19 1:26 PM, Fujii Masao wrote:
>>> There might be some recovery parameters that we can safely use
>>> even in crash recovery, e.g., maybe recovery_end_command
>>> (now, you can see that recovery_end_command is executed in
>>> crash recovery in v12). But at this stage of v12, it's worth thinking to
>>> just cause crash recovery to exit with an error when any recovery
>>> parameter is set. Thought?
>>
>> I dislike the idea of crash recovery throwing fatal errors because there
>> are recovery settings in postgresql.auto.conf.  Since there is no
>> defined mechanism for cleaning out old recovery settings we have to
>> assume that they will persist (and accumulate) more or less forever.

Yeah, I don't think that's a good thing either.  That's a recipe to
make the user experience more confusing.

>>> Or if that change is overkill, alternatively we can make crash recovery
>>> "ignore" any recovery parameters, e.g., by forcibly disabling
>>> the parameters.
>>
>> I'd rather load recovery settings *only* if recovery.signal or
>> standby.signal is present and do this only after crash recovery is
>> complete, i.e. in the absence of backup_label.
>>
>> I think blindly loading recovery settings then trying to ignore them
>> later is pretty much why we are having these issues in the first place.
>>  I'd rather not extend that pattern if possible.
>
> Agreed.

That would mean that you need to create some new special handling,
while making sure that the process reading the parameters is not
confused either by default values.  It seems to me that if we are not
in recovery, the best thing was can do now is just to not process
anything those parameters trigger, instead of "ignoring" these (you
are referring to use SetConfigOption in the startup process here?).
--
Michael

Вложения

Re: Standby accepts recovery_target_timeline setting?

От
Fujii Masao
Дата:
On Fri, Oct 4, 2019 at 6:09 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Oct 02, 2019 at 03:30:38AM -0400, Stephen Frost wrote:
> > * David Steele (david@pgmasters.net) wrote:
> >> On 9/28/19 1:26 PM, Fujii Masao wrote:
> >>> There might be some recovery parameters that we can safely use
> >>> even in crash recovery, e.g., maybe recovery_end_command
> >>> (now, you can see that recovery_end_command is executed in
> >>> crash recovery in v12). But at this stage of v12, it's worth thinking to
> >>> just cause crash recovery to exit with an error when any recovery
> >>> parameter is set. Thought?
> >>
> >> I dislike the idea of crash recovery throwing fatal errors because there
> >> are recovery settings in postgresql.auto.conf.  Since there is no
> >> defined mechanism for cleaning out old recovery settings we have to
> >> assume that they will persist (and accumulate) more or less forever.
>
> Yeah, I don't think that's a good thing either.  That's a recipe to
> make the user experience more confusing.
>
> >>> Or if that change is overkill, alternatively we can make crash recovery
> >>> "ignore" any recovery parameters, e.g., by forcibly disabling
> >>> the parameters.
> >>
> >> I'd rather load recovery settings *only* if recovery.signal or
> >> standby.signal is present and do this only after crash recovery is
> >> complete, i.e. in the absence of backup_label.
> >>
> >> I think blindly loading recovery settings then trying to ignore them
> >> later is pretty much why we are having these issues in the first place.
> >>  I'd rather not extend that pattern if possible.
> >
> > Agreed.

Agreed, too. Do you have any idea to implement that? I've not found out
"smart" way to do that yet.

One idea is, as Michael suggested, to use SetConfigOption() for all the
archive recovery parameters at the beginning of the startup process as follows,
to forcibly set the default values if crash recovery is running. But this
seems not smart for me.

SetConfigOption("restore_command", ...);
SetConfigOption("archive_cleanup_command", ...);
SetConfigOption("recovery_end_command", ...);
...

Maybe we should make GUC mechanism notice signal files and ignore
archive recovery-related parameters when none of those files exist?
This change seems overkill at least in v12, though.

Regards,

-- 
Fujii Masao



Re: Standby accepts recovery_target_timeline setting?

От
Robert Haas
Дата:
On Mon, Oct 7, 2019 at 9:14 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> Agreed, too. Do you have any idea to implement that? I've not found out
> "smart" way to do that yet.
>
> One idea is, as Michael suggested, to use SetConfigOption() for all the
> archive recovery parameters at the beginning of the startup process as follows,
> to forcibly set the default values if crash recovery is running. But this
> seems not smart for me.
>
> SetConfigOption("restore_command", ...);
> SetConfigOption("archive_cleanup_command", ...);
> SetConfigOption("recovery_end_command", ...);
> ...
>
> Maybe we should make GUC mechanism notice signal files and ignore
> archive recovery-related parameters when none of those files exist?
> This change seems overkill at least in v12, though.

I think this approach is going in the wrong direction. In every other
part of the system, it's the job of the code around the GUC system to
use parameters when they're relevant and ignore them when they should
be ignored. Deciding that the parameters that were formerly part of
recovery.conf are an exception to that rule and that the GUC system is
responsible for making sure they're set only when we pay attention to
them seems like it's bringing back or exacerbating a code-level split
between recovery.conf parameters and postgresql.conf parameters when,
meanwhile, we've been wanting to eradicate that split so that the
things we allow for postgresql.conf parameters -- e.g. changing them
while they are running -- can be applied to these parameters also.

I don't particularly like the use of SetConfigOption() either,
although it does have some precedent in autovacuum, for example.
Generally, it's right and proper that the GUC system sets the
variables to which the parameters it controls are tied -- and then the
rest of the code has to do the right thing around that. It sounds like
the patch that got rid of recovery.conf wasn't considered carefully
enough, and missed the fact that it was introducing some inadvertent
behavior changes. That's too bad, but let's not overreact. It seems
totally fine to me to just add ad-hoc checks that rule out
inappropriately relying on these parameters while performing crash
recovery - and be done with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Standby accepts recovery_target_timeline setting?

От
Stephen Frost
Дата:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Oct 7, 2019 at 9:14 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> > Agreed, too. Do you have any idea to implement that? I've not found out
> > "smart" way to do that yet.
> >
> > One idea is, as Michael suggested, to use SetConfigOption() for all the
> > archive recovery parameters at the beginning of the startup process as follows,
> > to forcibly set the default values if crash recovery is running. But this
> > seems not smart for me.
> >
> > SetConfigOption("restore_command", ...);
> > SetConfigOption("archive_cleanup_command", ...);
> > SetConfigOption("recovery_end_command", ...);
> > ...
> >
> > Maybe we should make GUC mechanism notice signal files and ignore
> > archive recovery-related parameters when none of those files exist?
> > This change seems overkill at least in v12, though.
>
> I think this approach is going in the wrong direction. In every other
> part of the system, it's the job of the code around the GUC system to
> use parameters when they're relevant and ignore them when they should
> be ignored. Deciding that the parameters that were formerly part of
> recovery.conf are an exception to that rule and that the GUC system is
> responsible for making sure they're set only when we pay attention to
> them seems like it's bringing back or exacerbating a code-level split
> between recovery.conf parameters and postgresql.conf parameters when,
> meanwhile, we've been wanting to eradicate that split so that the
> things we allow for postgresql.conf parameters -- e.g. changing them
> while they are running -- can be applied to these parameters also.

I don't think we necessairly need to be thinking about trying to
eliminate all differences between certain former recovery.conf settings
and things like work_mem, even as we make it such that those former
settings can be changed while we're running.

> I don't particularly like the use of SetConfigOption() either,
> although it does have some precedent in autovacuum, for example.

It's pretty explicitly the job of SetConfigOption to manage the fact
that only certain options can be set at certain times, as noted at the
top of guc.h where we're talking about GUC contexts (and which
SetConfigOption references as being what it's job is to manage-
guc.c:6776 currently).

> Generally, it's right and proper that the GUC system sets the
> variables to which the parameters it controls are tied -- and then the
> rest of the code has to do the right thing around that. It sounds like
> the patch that got rid of recovery.conf wasn't considered carefully
> enough, and missed the fact that it was introducing some inadvertent
> behavior changes. That's too bad, but let's not overreact. It seems
> totally fine to me to just add ad-hoc checks that rule out
> inappropriately relying on these parameters while performing crash
> recovery - and be done with it.

The patch that got rid of recovery.conf also removed the inherent
understanding and knowledge that there are certain options that can only
be set (and make sense ...) at certain times- namely, when we're doing
recovery.  Having these options set at other times is entirely wrong and
will be confusing to both users, and, as seen, code.  From a user
perspective, what happens when you've started up PG as a primary, since
you don't have a signal file in place to indicate that you're doing
recovery, and you have a recovery_target set, so some user does
"show recovery_target_name" and sees a value?  How is that sensible?

Those options should only be set when we're actually doing recovery,
which is governed by the signal file.  Recovery is absolutely a specific
kind of state that the system is in, not unlike postmaster, we've even
got a specific pg_is_in_recovery() function for it.

Having these options end up set but then hacking all of the other code
that looks at them to check if we're actually in recovery or not would
end up being both confusing to users as well as an ongoing source of
bugs (which has already been made clear by the fact that we're having
this discussion...).  Wouldn't that also mean we would need to hack the
'show' code, to blank out the recovery_target_name variable if we aren't
in recovery?  Ugh.

Thanks,

Stephen

Вложения

Re: Standby accepts recovery_target_timeline setting?

От
Fujii Masao
Дата:
On Tue, Oct 8, 2019 at 10:58 PM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Robert Haas (robertmhaas@gmail.com) wrote:
> > On Mon, Oct 7, 2019 at 9:14 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> > > Agreed, too. Do you have any idea to implement that? I've not found out
> > > "smart" way to do that yet.
> > >
> > > One idea is, as Michael suggested, to use SetConfigOption() for all the
> > > archive recovery parameters at the beginning of the startup process as follows,
> > > to forcibly set the default values if crash recovery is running. But this
> > > seems not smart for me.
> > >
> > > SetConfigOption("restore_command", ...);
> > > SetConfigOption("archive_cleanup_command", ...);
> > > SetConfigOption("recovery_end_command", ...);
> > > ...
> > >
> > > Maybe we should make GUC mechanism notice signal files and ignore
> > > archive recovery-related parameters when none of those files exist?
> > > This change seems overkill at least in v12, though.
> >
> > I think this approach is going in the wrong direction. In every other
> > part of the system, it's the job of the code around the GUC system to
> > use parameters when they're relevant and ignore them when they should
> > be ignored. Deciding that the parameters that were formerly part of
> > recovery.conf are an exception to that rule and that the GUC system is
> > responsible for making sure they're set only when we pay attention to
> > them seems like it's bringing back or exacerbating a code-level split
> > between recovery.conf parameters and postgresql.conf parameters when,
> > meanwhile, we've been wanting to eradicate that split so that the
> > things we allow for postgresql.conf parameters -- e.g. changing them
> > while they are running -- can be applied to these parameters also.
>
> I don't think we necessairly need to be thinking about trying to
> eliminate all differences between certain former recovery.conf settings
> and things like work_mem, even as we make it such that those former
> settings can be changed while we're running.
>
> > I don't particularly like the use of SetConfigOption() either,
> > although it does have some precedent in autovacuum, for example.
>
> It's pretty explicitly the job of SetConfigOption to manage the fact
> that only certain options can be set at certain times, as noted at the
> top of guc.h where we're talking about GUC contexts (and which
> SetConfigOption references as being what it's job is to manage-
> guc.c:6776 currently).
>
> > Generally, it's right and proper that the GUC system sets the
> > variables to which the parameters it controls are tied -- and then the
> > rest of the code has to do the right thing around that. It sounds like
> > the patch that got rid of recovery.conf wasn't considered carefully
> > enough, and missed the fact that it was introducing some inadvertent
> > behavior changes. That's too bad, but let's not overreact. It seems
> > totally fine to me to just add ad-hoc checks that rule out
> > inappropriately relying on these parameters while performing crash
> > recovery - and be done with it.

Yeah, I agree.

> The patch that got rid of recovery.conf also removed the inherent
> understanding and knowledge that there are certain options that can only
> be set (and make sense ...) at certain times- namely, when we're doing
> recovery.  Having these options set at other times is entirely wrong and
> will be confusing to both users, and, as seen, code.  From a user
> perspective, what happens when you've started up PG as a primary, since
> you don't have a signal file in place to indicate that you're doing
> recovery, and you have a recovery_target set, so some user does
> "show recovery_target_name" and sees a value?  How is that sensible?
>
> Those options should only be set when we're actually doing recovery,
> which is governed by the signal file.  Recovery is absolutely a specific
> kind of state that the system is in, not unlike postmaster, we've even
> got a specific pg_is_in_recovery() function for it.
>
> Having these options end up set but then hacking all of the other code
> that looks at them to check if we're actually in recovery or not would
> end up being both confusing to users as well as an ongoing source of
> bugs (which has already been made clear by the fact that we're having
> this discussion...).  Wouldn't that also mean we would need to hack the
> 'show' code, to blank out the recovery_target_name variable if we aren't
> in recovery?  Ugh.

Isn't this overkill? This doesn't seem the problem only for recovery-related
settings. We have already have the similar issue with other settings.
For example, log_directory parameter is ignored when logging_collector is
not enabled. But SHOW log_directory reports the setting value even when
logging_collector is disabled. This seems the similar issue and might be
confusing, but we could live with that.

Regards,

-- 
Fujii Masao



Re: Standby accepts recovery_target_timeline setting?

От
Stephen Frost
Дата:
Greetings,

* Fujii Masao (masao.fujii@gmail.com) wrote:
> On Tue, Oct 8, 2019 at 10:58 PM Stephen Frost <sfrost@snowman.net> wrote:
> > Having these options end up set but then hacking all of the other code
> > that looks at them to check if we're actually in recovery or not would
> > end up being both confusing to users as well as an ongoing source of
> > bugs (which has already been made clear by the fact that we're having
> > this discussion...).  Wouldn't that also mean we would need to hack the
> > 'show' code, to blank out the recovery_target_name variable if we aren't
> > in recovery?  Ugh.
>
> Isn't this overkill? This doesn't seem the problem only for recovery-related
> settings. We have already have the similar issue with other settings.
> For example, log_directory parameter is ignored when logging_collector is
> not enabled. But SHOW log_directory reports the setting value even when
> logging_collector is disabled. This seems the similar issue and might be
> confusing, but we could live with that.

I agree it's a similar issue.  I disagree that it's actually sensible
for us to do so and would rather contend that it's confusing and not
good.

We certainly do a lot of smart things in PG, but showing the value of
variables that aren't accurate, and we *know* they aren't, hardly seems
like something we should be saying "this is good and ok, so let's do
more of this."

I'd rather argue that this just shows that we need to come up with a
solution in this area.  I don't think it's *as* big of a deal when it
comes to logging_collector/log_directory because, at least there, you
don't even start to get into the same code paths where it matters, like
you end up doing with the recovery targets and crash recovery, so the
chances of bugs creeping in are less in the log_directory case.

I still don't think it's great though and, yes, would prefer that we
avoid having log_directory set when logging_collector is in use.

Thanks,

Stephen

Вложения

Re: Standby accepts recovery_target_timeline setting?

От
Fujii Masao
Дата:
On Wed, Oct 9, 2019 at 1:02 AM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Fujii Masao (masao.fujii@gmail.com) wrote:
> > On Tue, Oct 8, 2019 at 10:58 PM Stephen Frost <sfrost@snowman.net> wrote:
> > > Having these options end up set but then hacking all of the other code
> > > that looks at them to check if we're actually in recovery or not would
> > > end up being both confusing to users as well as an ongoing source of
> > > bugs (which has already been made clear by the fact that we're having
> > > this discussion...).  Wouldn't that also mean we would need to hack the
> > > 'show' code, to blank out the recovery_target_name variable if we aren't
> > > in recovery?  Ugh.
> >
> > Isn't this overkill? This doesn't seem the problem only for recovery-related
> > settings. We have already have the similar issue with other settings.
> > For example, log_directory parameter is ignored when logging_collector is
> > not enabled. But SHOW log_directory reports the setting value even when
> > logging_collector is disabled. This seems the similar issue and might be
> > confusing, but we could live with that.
>
> I agree it's a similar issue.  I disagree that it's actually sensible
> for us to do so and would rather contend that it's confusing and not
> good.
>
> We certainly do a lot of smart things in PG, but showing the value of
> variables that aren't accurate, and we *know* they aren't, hardly seems
> like something we should be saying "this is good and ok, so let's do
> more of this."
>
> I'd rather argue that this just shows that we need to come up with a
> solution in this area.  I don't think it's *as* big of a deal when it
> comes to logging_collector/log_directory because, at least there, you
> don't even start to get into the same code paths where it matters, like
> you end up doing with the recovery targets and crash recovery, so the
> chances of bugs creeping in are less in the log_directory case.
>
> I still don't think it's great though and, yes, would prefer that we
> avoid having log_directory set when logging_collector is in use.

There are other parameters having the similar issue, for example,
- parameters for SSL connection when ssl is disabled
- parameters for autovacuum activity when autovacuum is disabled
- parameters for Hot Standby when hot_standby is disabled
etc

Yeah, it's better to make SHOW command handle these parameters
"less confusing". But I cannot wait for the solution for them before
fixing the original issue in v12 (i.e., the issue where restore_command
can be executed even in crash recovery). So, barring any objection,
I'd like to commit the patch that I attached upthread, soon.
The patch prevents restore_command and recovery_end_command
from being executed in crash recovery. Thought?

Regards,

-- 
Fujii Masao



Re: Standby accepts recovery_target_timeline setting?

От
Robert Haas
Дата:
On Tue, Oct 8, 2019 at 9:58 AM Stephen Frost <sfrost@snowman.net> wrote:
> From a user
> perspective, what happens when you've started up PG as a primary, since
> you don't have a signal file in place to indicate that you're doing
> recovery, and you have a recovery_target set, so some user does
> "show recovery_target_name" and sees a value?  How is that sensible?

I'd argue that not only is it sensible, but it's the only correct
answer.  If I put a value in postgresql.conf and it doesn't show up in
the output of SHOW, I'm going to be confused. That just seems flat
wrong to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Standby accepts recovery_target_timeline setting?

От
Stephen Frost
Дата:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Tue, Oct 8, 2019 at 9:58 AM Stephen Frost <sfrost@snowman.net> wrote:
> > From a user
> > perspective, what happens when you've started up PG as a primary, since
> > you don't have a signal file in place to indicate that you're doing
> > recovery, and you have a recovery_target set, so some user does
> > "show recovery_target_name" and sees a value?  How is that sensible?
>
> I'd argue that not only is it sensible, but it's the only correct
> answer.  If I put a value in postgresql.conf and it doesn't show up in
> the output of SHOW, I'm going to be confused. That just seems flat
> wrong to me.

You're going to be really confused when you realize that, sure, it's
set, but we just completely ignored it ...

How about we look at things like listen_addresses or shared_buffers?
Let's make a similar argument there- some day, in the future, we make PG
automagically realize when shared_buffers is too high to be able to
start up, so we lower it to some other value just to get the database
online, with the hope that the user will realize and fix the setting
(this isn't a joke- having shared_buffers be too high through an ALTER
SYSTEM setting is a real problem and it'd be nice if we had a way to
deal with that...), you think we should keep the shared_buffers variable
showing whatever was in the config file because, well, that's what was
in the config file?

If anything, what we should be doing here is throwing a WARNING or
similar that these settings don't make sense, because we aren't going
through recovery, and blank them out.  If we want to be really cute, we
could have the show return something like: <not in recovery>, or
similar, but just returning an invalid value because that's what was in
the config file is nonsense.  SHOW isn't a view of what's in
postgresql.conf, it's telling the user what the current system state is.

Thanks,

Stephen

Вложения

Re: Standby accepts recovery_target_timeline setting?

От
Stephen Frost
Дата:
Greetings,

* Fujii Masao (masao.fujii@gmail.com) wrote:
> On Wed, Oct 9, 2019 at 1:02 AM Stephen Frost <sfrost@snowman.net> wrote:
> > * Fujii Masao (masao.fujii@gmail.com) wrote:
> > > On Tue, Oct 8, 2019 at 10:58 PM Stephen Frost <sfrost@snowman.net> wrote:
> > > > Having these options end up set but then hacking all of the other code
> > > > that looks at them to check if we're actually in recovery or not would
> > > > end up being both confusing to users as well as an ongoing source of
> > > > bugs (which has already been made clear by the fact that we're having
> > > > this discussion...).  Wouldn't that also mean we would need to hack the
> > > > 'show' code, to blank out the recovery_target_name variable if we aren't
> > > > in recovery?  Ugh.
> > >
> > > Isn't this overkill? This doesn't seem the problem only for recovery-related
> > > settings. We have already have the similar issue with other settings.
> > > For example, log_directory parameter is ignored when logging_collector is
> > > not enabled. But SHOW log_directory reports the setting value even when
> > > logging_collector is disabled. This seems the similar issue and might be
> > > confusing, but we could live with that.
> >
> > I agree it's a similar issue.  I disagree that it's actually sensible
> > for us to do so and would rather contend that it's confusing and not
> > good.
> >
> > We certainly do a lot of smart things in PG, but showing the value of
> > variables that aren't accurate, and we *know* they aren't, hardly seems
> > like something we should be saying "this is good and ok, so let's do
> > more of this."
> >
> > I'd rather argue that this just shows that we need to come up with a
> > solution in this area.  I don't think it's *as* big of a deal when it
> > comes to logging_collector/log_directory because, at least there, you
> > don't even start to get into the same code paths where it matters, like
> > you end up doing with the recovery targets and crash recovery, so the
> > chances of bugs creeping in are less in the log_directory case.
> >
> > I still don't think it's great though and, yes, would prefer that we
> > avoid having log_directory set when logging_collector is in use.
>
> There are other parameters having the similar issue, for example,
> - parameters for SSL connection when ssl is disabled
> - parameters for autovacuum activity when autovacuum is disabled
> - parameters for Hot Standby when hot_standby is disabled

I agree that those would also be nice to improve with some indication
that those features are disabled, but, again, as I said above, while
they're confusing they at least don't *also* lead to bugs where the code
itself is confused about the state of the system, because we don't have
two different major ways of getting to the same code.

> Yeah, it's better to make SHOW command handle these parameters
> "less confusing". But I cannot wait for the solution for them before
> fixing the original issue in v12 (i.e., the issue where restore_command
> can be executed even in crash recovery). So, barring any objection,
> I'd like to commit the patch that I attached upthread, soon.
> The patch prevents restore_command and recovery_end_command
> from being executed in crash recovery. Thought?

I'm not suggesting that we fix everything in this area in a patch that
we back-patch, and I haven't intended to imply that at all throughout
this, so this argument doesn't really hold.  I do think we should fix
this issue, where we've seen bugs from the confusion, in the right way
by realizing that this is a direction that's prone to cause bugs.

Thanks,

Stephen

Вложения

Re: Standby accepts recovery_target_timeline setting?

От
Robert Haas
Дата:
On Wed, Oct 9, 2019 at 8:42 AM Stephen Frost <sfrost@snowman.net> wrote:
> You're going to be really confused when you realize that, sure, it's
> set, but we just completely ignored it ...

No, I'm not, because I expect that settings will only take effect for
operations to which they apply. As Fujii Masao also pointed out, there
are lots of other settings that are ignored (but still shown as set)
in situations where they don't apply.

> How about we look at things like listen_addresses or shared_buffers?
> Let's make a similar argument there- some day, in the future, we make PG
> automagically realize when shared_buffers is too high to be able to
> start up, so we lower it to some other value just to get the database
> online, with the hope that the user will realize and fix the setting
> (this isn't a joke- having shared_buffers be too high through an ALTER
> SYSTEM setting is a real problem and it'd be nice if we had a way to
> deal with that...), you think we should keep the shared_buffers variable
> showing whatever was in the config file because, well, that's what was
> in the config file?

Yes. I mean, I assume that if we did such a thing, we might rename the
GUC in the config file to max_shared_buffers or shared_buffers_limit
or something like that to make it more clear, and shared_buffers
itself might become a PGC_INTERNAL setting that users can't modify but
which can still be viewed using SHOW. But if a value is set in the
configuration file and not overridden somewhere else (ALTER USER,
ALTER FUNCTION, etc.) I expect the SHOW command to display that value.

> If anything, what we should be doing here is throwing a WARNING or
> similar that these settings don't make sense, because we aren't going
> through recovery, and blank them out.  If we want to be really cute, we
> could have the show return something like: <not in recovery>, or
> similar, but just returning an invalid value because that's what was in
> the config file is nonsense.  SHOW isn't a view of what's in
> postgresql.conf, it's telling the user what the current system state is.

SHOW is telling the user the value that is configured in the current
session, which may or may not be the value configured in
postgresql.conf, but is the value that was configured somewhere. For
the most part, it's not trying to tell the user what the current
system state is, although we have a few weird exceptions that behave
otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Standby accepts recovery_target_timeline setting?

От
Stephen Frost
Дата:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Oct 9, 2019 at 8:42 AM Stephen Frost <sfrost@snowman.net> wrote:
> > If anything, what we should be doing here is throwing a WARNING or
> > similar that these settings don't make sense, because we aren't going
> > through recovery, and blank them out.  If we want to be really cute, we
> > could have the show return something like: <not in recovery>, or
> > similar, but just returning an invalid value because that's what was in
> > the config file is nonsense.  SHOW isn't a view of what's in
> > postgresql.conf, it's telling the user what the current system state is.
>
> SHOW is telling the user the value that is configured in the current
> session, which may or may not be the value configured in
> postgresql.conf, but is the value that was configured somewhere. For
> the most part, it's not trying to tell the user what the current
> system state is, although we have a few weird exceptions that behave
> otherwise.

I don't understand this argument and so it seems pretty likely that
we're just not going to agree here.  The idea that the GUC system isn't
something that someone can depend on to find out what the current state
of a variable is is just utter nonsense to me.  Yes, we have exceptions
to that, and I don't think they're good ones, but the argument that SHOW
doesn't actually return what the state is and instead returns "this is
what this variable was configured to at some point" does not make any
sense to me.

Thanks,

Stephen

Вложения

Re: Standby accepts recovery_target_timeline setting?

От
Peter Eisentraut
Дата:
On 2019-09-30 03:48, Fujii Masao wrote:
> Also we need to do the same thing for other recovery options like
> restore_command. Attached is the patch which makes crash recovery
> ignore restore_command and recovery_end_command.

This patch looks correct to me.

Do we need to handle archive_cleanup_command?  Perhaps not.

A check in recoveryApplyDelay() might be necessary.

That should cover everything then.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Standby accepts recovery_target_timeline setting?

От
Fujii Masao
Дата:
On Thu, Oct 10, 2019 at 5:52 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2019-09-30 03:48, Fujii Masao wrote:
> > Also we need to do the same thing for other recovery options like
> > restore_command. Attached is the patch which makes crash recovery
> > ignore restore_command and recovery_end_command.
>
> This patch looks correct to me.

Thanks for the review! I committed the patch.

> Do we need to handle archive_cleanup_command?  Perhaps not.

No, because archive_command is basically executed by checkpointer
and this process cannot be invoked in crash recovery case.

> A check in recoveryApplyDelay() might be necessary.

Yes! We are discussing this issue at
https://www.postgresql.org/message-id/CAHGQGwEyD6HdZLfdWc+95g=VQFPR4zQL4n+yHxQgGEGjaSVheQ@mail.gmail.com

Regards,

-- 
Fujii Masao