Обсуждение: Nicer error when connecting to standby with hot_standby=off

Поиск
Список
Период
Сортировка

Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
I recently noticed while setting up a test environment that attempting to connect to a standby running without hot_standby=on results in a fairly generic error (I believe "the database system is starting up"). I don't have my test setup running right now, so can't confirm with a repro case at the moment, but with a little bit of spelunking I noticed that error text only shows up in src/backend/postmaster/postmaster.c when port->canAcceptConnections has the value CAC_STARTUP.

Ideally the error message would include something along the lines of "The server is running as a standby but cannot accept connections with hot_standby=off".

I wanted to get some initial feedback on the idea before writing a patch: does that seem like a reasonable change? Is it actually plausible to distinguish between this state and "still recovering" (i.e., when starting up a hot standby but initial recovery hasn't completed so it legitimately can't accept connections yet)? If so, should we include the possibility if hot_standby isn't on, just in case?

The enum value CAC_STARTUP is defined in src/include/libpq/libpq-be.h, which makes me wonder if changing this value would result in a wire protocol change/something the client wants to know about? If so, I assume it's not reasonable to change the value, but would it still be reasonable to change the error text?

Thanks,
James Coleman

Re: Nicer error when connecting to standby with hot_standby=off

От
Andres Freund
Дата:
Hi,

On 2020-03-08 20:12:21 -0400, James Coleman wrote:
> I recently noticed while setting up a test environment that attempting to
> connect to a standby running without hot_standby=on results in a fairly
> generic error (I believe "the database system is starting up"). I don't
> have my test setup running right now, so can't confirm with a repro case at
> the moment, but with a little bit of spelunking I noticed that error text
> only shows up in src/backend/postmaster/postmaster.c when
> port->canAcceptConnections has the value CAC_STARTUP.
> 
> Ideally the error message would include something along the lines of "The
> server is running as a standby but cannot accept connections with
> hot_standby=off".

Yea, something roughly like that would be good.


> I wanted to get some initial feedback on the idea before writing a patch:
> does that seem like a reasonable change? Is it actually plausible to
> distinguish between this state and "still recovering" (i.e., when starting
> up a hot standby but initial recovery hasn't completed so it legitimately
> can't accept connections yet)? If so, should we include the possibility if
> hot_standby isn't on, just in case?

Yes, it is feasible to distinguish those cases. And we should, if we're
going to change things around.


> The enum value CAC_STARTUP is defined in src/include/libpq/libpq-be.h,
> which makes me wonder if changing this value would result in a wire
> protocol change/something the client wants to know about? If so, I assume
> it's not reasonable to change the value, but would it still be reasonable
> to change the error text?

The value shouldn't be visible to clients in any way. While not obvious
from the name, there's this comment at the top of the header:

 *      Note that this is backend-internal and is NOT exported to clients.
 *      Structs that need to be client-visible are in pqcomm.h.


Greetings,

Andres Freund



Re: Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
On Mon, Mar 9, 2020 at 6:28 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-03-08 20:12:21 -0400, James Coleman wrote:
> > I recently noticed while setting up a test environment that attempting to
> > connect to a standby running without hot_standby=on results in a fairly
> > generic error (I believe "the database system is starting up"). I don't
> > have my test setup running right now, so can't confirm with a repro case at
> > the moment, but with a little bit of spelunking I noticed that error text
> > only shows up in src/backend/postmaster/postmaster.c when
> > port->canAcceptConnections has the value CAC_STARTUP.
> >
> > Ideally the error message would include something along the lines of "The
> > server is running as a standby but cannot accept connections with
> > hot_standby=off".
>
> Yea, something roughly like that would be good.

Awesome, thanks for the early feedback!

> > I wanted to get some initial feedback on the idea before writing a patch:
> > does that seem like a reasonable change? Is it actually plausible to
> > distinguish between this state and "still recovering" (i.e., when starting
> > up a hot standby but initial recovery hasn't completed so it legitimately
> > can't accept connections yet)? If so, should we include the possibility if
> > hot_standby isn't on, just in case?
>
> Yes, it is feasible to distinguish those cases. And we should, if we're
> going to change things around.

I'll look into this hopefully soon, but it's helpful to know that it's
possible. Is it basically along the lines of checking to see if the
LSN is past the minimum recovery point?

> > The enum value CAC_STARTUP is defined in src/include/libpq/libpq-be.h,
> > which makes me wonder if changing this value would result in a wire
> > protocol change/something the client wants to know about? If so, I assume
> > it's not reasonable to change the value, but would it still be reasonable
> > to change the error text?
>
> The value shouldn't be visible to clients in any way. While not obvious
> from the name, there's this comment at the top of the header:
>
>  *        Note that this is backend-internal and is NOT exported to clients.
>  *        Structs that need to be client-visible are in pqcomm.h.

This is also helpful.

Thanks,
James



Re: Nicer error when connecting to standby with hot_standby=off

От
Andres Freund
Дата:
Hi,

On 2020-03-09 18:40:32 -0400, James Coleman wrote:
> On Mon, Mar 9, 2020 at 6:28 PM Andres Freund <andres@anarazel.de> wrote:
> > > I wanted to get some initial feedback on the idea before writing a patch:
> > > does that seem like a reasonable change? Is it actually plausible to
> > > distinguish between this state and "still recovering" (i.e., when starting
> > > up a hot standby but initial recovery hasn't completed so it legitimately
> > > can't accept connections yet)? If so, should we include the possibility if
> > > hot_standby isn't on, just in case?
> >
> > Yes, it is feasible to distinguish those cases. And we should, if we're
> > going to change things around.
> 
> I'll look into this hopefully soon, but it's helpful to know that it's
> possible. Is it basically along the lines of checking to see if the
> LSN is past the minimum recovery point?

No, I don't think that's the right approach. IIRC the startup process
(i.e. the one doing the WAL replay) signals postmaster once consistency
has been achieved. So you can just use that state.

Greetings,

Andres Freund



Re: Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
On Mon, Mar 9, 2020 at 8:06 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-03-09 18:40:32 -0400, James Coleman wrote:
> > On Mon, Mar 9, 2020 at 6:28 PM Andres Freund <andres@anarazel.de> wrote:
> > > > I wanted to get some initial feedback on the idea before writing a patch:
> > > > does that seem like a reasonable change? Is it actually plausible to
> > > > distinguish between this state and "still recovering" (i.e., when starting
> > > > up a hot standby but initial recovery hasn't completed so it legitimately
> > > > can't accept connections yet)? If so, should we include the possibility if
> > > > hot_standby isn't on, just in case?
> > >
> > > Yes, it is feasible to distinguish those cases. And we should, if we're
> > > going to change things around.
> >
> > I'll look into this hopefully soon, but it's helpful to know that it's
> > possible. Is it basically along the lines of checking to see if the
> > LSN is past the minimum recovery point?
>
> No, I don't think that's the right approach. IIRC the startup process
> (i.e. the one doing the WAL replay) signals postmaster once consistency
> has been achieved. So you can just use that state.

I've taken that approach in the attached patch (I'd expected to wait
until later to work on this...but it seemed pretty small so I ended up
hacking on it this evening).

I don't have tests included: I tried intentionally breaking the
existing behavior (returning no error when hot_standby=off), but
running make check-world (including tap tests) didn't find any
breakages. I can look into that more deeply at some point, but if you
happen to know a place we test similar things, then I'd be happy to
hear it.

One other question: how is error message translation handled? I
haven't added entries to the relevant files, but also I'm obviously
not qualified to write them.

Thanks,
James

Вложения

Re: Nicer error when connecting to standby with hot_standby=off

От
David Zhang
Дата:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

I applied the patch to the latest master branch and run a test below. The error messages have been separated. Below is
thetest steps.
 

### setup primary server
initdb -D /tmp/primary/data
mkdir /tmp/archive_dir
echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'" >> /tmp/primary/data/postgresql.conf
pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start

### setup host standby server
pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> /tmp/hotstandby/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'" >> /tmp/hotstandby/postgresql.conf
echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start

### keep trying to connect to hot standby server in order to get the error messages in different stages.
while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT txid_current_snapshot();" sleep 0.2; done

### before the patch
psql: error: could not connect to server: FATAL:  the database system is starting up
...

### after the patch, got different messages, one message indicates hot_standby is off
psql: error: could not connect to server: FATAL:  the database system is starting up
...
psql: error: could not connect to server: FATAL:  the database system is up, but hot_standby is off
...

Re: Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
On Thu, Apr 2, 2020 at 5:53 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:       tested, passed
> Spec compliant:           not tested
> Documentation:            not tested
>
> I applied the patch to the latest master branch and run a test below. The error messages have been separated. Below
isthe test steps.
 
>
> ### setup primary server
> initdb -D /tmp/primary/data
> mkdir /tmp/archive_dir
> echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
> echo "archive_command='cp %p /tmp/archive_dir/%f'" >> /tmp/primary/data/postgresql.conf
> pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start
>
> ### setup host standby server
> pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
> echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> /tmp/hotstandby/postgresql.conf
> echo "restore_command='cp /tmp/archive_dir/%f %p'" >> /tmp/hotstandby/postgresql.conf
> echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
> pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start
>
> ### keep trying to connect to hot standby server in order to get the error messages in different stages.
> while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT txid_current_snapshot();" sleep 0.2; done
>
> ### before the patch
> psql: error: could not connect to server: FATAL:  the database system is starting up
> ...
>
> ### after the patch, got different messages, one message indicates hot_standby is off
> psql: error: could not connect to server: FATAL:  the database system is starting up
> ...
> psql: error: could not connect to server: FATAL:  the database system is up, but hot_standby is off
> ...

Thanks for the review and testing!

James



Re: Nicer error when connecting to standby with hot_standby=off

От
Fujii Masao
Дата:

On 2020/04/03 22:49, James Coleman wrote:
> On Thu, Apr 2, 2020 at 5:53 PM David Zhang <david.zhang@highgo.ca> wrote:
>>
>> The following review has been posted through the commitfest application:
>> make installcheck-world:  not tested
>> Implements feature:       tested, passed
>> Spec compliant:           not tested
>> Documentation:            not tested
>>
>> I applied the patch to the latest master branch and run a test below. The error messages have been separated. Below
isthe test steps.
 
>>
>> ### setup primary server
>> initdb -D /tmp/primary/data
>> mkdir /tmp/archive_dir
>> echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
>> echo "archive_command='cp %p /tmp/archive_dir/%f'" >> /tmp/primary/data/postgresql.conf
>> pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start
>>
>> ### setup host standby server
>> pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
>> echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> /tmp/hotstandby/postgresql.conf
>> echo "restore_command='cp /tmp/archive_dir/%f %p'" >> /tmp/hotstandby/postgresql.conf
>> echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
>> pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start
>>
>> ### keep trying to connect to hot standby server in order to get the error messages in different stages.
>> while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT txid_current_snapshot();" sleep 0.2; done
>>
>> ### before the patch
>> psql: error: could not connect to server: FATAL:  the database system is starting up
>> ...
>>
>> ### after the patch, got different messages, one message indicates hot_standby is off
>> psql: error: could not connect to server: FATAL:  the database system is starting up
>> ...
>> psql: error: could not connect to server: FATAL:  the database system is up, but hot_standby is off
>> ...
> 
> Thanks for the review and testing!

Thanks for the patch! Here is the comment from me.

+        else if (!FatalError && pmState == PM_RECOVERY)
+            return CAC_STANDBY; /* connection disallowed on non-hot standby */

Even if hot_standby is enabled, pmState seems to indicate PM_RECOVERY
until recovery has reached a consistent state. No? So, if my understanding
is right, "FATAL:  the database system is up, but hot_standby is off" can
be logged even when hot_standby is on. Which sounds very confusing.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
On Wed, Jul 29, 2020 at 11:24 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/04/03 22:49, James Coleman wrote:
> > On Thu, Apr 2, 2020 at 5:53 PM David Zhang <david.zhang@highgo.ca> wrote:
> >>
> >> The following review has been posted through the commitfest application:
> >> make installcheck-world:  not tested
> >> Implements feature:       tested, passed
> >> Spec compliant:           not tested
> >> Documentation:            not tested
> >>
> >> I applied the patch to the latest master branch and run a test below. The error messages have been separated.
Belowis the test steps.
 
> >>
> >> ### setup primary server
> >> initdb -D /tmp/primary/data
> >> mkdir /tmp/archive_dir
> >> echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
> >> echo "archive_command='cp %p /tmp/archive_dir/%f'" >> /tmp/primary/data/postgresql.conf
> >> pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start
> >>
> >> ### setup host standby server
> >> pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
> >> echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> /tmp/hotstandby/postgresql.conf
> >> echo "restore_command='cp /tmp/archive_dir/%f %p'" >> /tmp/hotstandby/postgresql.conf
> >> echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
> >> pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start
> >>
> >> ### keep trying to connect to hot standby server in order to get the error messages in different stages.
> >> while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT txid_current_snapshot();" sleep 0.2; done
> >>
> >> ### before the patch
> >> psql: error: could not connect to server: FATAL:  the database system is starting up
> >> ...
> >>
> >> ### after the patch, got different messages, one message indicates hot_standby is off
> >> psql: error: could not connect to server: FATAL:  the database system is starting up
> >> ...
> >> psql: error: could not connect to server: FATAL:  the database system is up, but hot_standby is off
> >> ...
> >
> > Thanks for the review and testing!
>
> Thanks for the patch! Here is the comment from me.
>
> +               else if (!FatalError && pmState == PM_RECOVERY)
> +                       return CAC_STANDBY; /* connection disallowed on non-hot standby */
>
> Even if hot_standby is enabled, pmState seems to indicate PM_RECOVERY
> until recovery has reached a consistent state. No? So, if my understanding
> is right, "FATAL:  the database system is up, but hot_standby is off" can
> be logged even when hot_standby is on. Which sounds very confusing.

That's a good point. I've attached a corrected version.

I still don't have a good idea for how to add a test for this change.
If a test for this warranted, I'd be interested in any ideas.

James

Вложения

Re: Nicer error when connecting to standby with hot_standby=off

От
Fujii Masao
Дата:

On 2020/08/01 5:18, James Coleman wrote:
> On Wed, Jul 29, 2020 at 11:24 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2020/04/03 22:49, James Coleman wrote:
>>> On Thu, Apr 2, 2020 at 5:53 PM David Zhang <david.zhang@highgo.ca> wrote:
>>>>
>>>> The following review has been posted through the commitfest application:
>>>> make installcheck-world:  not tested
>>>> Implements feature:       tested, passed
>>>> Spec compliant:           not tested
>>>> Documentation:            not tested
>>>>
>>>> I applied the patch to the latest master branch and run a test below. The error messages have been separated.
Belowis the test steps.
 
>>>>
>>>> ### setup primary server
>>>> initdb -D /tmp/primary/data
>>>> mkdir /tmp/archive_dir
>>>> echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
>>>> echo "archive_command='cp %p /tmp/archive_dir/%f'" >> /tmp/primary/data/postgresql.conf
>>>> pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start
>>>>
>>>> ### setup host standby server
>>>> pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
>>>> echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> /tmp/hotstandby/postgresql.conf
>>>> echo "restore_command='cp /tmp/archive_dir/%f %p'" >> /tmp/hotstandby/postgresql.conf
>>>> echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
>>>> pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start
>>>>
>>>> ### keep trying to connect to hot standby server in order to get the error messages in different stages.
>>>> while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT txid_current_snapshot();" sleep 0.2; done
>>>>
>>>> ### before the patch
>>>> psql: error: could not connect to server: FATAL:  the database system is starting up
>>>> ...
>>>>
>>>> ### after the patch, got different messages, one message indicates hot_standby is off
>>>> psql: error: could not connect to server: FATAL:  the database system is starting up
>>>> ...
>>>> psql: error: could not connect to server: FATAL:  the database system is up, but hot_standby is off
>>>> ...
>>>
>>> Thanks for the review and testing!
>>
>> Thanks for the patch! Here is the comment from me.
>>
>> +               else if (!FatalError && pmState == PM_RECOVERY)
>> +                       return CAC_STANDBY; /* connection disallowed on non-hot standby */
>>
>> Even if hot_standby is enabled, pmState seems to indicate PM_RECOVERY
>> until recovery has reached a consistent state. No? So, if my understanding
>> is right, "FATAL:  the database system is up, but hot_standby is off" can
>> be logged even when hot_standby is on. Which sounds very confusing.
> 
> That's a good point. I've attached a corrected version.

Thanks for updating the patch! But it failed to be applied to the master branch
cleanly because of the recent commit 0038f94387. So could you update the patch
again?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
On Tue, Aug 18, 2020 at 12:25 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> Thanks for updating the patch! But it failed to be applied to the master branch
> cleanly because of the recent commit 0038f94387. So could you update the patch
> again?

Updated patch attached.

James

Вложения

Re: Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
On Tue, Mar 9, 2021 at 9:27 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2021/03/09 23:19, James Coleman wrote:
> > On Tue, Mar 9, 2021 at 9:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >>
> >> On 2021-Mar-09, James Coleman wrote:
> >>
> >>> Yes, I think they both agreed on the "DETAIL:  Hot standby mode is
> >>> disabled." message, but that alternative meant not needing to add any
> >>> new signals and pm states, correct?
> >>
> >> Ah, I see!  I was thinking that you still needed the state and signal in
> >> order to print the correct message in hot-standby mode, but that's
> >> (obviously!) wrong.  So you're right that no signal/state are needed.
> >
> > Cool. And yes, I'm planning to update the patch soon.
>
> +1. Thanks!

Here's an updated patch; I think I've gotten what Alvaro suggested.

Thanks,
James

Вложения

Re: Nicer error when connecting to standby with hot_standby=off

От
Fujii Masao
Дата:

On 2021/03/19 23:35, James Coleman wrote:
> Here's an updated patch; I think I've gotten what Alvaro suggested.

Thanks for updating the patch! But I was thinking that our consensus is
something like the attached patch. Could you check this patch?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Вложения

Re: Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
On Mon, Mar 22, 2021 at 1:49 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2021/03/19 23:35, James Coleman wrote:
> > Here's an updated patch; I think I've gotten what Alvaro suggested.
>
> Thanks for updating the patch! But I was thinking that our consensus is
> something like the attached patch. Could you check this patch?

As far as I can tell (I might be missing something) your v5 patch does
the same thing, albeit with different code organization. It did catch
though that I'd neglected to add the DETAIL line as separate from the
errmsg line.

Is the attached (in the style of my v4, since I'm not following why we
need to move the standby determination logic into a new
CAC_NOCONSISTENT block) what you're thinking? Or is there something
else I'm missing?

Thanks,
James

Вложения

Re: Nicer error when connecting to standby with hot_standby=off

От
Fujii Masao
Дата:

On 2021/03/23 3:25, James Coleman wrote:
> On Mon, Mar 22, 2021 at 1:49 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2021/03/19 23:35, James Coleman wrote:
>>> Here's an updated patch; I think I've gotten what Alvaro suggested.
>>
>> Thanks for updating the patch! But I was thinking that our consensus is
>> something like the attached patch. Could you check this patch?
> 
> As far as I can tell (I might be missing something) your v5 patch does
> the same thing, albeit with different code organization. It did catch
> though that I'd neglected to add the DETAIL line as separate from the
> errmsg line.
> 
> Is the attached (in the style of my v4, since I'm not following why we
> need to move the standby determination logic into a new
> CAC_NOCONSISTENT block) what you're thinking? Or is there something
> else I'm missing?

I just did that to avoid adding more CAC_state. But basically it's
ok to check hot standby at either canAcceptConnections() or
ProcessStartupPacket().

          case CAC_STARTUP:
              ereport(FATAL,
                      (errcode(ERRCODE_CANNOT_CONNECT_NOW),
-                     errmsg("the database system is starting up")));
+                     errmsg("the database system is not accepting connections"),
+                     errdetail("Consistent recovery state has not been yet reached.")));

Do you want to report this message even in crash recovery? Since crash
recovery is basically not so much related to "consistent recovery state",
at least for me the original message seems more suitable for crash recovery.

Also if we adopt this message, the server with hot_standby=off reports
"Consistent recovery state has not been yet reached." in PM_STARTUP,
but stops reporting this message at PM_RECOVERY even if the consistent
recovery state has not been reached yet. Instead, it reports "Hot standby
mode is disabled." at PM_RECOVERY. Isn't this transition of message confusing?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
On Mon, Mar 22, 2021 at 2:52 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2021/03/23 3:25, James Coleman wrote:
> > On Mon, Mar 22, 2021 at 1:49 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>
> >>
> >>
> >> On 2021/03/19 23:35, James Coleman wrote:
> >>> Here's an updated patch; I think I've gotten what Alvaro suggested.
> >>
> >> Thanks for updating the patch! But I was thinking that our consensus is
> >> something like the attached patch. Could you check this patch?
> >
> > As far as I can tell (I might be missing something) your v5 patch does
> > the same thing, albeit with different code organization. It did catch
> > though that I'd neglected to add the DETAIL line as separate from the
> > errmsg line.
> >
> > Is the attached (in the style of my v4, since I'm not following why we
> > need to move the standby determination logic into a new
> > CAC_NOCONSISTENT block) what you're thinking? Or is there something
> > else I'm missing?
>
> I just did that to avoid adding more CAC_state. But basically it's
> ok to check hot standby at either canAcceptConnections() or
> ProcessStartupPacket().
>
>                 case CAC_STARTUP:
>                         ereport(FATAL,
>                                         (errcode(ERRCODE_CANNOT_CONNECT_NOW),
> -                                        errmsg("the database system is starting up")));
> +                                        errmsg("the database system is not accepting connections"),
> +                                        errdetail("Consistent recovery state has not been yet reached.")));
>
> Do you want to report this message even in crash recovery? Since crash
> recovery is basically not so much related to "consistent recovery state",
> at least for me the original message seems more suitable for crash recovery.
>
> Also if we adopt this message, the server with hot_standby=off reports
> "Consistent recovery state has not been yet reached." in PM_STARTUP,
> but stops reporting this message at PM_RECOVERY even if the consistent
> recovery state has not been reached yet. Instead, it reports "Hot standby
> mode is disabled." at PM_RECOVERY. Isn't this transition of message confusing?

Are you saying we should only change the message for a single case:
the case where we'd otherwise allow connections but EnableHotStandby
is false? I believe that's what the original patch did, but then
Alvaro's proposal included changing additional messages.

James Coleman



Re: Nicer error when connecting to standby with hot_standby=off

От
Fujii Masao
Дата:

On 2021/03/23 3:59, James Coleman wrote:
> Are you saying we should only change the message for a single case:
> the case where we'd otherwise allow connections but EnableHotStandby
> is false?

No. Let me clarify my opinion.

At PM_STARTUP, "the database system is starting up" should be logged
whatever the setting of hot_standby is. This is the same as the original
behavior. During crash recovery, this message is output. Also at archive
recovery or standby server, until the startup process sends
PMSIGNAL_RECOVERY_STARTED, this message is logged.

At PM_RECOVERY, originally "the database system is starting up" was logged
whatever the setting of hot_standby is. My opinion is the same as our
consensus, i.e., "the database system is not accepting connections" and
"Hot standby mode is disabled." are logged if hot_standby is disabled.
"the database system is not accepting connections" and "Consistent
  recovery state has not been yet reached." are logged if hot_standby is
  enabled.

After the consistent recovery state is reached, if hot_standby is disabled,
the postmaster state is still PM_RECOVERY. So "Hot standby mode is disabled."
is still logged in this case. This is also different behavior from the original.
If hot_standby is enabled, read-only connections can be accepted because
the consistent state is reached. So no message needs to be logged.

Therefore for now what we've not reached the consensus is what message
should be logged at PM_STARTUP. I'm thinking it's better to log
"the database system is starting up" in that case because of the reasons
that I explained upthread.

Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
On Tue, Mar 23, 2021 at 1:46 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2021/03/23 3:59, James Coleman wrote:
> > Are you saying we should only change the message for a single case:
> > the case where we'd otherwise allow connections but EnableHotStandby
> > is false?
>
> No. Let me clarify my opinion.
>
> At PM_STARTUP, "the database system is starting up" should be logged
> whatever the setting of hot_standby is. This is the same as the original
> behavior. During crash recovery, this message is output. Also at archive
> recovery or standby server, until the startup process sends
> PMSIGNAL_RECOVERY_STARTED, this message is logged.
>
> At PM_RECOVERY, originally "the database system is starting up" was logged
> whatever the setting of hot_standby is. My opinion is the same as our
> consensus, i.e., "the database system is not accepting connections" and
> "Hot standby mode is disabled." are logged if hot_standby is disabled.
> "the database system is not accepting connections" and "Consistent
>   recovery state has not been yet reached." are logged if hot_standby is
>   enabled.
>
> After the consistent recovery state is reached, if hot_standby is disabled,
> the postmaster state is still PM_RECOVERY. So "Hot standby mode is disabled."
> is still logged in this case. This is also different behavior from the original.
> If hot_standby is enabled, read-only connections can be accepted because
> the consistent state is reached. So no message needs to be logged.
>
> Therefore for now what we've not reached the consensus is what message
> should be logged at PM_STARTUP. I'm thinking it's better to log
> "the database system is starting up" in that case because of the reasons
> that I explained upthread.
>
> Thought?

I understand your point now, and I agree, that makes sense.

The attached takes a similar approach to your v5, but I've used
CAC_NOTCONSISTENT instead of CAC_NOCONSISTENT because I think it reads
better (CAC_INCONSISTENT would technically be better English,
but...also it doesn't parallel the code and error message).

Thoughts?

James Coleman

Вложения

Re: Nicer error when connecting to standby with hot_standby=off

От
Alvaro Herrera
Дата:
On 2021-Mar-23, James Coleman wrote:

> On Tue, Mar 23, 2021 at 1:46 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

> > Therefore for now what we've not reached the consensus is what message
> > should be logged at PM_STARTUP. I'm thinking it's better to log
> > "the database system is starting up" in that case because of the reasons
> > that I explained upthread.

> I understand your point now, and I agree, that makes sense.

Please note that PM_STARTUP mode is very very short-lived.  It only
starts happening when postmaster launches the startup process, and
before the startup process begins WAL replay (as changed by
sigusr1_handler in postmaster.c).  Once WAL replay begins, the PM status
changes to PM_RECOVERY.  So I don't think we really care all that much
what message is logged in this case.  It changes very quickly into the
CAC_NOTCONSISTENT message anyway.  For this state, it seems okay with
either what James submitted in v7, or what Fujii said.

However, for this one

+       case CAC_NOTCONSISTENT:
+           if (EnableHotStandby)
+               ereport(FATAL,
+                       (errcode(ERRCODE_CANNOT_CONNECT_NOW),
+                        errmsg("the database system is not accepting connections"),
+                        errdetail("Consistent recovery state has not been yet reached.")));

Maybe it makes sense to say "... is not accepting connections *yet*".
That'd be a tad redundant with what the DETAIL says, but that seems
acceptable.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W



Re: Nicer error when connecting to standby with hot_standby=off

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> However, for this one

> +       case CAC_NOTCONSISTENT:
> +           if (EnableHotStandby)
> +               ereport(FATAL,
> +                       (errcode(ERRCODE_CANNOT_CONNECT_NOW),
> +                        errmsg("the database system is not accepting connections"),
> +                        errdetail("Consistent recovery state has not been yet reached.")));

> Maybe it makes sense to say "... is not accepting connections *yet*".

+1, but I think "... is not yet accepting connections" is slightly
better style.

            regards, tom lane



Re: Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
On Tue, Mar 23, 2021 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > However, for this one
>
> > +       case CAC_NOTCONSISTENT:
> > +           if (EnableHotStandby)
> > +               ereport(FATAL,
> > +                       (errcode(ERRCODE_CANNOT_CONNECT_NOW),
> > +                        errmsg("the database system is not accepting connections"),
> > +                        errdetail("Consistent recovery state has not been yet reached.")));
>
> > Maybe it makes sense to say "... is not accepting connections *yet*".
>
> +1, but I think "... is not yet accepting connections" is slightly
> better style.

All right, see attached v8.

James Coleman

Вложения

Re: Nicer error when connecting to standby with hot_standby=off

От
Fujii Masao
Дата:

On 2021/03/24 1:20, Alvaro Herrera wrote:
> On 2021-Mar-23, James Coleman wrote:
> 
>> On Tue, Mar 23, 2021 at 1:46 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> 
>>> Therefore for now what we've not reached the consensus is what message
>>> should be logged at PM_STARTUP. I'm thinking it's better to log
>>> "the database system is starting up" in that case because of the reasons
>>> that I explained upthread.
> 
>> I understand your point now, and I agree, that makes sense.
> 
> Please note that PM_STARTUP mode is very very short-lived.  It only
> starts happening when postmaster launches the startup process, and
> before the startup process begins WAL replay (as changed by
> sigusr1_handler in postmaster.c).  Once WAL replay begins, the PM status
> changes to PM_RECOVERY.

True if archive recovery or standby server. But during crash recovery
postmaster sits in PM_STARTUP mode. So I guess that we still see
the log messages for PM_STARTUP lots of times.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Nicer error when connecting to standby with hot_standby=off

От
Alvaro Herrera
Дата:
On 2021-Mar-24, Fujii Masao wrote:

> On 2021/03/24 1:20, Alvaro Herrera wrote:

> > Please note that PM_STARTUP mode is very very short-lived.  It only
> > starts happening when postmaster launches the startup process, and
> > before the startup process begins WAL replay (as changed by
> > sigusr1_handler in postmaster.c).  Once WAL replay begins, the PM status
> > changes to PM_RECOVERY.
> 
> True if archive recovery or standby server. But during crash recovery
> postmaster sits in PM_STARTUP mode. So I guess that we still see
> the log messages for PM_STARTUP lots of times.

Hmm ... true, and I had missed that this is what you had already said
upthread.  In this case, should we add a DETAIL line for this message?

FATAL:  the database system is starting up
DETAIL:  WAL is being applied to recover from a system crash.
or
DETAIL:  The system is applying WAL to recover from a system crash.
or
DETAIL:  The startup process is applying WAL to recover from a system crash.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"La conclusión que podemos sacar de esos estudios es que
no podemos sacar ninguna conclusión de ellos" (Tanenbaum)



Re: Nicer error when connecting to standby with hot_standby=off

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> FATAL:  the database system is starting up
> DETAIL:  WAL is being applied to recover from a system crash.
> or
> DETAIL:  The system is applying WAL to recover from a system crash.
> or
> DETAIL:  The startup process is applying WAL to recover from a system crash.

I don't think the postmaster has enough context to know if that's
actually true.  It just launches the startup process and waits for
results.  If somebody saw this during a normal (non-crash) startup,
they'd be justifiably alarmed.

            regards, tom lane



Re: Nicer error when connecting to standby with hot_standby=off

От
Fujii Masao
Дата:

On 2021/03/24 5:59, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>> FATAL:  the database system is starting up
>> DETAIL:  WAL is being applied to recover from a system crash.
>> or
>> DETAIL:  The system is applying WAL to recover from a system crash.
>> or
>> DETAIL:  The startup process is applying WAL to recover from a system crash.
> 
> I don't think the postmaster has enough context to know if that's
> actually true.  It just launches the startup process and waits for
> results.  If somebody saw this during a normal (non-crash) startup,
> they'd be justifiably alarmed.

Yes, so logging "the database system is starting up" seems enough to me.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Nicer error when connecting to standby with hot_standby=off

От
Alvaro Herrera
Дата:
On 2021-Mar-24, Fujii Masao wrote:

> On 2021/03/24 5:59, Tom Lane wrote:
> > Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > > FATAL:  the database system is starting up
> > > DETAIL:  WAL is being applied to recover from a system crash.
> > > or
> > > DETAIL:  The system is applying WAL to recover from a system crash.
> > > or
> > > DETAIL:  The startup process is applying WAL to recover from a system crash.
> > 
> > I don't think the postmaster has enough context to know if that's
> > actually true.  It just launches the startup process and waits for
> > results.  If somebody saw this during a normal (non-crash) startup,
> > they'd be justifiably alarmed.
> 
> Yes, so logging "the database system is starting up" seems enough to me.

No objection.

-- 
Álvaro Herrera       Valdivia, Chile
"Cuando no hay humildad las personas se degradan" (A. Christie)



Re: Nicer error when connecting to standby with hot_standby=off

От
Fujii Masao
Дата:

On 2021/03/24 16:59, Alvaro Herrera wrote:
> On 2021-Mar-24, Fujii Masao wrote:
> 
>> On 2021/03/24 5:59, Tom Lane wrote:
>>> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>>>> FATAL:  the database system is starting up
>>>> DETAIL:  WAL is being applied to recover from a system crash.
>>>> or
>>>> DETAIL:  The system is applying WAL to recover from a system crash.
>>>> or
>>>> DETAIL:  The startup process is applying WAL to recover from a system crash.
>>>
>>> I don't think the postmaster has enough context to know if that's
>>> actually true.  It just launches the startup process and waits for
>>> results.  If somebody saw this during a normal (non-crash) startup,
>>> they'd be justifiably alarmed.
>>
>> Yes, so logging "the database system is starting up" seems enough to me.
> 
> No objection.

Thanks! So I changed the message reported at PM_STARTUP to that one,
based on v8 patch that James posted upthread. I also ran pgindent for
the patch. Attached is the updated version of the patch.

Barring any objection, I will commit this.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Вложения

Re: Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
On Wed, Mar 24, 2021 at 5:55 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2021/03/24 16:59, Alvaro Herrera wrote:
> > On 2021-Mar-24, Fujii Masao wrote:
> >
> >> On 2021/03/24 5:59, Tom Lane wrote:
> >>> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> >>>> FATAL:  the database system is starting up
> >>>> DETAIL:  WAL is being applied to recover from a system crash.
> >>>> or
> >>>> DETAIL:  The system is applying WAL to recover from a system crash.
> >>>> or
> >>>> DETAIL:  The startup process is applying WAL to recover from a system crash.
> >>>
> >>> I don't think the postmaster has enough context to know if that's
> >>> actually true.  It just launches the startup process and waits for
> >>> results.  If somebody saw this during a normal (non-crash) startup,
> >>> they'd be justifiably alarmed.
> >>
> >> Yes, so logging "the database system is starting up" seems enough to me.
> >
> > No objection.
>
> Thanks! So I changed the message reported at PM_STARTUP to that one,
> based on v8 patch that James posted upthread. I also ran pgindent for
> the patch. Attached is the updated version of the patch.
>
> Barring any objection, I will commit this.

That looks good to me. Thanks for working on this.

James Coleman



Re: Nicer error when connecting to standby with hot_standby=off

От
Fujii Masao
Дата:

On 2021/03/24 22:06, James Coleman wrote:
> That looks good to me. Thanks for working on this.

Thanks! I pushed the patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Nicer error when connecting to standby with hot_standby=off

От
James Coleman
Дата:
On Wed, Mar 24, 2021 at 9:43 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2021/03/24 22:06, James Coleman wrote:
> > That looks good to me. Thanks for working on this.
>
> Thanks! I pushed the patch.
>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION

Thanks for reviewing and committing!

James