Обсуждение: [HACKERS] pg_ctl wait exit code (was Re: [COMMITTERS] pgsql: Additional testsfor subtransactions in recovery)
On 4/27/17 08:41, Michael Paquier wrote:
> +$node_slave->promote;
> +$node_slave->poll_query_until('postgres',
> + "SELECT NOT pg_is_in_recovery()")
> + or die "Timed out while waiting for promotion of standby";
>
> This reminds me that we should really switch PostgresNode::promote to
> use the wait mode of pg_ctl promote, and remove all those polling
> queries...
I was going to say: This should all be obsolete already, because pg_ctl
promote waits by default.
However: Failure to complete promotion within the waiting time does not
lead to an error exit, so you will not get a failure if the promotion
does not finish. This is probably a mistake. Looking around pg_ctl, I
found that this was handled seemingly inconsistently in do_start(), but
do_stop() errors when it does not complete.
Possible patches for this attached.
Perhaps we need a separate exit code in pg_ctl to distinguish general
errors from did not finish within timeout?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On 5/1/17 12:19, Peter Eisentraut wrote:
> On 4/27/17 08:41, Michael Paquier wrote:
>> +$node_slave->promote;
>> +$node_slave->poll_query_until('postgres',
>> + "SELECT NOT pg_is_in_recovery()")
>> + or die "Timed out while waiting for promotion of standby";
>>
>> This reminds me that we should really switch PostgresNode::promote to
>> use the wait mode of pg_ctl promote, and remove all those polling
>> queries...
>
> I was going to say: This should all be obsolete already, because pg_ctl
> promote waits by default.
>
> However: Failure to complete promotion within the waiting time does not
> lead to an error exit, so you will not get a failure if the promotion
> does not finish. This is probably a mistake. Looking around pg_ctl, I
> found that this was handled seemingly inconsistently in do_start(), but
> do_stop() errors when it does not complete.
>
> Possible patches for this attached.
>
> Perhaps we need a separate exit code in pg_ctl to distinguish general
> errors from did not finish within timeout?
I was going to hold this back for PG11, but since we're now doing some
other tweaks in pg_ctl, it might be useful to add this too. Thoughts?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Sat, Jul 1, 2017 at 4:47 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 5/1/17 12:19, Peter Eisentraut wrote: >> However: Failure to complete promotion within the waiting time does not >> lead to an error exit, so you will not get a failure if the promotion >> does not finish. This is probably a mistake. Looking around pg_ctl, I >> found that this was handled seemingly inconsistently in do_start(), but >> do_stop() errors when it does not complete. This inconsistency could be treated like a bug, though changing such an old behavior in bacl-branches would be risky. So +1 for only HEAD with such a change, and pg_ctl promote -w is new in 10. >> Possible patches for this attached. >> >> Perhaps we need a separate exit code in pg_ctl to distinguish general >> errors from did not finish within timeout? I would treat that as a separate item for 11, but that's as far as my opinion goes. Per this link in pg_ctl.c the error code ought to be 4: https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html > I was going to hold this back for PG11, but since we're now doing some > other tweaks in pg_ctl, it might be useful to add this too. Thoughts? The use of 0 as exit code for the new promote -w if timeout is reached looks like an open item to me. Cleaning up the pool queries after promotion would be nice to see as well. -- Michael
On 7/2/17 20:28, Michael Paquier wrote: >> I was going to hold this back for PG11, but since we're now doing some >> other tweaks in pg_ctl, it might be useful to add this too. Thoughts? > > The use of 0 as exit code for the new promote -w if timeout is reached > looks like an open item to me. Cleaning up the pool queries after > promotion would be nice to see as well. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jul 6, 2017 at 2:41 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 7/2/17 20:28, Michael Paquier wrote: >>> I was going to hold this back for PG11, but since we're now doing some >>> other tweaks in pg_ctl, it might be useful to add this too. Thoughts? >> >> The use of 0 as exit code for the new promote -w if timeout is reached >> looks like an open item to me. Cleaning up the pool queries after >> promotion would be nice to see as well. > > committed Thanks for finishing the cleanup. -- Michael