Обсуждение: BUG #15331: Please check if recovery.conf can be renamed

Поиск
Список
Период
Сортировка

BUG #15331: Please check if recovery.conf can be renamed

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15331
Logged by:          Phil Endecott
Email address:      spam_from_pgsql_lists@chezphil.org
PostgreSQL version: 9.6.10
Operating system:   Debian Stretch
Description:

When a standby server is promoted it renames recovery.conf to
recovery.done.
That will not be possible if that file is owned by root or otherwise has the
wrong permissions.  It's unusual for a program to modify its own
configuration files like this.
It would be great if PostgreSQL could check that the permissions are
suitable when it starts, and emit a warning if not.  Currently it only fails
when asked to promote, with this log message:
FATAL:  could not open file "recovery.conf": Permission denied
(Note that it only says "could not open", not "could not rename".)
This means that promotion fails, and for me even after fixing the
permissions the system was in an odd state that took some work to fix.
Failover is hard to get right; emitting a warning earlier in this case would
mean one less thing to go wrong.


Re: BUG #15331: Please check if recovery.conf can be renamed

От
Michael Paquier
Дата:
On Thu, Aug 16, 2018 at 11:30:09AM +0000, PG Bug reporting form wrote:
> This means that promotion fails, and for me even after fixing the
> permissions the system was in an odd state that took some work to fix.
> Failover is hard to get right; emitting a warning earlier in this case would
> mean one less thing to go wrong.

I think that you would be interested in this recent commit (fixed as of
the last round of minor releases):
commit: cbc55da556bbcb649e059804009c38100ee98884
committer: Michael Paquier <michael@paquier.xyz>
date: Mon, 9 Jul 2018 10:22:34 +0900
Rework order of end-of-recovery actions to delay timeline history write

And this thread:
https://postgr.es/m/CABUevEz09XY2EevA2dLjPCY-C5UO4Hq=XxmXLmF6ipNFecbShQ@mail.gmail.com

To give you a summary, once recovery finished and before recovery.conf
was renamed, some on-disk actions happened, which could put the cluster
in a weird state, perhaps similarly to what you saw.
--
Michael

Вложения

Re: BUG #15331: Please check if recovery.conf can be renamed

От
Andres Freund
Дата:
On 2018-08-16 20:50:55 +0900, Michael Paquier wrote:
> On Thu, Aug 16, 2018 at 11:30:09AM +0000, PG Bug reporting form wrote:
> > This means that promotion fails, and for me even after fixing the
> > permissions the system was in an odd state that took some work to fix.
> > Failover is hard to get right; emitting a warning earlier in this case would
> > mean one less thing to go wrong.
> 
> I think that you would be interested in this recent commit (fixed as of
> the last round of minor releases):
> commit: cbc55da556bbcb649e059804009c38100ee98884
> committer: Michael Paquier <michael@paquier.xyz>
> date: Mon, 9 Jul 2018 10:22:34 +0900
> Rework order of end-of-recovery actions to delay timeline history write
> 
> And this thread:
> https://postgr.es/m/CABUevEz09XY2EevA2dLjPCY-C5UO4Hq=XxmXLmF6ipNFecbShQ@mail.gmail.com
> 
> To give you a summary, once recovery finished and before recovery.conf
> was renamed, some on-disk actions happened, which could put the cluster
> in a weird state, perhaps similarly to what you saw.

How would this address OP's concern? You'd still not learn meaningfully
earlier that your attempted promotion failed (instead of learning of the
problem before you ever promote).

Greetings,

Andres Freund


Re: BUG #15331: Please check if recovery.conf can be renamed

От
Michael Paquier
Дата:
On Thu, Aug 16, 2018 at 05:09:43AM -0700, Andres Freund wrote:
> How would this address OP's concern? You'd still not learn meaningfully
> earlier that your attempted promotion failed (instead of learning of the
> problem before you ever promote).

The problem that the previous commit fixes is to make sure that even if
recovery.conf renaming fails, then the cluster does not get into a weird
state, making it reusable later on, and the OP would not see the later
problems reported after the failed promotion.  I am not sure that using
a warning at an early stage would be actually useful as I doubt that any
user would remark it, but there could be indeed an argument to make sure
that recovery.conf has a correct permission set, and fail hard before
entering recovery if that's not the case.  I am not sure how much we
want to restrict things though, lately has been for example introduced
read grouping access in data folders...
--
Michael

Вложения