Обсуждение: BUG #15331: Please check if recovery.conf can be renamed
The following bug has been logged on the website: Bug reference: 15331 Logged by: Phil Endecott Email address: spam_from_pgsql_lists@chezphil.org PostgreSQL version: 9.6.10 Operating system: Debian Stretch Description: When a standby server is promoted it renames recovery.conf to recovery.done. That will not be possible if that file is owned by root or otherwise has the wrong permissions. It's unusual for a program to modify its own configuration files like this. It would be great if PostgreSQL could check that the permissions are suitable when it starts, and emit a warning if not. Currently it only fails when asked to promote, with this log message: FATAL: could not open file "recovery.conf": Permission denied (Note that it only says "could not open", not "could not rename".) This means that promotion fails, and for me even after fixing the permissions the system was in an odd state that took some work to fix. Failover is hard to get right; emitting a warning earlier in this case would mean one less thing to go wrong.
On Thu, Aug 16, 2018 at 11:30:09AM +0000, PG Bug reporting form wrote: > This means that promotion fails, and for me even after fixing the > permissions the system was in an odd state that took some work to fix. > Failover is hard to get right; emitting a warning earlier in this case would > mean one less thing to go wrong. I think that you would be interested in this recent commit (fixed as of the last round of minor releases): commit: cbc55da556bbcb649e059804009c38100ee98884 committer: Michael Paquier <michael@paquier.xyz> date: Mon, 9 Jul 2018 10:22:34 +0900 Rework order of end-of-recovery actions to delay timeline history write And this thread: https://postgr.es/m/CABUevEz09XY2EevA2dLjPCY-C5UO4Hq=XxmXLmF6ipNFecbShQ@mail.gmail.com To give you a summary, once recovery finished and before recovery.conf was renamed, some on-disk actions happened, which could put the cluster in a weird state, perhaps similarly to what you saw. -- Michael
Вложения
On 2018-08-16 20:50:55 +0900, Michael Paquier wrote: > On Thu, Aug 16, 2018 at 11:30:09AM +0000, PG Bug reporting form wrote: > > This means that promotion fails, and for me even after fixing the > > permissions the system was in an odd state that took some work to fix. > > Failover is hard to get right; emitting a warning earlier in this case would > > mean one less thing to go wrong. > > I think that you would be interested in this recent commit (fixed as of > the last round of minor releases): > commit: cbc55da556bbcb649e059804009c38100ee98884 > committer: Michael Paquier <michael@paquier.xyz> > date: Mon, 9 Jul 2018 10:22:34 +0900 > Rework order of end-of-recovery actions to delay timeline history write > > And this thread: > https://postgr.es/m/CABUevEz09XY2EevA2dLjPCY-C5UO4Hq=XxmXLmF6ipNFecbShQ@mail.gmail.com > > To give you a summary, once recovery finished and before recovery.conf > was renamed, some on-disk actions happened, which could put the cluster > in a weird state, perhaps similarly to what you saw. How would this address OP's concern? You'd still not learn meaningfully earlier that your attempted promotion failed (instead of learning of the problem before you ever promote). Greetings, Andres Freund
On Thu, Aug 16, 2018 at 05:09:43AM -0700, Andres Freund wrote: > How would this address OP's concern? You'd still not learn meaningfully > earlier that your attempted promotion failed (instead of learning of the > problem before you ever promote). The problem that the previous commit fixes is to make sure that even if recovery.conf renaming fails, then the cluster does not get into a weird state, making it reusable later on, and the OP would not see the later problems reported after the failed promotion. I am not sure that using a warning at an early stage would be actually useful as I doubt that any user would remark it, but there could be indeed an argument to make sure that recovery.conf has a correct permission set, and fail hard before entering recovery if that's not the case. I am not sure how much we want to restrict things though, lately has been for example introduced read grouping access in data folders... -- Michael