Обсуждение: pg_standby: Question about truncation of trigger file in fast failover
I was trying to understand (and then perhaps mimic) how pg_standby does a fast failover.
My current understanding is that when a secondary db is in standby mode, it will exhaust all the archive log to be replayed from the primary and then start streaming. It is at this point that xlog.c checks for the existence of a trigger file to promote the secondary. This was been a cause of some irritation for some of our customers who do not really care about catching up all the way. I want to achieve the exact semantics of pg_standby's fast failover option.
I manipulated the restore command to return 'failure' when the word "fast" is present in the trigger file (see below), hoping that when I want a secondary database to come out fast, I can just echo the word "fast" into the trigger file thereby simulating pg_standby's fast failover behavior. However, that did not work. Techically, I did not truncate the trigger file like how pg_standby.
<New restore_command> = ! fgrep -qsi fast <trigger_file> && <Old restore_command>
And that is where I have a question. I noticed that in pg_standby.c when we detect the word "fast" in the trigger file we truncate the file.
https://github.com/postgres/postgres/blob/REL9_1_11/contrib/pg_standby/pg_standby.c#L456
There is also a comment above it about not "upsetting" the server.
https://github.com/postgres/postgres/blob/REL9_1_11/contrib/pg_standby/pg_standby.c#L454
What is the purpose of truncating the file? To do a smart failover once you come out of standby? But, when I look at xlog.c, when we come out of standby due to a failure returned by restore_command, we call CheckForStandbyTrigger() here:
https://github.com/postgres/postgres/blob/REL9_1_11/src/backend/access/transam/xlog.c#L10441
Now, CheckForStandbyTrigger() unlinks the trigger file. I noticed through the debugger that the unlinking happens before xlog.c makes a call to the next restore_command. So, what is the reason for truncating the "fast" word from the trigger file if the file is going to be deleted soon after it is discovered? How will we "upset" the server if we don't?
Assuming this question is answered and I get a better understanding, I have a follow up question. If truncation is indeed necessary, can I simulate the truncation by manipulating restore_command and achieve the same effect as a fast failover in pg_standby?
Thanks in advance for the help.
Neil
Re: pg_standby: Question about truncation of trigger file in fast failover
On 02/19/2014 11:15 PM, Neil Thombre wrote: > And that is where I have a question. I noticed that in pg_standby.c when we > detect the word "fast" in the trigger file we truncate the file. > > https://github.com/postgres/postgres/blob/REL9_1_11/contrib/pg_standby/pg_standby.c#L456 > > There is also a comment above it about not "upsetting" the server. > > https://github.com/postgres/postgres/blob/REL9_1_11/contrib/pg_standby/pg_standby.c#L454 > > What is the purpose of truncating the file? To do a smart failover once you > come out of standby? But, when I look at xlog.c, when we come out of > standby due to a failure returned by restore_command, we call > CheckForStandbyTrigger() here: > > https://github.com/postgres/postgres/blob/REL9_1_11/src/backend/access/transam/xlog.c#L10441 > > Now, CheckForStandbyTrigger() unlinks the trigger file. I noticed through > the debugger that the unlinking happens before xlog.c makes a call to the > next restore_command. So, what is the reason for truncating the "fast" > word from the trigger file if the file is going to be deleted soon after it > is discovered? How will we "upset" the server if we don't? At end-of-recovery, the server will fetch again the last WAL file that was replayed. If it can no longer find it, because restore_command now returns an error even though it succeeded for the same file few seconds earlier, it will throw an error and refuse to start up. That's the way it used to be until 9.2, anyway. In 9.2, the behavior was changed, so that the server keeps all the files restored from archive, in pg_xlog, so that it can access them again. I haven't tried, but it's possible that the truncation is no longer necessary. Try it, with 9.1 and 9.3, and see what happens. - Heikki
On 02/19/2014 11:15 PM, Neil Thombre wrote:At end-of-recovery, the server will fetch again the last WAL file that was replayed. If it can no longer find it, because restore_command now returns an error even though it succeeded for the same file few seconds earlier, it will throw an error and refuse to start up.And that is where I have a question. I noticed that in pg_standby.c when we
detect the word "fast" in the trigger file we truncate the file.
https://github.com/postgres/postgres/blob/REL9_1_11/contrib/pg_standby/pg_standby.c#L456
There is also a comment above it about not "upsetting" the server.
https://github.com/postgres/postgres/blob/REL9_1_11/contrib/pg_standby/pg_standby.c#L454
What is the purpose of truncating the file? To do a smart failover once you
come out of standby? But, when I look at xlog.c, when we come out of
standby due to a failure returned by restore_command, we call
CheckForStandbyTrigger() here:
https://github.com/postgres/postgres/blob/REL9_1_11/src/backend/access/transam/xlog.c#L10441
Now, CheckForStandbyTrigger() unlinks the trigger file. I noticed through
the debugger that the unlinking happens before xlog.c makes a call to the
next restore_command. So, what is the reason for truncating the "fast"
word from the trigger file if the file is going to be deleted soon after it
is discovered? How will we "upset" the server if we don't?
Feb 7 00:37:45 XXXXXXXXXXXXXXXXXXXXXXXX LOG: restored log file "0000000300000C3100000099" from archive
Feb 7 00:37:45 XXXXXXXXXXXXXXXXXXXXXXXX FATAL: WAL ends before consistent recovery point
This error comes from:
https://github.com/postgres/postgres/blob/REL9_1_11/src/backend/access/transam/xlog.c#L6782-L6783
Therefore, I feel that something was amiss in my setup. And I wanted to understand the motive/tribal-knowledge behind the truncation part of pg_standby's fast failover so as not to upset the server. In other words, I have a feeling that by not truncating the trigger file I am inadvertently upsetting the server which is the cause of my FATAL error.
That's the way it used to be until 9.2, anyway. In 9.2, the behavior was changed, so that the server keeps all the files restored from archive, in pg_xlog, so that it can access them again. I haven't tried, but it's possible that the truncation is no longer necessary. Try it, with 9.1 and 9.3, and see what happens.
- Heikki