Обсуждение: pgsql: Fast promote mode skips checkpoint at end of recovery.

Поиск
Список
Период
Сортировка

pgsql: Fast promote mode skips checkpoint at end of recovery.

От
Simon Riggs
Дата:
Fast promote mode skips checkpoint at end of recovery.
pg_ctl promote -m fast will skip the checkpoint at end of recovery so that we
can achieve very fast failover when the apply delay is low. Write new WAL record
XLOG_END_OF_RECOVERY to allow us to switch timeline correctly for downstream log
readers. If we skip synchronous end of recovery checkpoint we request a normal
spread checkpoint so that the window of re-recovery is low.

Simon Riggs and Kyotaro Horiguchi, with input from Fujii Masao.
Review by Heikki Linnakangas

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/fd4ced5230162b50a5c9d33b4bf9cfb1231aa62e

Modified Files
--------------
src/backend/access/rmgrdesc/xlogdesc.c |   10 ++
src/backend/access/transam/xlog.c      |  192 +++++++++++++++++++++++++++-----
src/bin/pg_ctl/pg_ctl.c                |   18 +++-
src/include/access/xlog_internal.h     |    6 +
src/include/catalog/pg_control.h       |    1 +
5 files changed, 195 insertions(+), 32 deletions(-)


Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

От
Heikki Linnakangas
Дата:
On 29.01.2013 02:07, Simon Riggs wrote:
> +            /*
> +             * If we've been explicitly promoted with fast option,
> +             * end of recovery without a checkpoint if possible.
> +             */
> +            if (fast_promote)
> +            {
> +                checkPointLoc = ControlFile->prevCheckPoint;
> +                record = ReadCheckpointRecord(xlogreader, checkPointLoc, 2, false);
> +                if (record != NULL)
> +                {
> +                    checkpoint_wait = false;
> +                    CreateEndOfRecoveryRecord();
> +                }
> +            }

If we must have this ReadCheckPointRecord check, it needs more than zero
comments. Also, if it ever fails for some reason, I'd like to have a big
fat warning in the log to caution that something went badly wrong.

Why does it insist that we still have not only the latest checkpoint,
but the previous one too? At recovery, we fall back to the previous
checkpoint if we can't access the latest one, but that's just a
desperate measure to try to recover something if things have gone badly
wrong. It's OK to not have the WAL containing the previous checkpoint
still around. In particular, right after restoring from a base backup,
e.g with pg_basebackup -x, or with good old pg_start/stop_backup, the
WAL included with the backup won't stretch back to previous checkpoint.

- Heikki


Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

От
Simon Riggs
Дата:
On 29 January 2013 11:31, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 29.01.2013 02:07, Simon Riggs wrote:
>>
>> +                       /*
>> +                        * If we've been explicitly promoted with fast
>> option,
>> +                        * end of recovery without a checkpoint if
>> possible.
>> +                        */
>> +                       if (fast_promote)
>> +                       {
>> +                               checkPointLoc =
>> ControlFile->prevCheckPoint;
>> +                               record = ReadCheckpointRecord(xlogreader,
>> checkPointLoc, 2, false);
>> +                               if (record != NULL)
>> +                               {
>> +                                       checkpoint_wait = false;
>> +                                       CreateEndOfRecoveryRecord();
>> +                               }
>> +                       }
>
>
> If we must have this ReadCheckPointRecord check, it needs more than zero
> comments. Also, if it ever fails for some reason, I'd like to have a big fat
> warning in the log to caution that something went badly wrong.

> Why does it insist that we still have not only the latest checkpoint, but
> the previous one too? At recovery, we fall back to the previous checkpoint
> if we can't access the latest one, but that's just a desperate measure to
> try to recover something if things have gone badly wrong. It's OK to not
> have the WAL containing the previous checkpoint still around. In particular,
> right after restoring from a base backup, e.g with pg_basebackup -x, or with
> good old pg_start/stop_backup, the WAL included with the backup won't
> stretch back to previous checkpoint.

As you say, there are cases where the lack of a secondary checkpoint
could be considered normal, hence no message to confuse the user.

We don't actually need a fast promotion when restoring from backup, so
we don't do it.

I want this to work for the cases we need it, and not break when we
don't need it.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

От
Heikki Linnakangas
Дата:
On 29.01.2013 13:46, Simon Riggs wrote:
> On 29 January 2013 11:31, Heikki Linnakangas<hlinnakangas@vmware.com>  wrote:
>> On 29.01.2013 02:07, Simon Riggs wrote:
>>>
>>> +                       /*
>>> +                        * If we've been explicitly promoted with fast
>>> option,
>>> +                        * end of recovery without a checkpoint if
>>> possible.
>>> +                        */
>>> +                       if (fast_promote)
>>> +                       {
>>> +                               checkPointLoc =
>>> ControlFile->prevCheckPoint;
>>> +                               record = ReadCheckpointRecord(xlogreader,
>>> checkPointLoc, 2, false);
>>> +                               if (record != NULL)
>>> +                               {
>>> +                                       checkpoint_wait = false;
>>> +                                       CreateEndOfRecoveryRecord();
>>> +                               }
>>> +                       }
>>
>>
>> If we must have this ReadCheckPointRecord check, it needs more than zero
>> comments. Also, if it ever fails for some reason, I'd like to have a big fat
>> warning in the log to caution that something went badly wrong.
>
>> Why does it insist that we still have not only the latest checkpoint, but
>> the previous one too? At recovery, we fall back to the previous checkpoint
>> if we can't access the latest one, but that's just a desperate measure to
>> try to recover something if things have gone badly wrong. It's OK to not
>> have the WAL containing the previous checkpoint still around. In particular,
>> right after restoring from a base backup, e.g with pg_basebackup -x, or with
>> good old pg_start/stop_backup, the WAL included with the backup won't
>> stretch back to previous checkpoint.
>
> As you say, there are cases where the lack of a secondary checkpoint
> could be considered normal, hence no message to confuse the user.
>
> We don't actually need a fast promotion when restoring from backup, so
> we don't do it.

You might want to bring the database up ASAP after restoring. If the
user requests that, the system shouldn't second-guess that.

PS. I think the implicit judgment you made that "pg_ctl promote" is now
the preferred method of promoting the server, over the trigger file
method, needs more discussion. I'm not sure I agree with that, but if we
do that, the docs should emphasize the pg_ctl promote more than the
trigger file. Also, I don't like conflating the shutdown mode argument
with promotion mode either, in pg_ctl. Perhaps it would be best to
revert this and take some more time to discuss the right behavior and
user interface for this (if it needs one).

PPS. doc changes are missing...

- Heikki


Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

От
Simon Riggs
Дата:
On 29 January 2013 12:19, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> You might want to bring the database up ASAP after restoring. If the user
> requests that, the system shouldn't second-guess that.

In later releases we can relax further, if that is justified. I call
this acting conservatively in the interests of robustness.

> PS. I think the implicit judgment you made that "pg_ctl promote" is now the
> preferred method of promoting the server, over the trigger file method,
> needs more discussion. I'm not sure I agree with that, but if we do that,
> the docs should emphasize the pg_ctl promote more than the trigger file.

So why did you commit a second method?

> Also, I don't like conflating the shutdown mode argument with promotion mode
> either, in pg_ctl. Perhaps it would be best to revert this and take some
> more time to discuss the right behavior and user interface for this (if it
> needs one).

I don't think so. This was discussed on list. You are asking for
additional features, which I've explained why they aren't added by me.
We have time left to add them, but these minor points aren't more
important than other patches in the queue.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

От
Fujii Masao
Дата:
On Tue, Jan 29, 2013 at 9:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Fast promote mode skips checkpoint at end of recovery.
> pg_ctl promote -m fast will skip the checkpoint at end of recovery so that we
> can achieve very fast failover when the apply delay is low. Write new WAL record
> XLOG_END_OF_RECOVERY to allow us to switch timeline correctly for downstream log
> readers. If we skip synchronous end of recovery checkpoint we request a normal
> spread checkpoint so that the window of re-recovery is low.

When I tested this feature, I encountered the following FATAL message.

    FATAL:  highest timeline 1 of the primary is behind recovery timeline 2

Is this an intentional behavior or bug? What I did in my test is:

1. Set up one master (A), one standby (B), one cascade standby (C)
2. After running pgbench -i -s 10, I promoted the standby (B) with fast mode
3. Then, I shut down the server (B) with immediate mode after it has been
    brought up to the master before end-of-recovery checkpoint has not been
    completed.
4. Restart the server (B).
5. After the standby (C) established the replication connection with (B),
    I got the above FATAL messages repeatedly.

Promoting (B) increments the timeline ID to 2 and generates the timeline
history file. But after restarting (B), its timeline ID is reset to 1
unexpectedly.
This seems to be the cause of the problem.

To address this problem, we should switch to new timeline ID whenever
we read the XLOG_END_OF_RECOVERY even if it's a crash recovery?

Regards,

--
Fujii Masao


Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

От
Fujii Masao
Дата:
On Wed, Jan 30, 2013 at 1:27 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Jan 29, 2013 at 9:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Fast promote mode skips checkpoint at end of recovery.
>> pg_ctl promote -m fast will skip the checkpoint at end of recovery so that we
>> can achieve very fast failover when the apply delay is low. Write new WAL record
>> XLOG_END_OF_RECOVERY to allow us to switch timeline correctly for downstream log
>> readers. If we skip synchronous end of recovery checkpoint we request a normal
>> spread checkpoint so that the window of re-recovery is low.
>
> When I tested this feature, I encountered the following FATAL message.
>
>     FATAL:  highest timeline 1 of the primary is behind recovery timeline 2
>
> Is this an intentional behavior or bug? What I did in my test is:
>
> 1. Set up one master (A), one standby (B), one cascade standby (C)
> 2. After running pgbench -i -s 10, I promoted the standby (B) with fast mode
> 3. Then, I shut down the server (B) with immediate mode after it has been
>     brought up to the master before end-of-recovery checkpoint has not been
>     completed.
> 4. Restart the server (B).
> 5. After the standby (C) established the replication connection with (B),
>     I got the above FATAL messages repeatedly.
>
> Promoting (B) increments the timeline ID to 2 and generates the timeline
> history file. But after restarting (B), its timeline ID is reset to 1
> unexpectedly.
> This seems to be the cause of the problem.
>
> To address this problem, we should switch to new timeline ID whenever
> we read the XLOG_END_OF_RECOVERY even if it's a crash recovery?

On second thought, we don't need such a complicated test case to produce
the problem which derives from the same cause of reported problem. The
procedure to produce the problem is:

1. Set up one master (A) and one standby (B)
2. Promote (B) with fast mode after running pgbench -i -s 10
3. Execute the write transaction on new master (B)
4. Shut down (B) with immediate mode before end-of-recovery checkpoint
has been completed
5. Restart (B)

Then you can confirm that the write transaction that you executed in #3 has
been lost.

Regards,

--
Fujii Masao


Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

От
Simon Riggs
Дата:
On 29 January 2013 16:38, Fujii Masao <masao.fujii@gmail.com> wrote:

> On second thought, we don't need such a complicated test case to produce
> the problem which derives from the same cause of reported problem. The
> procedure to produce the problem is:
>
> 1. Set up one master (A) and one standby (B)
> 2. Promote (B) with fast mode after running pgbench -i -s 10
> 3. Execute the write transaction on new master (B)
> 4. Shut down (B) with immediate mode before end-of-recovery checkpoint
> has been completed
> 5. Restart (B)
>
> Then you can confirm that the write transaction that you executed in #3 has
> been lost.

Thanks for the test case, that was quick!

It looks like my caution was justified about this.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

От
Simon Riggs
Дата:
On 29 January 2013 16:27, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Jan 29, 2013 at 9:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Fast promote mode skips checkpoint at end of recovery.
>> pg_ctl promote -m fast will skip the checkpoint at end of recovery so that we
>> can achieve very fast failover when the apply delay is low. Write new WAL record
>> XLOG_END_OF_RECOVERY to allow us to switch timeline correctly for downstream log
>> readers. If we skip synchronous end of recovery checkpoint we request a normal
>> spread checkpoint so that the window of re-recovery is low.
>
> When I tested this feature, I encountered the following FATAL message.
>
>     FATAL:  highest timeline 1 of the primary is behind recovery timeline 2
>
> Is this an intentional behavior or bug?

Tough one that.

> What I did in my test is:
>
> 1. Set up one master (A), one standby (B), one cascade standby (C)
> 2. After running pgbench -i -s 10, I promoted the standby (B) with fast mode
> 3. Then, I shut down the server (B) with immediate mode after it has been
>     brought up to the master before end-of-recovery checkpoint has not been
>     completed.
> 4. Restart the server (B).
> 5. After the standby (C) established the replication connection with (B),
>     I got the above FATAL messages repeatedly.

Where do you get the errors, which server? The above doesn't contain a
promote command, so how does this make it fail.

Please show me the test case in more detail.

> Promoting (B) increments the timeline ID to 2 and generates the timeline
> history file. But after restarting (B), its timeline ID is reset to 1
> unexpectedly.
> This seems to be the cause of the problem.
>
> To address this problem, we should switch to new timeline ID whenever
> we read the XLOG_END_OF_RECOVERY even if it's a crash recovery?

We do. Do you see a problem with that code? There is no conditional recovery.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

От
Devrim Gündüz
Дата:
Simon Riggs <simon@2ndQuadrant.com> wrote:

>On 29 January 2013 16:27, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Jan 29, 2013 at 9:07 AM, Simon Riggs <simon@2ndquadrant.com>
>wrote:
>>> Fast promote mode skips checkpoint at end of recovery.
>>> pg_ctl promote -m fast will skip the checkpoint at end of recovery
>so that we
>>> can achieve very fast failover when the apply delay is low. Write
>new WAL record
>>> XLOG_END_OF_RECOVERY to allow us to switch timeline correctly for
>downstream log
>>> readers. If we skip synchronous end of recovery checkpoint we
>request a normal
>>> spread checkpoint so that the window of re-recovery is low.
>>
>> When I tested this feature, I encountered the following FATAL
>message.
>>
>>     FATAL:  highest timeline 1 of the primary is behind recovery
>timeline 2
>>
>> Is this an intentional behavior or bug?
>
>Tough one that.
>
>> What I did in my test is:
>>
>> 1. Set up one master (A), one standby (B), one cascade standby (C)
>> 2. After running pgbench -i -s 10, I promoted the standby (B) with
>fast mode
>> 3. Then, I shut down the server (B) with immediate mode after it has
>been
>>     brought up to the master before end-of-recovery checkpoint has
>not been
>>     completed.
>> 4. Restart the server (B).
>> 5. After the standby (C) established the replication connection with
>(B),
>>     I got the above FATAL messages repeatedly.
>
>Where do you get the errors, which server? The above doesn't contain a
>promote command, so how does this make it fail.
>
>Please show me the test case in more detail.
>
>> Promoting (B) increments the timeline ID to 2 and generates the
>timeline
>> history file. But after restarting (B), its timeline ID is reset to 1
>> unexpectedly.
>> This seems to be the cause of the problem.
>>
>> To address this problem, we should switch to new timeline ID whenever
>> we read the XLOG_END_OF_RECOVERY even if it's a crash recovery?
>
>We do. Do you see a problem with that code? There is no conditional
>recovery.

Hi,

Could you please move this to -hackers, for archives' sake?

Regards, Devrim
--
Devrim Gündüz


Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

От
Simon Riggs
Дата:
On 29 January 2013 16:51, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 29 January 2013 16:38, Fujii Masao <masao.fujii@gmail.com> wrote:
>
>> On second thought, we don't need such a complicated test case to produce
>> the problem which derives from the same cause of reported problem. The
>> procedure to produce the problem is:
>>
>> 1. Set up one master (A) and one standby (B)
>> 2. Promote (B) with fast mode after running pgbench -i -s 10
>> 3. Execute the write transaction on new master (B)
>> 4. Shut down (B) with immediate mode before end-of-recovery checkpoint
>> has been completed
>> 5. Restart (B)
>>
>> Then you can confirm that the write transaction that you executed in #3 has
>> been lost.
>
> Thanks for the test case, that was quick!

OK, I can confirm this bug.

This needs more work as is, so I'll revert and re-post.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: Fast promote mode skips checkpoint at end of recovery.

От
Simon Riggs
Дата:
On 30 January 2013 17:26, Simon Riggs <simon@2ndquadrant.com> wrote:

>> Thanks for the test case, that was quick!
>
> OK, I can confirm this bug.
>
> This needs more work as is, so I'll revert and re-post.

The fix was pretty simple in the end, so I've not reverted, just
applied the fix.

If anyone really wants me to revert, pls start new hackers thread to
discuss, or comment on changes.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services