Обсуждение: Clean switchover

Поиск
Список
Период
Сортировка

Clean switchover

От
Fujii Masao
Дата:
Hi,

In streaming replication, when we shutdown the master, walsender tries to
send all the outstanding WAL records including the shutdown checkpoint
record to the standby, and then to exit. This basically means that all the
WAL records are fully synced between two servers after the clean shutdown
of the master. So, after promoting the standby to new master, we can
restart the stopped master as new standby without the need for a fresh
backup from new master.

But there is one problem: though walsender tries to send all the outstanding
WAL records, it doesn't wait for them to be replicated to the standby. IOW,
walsender closes the replication connection as soon as it sends WAL records.
Then, before receiving all the WAL records, walreceiver can detect
the closure of connection and exit. We cannot guarantee that there is no
missing WAL in the standby after clean shutdown of the master. In this case,
backup from new master is required when restarting the stopped master as
new standby. I have experienced this case several times, especially when
enabling WAL archiving.

The attached patch fixes this problem. It just changes walsender so that it
waits for all the outstanding WAL records to be replicated to the standby
before closing the replication connection.

You may be concerned the case where the standby gets stuck and the
walsender keeps waiting for the reply from that standby. In this case,
wal_sender_timeout detects such inactive standby and then walsender
ends. So even in that case, the shutdown can end.

Thought?

Regards,

--
Fujii Masao

Вложения

Re: Clean switchover

От
Stephen Frost
Дата:
* Fujii Masao (masao.fujii@gmail.com) wrote:
> The attached patch fixes this problem. It just changes walsender so that it
> waits for all the outstanding WAL records to be replicated to the standby
> before closing the replication connection.

Seems like a good idea to me..  Rather surprised that we're not doing
this already, to be honest.
Thanks,
    Stephen

Re: Clean switchover

От
Mark Kirkwood
Дата:
On 12/06/13 13:15, Stephen Frost wrote:
> * Fujii Masao (masao.fujii@gmail.com) wrote:
>> The attached patch fixes this problem. It just changes walsender so that it
>> waits for all the outstanding WAL records to be replicated to the standby
>> before closing the replication connection.
>
> Seems like a good idea to me..  Rather surprised that we're not doing
> this already, to be honest.
>

Yeah +1 from here too. This would make clean switchovers for (typically) 
testing scenarios a lot less complex and resource intensive (rebuilding 
of the old master as a slave when you know it is ok is despairing on a 
huge db).

On the related note (but not actually to do with this patch), 
clarifying/expanding the docs about the various methods for standby 
promotion:

1/ trigger file creation
2/ pg_ctl promote
3/ renaming/removing recovery.conf

and the differences between them would be great. For instance I only 
recently realized that method 3) means the promoted standby does not 
start a new timeline (incidentally - could this be an option to pg_ctl 
promote) which is very useful for (again) controlled/clean switchovers.

regards

Mark





Re: Clean switchover

От
Amit Kapila
Дата:
On Wednesday, June 12, 2013 4:23 AM Fujii Masao wrote:
> Hi,
> 
> In streaming replication, when we shutdown the master, walsender tries
> to send all the outstanding WAL records including the shutdown
> checkpoint record to the standby, and then to exit. This basically
> means that all the WAL records are fully synced between two servers
> after the clean shutdown of the master. So, after promoting the standby
> to new master, we can restart the stopped master as new standby without
> the need for a fresh backup from new master.
> 
> But there is one problem: though walsender tries to send all the
> outstanding WAL records, it doesn't wait for them to be replicated to
> the standby. IOW, walsender closes the replication connection as soon
> as it sends WAL records.
> Then, before receiving all the WAL records, walreceiver can detect the
> closure of connection and exit. We cannot guarantee that there is no
> missing WAL in the standby after clean shutdown of the master. In this
> case, backup from new master is required when restarting the stopped
> master as new standby. I have experienced this case several times,
> especially when enabling WAL archiving.
> 
> The attached patch fixes this problem. It just changes walsender so
> that it waits for all the outstanding WAL records to be replicated to
> the standby before closing the replication connection.
>
> You may be concerned the case where the standby gets stuck and the
> walsender keeps waiting for the reply from that standby. In this case,
> wal_sender_timeout detects such inactive standby and then walsender
> ends. So even in that case, the shutdown can end.

Do you think it can impact time to complete shutdown?
After completing shutdown, user will promote standby to master, so if there
is delay in shutdown, it can cause delay in switchover.


With Regards,
Amit Kapila.




Re: Clean switchover

От
Magnus Hagander
Дата:
On Wed, Jun 12, 2013 at 6:41 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
> On Wednesday, June 12, 2013 4:23 AM Fujii Masao wrote:
>> Hi,
>>
>> In streaming replication, when we shutdown the master, walsender tries
>> to send all the outstanding WAL records including the shutdown
>> checkpoint record to the standby, and then to exit. This basically
>> means that all the WAL records are fully synced between two servers
>> after the clean shutdown of the master. So, after promoting the standby
>> to new master, we can restart the stopped master as new standby without
>> the need for a fresh backup from new master.
>>
>> But there is one problem: though walsender tries to send all the
>> outstanding WAL records, it doesn't wait for them to be replicated to
>> the standby. IOW, walsender closes the replication connection as soon
>> as it sends WAL records.
>> Then, before receiving all the WAL records, walreceiver can detect the
>> closure of connection and exit. We cannot guarantee that there is no
>> missing WAL in the standby after clean shutdown of the master. In this
>> case, backup from new master is required when restarting the stopped
>> master as new standby. I have experienced this case several times,
>> especially when enabling WAL archiving.
>>
>> The attached patch fixes this problem. It just changes walsender so
>> that it waits for all the outstanding WAL records to be replicated to
>> the standby before closing the replication connection.
>>
>> You may be concerned the case where the standby gets stuck and the
>> walsender keeps waiting for the reply from that standby. In this case,
>> wal_sender_timeout detects such inactive standby and then walsender
>> ends. So even in that case, the shutdown can end.
>
> Do you think it can impact time to complete shutdown?
> After completing shutdown, user will promote standby to master, so if there
> is delay in shutdown, it can cause delay in switchover.

I'd expect a controlled switchover to happen without dataloss. Yes,
this could make it take a bit longer time, but it guarantees you don't
loose data. ISTM that if you don't care about the potential dataloss,
you can just use a faster shutdown method (e.g. immediate)

--Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/



Re: Clean switchover

От
Andres Freund
Дата:
Hi,

On 2013-06-12 07:53:29 +0900, Fujii Masao wrote:
> The attached patch fixes this problem. It just changes walsender so that it
> waits for all the outstanding WAL records to be replicated to the standby
> before closing the replication connection.

Imo this is a fix that needs to get backpatched... The code tried to do
this but failed, I don't think it really gives grounds for valid *new*
concerns.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Clean switchover

От
Magnus Hagander
Дата:
On Wed, Jun 12, 2013 at 1:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
> On 2013-06-12 07:53:29 +0900, Fujii Masao wrote:
>> The attached patch fixes this problem. It just changes walsender so that it
>> waits for all the outstanding WAL records to be replicated to the standby
>> before closing the replication connection.
>
> Imo this is a fix that needs to get backpatched... The code tried to do
> this but failed, I don't think it really gives grounds for valid *new*
> concerns.

+1 (without having looked at the code itself, it's definitely a
behaviour that needs to be fixed)

--Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/



Re: Clean switchover

От
Stephen Frost
Дата:
* Magnus Hagander (magnus@hagander.net) wrote:
> On Wed, Jun 12, 2013 at 1:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-06-12 07:53:29 +0900, Fujii Masao wrote:
> >> The attached patch fixes this problem. It just changes walsender so that it
> >> waits for all the outstanding WAL records to be replicated to the standby
> >> before closing the replication connection.
> >
> > Imo this is a fix that needs to get backpatched... The code tried to do
> > this but failed, I don't think it really gives grounds for valid *new*
> > concerns.
>
> +1 (without having looked at the code itself, it's definitely a
> behaviour that needs to be fixed)

Yea, I was also thinking it would be reasonable to backpatch this; it
really looks like a bug that we're allowing this to happen today.

So, +1 on a backpatch for me.  I've looked at the patch (it's a
one-liner, plus some additional comments) but havn't looked through the
overall code surrounding it.
Thanks,
    Stephen

Re: Clean switchover

От
Andres Freund
Дата:
On 2013-06-12 08:48:39 -0400, Stephen Frost wrote:
> * Magnus Hagander (magnus@hagander.net) wrote:
> > On Wed, Jun 12, 2013 at 1:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > > On 2013-06-12 07:53:29 +0900, Fujii Masao wrote:
> > >> The attached patch fixes this problem. It just changes walsender so that it
> > >> waits for all the outstanding WAL records to be replicated to the standby
> > >> before closing the replication connection.
> > >
> > > Imo this is a fix that needs to get backpatched... The code tried to do
> > > this but failed, I don't think it really gives grounds for valid *new*
> > > concerns.
> > 
> > +1 (without having looked at the code itself, it's definitely a
> > behaviour that needs to be fixed)
> 
> Yea, I was also thinking it would be reasonable to backpatch this; it
> really looks like a bug that we're allowing this to happen today.
> 
> So, +1 on a backpatch for me.  I've looked at the patch (it's a
> one-liner, plus some additional comments) but havn't looked through the
> overall code surrounding it.

I've read most of the surrounding code and I think the patch is as
sensible as it can be without reworking the whole walsender main loop
which seems like a job for another day.

I'd personally write     if (caughtup && !pq_is_send_pending() &&         sentPtr == MyWalSnd->flush)
as     if (caughtup && sentPtr == MyWalSnd->flush &&         !pq_is_send_pending())

Since pq_is_send_pending() basically can only be false if the flush
comparison is true. There's the tiny chance that we were sending a
message out just before which is why we should include the
!pq_is_send_pending() condition at all in that if().

But that's fairly, fairly minor.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Clean switchover

От
Fujii Masao
Дата:
On Wed, Jun 12, 2013 at 12:55 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
> On 12/06/13 13:15, Stephen Frost wrote:
>>
>> * Fujii Masao (masao.fujii@gmail.com) wrote:
>>>
>>> The attached patch fixes this problem. It just changes walsender so that
>>> it
>>> waits for all the outstanding WAL records to be replicated to the standby
>>> before closing the replication connection.
>>
>>
>> Seems like a good idea to me..  Rather surprised that we're not doing
>> this already, to be honest.
>>
>
> Yeah +1 from here too. This would make clean switchovers for (typically)
> testing scenarios a lot less complex and resource intensive (rebuilding of
> the old master as a slave when you know it is ok is despairing on a huge
> db).
>
> On the related note (but not actually to do with this patch),
> clarifying/expanding the docs about the various methods for standby
> promotion:
>
> 1/ trigger file creation
> 2/ pg_ctl promote
> 3/ renaming/removing recovery.conf
>
> and the differences between them would be great. For instance I only
> recently realized that method 3) means the promoted standby does not start a
> new timeline (incidentally - could this be an option to pg_ctl promote)
> which is very useful for (again) controlled/clean switchovers.

In 9.3, you no longer need to worry about the increment of timeline
after the promotion because the standby can automatically follow
the timeline switch.

Regards,

-- 
Fujii Masao



Re: Clean switchover

От
Fujii Masao
Дата:
On Wed, Jun 12, 2013 at 9:55 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-06-12 08:48:39 -0400, Stephen Frost wrote:
>> * Magnus Hagander (magnus@hagander.net) wrote:
>> > On Wed, Jun 12, 2013 at 1:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > > On 2013-06-12 07:53:29 +0900, Fujii Masao wrote:
>> > >> The attached patch fixes this problem. It just changes walsender so that it
>> > >> waits for all the outstanding WAL records to be replicated to the standby
>> > >> before closing the replication connection.
>> > >
>> > > Imo this is a fix that needs to get backpatched... The code tried to do
>> > > this but failed, I don't think it really gives grounds for valid *new*
>> > > concerns.
>> >
>> > +1 (without having looked at the code itself, it's definitely a
>> > behaviour that needs to be fixed)
>>
>> Yea, I was also thinking it would be reasonable to backpatch this; it
>> really looks like a bug that we're allowing this to happen today.
>>
>> So, +1 on a backpatch for me.  I've looked at the patch (it's a
>> one-liner, plus some additional comments) but havn't looked through the
>> overall code surrounding it.
>
> I've read most of the surrounding code and I think the patch is as
> sensible as it can be without reworking the whole walsender main loop
> which seems like a job for another day.
>
> I'd personally write
>       if (caughtup && !pq_is_send_pending() &&
>           sentPtr == MyWalSnd->flush)
> as
>       if (caughtup && sentPtr == MyWalSnd->flush &&
>           !pq_is_send_pending())
>
> Since pq_is_send_pending() basically can only be false if the flush
> comparison is true. There's the tiny chance that we were sending a
> message out just before which is why we should include the
> !pq_is_send_pending() condition at all in that if().

Yep, I updated the patch that way. Thanks for the comment!

Regards,

--
Fujii Masao

Вложения

Re: Clean switchover

От
Fujii Masao
Дата:
On Wed, Jun 12, 2013 at 9:48 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Magnus Hagander (magnus@hagander.net) wrote:
>> On Wed, Jun 12, 2013 at 1:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > On 2013-06-12 07:53:29 +0900, Fujii Masao wrote:
>> >> The attached patch fixes this problem. It just changes walsender so that it
>> >> waits for all the outstanding WAL records to be replicated to the standby
>> >> before closing the replication connection.
>> >
>> > Imo this is a fix that needs to get backpatched... The code tried to do
>> > this but failed, I don't think it really gives grounds for valid *new*
>> > concerns.
>>
>> +1 (without having looked at the code itself, it's definitely a
>> behaviour that needs to be fixed)
>
> Yea, I was also thinking it would be reasonable to backpatch this; it
> really looks like a bug that we're allowing this to happen today.
>
> So, +1 on a backpatch for me.

+1. I think that we can backpatch to 9.1, 9.2 and 9.3.

In 9.0, the standby doesn't send back any message to the master and
there is no way to know whether replication has been done up to
the specified location, so I don't think that we can backpatch.

One note is, even if we backpatch, controlled switchover may require
the backup in order to follow the timeline switch, in 9.1 and 9.2.
If we want to avoid the backup in that case, we need to set up
the shared archive area between the master and the standby and
set recovery_target_timeline to 'latest'.

Regards,

-- 
Fujii Masao



Re: Clean switchover

От
Mark Kirkwood
Дата:
On 14/06/13 07:38, Fujii Masao wrote:
> On Wed, Jun 12, 2013 at 12:55 PM, Mark Kirkwood
> <mark.kirkwood@catalyst.net.nz> wrote:
>> On 12/06/13 13:15, Stephen Frost wrote:
>>>
>>> * Fujii Masao (masao.fujii@gmail.com) wrote:
>>>>
>>>> The attached patch fixes this problem. It just changes walsender so that
>>>> it
>>>> waits for all the outstanding WAL records to be replicated to the standby
>>>> before closing the replication connection.
>>>
>>>
>>> Seems like a good idea to me..  Rather surprised that we're not doing
>>> this already, to be honest.
>>>
>>
>> Yeah +1 from here too. This would make clean switchovers for (typically)
>> testing scenarios a lot less complex and resource intensive (rebuilding of
>> the old master as a slave when you know it is ok is despairing on a huge
>> db).
>>
>> On the related note (but not actually to do with this patch),
>> clarifying/expanding the docs about the various methods for standby
>> promotion:
>>
>> 1/ trigger file creation
>> 2/ pg_ctl promote
>> 3/ renaming/removing recovery.conf
>>
>> and the differences between them would be great. For instance I only
>> recently realized that method 3) means the promoted standby does not start a
>> new timeline (incidentally - could this be an option to pg_ctl promote)
>> which is very useful for (again) controlled/clean switchovers.
>
> In 9.3, you no longer need to worry about the increment of timeline
> after the promotion because the standby can automatically follow
> the timeline switch.
>
> Regards,
>

Yes - and that will be awesome. Nice work!

However for those systems still on 9.1/9.2for the time being, some 
clarification of the details/differences uisg promotion via the 1) - 3) 
would be useful I think.

Note I'm not demanding that you do it - I just happened to be thinking 
about this when I saw your patch, so though I'd mention it (I've seen 
some brief discussion about this before, but nothing concrete came out 
of that in terms of documentation additions).

If folks are keen, I'm willing to attempt to find a suitable place in 
the docs to add a discussion about 1) - 3) above. However I was kinda 
hoping that someone who knows all the details would... fro instance I've 
skimmed the code and note that the system "knows" when it is promoted 
via trigger file vs pg_ctl... but as to what if any differences that 
makes to the resulting actions (other than removing the trigger file) - 
I am not clear on.

Cheers

Mark



Re: Clean switchover

От
Andres Freund
Дата:
On 2013-06-14 04:56:15 +0900, Fujii Masao wrote:
> On Wed, Jun 12, 2013 at 9:48 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Magnus Hagander (magnus@hagander.net) wrote:
> >> On Wed, Jun 12, 2013 at 1:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> > On 2013-06-12 07:53:29 +0900, Fujii Masao wrote:
> >> >> The attached patch fixes this problem. It just changes walsender so that it
> >> >> waits for all the outstanding WAL records to be replicated to the standby
> >> >> before closing the replication connection.
> >> >
> >> > Imo this is a fix that needs to get backpatched... The code tried to do
> >> > this but failed, I don't think it really gives grounds for valid *new*
> >> > concerns.
> >>
> >> +1 (without having looked at the code itself, it's definitely a
> >> behaviour that needs to be fixed)
> >
> > Yea, I was also thinking it would be reasonable to backpatch this; it
> > really looks like a bug that we're allowing this to happen today.
> >
> > So, +1 on a backpatch for me.
> 
> +1. I think that we can backpatch to 9.1, 9.2 and 9.3.

I marked the patch as ready for committer.

> In 9.0, the standby doesn't send back any message to the master and
> there is no way to know whether replication has been done up to
> the specified location, so I don't think that we can backpatch.

Agreed. 9.0 doesn't have enough infrastructure for this.

> One note is, even if we backpatch, controlled switchover may require
> the backup in order to follow the timeline switch, in 9.1 and 9.2.
> If we want to avoid the backup in that case, we need to set up
> the shared archive area between the master and the standby and
> set recovery_target_timeline to 'latest'.

Fixing this seems outside the scope of this patch... - and rather
unlikely to be backpatchable.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Clean switchover

От
Fujii Masao
Дата:
On Mon, Jun 24, 2013 at 3:41 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-06-14 04:56:15 +0900, Fujii Masao wrote:
>> On Wed, Jun 12, 2013 at 9:48 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > * Magnus Hagander (magnus@hagander.net) wrote:
>> >> On Wed, Jun 12, 2013 at 1:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> >> > On 2013-06-12 07:53:29 +0900, Fujii Masao wrote:
>> >> >> The attached patch fixes this problem. It just changes walsender so that it
>> >> >> waits for all the outstanding WAL records to be replicated to the standby
>> >> >> before closing the replication connection.
>> >> >
>> >> > Imo this is a fix that needs to get backpatched... The code tried to do
>> >> > this but failed, I don't think it really gives grounds for valid *new*
>> >> > concerns.
>> >>
>> >> +1 (without having looked at the code itself, it's definitely a
>> >> behaviour that needs to be fixed)
>> >
>> > Yea, I was also thinking it would be reasonable to backpatch this; it
>> > really looks like a bug that we're allowing this to happen today.
>> >
>> > So, +1 on a backpatch for me.
>>
>> +1. I think that we can backpatch to 9.1, 9.2 and 9.3.
>
> I marked the patch as ready for committer.

Committed. Thanks a lot!

Regards,

-- 
Fujii Masao