Обсуждение: [ADMIN] Shutdown Order with Primary/Standby?

Поиск
Список
Период
Сортировка

[ADMIN] Shutdown Order with Primary/Standby?

От
Don Seiler
Дата:
Afternoon,

Just wondering if there's a best practice in regards to shutdown of primary and standbys, as I approach my first planned maintenance task as a newbie PostgreSQL DBA. I found one older post that suggested doing a fast-mode shutdown on primary first so that all transaction info can be cleanly pushed to standby, then shutdown the standby. Our big app group runs on 9.2.18 and is being upgraded to 9.2.22.

Extending that scenario, if you had a cascading standby, would you go primary -> upstream standby -> downstream standby?

Curious what folks think.

Don.

--
Don Seiler
www.seiler.us

Re: [ADMIN] Shutdown Order with Primary/Standby?

От
Scott Whitney
Дата:

Exactly what I would do, yes.


From: pgsql-admin-owner@postgresql.org <pgsql-admin-owner@postgresql.org> on behalf of Don Seiler <don@seiler.us>
Sent: Thursday, August 10, 2017 12:59 PM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] Shutdown Order with Primary/Standby?
 

Extending that scenario, if you had a cascading standby, would you go primary -> upstream standby -> downstream standby?





Journyx, Inc.
7600 Burnet Road #300
Austin, TX 78757
www.journyx.com

p 512.834.8888 
f 512-834-8858 

Do you receive our promotional emails? Click here or visit http://journyx.com/communication-preferences to unsubscribe.

Re: [ADMIN] Shutdown Order with Primary/Standby?

От
Scott Marlowe
Дата:
On Thu, Aug 10, 2017 at 11:59 AM, Don Seiler <don@seiler.us> wrote:
> Afternoon,
>
> Just wondering if there's a best practice in regards to shutdown of primary
> and standbys, as I approach my first planned maintenance task as a newbie
> PostgreSQL DBA. I found one older post that suggested doing a fast-mode
> shutdown on primary first so that all transaction info can be cleanly pushed
> to standby, then shutdown the standby. Our big app group runs on 9.2.18 and
> is being upgraded to 9.2.22.
>
> Extending that scenario, if you had a cascading standby, would you go
> primary -> upstream standby -> downstream standby?

Set keep wal segments to something largish (1000 or so) well before
the upgrade etc. Make sure the volume holding pg_xlog can hold
1000*16MB of data. This ensures the streaming replicant can catch up
if some stuff happens before it's back up.

Best practices are to first eliminate client access to the db cluster
so there's no updates going on up to the last second and all that. A
great tool for this is pgbouncer, which can pause its connections
while you do the upgrade.

After that, it really doesn't matter the order, but yes, I generally
shut down the source, then the destination machines.


Re: [ADMIN] Shutdown Order with Primary/Standby?

От
Don Seiler
Дата:
Yes we'll be shutting down the app servers in advance.

On Aug 10, 2017 1:13 PM, "Scott Marlowe" <scott.marlowe@gmail.com> wrote:
On Thu, Aug 10, 2017 at 11:59 AM, Don Seiler <don@seiler.us> wrote:
> Afternoon,
>
> Just wondering if there's a best practice in regards to shutdown of primary
> and standbys, as I approach my first planned maintenance task as a newbie
> PostgreSQL DBA. I found one older post that suggested doing a fast-mode
> shutdown on primary first so that all transaction info can be cleanly pushed
> to standby, then shutdown the standby. Our big app group runs on 9.2.18 and
> is being upgraded to 9.2.22.
>
> Extending that scenario, if you had a cascading standby, would you go
> primary -> upstream standby -> downstream standby?

Set keep wal segments to something largish (1000 or so) well before
the upgrade etc. Make sure the volume holding pg_xlog can hold
1000*16MB of data. This ensures the streaming replicant can catch up
if some stuff happens before it's back up.

Best practices are to first eliminate client access to the db cluster
so there's no updates going on up to the last second and all that. A
great tool for this is pgbouncer, which can pause its connections
while you do the upgrade.

After that, it really doesn't matter the order, but yes, I generally
shut down the source, then the destination machines.

Re: [ADMIN] Shutdown Order with Primary/Standby?

От
Don Seiler
Дата:
On Thu, Aug 10, 2017 at 1:13 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Set keep wal segments to something largish (1000 or so) well before
the upgrade etc. Make sure the volume holding pg_xlog can hold
1000*16MB of data. This ensures the streaming replicant can catch up
if some stuff happens before it's back up.

If we have both primary and standby down at the same time, would this really still be necessary? FWIW right now ours is set to keep 128.

Also, going back to my original question. Once both are down, is it best practice to perform patching/upgrades on the standby first (starting furthest downstream if cascading)? e.g. patch/upgrade the standby (via standard CentOS7 yum from the repo), then start the standby DB and verify nothing has broken, then do the same to the upstream or primary?

Don.

--
Don Seiler
www.seiler.us

Re: [ADMIN] Shutdown Order with Primary/Standby?

От
Scott Marlowe
Дата:
On Fri, Aug 11, 2017 at 8:10 AM, Don Seiler <don@seiler.us> wrote:
> On Thu, Aug 10, 2017 at 1:13 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> Set keep wal segments to something largish (1000 or so) well before
>> the upgrade etc. Make sure the volume holding pg_xlog can hold
>> 1000*16MB of data. This ensures the streaming replicant can catch up
>> if some stuff happens before it's back up.
>
>
> If we have both primary and standby down at the same time, would this really
> still be necessary? FWIW right now ours is set to keep 128.

It's nice in case the replica hangs or acts up, you won't have to
resubscribe it should it run through the first 128 wal segments etc.

> Also, going back to my original question. Once both are down, is it best
> practice to perform patching/upgrades on the standby first (starting
> furthest downstream if cascading)? e.g. patch/upgrade the standby (via
> standard CentOS7 yum from the repo), then start the standby DB and verify
> nothing has broken, then do the same to the upstream or primary?

Lots of ways to do this. One way is to make a replicant, remove it
from the cluster and fail it over and then upgrade it and test the
crap out of it distinctly from production. Load test, proper behavior
testing, and so on.

If you're doing minor upgrades, you don't need to bring down the whole
cluster, you can take a single replica out and upgrade pg on it, then
put it back in the cluster. Again, if you have enough kept wal
segments, or you use replication slots, then it will just catch right
up.


Re: [ADMIN] Shutdown Order with Primary/Standby?

От
Don Seiler
Дата:
On Fri, Aug 11, 2017 at 11:07 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
If you're doing minor upgrades, you don't need to bring down the whole
cluster, you can take a single replica out and upgrade pg on it, then
put it back in the cluster. Again, if you have enough kept wal
segments, or you use replication slots, then it will just catch right
up.

What do you mean by "take it out of the cluster" and "put it back in the cluster"? Shut it down and rename the recovery.conf? Or more complexity than that? Why would that be needed at all?

--
Don Seiler
www.seiler.us

Re: [ADMIN] Shutdown Order with Primary/Standby?

От
Scott Marlowe
Дата:
On Fri, Aug 11, 2017 at 2:20 PM, Don Seiler <don@seiler.us> wrote:
> On Fri, Aug 11, 2017 at 11:07 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> If you're doing minor upgrades, you don't need to bring down the whole
>> cluster, you can take a single replica out and upgrade pg on it, then
>> put it back in the cluster. Again, if you have enough kept wal
>> segments, or you use replication slots, then it will just catch right
>> up.
>
>
> What do you mean by "take it out of the cluster" and "put it back in the
> cluster"? Shut it down and rename the recovery.conf? Or more complexity than
> that? Why would that be needed at all?

Yeah, basically touch the the file defined to make it fail over so it
becomes its own master.

Then test upgrade it and test the crap out of it separate from
production. If all works fine, then resubscribe it and now you know
your upgrades in production should work.