Обсуждение: adding more space to the existing server
Hello postgres folks,
We're tossing around the idea of upgrading a replicated postgres cluster (37 dbs) by breaking the replication, adding different size (larger) data disks to the hot-spare, then turning replication back on, letting it fully populate, then breaking replication, making the standby the primary, upgrade the disks on the other system, bring it back up, replicate backwards until fully replicated then failing-back to the original primary. Is this feasible?
Our current size is 22 tb, and it is 97% full
(PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.3.1-14ubuntu2) 5.3.1 20160413, 64-bit) 
Thank you for your suggestions
			
		
			
				 On 7/31/19 5:21 PM, Julie Nishimura wrote:
Can we correctly infer that you aren't using LVM? (Or in there no more room on the rack/controller for new drives?)
		
	P {margin-top:0;margin-bottom:0;} Hello postgres folks,We're tossing around the idea of upgrading a replicated postgres cluster (37 dbs) by breaking the replication, adding different size (larger) data disks to the hot-spare, then turning replication back on, letting it fully populate, then breaking replication, making the standby the primary, upgrade the disks on the other system, bring it back up, replicate backwards until fully replicated then failing-back to the original primary. Is this feasible?Our current size is 22 tb, and it is 97% full(PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.3.1-14ubuntu2) 5.3.1 20160413, 64-bit)Thank you for your suggestions
Can we correctly infer that you aren't using LVM? (Or in there no more room on the rack/controller for new drives?)
-- 
Angular momentum makes the world go 'round.
			
		Angular momentum makes the world go 'round.
On 7/31/19 3:21 PM, Julie Nishimura wrote: > Hello postgres folks, > > We're tossing around the idea of upgrading a replicated postgres cluster > (37 dbs) by breaking the replication, adding different size (larger) > data disks to the hot-spare, then turning replication back on, letting > it fully populate, then breaking replication, making the standby the > primary, upgrade the disks on the other system, bring it back up, > replicate backwards until fully replicated then failing-back to the > original primary. Is this feasible? To make a determination more information would be helpful: 1) What is the method of replication? 2) What is the expected downtime for the disk upgrades? 3) Dependent on 1). Is there sufficient storage to hold the data until it can be replicated back? 4) How is the data on the existing disks going to get to the new disks? > > Our current size is 22 tb, and it is 97% full > (PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu > 5.3.1-14ubuntu2) 5.3.1 20160413, 64-bit) > > > Thank you for your suggestions -- Adrian Klaver adrian.klaver@aklaver.com
Thanks for your replies.
1) We use streaming replication, and due to hardware limitation, we cannot add more drives to the existing host. That is why we thought by breaking the existing streaming replication (from a->b), instead of currently identical standby (b), we can introduce twice larger host, then start the replication to the newly larger host, and when it is caught up, break it again. Then break rep again, make modification to 'a" host, making it larger, then replicate b->a. After it is caught up, break the rep again, switch master->standby (if necessary).
2) I am not sure about the time, but it is understood it is required 2 full replication cycles, and might be up to 2 weeks with no standby situation
3) Yes, we will clean up whatever we can to buy us time
4) by pg_basebackup and restore
As of now, we are thinking about possibly other solutions, as of splitting existing 37 databases on the cluster into 2 hosts with their own standbys. This solution requires breaking up existing replication as well. Can you please point me to some document which lists all steps describing breaking up the existing replication properly? we are using 9.6 postgres
Thank you!
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Thursday, August 1, 2019 7:28 AM
To: Julie Nishimura <juliezain@hotmail.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: adding more space to the existing server 
Sent: Thursday, August 1, 2019 7:28 AM
To: Julie Nishimura <juliezain@hotmail.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: adding more space to the existing server
On 7/31/19 3:21 PM, Julie Nishimura wrote:
> Hello postgres folks,
>
> We're tossing around the idea of upgrading a replicated postgres cluster
> (37 dbs) by breaking the replication, adding different size (larger)
> data disks to the hot-spare, then turning replication back on, letting
> it fully populate, then breaking replication, making the standby the
> primary, upgrade the disks on the other system, bring it back up,
> replicate backwards until fully replicated then failing-back to the
> original primary. Is this feasible?
To make a determination more information would be helpful:
1) What is the method of replication?
2) What is the expected downtime for the disk upgrades?
3) Dependent on 1). Is there sufficient storage to hold the data until
it can be replicated back?
4) How is the data on the existing disks going to get to the new disks?
>
> Our current size is 22 tb, and it is 97% full
> (PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
> 5.3.1-14ubuntu2) 5.3.1 20160413, 64-bit)
>
>
> Thank you for your suggestions
--
Adrian Klaver
adrian.klaver@aklaver.com
> Hello postgres folks,
>
> We're tossing around the idea of upgrading a replicated postgres cluster
> (37 dbs) by breaking the replication, adding different size (larger)
> data disks to the hot-spare, then turning replication back on, letting
> it fully populate, then breaking replication, making the standby the
> primary, upgrade the disks on the other system, bring it back up,
> replicate backwards until fully replicated then failing-back to the
> original primary. Is this feasible?
To make a determination more information would be helpful:
1) What is the method of replication?
2) What is the expected downtime for the disk upgrades?
3) Dependent on 1). Is there sufficient storage to hold the data until
it can be replicated back?
4) How is the data on the existing disks going to get to the new disks?
>
> Our current size is 22 tb, and it is 97% full
> (PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
> 5.3.1-14ubuntu2) 5.3.1 20160413, 64-bit)
>
>
> Thank you for your suggestions
--
Adrian Klaver
adrian.klaver@aklaver.com
On 8/2/19 12:45 PM, Julie Nishimura wrote:
> Thanks for your replies.
> 
> 1) We use streaming replication, and due to hardware limitation, we 
> cannot add more drives to the existing host. That is why we thought by 
> breaking the existing streaming replication (from a->b), instead of 
> currently identical standby (b), we can introduce twice larger host, 
> then start the replication to the newly larger host, and when it is 
> caught up, break it again. Then break rep again, make modification to 
> 'a" host, making it larger, then replicate b->a. After it is caught up, 
> break the rep again, switch master->standby (if necessary).
Is this new host going to be an entirely new machine?
If so and there is a place for it why not something like:
1)
    a) --> b) --> c)(new host)
                   Seed using pg_basebackup
                    Replicate in background
2)
    Take down a)
    Promote either b) or c) to primary and the other as standby.
3)
    Bring up a) and have it replicate off the standby in 2) and then 
decide which of a) or c) is the primary or standby.
> 
> 2) I am not sure about the time, but it is understood it is required 2 
> full replication cycles, and might be up to 2 weeks with no standby 
> situation
> 
> 3) Yes, we will clean up whatever we can to buy us time
> 
> 4) by pg_basebackup and restore
> 
> As of now, we are thinking about possibly other solutions, as of 
> splitting existing 37 databases on the cluster into 2 hosts with their 
> own standbys. This solution requires breaking up existing replication as 
> well. Can you please point me to some document which lists all steps 
> describing breaking up the existing replication properly? we are using 
> 9.6 postgres
> 
> 
> Thank you!
> 
-- 
Adrian Klaver
adrian.klaver@aklaver.com
			
		> On 2 Aug 2019, at 21:45, Julie Nishimura <juliezain@hotmail.com> wrote: > 1) We use streaming replication, and due to hardware limitation, we cannot add more drives to the existing host. That iswhy we thought by breaking the existing streaming replication (from a->b), instead of currently identical standby (b),we can introduce twice larger host, then start the replication to the newly larger host, and when it is caught up, breakit again. Then break rep again, make modification to 'a" host, making it larger, then replicate b->a. After it is caughtup, break the rep again, switch master->standby (if necessary). Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only done (streaming) replication once and I managedto mess that up in a minor way (disabled the wrong service during failover, so data still went to the database I wasattempting to replace for, like, 15 minutes). > 2) I am not sure about the time, but it is understood it is required 2 full replication cycles, and might be up to 2 weekswith no standby situation No standby situation? Murphy is probably just waiting for that to strike… I recall a fairly recent story on the FreeBSD ML about someone on Malta doing a migration of a couple dozen terabytes fromher main server (because of some failing disks in her RAID set) using her backup server to move data around (with backupsremoved to make room), when, due to an accident outside the building, an aerial 10KV power line hit another powerline in the ground, causing a fire in one UPS and frying the other one. Losing power at that point meant that the filesystems (ZFS) on both servers ended up in an unrecoverable state with no backups. It didn’t help that the UPS’s wereat the bottom of the rack, with the heat and smoke going up into the servers. What are the chances, right? (And thenit turned out that it is really hard to try to recover data from a ZFS file system in such a state, which is what heractual inquiry was about) I would definitely prefer to add a 3rd machine into the mix, even if it were just a temporary machine - a rental perhaps? From there, I’m certain Adrian knows more about replication than I do. I’d go with the approach he suggested. > 4) by pg_basebackup and restore > > As of now, we are thinking about possibly other solutions, as of splitting existing 37 databases on the cluster into 2hosts with their own standbys. This solution requires breaking up existing replication as well. Can you please point meto some document which lists all steps describing breaking up the existing replication properly? we are using 9.6 postgres I’m going to assume that you will have data coming in while this split is taking place and that you therefore cannot offlinethe entire set of databases for as long as this takes. If not, that would probably allow for a simpler (faster) scenario. I think the easiest for this scenario would be to add two more machines (c and d) and replicate them off the current setup.You want that to happen as parallel as possible, so perhaps replicate c off a and d off b. If you aren’t already using “replication slots”, I found that to make things both easier to understand and more reliable.You can query their status, for one thing. Those replicas will take extra time of course (about double) because you’re replicating twice what you need, but I don’tthink you can replicate parts of a cluster with your setup unless you go for a different replication approach (I thinkper database replication requires statement level replication?). After that, decouple both sets into: a —> b (your current machine) c —> d (the new ones) (Although any order should be fine, really, as long as they have caught up.) At that point I would probably (temporarily) pause replication in at least one set and create a backup of that. This is the point to start removing superfluous databases from a and c (so that a+c make up the complete set again). After verifying that no databases are missing, unpause replication. If instead you find that you accidentally removed a database from both a and c, you still have replicas to recover it from.And the backups, of course, but that will not contain the data that came in after replication was paused. I do hope the remaining 3% disk space is enough to cover all that, though... Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Alban,  thank you for your reply. Your suggestion makes sense, and I will be talking to our engineers about it. Currently we need to understand:
a)       How do we break A -> B replication such that both can become independent primaries
b)      How do we reassign C from B->C replication to A->C replication
c)       Thoughts on why this isn’t a good plan
Current: 
 A replicates to B
all requests go to A
Soon: 
  A replicates to B -> cascading to C and D
Transition: 
  break A replication to B such that both can become primary
  stop B replication to C  then setup A to replicate to C
End state:
A replicates to C
B replicates to D
I hope it all makes sense...
Thank you
From: Alban Hertroys <haramrae@gmail.com>
Sent: Saturday, August 3, 2019 3:15 AM
To: Julie Nishimura <juliezain@hotmail.com>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: adding more space to the existing server 
Sent: Saturday, August 3, 2019 3:15 AM
To: Julie Nishimura <juliezain@hotmail.com>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: adding more space to the existing server
> On 2 Aug 2019, at 21:45, Julie Nishimura <juliezain@hotmail.com> wrote:
> 1) We use streaming replication, and due to hardware limitation, we cannot add more drives to the existing host. That is why we thought by breaking the existing streaming replication (from a->b), instead of currently identical standby (b), we can introduce twice larger host, then start the replication to the newly larger host, and when it is caught up, break it again. Then break rep again, make modification to 'a" host, making it larger, then replicate b->a. After it is caught up, break the rep again, switch master->standby (if necessary).
Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only done (streaming) replication once and I managed to mess that up in a minor way (disabled the wrong service during failover, so data still went to the database I was attempting to replace for, like, 15 minutes).
> 2) I am not sure about the time, but it is understood it is required 2 full replication cycles, and might be up to 2 weeks with no standby situation
No standby situation? Murphy is probably just waiting for that to strike…
I recall a fairly recent story on the FreeBSD ML about someone on Malta doing a migration of a couple dozen terabytes from her main server (because of some failing disks in her RAID set) using her backup server to move data around (with backups removed to make room), when, due to an accident outside the building, an aerial 10KV power line hit another power line in the ground, causing a fire in one UPS and frying the other one. Losing power at that point meant that the file systems (ZFS) on both servers ended up in an unrecoverable state with no backups. It didn’t help that the UPS’s were at the bottom of the rack, with the heat and smoke going up into the servers. What are the chances, right? (And then it turned out that it is really hard to try to recover data from a ZFS file system in such a state, which is what her actual inquiry was about)
I would definitely prefer to add a 3rd machine into the mix, even if it were just a temporary machine - a rental perhaps?
From there, I’m certain Adrian knows more about replication than I do. I’d go with the approach he suggested.
> 4) by pg_basebackup and restore
>
> As of now, we are thinking about possibly other solutions, as of splitting existing 37 databases on the cluster into 2 hosts with their own standbys. This solution requires breaking up existing replication as well. Can you please point me to some document which lists all steps describing breaking up the existing replication properly? we are using 9.6 postgres
I’m going to assume that you will have data coming in while this split is taking place and that you therefore cannot offline the entire set of databases for as long as this takes. If not, that would probably allow for a simpler (faster) scenario.
I think the easiest for this scenario would be to add two more machines (c and d) and replicate them off the current setup. You want that to happen as parallel as possible, so perhaps replicate c off a and d off b.
If you aren’t already using “replication slots”, I found that to make things both easier to understand and more reliable. You can query their status, for one thing.
Those replicas will take extra time of course (about double) because you’re replicating twice what you need, but I don’t think you can replicate parts of a cluster with your setup unless you go for a different replication approach (I think per database replication requires statement level replication?).
After that, decouple both sets into:
a —> b (your current machine)
c —> d (the new ones)
(Although any order should be fine, really, as long as they have caught up.)
At that point I would probably (temporarily) pause replication in at least one set and create a backup of that.
This is the point to start removing superfluous databases from a and c (so that a+c make up the complete set again).
After verifying that no databases are missing, unpause replication.
If instead you find that you accidentally removed a database from both a and c, you still have replicas to recover it from. And the backups, of course, but that will not contain the data that came in after replication was paused.
I do hope the remaining 3% disk space is enough to cover all that, though...
Regards,
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
On 5 Aug 2019, at 0:39, Julie Nishimura <juliezain@hotmail.com> wrote:
Alban, thank you for your reply. Your suggestion makes sense, and I will be talking to our engineers about it. Currently we need to understand:
a) How do we break A -> B replication such that both can become independent primaries
That is pretty much like normal failover from A to B, except that you don’t reverse replication.
You will need to stop your clients from sending data for a bit (if it’s continuous data, having a buffer in between is a big help - at our company we’re looking into Apache Kafka for that), so that you can switch half of them to connect to B instead of A.
Next, you promote B to master. I used the docs for that last time, and they were pretty clear on the subject.
b) How do we reassign C from B->C replication to A->C replication
I don’t think you need to. If you indeed already have A->B->C, after promoting B to master, you end up with B->C, which is alright.
You just need to add A->D for the other set.
c) Thoughts on why this isn’t a good plan
That depends on your clients and how you decide which database in the current cluster they connect to. If you connect specific clients to specific databases, then all you need to do is to configure half your clients to connect to B instead.
Another option is to put a virtual database layer in front, such that both clusters still look like a single database to the outside world. We have some experience with Dremio for similar purposes (although for read-only reporting). Mind that the community edition doesn’t do authorization.
Current:Correct.
A replicates to B
all requests go to A
Soon:
A replicates to B -> cascading to C and D
Transition:
break A replication to B such that both can become primary
stop B replication to C then setup A to replicate to CI would change this in:
   setup A to replicate to D
End state:
End state:
A replicates to C
B replicates to D
A replicates to D
B replicates to C
we remove some of the dbs from A and B, then reassign the traffic based on db selections
I hope it all makes sense...
Thank you
It does to me. Now would be a good time for people to chime in if they don't agree ;)
From: Alban Hertroys <haramrae@gmail.com>
Sent: Saturday, August 3, 2019 3:15 AM
To: Julie Nishimura <juliezain@hotmail.com>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: adding more space to the existing server
> On 2 Aug 2019, at 21:45, Julie Nishimura <juliezain@hotmail.com> wrote:
> 1) We use streaming replication, and due to hardware limitation, we cannot add more drives to the existing host. That is why we thought by breaking the existing streaming replication (from a->b), instead of currently identical standby (b), we can introduce twice larger host, then start the replication to the newly larger host, and when it is caught up, break it again. Then break rep again, make modification to 'a" host, making it larger, then replicate b->a. After it is caught up, break the rep again, switch master->standby (if necessary).
Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only done (streaming) replication once and I managed to mess that up in a minor way (disabled the wrong service during failover, so data still went to the database I was attempting to replace for, like, 15 minutes).
> 2) I am not sure about the time, but it is understood it is required 2 full replication cycles, and might be up to 2 weeks with no standby situation
No standby situation? Murphy is probably just waiting for that to strike…
I recall a fairly recent story on the FreeBSD ML about someone on Malta doing a migration of a couple dozen terabytes from her main server (because of some failing disks in her RAID set) using her backup server to move data around (with backups removed to make room), when, due to an accident outside the building, an aerial 10KV power line hit another power line in the ground, causing a fire in one UPS and frying the other one. Losing power at that point meant that the file systems (ZFS) on both servers ended up in an unrecoverable state with no backups. It didn’t help that the UPS’s were at the bottom of the rack, with the heat and smoke going up into the servers. What are the chances, right? (And then it turned out that it is really hard to try to recover data from a ZFS file system in such a state, which is what her actual inquiry was about)
I would definitely prefer to add a 3rd machine into the mix, even if it were just a temporary machine - a rental perhaps?
From there, I’m certain Adrian knows more about replication than I do. I’d go with the approach he suggested.
> 4) by pg_basebackup and restore
>
> As of now, we are thinking about possibly other solutions, as of splitting existing 37 databases on the cluster into 2 hosts with their own standbys. This solution requires breaking up existing replication as well. Can you please point me to some document which lists all steps describing breaking up the existing replication properly? we are using 9.6 postgres
I’m going to assume that you will have data coming in while this split is taking place and that you therefore cannot offline the entire set of databases for as long as this takes. If not, that would probably allow for a simpler (faster) scenario.
I think the easiest for this scenario would be to add two more machines (c and d) and replicate them off the current setup. You want that to happen as parallel as possible, so perhaps replicate c off a and d off b.
If you aren’t already using “replication slots”, I found that to make things both easier to understand and more reliable. You can query their status, for one thing.
Those replicas will take extra time of course (about double) because you’re replicating twice what you need, but I don’t think you can replicate parts of a cluster with your setup unless you go for a different replication approach (I think per database replication requires statement level replication?).
After that, decouple both sets into:
a —> b (your current machine)
c —> d (the new ones)
(Although any order should be fine, really, as long as they have caught up.)
At that point I would probably (temporarily) pause replication in at least one set and create a backup of that.
This is the point to start removing superfluous databases from a and c (so that a+c make up the complete set again).
After verifying that no databases are missing, unpause replication.
If instead you find that you accidentally removed a database from both a and c, you still have replicas to recover it from. And the backups, of course, but that will not contain the data that came in after replication was paused.
I do hope the remaining 3% disk space is enough to cover all that, though...
Regards,
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
Alban Hertroys
--
There is always an exception to always.
--
There is always an exception to always.
 Thanks for your reply Alban. Currently we only have A->B replication. Is adding B->C replication difficult? I remember in the past I tried to seed pg_basebackup from hot standby, and it was erroring out after awhile, so needed to switch to run from master.
From: Alban Hertroys <haramrae@gmail.com>
Sent: Monday, August 5, 2019 5:01 AM
To: Julie Nishimura <juliezain@hotmail.com>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: adding more space to the existing server 
Sent: Monday, August 5, 2019 5:01 AM
To: Julie Nishimura <juliezain@hotmail.com>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: adding more space to the existing server
On 5 Aug 2019, at 0:39, Julie Nishimura <juliezain@hotmail.com> wrote:
Alban, thank you for your reply. Your suggestion makes sense, and I will be talking to our engineers about it. Currently we need to understand:
a) How do we break A -> B replication such that both can become independent primaries
That is pretty much like normal failover from A to B, except that you don’t reverse replication.
You will need to stop your clients from sending data for a bit (if it’s continuous data, having a buffer in between is a big help - at our company we’re looking into Apache Kafka for that), so that you can switch half of them to connect to B instead of A.
Next, you promote B to master. I used the docs for that last time, and they were pretty clear on the subject.
b) How do we reassign C from B->C replication to A->C replication
I don’t think you need to. If you indeed already have A->B->C, after promoting B to master, you end up with B->C, which is alright.
You just need to add A->D for the other set.
c) Thoughts on why this isn’t a good plan
That depends on your clients and how you decide which database in the current cluster they connect to. If you connect specific clients to specific databases, then all you need to do is to configure half your clients to connect to B instead.
Another option is to put a virtual database layer in front, such that both clusters still look like a single database to the outside world. We have some experience with Dremio for similar purposes (although for read-only reporting). Mind that the community edition doesn’t do authorization.
Current:Correct.
A replicates to B
all requests go to A
Soon:
A replicates to B -> cascading to C and D
Transition:
break A replication to B such that both can become primary
stop B replication to C then setup A to replicate to CI would change this in:
    setup A to replicate to D
End state:
End state:
A replicates to C
B replicates to D
 A replicates to D
 B replicates to C
we remove some of the dbs from A and B, then reassign the traffic based on db selections
I hope it all makes sense...
Thank you
 It does to me. Now would be a good time for people to chime in if they don't agree ;)
From: Alban Hertroys <haramrae@gmail.com>
Sent: Saturday, August 3, 2019 3:15 AM
To: Julie Nishimura <juliezain@hotmail.com>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: adding more space to the existing server
> On 2 Aug 2019, at 21:45, Julie Nishimura <juliezain@hotmail.com> wrote:
> 1) We use streaming replication, and due to hardware limitation, we cannot add more drives to the existing host. That is why we thought by breaking the existing streaming replication (from a->b), instead of currently identical standby (b), we can introduce twice larger host, then start the replication to the newly larger host, and when it is caught up, break it again. Then break rep again, make modification to 'a" host, making it larger, then replicate b->a. After it is caught up, break the rep again, switch master->standby (if necessary).
Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only done (streaming) replication once and I managed to mess that up in a minor way (disabled the wrong service during failover, so data still went to the database I was attempting to replace for, like, 15 minutes).
> 2) I am not sure about the time, but it is understood it is required 2 full replication cycles, and might be up to 2 weeks with no standby situation
No standby situation? Murphy is probably just waiting for that to strike…
I recall a fairly recent story on the FreeBSD ML about someone on Malta doing a migration of a couple dozen terabytes from her main server (because of some failing disks in her RAID set) using her backup server to move data around (with backups removed to make room), when, due to an accident outside the building, an aerial 10KV power line hit another power line in the ground, causing a fire in one UPS and frying the other one. Losing power at that point meant that the file systems (ZFS) on both servers ended up in an unrecoverable state with no backups. It didn’t help that the UPS’s were at the bottom of the rack, with the heat and smoke going up into the servers. What are the chances, right? (And then it turned out that it is really hard to try to recover data from a ZFS file system in such a state, which is what her actual inquiry was about)
I would definitely prefer to add a 3rd machine into the mix, even if it were just a temporary machine - a rental perhaps?
From there, I’m certain Adrian knows more about replication than I do. I’d go with the approach he suggested.
> 4) by pg_basebackup and restore
>
> As of now, we are thinking about possibly other solutions, as of splitting existing 37 databases on the cluster into 2 hosts with their own standbys. This solution requires breaking up existing replication as well. Can you please point me to some document which lists all steps describing breaking up the existing replication properly? we are using 9.6 postgres
I’m going to assume that you will have data coming in while this split is taking place and that you therefore cannot offline the entire set of databases for as long as this takes. If not, that would probably allow for a simpler (faster) scenario.
I think the easiest for this scenario would be to add two more machines (c and d) and replicate them off the current setup. You want that to happen as parallel as possible, so perhaps replicate c off a and d off b.
If you aren’t already using “replication slots”, I found that to make things both easier to understand and more reliable. You can query their status, for one thing.
Those replicas will take extra time of course (about double) because you’re replicating twice what you need, but I don’t think you can replicate parts of a cluster with your setup unless you go for a different replication approach (I think per database replication requires statement level replication?).
After that, decouple both sets into:
a —> b (your current machine)
c —> d (the new ones)
(Although any order should be fine, really, as long as they have caught up.)
At that point I would probably (temporarily) pause replication in at least one set and create a backup of that.
This is the point to start removing superfluous databases from a and c (so that a+c make up the complete set again).
After verifying that no databases are missing, unpause replication.
If instead you find that you accidentally removed a database from both a and c, you still have replicas to recover it from. And the backups, of course, but that will not contain the data that came in after replication was paused.
I do hope the remaining 3% disk space is enough to cover all that, though...
Regards,
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
Alban Hertroys
--
There is always an exception to always.
--
There is always an exception to always.
> On 5 Aug 2019, at 17:27, Julie Nishimura <juliezain@hotmail.com> wrote: > > Thanks for your reply Alban. Currently we only have A->B replication. Is adding B->C replication difficult? I rememberin the past I tried to seed pg_basebackup from hot standby, and it was erroring out after awhile, so needed to switchto run from master. I’ve never used multi-level replication (or cascading replication, as it’s called in the PG docs). I expect that replication slots (w/ streaming replication) would be resilient to streaming to multiple levels, provided thatyou have the disk space left to keep the logs that your farthest-behind replica is at, but I do not know that. I’m prettysure that’s being done though and not just theoretically possible. The basic problem you have is a bit like Towers of Hanoi. The details of setting up each replica may be a little complicated,but if you look at it from a high-level design phase, you’re just shuffling around clusters. The details comelater, when you’re actually designing how to apply those replicas/clusters. One of the main problems is that creating a new replica takes a lot of time, you want to minimise the total time that takes.Apparently, you have neither C nor D ready yet, so you need to create two replicas - being able to do those in parallelinstead of sequentially would save you time. The other problem is that, to be on the safe side, you want to have a full replica of A at any point in time. If you disconnectB from A before either C or D is complete, you run a risk: If A fails, you don’t have the data that accumulatedwhile B was ‘offline’. So that’s not the best scenario. That is why I think your initial set of replicas should look like: A — B — C \ D IIRC, streaming replication uses pg_basebackup to create the initial replica and then it streams what’s needed to catch up.With replication slots, the master knows what the slaves still need, so it won’t clean up too early. Apparently, the slave(B) knows that it needs to retain data for C as well. It looks perfectly safe on paper, except for the replication lagbetween A — B. You can, according to the docs, cascade replicas from each other and that even allows replication (among the slaves) to goon after the master gets disconnected - quite what you need for the B — C chain, I would think. Take a look at: https://www.postgresql.org/docs/11/warm-standby.html#STREAMING-REPLICATION And note the sections on Replication Slots and on Cascading Replication. The replication lag between A — B can be solved by pausing the clients connecting to A (half of them need to be changed toB anyway) while B is catching up on its lag. You probably have at least that much down-time to change the connections anyway. Regards, Alban. > From: Alban Hertroys <haramrae@gmail.com> > Sent: Monday, August 5, 2019 5:01 AM > To: Julie Nishimura <juliezain@hotmail.com> > Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>;pgsql-general <pgsql-general@postgresql.org> > Subject: Re: adding more space to the existing server > > >> On 5 Aug 2019, at 0:39, Julie Nishimura <juliezain@hotmail.com> wrote: >> >> Alban, thank you for your reply. Your suggestion makes sense, and I will be talking to our engineers about it. Currentlywe need to understand: >> >> a) How do we break A -> B replication such that both can become independent primaries > > That is pretty much like normal failover from A to B, except that you don’t reverse replication. > > You will need to stop your clients from sending data for a bit (if it’s continuous data, having a buffer in between isa big help - at our company we’re looking into Apache Kafka for that), so that you can switch half of them to connect toB instead of A. > > Next, you promote B to master. I used the docs for that last time, and they were pretty clear on the subject. > > >> b) How do we reassign C from B->C replication to A->C replication > > I don’t think you need to. If you indeed already have A->B->C, after promoting B to master, you end up with B->C, whichis alright. > You just need to add A->D for the other set. > >> c) Thoughts on why this isn’t a good plan > > That depends on your clients and how you decide which database in the current cluster they connect to. If you connect specificclients to specific databases, then all you need to do is to configure half your clients to connect to B instead. > > Another option is to put a virtual database layer in front, such that both clusters still look like a single database tothe outside world. We have some experience with Dremio for similar purposes (although for read-only reporting). Mind thatthe community edition doesn’t do authorization. > >> Current: >> A replicates to B >> all requests go to A >> >> Soon: >> A replicates to B -> cascading to C and D >> >> Transition: >> break A replication to B such that both can become primary > Correct. >> stop B replication to C then setup A to replicate to C > I would change this in: > setup A to replicate to D >> >> End state: >> A replicates to C >> B replicates to D >> > End state: > A replicates to D > B replicates to C >> >> we remove some of the dbs from A and B, then reassign the traffic based on db selections >> >> I hope it all makes sense... >> >> Thank you > It does to me. Now would be a good time for people to chime in if they don't agree ;) > >> >> From: Alban Hertroys <haramrae@gmail.com> >> Sent: Saturday, August 3, 2019 3:15 AM >> To: Julie Nishimura <juliezain@hotmail.com> >> Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>;pgsql-general <pgsql-general@postgresql.org> >> Subject: Re: adding more space to the existing server >> >> >> > On 2 Aug 2019, at 21:45, Julie Nishimura <juliezain@hotmail.com> wrote: >> >> > 1) We use streaming replication, and due to hardware limitation, we cannot add more drives to the existing host. Thatis why we thought by breaking the existing streaming replication (from a->b), instead of currently identical standby(b), we can introduce twice larger host, then start the replication to the newly larger host, and when it is caughtup, break it again. Then break rep again, make modification to 'a" host, making it larger, then replicate b->a. Afterit is caught up, break the rep again, switch master->standby (if necessary). >> >> Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only done (streaming) replication once and I managedto mess that up in a minor way (disabled the wrong service during failover, so data still went to the database I wasattempting to replace for, like, 15 minutes). >> >> > 2) I am not sure about the time, but it is understood it is required 2 full replication cycles, and might be up to 2weeks with no standby situation >> >> No standby situation? Murphy is probably just waiting for that to strike… >> I recall a fairly recent story on the FreeBSD ML about someone on Malta doing a migration of a couple dozen terabytesfrom her main server (because of some failing disks in her RAID set) using her backup server to move data around(with backups removed to make room), when, due to an accident outside the building, an aerial 10KV power line hit anotherpower line in the ground, causing a fire in one UPS and frying the other one. Losing power at that point meant thatthe file systems (ZFS) on both servers ended up in an unrecoverable state with no backups. It didn’t help that the UPS’swere at the bottom of the rack, with the heat and smoke going up into the servers. What are the chances, right? (Andthen it turned out that it is really hard to try to recover data from a ZFS file system in such a state, which is whather actual inquiry was about) >> >> I would definitely prefer to add a 3rd machine into the mix, even if it were just a temporary machine - a rental perhaps? >> >> From there, I’m certain Adrian knows more about replication than I do. I’d go with the approach he suggested. >> >> > 4) by pg_basebackup and restore >> > >> > As of now, we are thinking about possibly other solutions, as of splitting existing 37 databases on the cluster into2 hosts with their own standbys. This solution requires breaking up existing replication as well. Can you please pointme to some document which lists all steps describing breaking up the existing replication properly? we are using 9.6postgres >> >> I’m going to assume that you will have data coming in while this split is taking place and that you therefore cannot offlinethe entire set of databases for as long as this takes. If not, that would probably allow for a simpler (faster) scenario. >> >> I think the easiest for this scenario would be to add two more machines (c and d) and replicate them off the current setup.You want that to happen as parallel as possible, so perhaps replicate c off a and d off b. >> >> If you aren’t already using “replication slots”, I found that to make things both easier to understand and more reliable.You can query their status, for one thing. >> >> Those replicas will take extra time of course (about double) because you’re replicating twice what you need, but I don’tthink you can replicate parts of a cluster with your setup unless you go for a different replication approach (I thinkper database replication requires statement level replication?). >> >> After that, decouple both sets into: >> a —> b (your current machine) >> c —> d (the new ones) >> >> (Although any order should be fine, really, as long as they have caught up.) >> >> At that point I would probably (temporarily) pause replication in at least one set and create a backup of that. >> >> This is the point to start removing superfluous databases from a and c (so that a+c make up the complete set again). >> After verifying that no databases are missing, unpause replication. >> >> If instead you find that you accidentally removed a database from both a and c, you still have replicas to recover itfrom. And the backups, of course, but that will not contain the data that came in after replication was paused. >> >> I do hope the remaining 3% disk space is enough to cover all that, though... >> >> Regards, >> >> Alban Hertroys >> -- >> If you can't see the forest for the trees, >> cut the trees and you'll find there is no forest. > > Alban Hertroys > -- > There is always an exception to always. Alban Hertroys -- There is always an exception to always.
Thanks a lot! Very helpful Sent from my iPhone > On Aug 5, 2019, at 1:50 PM, Alban Hertroys <haramrae@gmail.com> wrote: > > >> On 5 Aug 2019, at 17:27, Julie Nishimura <juliezain@hotmail.com> wrote: >> >> Thanks for your reply Alban. Currently we only have A->B replication. Is adding B->C replication difficult? I rememberin the past I tried to seed pg_basebackup from hot standby, and it was erroring out after awhile, so needed to switchto run from master. > > I’ve never used multi-level replication (or cascading replication, as it’s called in the PG docs). > > I expect that replication slots (w/ streaming replication) would be resilient to streaming to multiple levels, providedthat you have the disk space left to keep the logs that your farthest-behind replica is at, but I do not know that.I’m pretty sure that’s being done though and not just theoretically possible. > > The basic problem you have is a bit like Towers of Hanoi. The details of setting up each replica may be a little complicated,but if you look at it from a high-level design phase, you’re just shuffling around clusters. The details comelater, when you’re actually designing how to apply those replicas/clusters. > > One of the main problems is that creating a new replica takes a lot of time, you want to minimise the total time that takes.Apparently, you have neither C nor D ready yet, so you need to create two replicas - being able to do those in parallelinstead of sequentially would save you time. > > The other problem is that, to be on the safe side, you want to have a full replica of A at any point in time. If you disconnectB from A before either C or D is complete, you run a risk: If A fails, you don’t have the data that accumulatedwhile B was ‘offline’. So that’s not the best scenario. > > That is why I think your initial set of replicas should look like: > > A — B — C > \ > D > > IIRC, streaming replication uses pg_basebackup to create the initial replica and then it streams what’s needed to catchup. With replication slots, the master knows what the slaves still need, so it won’t clean up too early. Apparently,the slave (B) knows that it needs to retain data for C as well. It looks perfectly safe on paper, except for thereplication lag between A — B. > > You can, according to the docs, cascade replicas from each other and that even allows replication (among the slaves) togo on after the master gets disconnected - quite what you need for the B — C chain, I would think. > > Take a look at: https://www.postgresql.org/docs/11/warm-standby.html#STREAMING-REPLICATION > And note the sections on Replication Slots and on Cascading Replication. > > The replication lag between A — B can be solved by pausing the clients connecting to A (half of them need to be changedto B anyway) while B is catching up on its lag. You probably have at least that much down-time to change the connectionsanyway. > > Regards, > > Alban. > > >> From: Alban Hertroys <haramrae@gmail.com> >> Sent: Monday, August 5, 2019 5:01 AM >> To: Julie Nishimura <juliezain@hotmail.com> >> Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>;pgsql-general <pgsql-general@postgresql.org> >> Subject: Re: adding more space to the existing server >> >> >>> On 5 Aug 2019, at 0:39, Julie Nishimura <juliezain@hotmail.com> wrote: >>> >>> Alban, thank you for your reply. Your suggestion makes sense, and I will be talking to our engineers about it. Currentlywe need to understand: >>> >>> a) How do we break A -> B replication such that both can become independent primaries >> >> That is pretty much like normal failover from A to B, except that you don’t reverse replication. >> >> You will need to stop your clients from sending data for a bit (if it’s continuous data, having a buffer in between isa big help - at our company we’re looking into Apache Kafka for that), so that you can switch half of them to connect toB instead of A. >> >> Next, you promote B to master. I used the docs for that last time, and they were pretty clear on the subject. >> >> >>> b) How do we reassign C from B->C replication to A->C replication >> >> I don’t think you need to. If you indeed already have A->B->C, after promoting B to master, you end up with B->C, whichis alright. >> You just need to add A->D for the other set. >> >>> c) Thoughts on why this isn’t a good plan >> >> That depends on your clients and how you decide which database in the current cluster they connect to. If you connectspecific clients to specific databases, then all you need to do is to configure half your clients to connect to Binstead. >> >> Another option is to put a virtual database layer in front, such that both clusters still look like a single databaseto the outside world. We have some experience with Dremio for similar purposes (although for read-only reporting).Mind that the community edition doesn’t do authorization. >> >>> Current: >>> A replicates to B >>> all requests go to A >>> >>> Soon: >>> A replicates to B -> cascading to C and D >>> >>> Transition: >>> break A replication to B such that both can become primary >> Correct. >>> stop B replication to C then setup A to replicate to C >> I would change this in: >> setup A to replicate to D >>> >>> End state: >>> A replicates to C >>> B replicates to D >>> >> End state: >> A replicates to D >> B replicates to C >>> >>> we remove some of the dbs from A and B, then reassign the traffic based on db selections >>> >>> I hope it all makes sense... >>> >>> Thank you >> It does to me. Now would be a good time for people to chime in if they don't agree ;) >> >>> >>> From: Alban Hertroys <haramrae@gmail.com> >>> Sent: Saturday, August 3, 2019 3:15 AM >>> To: Julie Nishimura <juliezain@hotmail.com> >>> Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>;pgsql-general <pgsql-general@postgresql.org> >>> Subject: Re: adding more space to the existing server >>> >>> >>>>> On 2 Aug 2019, at 21:45, Julie Nishimura <juliezain@hotmail.com> wrote: >>>> >>>> 1) We use streaming replication, and due to hardware limitation, we cannot add more drives to the existing host. Thatis why we thought by breaking the existing streaming replication (from a->b), instead of currently identical standby(b), we can introduce twice larger host, then start the replication to the newly larger host, and when it is caughtup, break it again. Then break rep again, make modification to 'a" host, making it larger, then replicate b->a. Afterit is caught up, break the rep again, switch master->standby (if necessary). >>> >>> Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only done (streaming) replication once and I managedto mess that up in a minor way (disabled the wrong service during failover, so data still went to the database I wasattempting to replace for, like, 15 minutes). >>> >>>> 2) I am not sure about the time, but it is understood it is required 2 full replication cycles, and might be up to 2weeks with no standby situation >>> >>> No standby situation? Murphy is probably just waiting for that to strike… >>> I recall a fairly recent story on the FreeBSD ML about someone on Malta doing a migration of a couple dozen terabytesfrom her main server (because of some failing disks in her RAID set) using her backup server to move data around(with backups removed to make room), when, due to an accident outside the building, an aerial 10KV power line hit anotherpower line in the ground, causing a fire in one UPS and frying the other one. Losing power at that point meant thatthe file systems (ZFS) on both servers ended up in an unrecoverable state with no backups. It didn’t help that the UPS’swere at the bottom of the rack, with the heat and smoke going up into the servers. What are the chances, right? (Andthen it turned out that it is really hard to try to recover data from a ZFS file system in such a state, which is whather actual inquiry was about) >>> >>> I would definitely prefer to add a 3rd machine into the mix, even if it were just a temporary machine - a rental perhaps? >>> >>> From there, I’m certain Adrian knows more about replication than I do. I’d go with the approach he suggested. >>> >>>> 4) by pg_basebackup and restore >>>> >>>> As of now, we are thinking about possibly other solutions, as of splitting existing 37 databases on the cluster into2 hosts with their own standbys. This solution requires breaking up existing replication as well. Can you please pointme to some document which lists all steps describing breaking up the existing replication properly? we are using 9.6postgres >>> >>> I’m going to assume that you will have data coming in while this split is taking place and that you therefore cannotoffline the entire set of databases for as long as this takes. If not, that would probably allow for a simpler (faster)scenario. >>> >>> I think the easiest for this scenario would be to add two more machines (c and d) and replicate them off the currentsetup. You want that to happen as parallel as possible, so perhaps replicate c off a and d off b. >>> >>> If you aren’t already using “replication slots”, I found that to make things both easier to understand and more reliable.You can query their status, for one thing. >>> >>> Those replicas will take extra time of course (about double) because you’re replicating twice what you need, but I don’tthink you can replicate parts of a cluster with your setup unless you go for a different replication approach (I thinkper database replication requires statement level replication?). >>> >>> After that, decouple both sets into: >>> a —> b (your current machine) >>> c —> d (the new ones) >>> >>> (Although any order should be fine, really, as long as they have caught up.) >>> >>> At that point I would probably (temporarily) pause replication in at least one set and create a backup of that. >>> >>> This is the point to start removing superfluous databases from a and c (so that a+c make up the complete set again). >>> After verifying that no databases are missing, unpause replication. >>> >>> If instead you find that you accidentally removed a database from both a and c, you still have replicas to recover itfrom. And the backups, of course, but that will not contain the data that came in after replication was paused. >>> >>> I do hope the remaining 3% disk space is enough to cover all that, though... >>> >>> Regards, >>> >>> Alban Hertroys >>> -- >>> If you can't see the forest for the trees, >>> cut the trees and you'll find there is no forest. >> >> Alban Hertroys >> -- >> There is always an exception to always. > > Alban Hertroys > -- > There is always an exception to always. > > > >