Обсуждение: [ADMIN] Standby Mechanics: WAL vs Streaming

Поиск
Список
Период
Сортировка

[ADMIN] Standby Mechanics: WAL vs Streaming

От
Don Seiler
Дата:
Primary is 9.2.18, standby is 9.2.10. I know, I know. We're upgrading both to 9.2.22 in the next maintenance window. Standby is configured for streaming replication with the primary_conninfo parameter set. The primary has a bash script for archive_command that copies the WAL file to NFS and then rsync's the file to the replicas.

After some index rebuilds and manual vaccum today, the standby started lagging by a couple of hours. After some time I noticed that the WAL files being rsync'ed were still way behind where the standby was recovered to. For example, pg_last_xact_replay_timestamp would be at 2PM but the WAL files being rsynced hadn't even made it past noon yet.

So that's when I did remember that this is streaming replication, so the standby is getting the xlog records directly from the primary. The WAL files themselves aren't used at all by the standby, but the standby was cleaning them up, presumably because it saw that the data in them was applied hours ago.

So I'm writing just to get a confirmation or correction of my understanding of the process here, as well as some suggestions. Here's my first swing:
  1. Transaction completes on primary, xlog record is created. When the standby needs it, it pulls it from the primary (using the primary_conninfo credentials).
  2. When a WAL file is archived, our own script runs an rsync command to push it to the standby.
  3. The standby will recover from the streaming xlog data if local WAL files are not available. (So the standby prefers local WAL files to streaming?)
  4. If WAL files are available in an archive dir, restore_command is used to restore those files to the pg_xlog directory and the archive file removed.
  5. When a WAL file is no longer needed for recovery, the standby will delete them from the pg_xlog directory.
Assuming I'm right so far, I have these questions:
  1. I saw that the WAL archives were being removed from the archive directory. If the standby was already ahead of the WAL files because of streaming replication, would it still use the restore_command to restore WAL files from archive to px_log directory? Or would it just delete the archive files?
  2. As far as the lag goes, what can we do to increase the throughput for streaming replication? We tried to give it almost all of our 100Mb pipe between data centers and there was still significant lag until it recovered past the point of my manual vacuums.
  3. The archive_command on the primary seemed to run serial and wait, since it took 4-5 seconds to rsync each 16Mb WAL file from primary to standby. Is there any way to parallelize this? I guess we could just have archive_command do a local copy and then have a separate script in cron to handle some multiple rsync calls (with a max concurrency, of course)?
  4. I'm going to assume that having both streaming replication and WAL rsyncing are going to compete for bandwidth. Would it be advisable to only do streaming most of the day and then maybe rsync WAL files in a nightly batch? Now that I think of it, we do hourly snapmirrors of that WAL storage already (also using the same bandwidth), perhaps we just use that for the WAL replication.
I'd love to know how others might be handling this. If there's any important information that I've left out, please let me know.

Don.
--
Don Seiler
www.seiler.us

Re: [ADMIN] Standby Mechanics: WAL vs Streaming

От
MichaelDBA
Дата:
pgbackrest archives WALs faster than a custom archive_command command.  I use pgbackrest to automate archiving of WALs from primary to pgbackrest server.  From there, my restore_command on the slave pulls WALs if needed to catch up to SR (streaming replication).

Some of your other questions....

The standby will recover from the streaming xlog data if local WAL files are not available. (So the standby prefers local WAL files to streaming?)
The other way around: The standby prefers streaming replication.  It only pulls WALs (restore_command) if it is too far behind.

I saw that the WAL archives were being removed from the archive directory. If the standby was already ahead of the WAL files because of streaming replication, would it still use the restore_command to restore WAL files from archive to px_log directory? Or would it just delete the archive files?
The archived files in your case (stored on the standby) would only be removed if you had  pg_archivecleanup command also in your recovery.conf file for WAL cleanup (archive_cleanup_command).

As far as the lag goes, what can we do to increase the throughput for streaming replication? We tried to give it almost all of our 100Mb pipe between data centers and there was still significant lag until it recovered past the point of my manual vacuums.
What do you expect?  Manual vacuum full on big tables will generate a lot of WAL thereby causing SR lag.

The archive_command on the primary seemed to run serial and wait, since it took 4-5 seconds to rsync each 16Mb WAL file from primary to standby. Is there any way to parallelize this? I guess we could just have archive_command do a local copy and then have a separate script in cron to handle some multiple rsync calls (with a max concurrency, of course)?
See my solution using pgbackrest above.

I'm going to assume that having both streaming replication and WAL rsyncing are going to compete for bandwidth. Would it be advisable to only do streaming most of the day and then maybe rsync WAL files in a nightly batch? Now that I think of it, we do hourly snapmirrors of that WAL storage already (also using the same bandwidth), perhaps we just use that for the WAL replication.
How about doing your manual vacuums at night time during low load times?

Regards,
Michael Vitale

Re: [ADMIN] Standby Mechanics: WAL vs Streaming

От
Don Seiler
Дата:
On Sat, Aug 26, 2017 at 9:42 PM, MichaelDBA <MichaelDBA@sqlexec.com> wrote:
pgbackrest archives WALs faster than a custom archive_command command.  I use pgbackrest to automate archiving of WALs from primary to pgbackrest server.  From there, my restore_command on the slave pulls WALs if needed to catch up to SR (streaming replication).

Interesting, I'll look into pgbackrest. Thanks!
 
The standby will recover from the streaming xlog data if local WAL files are not available. (So the standby prefers local WAL files to streaming?)
The other way around: The standby prefers streaming replication.  It only pulls WALs (restore_command) if it is too far behind.

OK. That's what I had though initially but for some reason convinced myself that it would think it's easier to get it locally when available.

 
I saw that the WAL archives were being removed from the archive directory. If the standby was already ahead of the WAL files because of streaming replication, would it still use the restore_command to restore WAL files from archive to px_log directory? Or would it just delete the archive files?
The archived files in your case (stored on the standby) would only be removed if you had  pg_archivecleanup command also in your recovery.conf file for WAL cleanup (archive_cleanup_command).

Yes we do have that, sorry I didn't specify. I just wanted to confirm that it would be used to clean up archives that arrive well after their usefulness with streaming replication.

 
As far as the lag goes, what can we do to increase the throughput for streaming replication? We tried to give it almost all of our 100Mb pipe between data centers and there was still significant lag until it recovered past the point of my manual vacuums.
What do you expect?  Manual vacuum full on big tables will generate a lot of WAL thereby causing SR lag.

As I've said in other recent emails to this list, I'm relatively new to these waters, having spent the last 15+ years as an Oracle DBA. I figured that manual vacuum would generate a lot of changes for replication. That's why I'm asking where I can improve the throughput (if possible), given the additional activity spike.
 
I'm going to assume that having both streaming replication and WAL rsyncing are going to compete for bandwidth. Would it be advisable to only do streaming most of the day and then maybe rsync WAL files in a nightly batch? Now that I think of it, we do hourly snapmirrors of that WAL storage already (also using the same bandwidth), perhaps we just use that for the WAL replication.
How about doing your manual vacuums at night time during low load times?

You're assuming that night time is our off-peak time (although I did say "nightly batch", it doesn't mean it's a better time for maintenance work in our case). There's a reason I was running the vacuum when I was this morning, and it isn't something we regularly do. We had a table with 3 corrupt indexes that had prevented autovacuum from running on that table for many months. Today I recreated those indexes and ran a manual vac freeze (it was well past the min freeze age), during an off-peak time negotiated with the business side of the shop.

Thanks again for the pg_backrest information. I'll check it out. I'm thinking at the very least we could just present that NFS snapmirror as a read-only volume to the standby and point restore_command at it. That snapmirror is taking place regardless, so the additional rsync we're doing in our archive_command script on the primary is just piling on and eating up more bandwidth.

Don.
--
Don Seiler
www.seiler.us

Re: [ADMIN] Standby Mechanics: WAL vs Streaming

От
Guillaume Lelarge
Дата:
2017-08-27 5:16 GMT+02:00 Don Seiler <don@seiler.us>:
On Sat, Aug 26, 2017 at 9:42 PM, MichaelDBA <MichaelDBA@sqlexec.com> wrote:
pgbackrest archives WALs faster than a custom archive_command command.  I use pgbackrest to automate archiving of WALs from primary to pgbackrest server.  From there, my restore_command on the slave pulls WALs if needed to catch up to SR (streaming replication).

Interesting, I'll look into pgbackrest. Thanks!
 
The standby will recover from the streaming xlog data if local WAL files are not available. (So the standby prefers local WAL files to streaming?)
The other way around: The standby prefers streaming replication.  It only pulls WALs (restore_command) if it is too far behind.

OK. That's what I had though initially but for some reason convinced myself that it would think it's easier to get it locally when available.


The way PostgreSQL does it is a lot easier.

When you start the standby, it first looks for archived WAL to replay. When there's none left, it tries to connect to the primary. Once connected, it asks for a specific WAL location. If the primary has it, the streaming starts. It stays that way (using the streaming) as long as it can. If there's an error, for example the primary had a lot of writing to do, and recycle the WAL too quickly, the WAL information may have been gone before being sent to the WAL. In such a case, the primary and the standby will start displaying on the logs that kind of messages: FATAL:  requested WAL segment XXXX has already been removed. When this happens, the streaming connection between the primary and the standby is dropped, and the standby tries to replay the available archived WAL, according the ones it needs. When it finally replays them all, it tries back to connect to the primary (which hopefully will now have the WAL it needs).

So, that's kind of simple:
* first replaying archived WAL
* when no more archived WAL, streaming
* when error, back to WAL replay

 
I saw that the WAL archives were being removed from the archive directory. If the standby was already ahead of the WAL files because of streaming replication, would it still use the restore_command to restore WAL files from archive to px_log directory? Or would it just delete the archive files?
The archived files in your case (stored on the standby) would only be removed if you had  pg_archivecleanup command also in your recovery.conf file for WAL cleanup (archive_cleanup_command).

Yes we do have that, sorry I didn't specify. I just wanted to confirm that it would be used to clean up archives that arrive well after their usefulness with streaming replication.

 
As far as the lag goes, what can we do to increase the throughput for streaming replication? We tried to give it almost all of our 100Mb pipe between data centers and there was still significant lag until it recovered past the point of my manual vacuums.
What do you expect?  Manual vacuum full on big tables will generate a lot of WAL thereby causing SR lag.

As I've said in other recent emails to this list, I'm relatively new to these waters, having spent the last 15+ years as an Oracle DBA. I figured that manual vacuum would generate a lot of changes for replication. That's why I'm asking where I can improve the throughput (if possible), given the additional activity spike.
 
I'm going to assume that having both streaming replication and WAL rsyncing are going to compete for bandwidth. Would it be advisable to only do streaming most of the day and then maybe rsync WAL files in a nightly batch? Now that I think of it, we do hourly snapmirrors of that WAL storage already (also using the same bandwidth), perhaps we just use that for the WAL replication.
How about doing your manual vacuums at night time during low load times?

You're assuming that night time is our off-peak time (although I did say "nightly batch", it doesn't mean it's a better time for maintenance work in our case). There's a reason I was running the vacuum when I was this morning, and it isn't something we regularly do. We had a table with 3 corrupt indexes that had prevented autovacuum from running on that table for many months. Today I recreated those indexes and ran a manual vac freeze (it was well past the min freeze age), during an off-peak time negotiated with the business side of the shop.

Thanks again for the pg_backrest information. I'll check it out. I'm thinking at the very least we could just present that NFS snapmirror as a read-only volume to the standby and point restore_command at it. That snapmirror is taking place regardless, so the additional rsync we're doing in our archive_command script on the primary is just piling on and eating up more bandwidth.



--
Guillaume.

Re: [ADMIN] Standby Mechanics: WAL vs Streaming

От
Don Seiler
Дата:
So, that's kind of simple:
* first replaying archived WAL
* when no more archived WAL, streaming
* when error, back to WAL replay

After this last scenario, will the standby periodically retry to connect to the primary to resume streaming replication? It will it at least try it it runs out of WAL files?

Don.

Re: [ADMIN] Standby Mechanics: WAL vs Streaming

От
Guillaume Lelarge
Дата:
Le 28 août 2017 6:52 AM, "Don Seiler" <don@seiler.us> a écrit :
So, that's kind of simple:
* first replaying archived WAL
* when no more archived WAL, streaming
* when error, back to WAL replay

After this last scenario, will the standby periodically retry to connect to the primary to resume streaming replication? It will it at least try it it runs out of WAL files?

It will resume streaming when it runs out of WAL files to replay.

Re: [ADMIN] Standby Mechanics: WAL vs Streaming

От
Don Seiler
Дата:
On Mon, Aug 28, 2017 at 12:16 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
Le 28 août 2017 6:52 AM, "Don Seiler" <don@seiler.us> a écrit :
After this last scenario, will the standby periodically retry to connect to the primary to resume streaming replication? It will it at least try it it runs out of WAL files?

It will resume streaming when it runs out of WAL files to replay.

OK thanks. Good to know!

Don.

--
Don Seiler
www.seiler.us