Обсуждение: Streaming Replication Networking Best Practices?

Поиск
Список
Период
Сортировка

Streaming Replication Networking Best Practices?

От
Don Seiler
Дата:
Postgres 9.6.6. Primary has a local (HA) replica and a remote (DR) replica.

I've done a couple of big data purges the past few weeks. This past weekend I ran a DELETE & VACUUM that took all of 8 minutes. The local replica kept up just fine, but the remote replica lagged and broke streaming replication after just a few minutes. We have WAL archives sent via NetApp mirroring to back that up.

However I'd like to know if there are any optimal networking settings on the host or network that we maybe missing. My manager says that the circuit between data centers was only 60% utilized at its peak.

In the past I've tried increasing wal_keep_files, which keeps the WAL files available for streaming but the fact remains that they stream very slowly so the lag just gets worse than if we fell back to archives every 30 minutes or so.

I have no basis for this other than my previous experience with Oracle physical standbys, but I would think that streaming replication should be able to push more than it seems to be doing in my prod environment. The fact that the local replica keeps up just fine without breaking streaming replication tells me that the problem is in the cross-datacenter circuit, not in postgres recovery performance.

If anyone has any advice on host networking setup, tuning or testing, I'd love to hear it.

Thanks,
Don.

--
Don Seiler
www.seiler.us

Re: Streaming Replication Networking Best Practices?

От
Flavio Henrique Araque Gurgel
Дата:


Em seg, 14 de mai de 2018 às 18:12, Don Seiler <don@seiler.us> escreveu:
Postgres 9.6.6. Primary has a local (HA) replica and a remote (DR) replica.

I've done a couple of big data purges the past few weeks. This past weekend I ran a DELETE & VACUUM that took all of 8 minutes. The local replica kept up just fine, but the remote replica lagged and broke streaming replication after just a few minutes. We have WAL archives sent via NetApp mirroring to back that up.

However I'd like to know if there are any optimal networking settings on the host or network that we maybe missing. My manager says that the circuit between data centers was only 60% utilized at its peak.

In the past I've tried increasing wal_keep_files, which keeps the WAL files available for streaming but the fact remains that they stream very slowly so the lag just gets worse than if we fell back to archives every 30 minutes or so.

I have no basis for this other than my previous experience with Oracle physical standbys, but I would think that streaming replication should be able to push more than it seems to be doing in my prod environment. The fact that the local replica keeps up just fine without breaking streaming replication tells me that the problem is in the cross-datacenter circuit, not in postgres recovery performance.

If anyone has any advice on host networking setup, tuning or testing, I'd love to hear it.

If you're running 9.6, you can use replication slots to avoid to mess with wal_keep_segments [1]
Be aware that not only network bandwidth and latency are responsable for that behaviour, wal_receiver, disk write capability on your standby can be bottlenecks too.
It happens to me in local networks with 10Gbps capable hardware.

[1] https://www.postgresql.org/docs/9.6/static/warm-standby.html#STREAMING-REPLICATION-SLOTS

Flavio Gurgel

Re: Streaming Replication Networking Best Practices?

От
Don Seiler
Дата:
On Mon, May 14, 2018 at 11:17 AM, Flavio Henrique Araque Gurgel <fhagur@gmail.com> wrote:

If you're running 9.6, you can use replication slots to avoid to mess with wal_keep_segments [1]

I don't think replication slots would help here. As I mentioned in my original post, I changed wal_keep_files so that the WAL files weren't deleted too soon, but the streaming was still slow and lagged even further behind. My understanding is replication slots just serve to keep the WAL files in place. If I did that then they'd probably fill up the primary disk since the DR replica would take too long to process them all.
 
Be aware that not only network bandwidth and latency are responsable for that behaviour, wal_receiver, disk write capability on your standby can be bottlenecks too.
It happens to me in local networks with 10Gbps capable hardware.

Worth looking into. I would have assumed that since our local replica handles the storage I/O just fine that the replica would since they use the same model of hardware for server and storage array. I'll make sure that my assumptions are correct though and see if anything else is up there.

Thanks!
Don.

--
Don Seiler
www.seiler.us

Re: Streaming Replication Networking Best Practices?

От
Flavio Henrique Araque Gurgel
Дата:


Em seg, 14 de mai de 2018 às 18:22, Don Seiler <don@seiler.us> escreveu:
On Mon, May 14, 2018 at 11:17 AM, Flavio Henrique Araque Gurgel <fhagur@gmail.com> wrote:

If you're running 9.6, you can use replication slots to avoid to mess with wal_keep_segments [1]

I don't think replication slots would help here. As I mentioned in my original post, I changed wal_keep_files so that the WAL files weren't deleted too soon, but the streaming was still slow and lagged even further behind. My understanding is replication slots just serve to keep the WAL files in place. If I did that then they'd probably fill up the primary disk since the DR replica would take too long to process them all.

Are you sure that wal streaming from primary is the main cause of replication lag?
Take a look at the pg_stat_replication view and compare values of sent, write and flush locations. If flush lags behind more than sent or write locations, queries running on your standby server may need rows that have been cleaned up by your vacuum process on your master and replication is held until those queries finish. If it's the case you may consider increasing parameters like vacuum_defer_cleanup_age (be aware that already deleted/updated rows will remain dead longer on your master) or consider not vacuuming too soon (you may need to modify autovacuum parameters if it's the case)
 
 
Be aware that not only network bandwidth and latency are responsable for that behaviour, wal_receiver, disk write capability on your standby can be bottlenecks too.
It happens to me in local networks with 10Gbps capable hardware.

Worth looking into. I would have assumed that since our local replica handles the storage I/O just fine that the replica would since they use the same model of hardware for server and storage array. I'll make sure that my assumptions are correct though and see if anything else is up there.

If that's the case the sent_location will lag above all others.

Flavio Gurgel

Re: Streaming Replication Networking Best Practices?

От
Don Seiler
Дата:
On Mon, May 14, 2018 at 11:33 AM, Flavio Henrique Araque Gurgel <fhagur@gmail.com> wrote:

Are you sure that wal streaming from primary is the main cause of replication lag?
Take a look at the pg_stat_replication view and compare values of sent, write and flush locations. If flush lags behind more than sent or write locations, queries running on your standby server may need rows that have been cleaned up by your vacuum process on your master and replication is held until those queries finish. If it's the case you may consider increasing parameters like vacuum_defer_cleanup_age (be aware that already deleted/updated rows will remain dead longer on your master) or consider not vacuuming too soon (you may need to modify autovacuum parameters if it's the case)

Pretty sure. The DR replica had no other activity and by HBA rule we don't allow any user activity here. There is a second cascading replica that developers can connect to for queries. It seems very plain at the time that the DR replica is just not receiving the WAL information fast enough. Once it has them it recover the changes very quickly.

wal_keep_files is set to 128 and it seems to blow through that in a minute or two when I ran the purge.


--
Don Seiler
www.seiler.us

Re: Streaming Replication Networking Best Practices?

От
Johannes Truschnigg
Дата:
On Mon, May 14, 2018 at 11:11:40AM -0500, Don Seiler wrote:
> Postgres 9.6.6. Primary has a local (HA) replica and a remote (DR) replica.
> [...]
> However I'd like to know if there are any optimal networking settings on
> the host or network that we maybe missing. My manager says that the circuit
> between data centers was only 60% utilized at its peak.

That actually hints at your network link/TCP performance _not_ being the
problem, I think.


Do you happen to have historical host-monitoring data available for when the
replication interruption happened? You should definitely check for CPU (on
both sides) and I/O (on the receiver/secondary) saturation.

I remember when we first set up streaming replication initially, back then
under postgres 9.0, the replication connection defaulted to using TLS/SSL; at
the time with SSL/TLS compression enabled. The huge extra work that this
incurred on the CPUs involved regularly made the WAL sender on the primary
break streaming replication because it couldn't possibly keep up with the data
that was being pushed into it encrypted & compressed TCP connection over a 10G
link. (Linux's excellent perf tool proved invaluable in determining the exact
cause for the high CPU load inside the postgres processes; once we had
re-compiled OpenSSL without compression, the problem went away.)

Now of course modern TLS library versions don't implement compression any
more, and the streaming ciphers are most probably hardware accelerated for
your combination of hard- and software, but the lesson we learned back then
may still be worth keeping in mind...


Other than that... have you verified that the network link between your hosts
can actually live up to you and your manager's expectations in terms of
bandwidth delivered? iperf3 could help verify that; if the measured bandwidth
for a single TCP stream lives up to what you'd expect, you can probably rule
out network-related concerns and concentrate on looking at other potential
bottlenecks.

--
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   https://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.

Вложения

Re: Streaming Replication Networking Best Practices?

От
Don Seiler
Дата:
On Mon, May 14, 2018 at 1:31 PM, Johannes Truschnigg <johannes@truschnigg.info> wrote:

Do you happen to have historical host-monitoring data available for when the
replication interruption happened? You should definitely check for CPU (on
both sides) and I/O (on the receiver/secondary) saturation.

We do have grafana and zenoss info going way back, I'll see if I can get a login there.
 
I remember when we first set up streaming replication initially, back then
under postgres 9.0, the replication connection defaulted to using TLS/SSL; at
the time with SSL/TLS compression enabled. The huge extra work that this
incurred on the CPUs involved regularly made the WAL sender on the primary
break streaming replication because it couldn't possibly keep up with the data
that was being pushed into it encrypted & compressed TCP connection over a 10G
link. (Linux's excellent perf tool proved invaluable in determining the exact
cause for the high CPU load inside the postgres processes; once we had
re-compiled OpenSSL without compression, the problem went away.)

Now of course modern TLS library versions don't implement compression any
more, and the streaming ciphers are most probably hardware accelerated for
your combination of hard- and software, but the lesson we learned back then
may still be worth keeping in mind...

Very interesting read. I just re-examined all of our settings in postgresql.conf, pg_hba.con and recovery.conf and we don't have SSL enabled anywhere there. I'm going to assume that this isn't a bottleneck in our case then.
 
Other than that... have you verified that the network link between your hosts
can actually live up to you and your manager's expectations in terms of
bandwidth delivered? iperf3 could help verify that; if the measured bandwidth
for a single TCP stream lives up to what you'd expect, you can probably rule
out network-related concerns and concentrate on looking at other potential
bottlenecks.

Thanks, I'll play around with some of these tools.

Don.

--
Don Seiler
www.seiler.us

Re: Streaming Replication Networking Best Practices?

От
Rui DeSousa
Дата:
What is the latency between the two sites?  That is going to dictate how much potential throughput a given TCP/IP
streamcould use.  You may need tune your TCP/IP window to increase the throughput. 
I would check the network to ensure there are no errors occurring; i.e. dropped packets and retransmits.

Do you have QOS enabled and if so how is that handled? Does it just drop the packets? As that would cause the TCP/IP to
backoffand retransmit. 




Re: Streaming Replication Networking Best Practices?

От
Don Seiler
Дата:
On Mon, May 14, 2018 at 2:53 PM, Rui DeSousa <rui.desousa@icloud.com> wrote:
What is the latency between the two sites?  That is going to dictate how much potential throughput a given TCP/IP stream could use.  You may need tune your TCP/IP window to increase the throughput.

Ping shows a pretty solid latency of 49-50 ms.

47 packets transmitted, 47 received, 0% packet loss, time 46057ms
rtt min/avg/max/mdev = 49.125/49.260/49.682/0.289 ms
 
I would check the network to ensure there are no errors occurring; i.e. dropped packets and retransmits.

Looking at ifconfig, I do see dropped RX packets on the iface that talks to the replica. I don't see any dropped packets on the replica iface that talks to the primary. The local LAN iface for both also has many dropped packets.
 
Do you have QOS enabled and if so how is that handled? Does it just drop the packets? As that would cause the TCP/IP to backoff and retransmit.

I'll follow-up with my sys admin team here. I believe there is QoS enabled but don't have details yet.

Don.

--
Don Seiler
www.seiler.us

Re: Streaming Replication Networking Best Practices?

От
Peter Eisentraut
Дата:
On 5/14/18 16:22, Don Seiler wrote:
> On Mon, May 14, 2018 at 2:53 PM, Rui DeSousa <rui.desousa@icloud.com
> <mailto:rui.desousa@icloud.com>> wrote:
> 
>     What is the latency between the two sites?  That is going to dictate
>     how much potential throughput a given TCP/IP stream could use.  You
>     may need tune your TCP/IP window to increase the throughput.
> 
> 
> Ping shows a pretty solid latency of 49-50 ms.

You might find that increasing the TCP receive buffer on the receiving
side will help.

Here is a bit of an explanation:

https://stackoverflow.com/questions/259553/tcp-is-it-possible-to-achieve-higher-transfer-rate-with-multiple-connections

I believe this mechanism is the reason (or at least one of them) why
using parallel rsync (for base backup) or parallel copying of archive
segments is sometime faster than straight streaming.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Streaming Replication Networking Best Practices?

От
Rui DeSousa
Дата:

> On May 14, 2018, at 4:22 PM, Don Seiler <don@seiler.us> wrote:
>
> Ping shows a pretty solid latency of 49-50 ms.
>

Depending on your window size; that will determine the throughput (not to exceed the link speed).

TCP Window / latency = throughput; i.e. if the windows is 16kb with 50 ms latency the max throughput is 640kb/s or
5.12Mbps

16/.05 = 640kb/s

The window size will vary during the connection; it will start off small and then progress to your max setting.  Now
youneed to determine how large you window can get and what is it actually doing; that will require observing the
connectionvia tcpdump or similar tools.  Dropped packets will prevent the window size from increasing and also cause
retransmits.



Re: Streaming Replication Networking Best Practices?

От
Don Seiler
Дата:
On Mon, May 14, 2018 at 3:33 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
You might find that increasing the TCP receive buffer on the receiving
side will help.

This got me wondering. I know Oracle requires minimums for some kernel settings like net.core.rmem_default and rmem_max and similar for wmem, ie:

  • net.core.rmem_default = 262144
  • net.core.rmem_max = 4194304
  • net.core.wmem_default = 262144
  • net.core.wmem_max = 1048576

Looking at my primary and standby, they look to still be the CentOS defaults for CentOS7 and CentOS6.

Primary (CentOS7)
  • net.core.rmem_default = 212992
  • net.core.rmem_max = 212992
  • net.ipv4.tcp_rmem = 4096        87380   6291456
  • net.core.wmem_default = 212992
  • net.core.wmem_max = 212992
  • net.ipv4.tcp_wmem = 4096        16384   4194304

Standby (CentOS6)
  • net.core.rmem_default = 124928
  • net.core.rmem_max = 124928
  • net.ipv4.tcp_rmem = 4096        87380   4194304
  • net.core.wmem_default = 124928
  • net.core.wmem_max = 124928
  • net.ipv4.tcp_wmem = 4096        16384   4194304
FWIW: these machines are VMWare with 8 cores and over 100GB of memory. I'm assuming we have gigabit ethernet within the datacenter but the circuit between the two is 200 Mbps.
--
Don Seiler
www.seiler.us