Обсуждение: Issue with pg_basebackup v.11
Hello experts,
I am facing an issue with a customer's production server while trying to take backup using pg_basebackup.
Below is the log from pg_basebackup execution.
* 115338208/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.1)
115355616/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.1)
115372640/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.1)
115389568/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.1)
115405792/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.1)
115423776/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.1)
115440640/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.2)
115454656/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.2)
pgbasebackup: could not read COPY data: could not receive data from server: Connection timed out
pgbasebackup: removing contents of data directory "/u01/PostgreSQL/11/datastaging"*
It copied nearly 110 GB of data and exited. Initially, we suspected it as a network/OS issue. However, we tried to copy a 150 GB large file over the network, which finished successfully.
What I observed is that it takes a couple of hours between below 2 lines.
115454656/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.2)
pgbasebackup: could not read COPY data: could not receive data from server: Connection timed out
In other words, it run for an hour, and later, it takes 2 hours before it times out.
Can someone please help me out here?
Regards,
Ninad Shah
Ninad Shah <nshah.postgres@gmail.com> writes: > What I observed is that it takes a couple of hours between below 2 lines. > 115454656/1304172127 kB (8%), 0/1 tablespace > (...atastaging/base/115868/154220.2) > pgbasebackup: could not read COPY data: could not receive data from server: > Connection timed out We have heard reports of network connections dropping while pg_basebackup is busy doing something disk-intensive such as fsync'ing. The apparent 2-hour delay here does not mean that pg_basebackup was out to lunch for 2 hours; more likely that reflects the TCP timeout delay before the kernel realizes that the connection is lost. The actual blame probably resides with some firewall or router that has a short timeout for idle connections. I'd try turning on fairly aggressive TCP keepalive settings for the connection, say keepalives_idle=30 or so. regards, tom lane
Hey Tom,
Thank you for your response. Actually, when we copy data using scp/rsync, it works without any issue. But, it fails while attempting to transfer using pg_basebackup.
Would keepalive setting address and mitigate the issue?
Regards,
Ninad Shah
On Fri, 22 Oct 2021 at 21:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ninad Shah <nshah.postgres@gmail.com> writes:
> What I observed is that it takes a couple of hours between below 2 lines.
> 115454656/1304172127 kB (8%), 0/1 tablespace
> (...atastaging/base/115868/154220.2)
> pgbasebackup: could not read COPY data: could not receive data from server:
> Connection timed out
We have heard reports of network connections dropping while pg_basebackup
is busy doing something disk-intensive such as fsync'ing. The apparent
2-hour delay here does not mean that pg_basebackup was out to lunch for
2 hours; more likely that reflects the TCP timeout delay before the kernel
realizes that the connection is lost. The actual blame probably resides
with some firewall or router that has a short timeout for idle
connections.
I'd try turning on fairly aggressive TCP keepalive settings for the
connection, say keepalives_idle=30 or so.
regards, tom lane
Ninad Shah <nshah.postgres@gmail.com> writes: > Would keepalive setting address and mitigate the issue? [ shrug... ] Maybe; nobody else has more information about this situation than you do. I suggested something to experiment with. regards, tom lane
Thanks Tom.
Regards,
Ninad Shah
On Sat, 23 Oct 2021 at 20:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ninad Shah <nshah.postgres@gmail.com> writes:
> Would keepalive setting address and mitigate the issue?
[ shrug... ] Maybe; nobody else has more information about this
situation than you do. I suggested something to experiment with.
regards, tom lane