Обсуждение: Failed archive_command copy - number of attempts configurable?
Hi all, I'm doing some testing of Postgres 9.0 archiving and streaming replication between a couple of Solaris 10 servers. Recently I was trying to test how well the standby server catches up after an outage, and a question arose. It seems that if the standby is uncontactable by the primary when it is attempting WAL archiving, the primary will attempt the copy three times, then log that the log file could not be archived, as there were too many failures. See: ssh: connect to host 172.18.131.212 port 22: Connection timed out^M lost connection LOG: archive command failed with exit code 1 DETAIL: The failed archive command was: scp pg_xlog/000000010000000000000006 postgres@172.18.131.212:/postgres/postgres/9.0-pgdg/primary_archive ssh: connect to host 172.18.131.212 port 22: Connection timed out^M lost connection LOG: archive command failed with exit code 1 DETAIL: The failed archive command was: scp pg_xlog/000000010000000000000006 postgres@172.18.131.212:/postgres/postgres/9.0-pgdg/primary_archive ssh: connect to host 172.18.131.212 port 22: Connection timed out^M lost connection LOG: archive command failed with exit code 1 DETAIL: The failed archive command was: scp pg_xlog/000000010000000000000006 postgres@172.18.131.212:/postgres/postgres/9.0-pgdg/primary_archive WARNING: transaction log file "000000010000000000000006" could not be archived: too many failures But then the primary retries this another 49 times! So 150 attempts in all. What I need to know is whether these numbers are configurable? Can they be timed? How long before the primary stops retrying altogether? Any help appreciated. Thanks! Dan -- View this message in context: http://postgresql.1045698.n5.nabble.com/Failed-archive-command-copy-number-of-attempts-configurable-tp3255563p3255563.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Tue, Nov 9, 2010 at 4:01 AM, dan.m.harris <daniel.harris@metaswitch.com> wrote: > But then the primary retries this another 49 times! So 150 attempts in all. > > What I need to know is whether these numbers are configurable? No. > Can they be > timed? How long before the primary stops retrying altogether? Forever until the archive will have been available again. BTW, since the primary cannot remove the unarchived WAL file from pg_xlog directory, unless you fix the archive soon, the primary might run out of the disk space and cause a PANIC error. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii, thank you very much for this clarification. Regards, Dan -----Original Message----- From: Fujii Masao [mailto:masao.fujii@gmail.com] Sent: 09 November 2010 11:55 To: Daniel Harris Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Failed archive_command copy - number of attempts configurable? On Tue, Nov 9, 2010 at 4:01 AM, dan.m.harris <daniel.harris@metaswitch.com> wrote: > But then the primary retries this another 49 times! So 150 attempts in all. > > What I need to know is whether these numbers are configurable? No. > Can they be > timed? How long before the primary stops retrying altogether? Forever until the archive will have been available again. BTW, since the primary cannot remove the unarchived WAL file from pg_xlog directory, unless you fix the archive soon, the primary might run out of the disk space and cause a PANIC error. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center