Обсуждение: Failed archive_command copy - number of attempts configurable?

Поиск
Список
Период
Сортировка

Failed archive_command copy - number of attempts configurable?

От
"dan.m.harris"
Дата:
Hi all,

I'm doing some testing of Postgres 9.0 archiving and streaming replication
between a couple of Solaris 10 servers. Recently I was trying to test how
well the standby server catches up after an outage, and a question arose.

It seems that if the standby is uncontactable by the primary when it is
attempting WAL archiving, the primary will attempt the copy three times,
then log that the log file could not be archived, as there were too many
failures. See:

ssh: connect to host 172.18.131.212 port 22: Connection timed out^M
lost connection
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: scp
pg_xlog/000000010000000000000006
postgres@172.18.131.212:/postgres/postgres/9.0-pgdg/primary_archive
ssh: connect to host 172.18.131.212 port 22: Connection timed out^M
lost connection
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: scp
pg_xlog/000000010000000000000006
postgres@172.18.131.212:/postgres/postgres/9.0-pgdg/primary_archive
ssh: connect to host 172.18.131.212 port 22: Connection timed out^M
lost connection
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: scp
pg_xlog/000000010000000000000006
postgres@172.18.131.212:/postgres/postgres/9.0-pgdg/primary_archive
WARNING:  transaction log file "000000010000000000000006" could not be
archived: too many failures


But then the primary retries this another 49 times! So 150 attempts in all.

What I need to know is whether these numbers are configurable? Can they be
timed? How long before the primary stops retrying altogether?

Any help appreciated. Thanks!
Dan
--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Failed-archive-command-copy-number-of-attempts-configurable-tp3255563p3255563.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Failed archive_command copy - number of attempts configurable?

От
Fujii Masao
Дата:
On Tue, Nov 9, 2010 at 4:01 AM, dan.m.harris
<daniel.harris@metaswitch.com> wrote:
> But then the primary retries this another 49 times! So 150 attempts in all.
>
> What I need to know is whether these numbers are configurable?

No.

> Can they be
> timed? How long before the primary stops retrying altogether?

Forever until the archive will have been available again.

BTW, since the primary cannot remove the unarchived WAL file from
pg_xlog directory, unless you fix the archive soon, the primary might
run out of the disk space and cause a PANIC error.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Failed archive_command copy - number of attempts configurable?

От
Daniel Harris
Дата:
Fujii, thank you very much for this clarification.

Regards,
Dan

-----Original Message-----
From: Fujii Masao [mailto:masao.fujii@gmail.com]
Sent: 09 November 2010 11:55
To: Daniel Harris
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Failed archive_command copy - number of attempts configurable?

On Tue, Nov 9, 2010 at 4:01 AM, dan.m.harris
<daniel.harris@metaswitch.com> wrote:
> But then the primary retries this another 49 times! So 150 attempts in all.
>
> What I need to know is whether these numbers are configurable?

No.

> Can they be
> timed? How long before the primary stops retrying altogether?

Forever until the archive will have been available again.

BTW, since the primary cannot remove the unarchived WAL file from
pg_xlog directory, unless you fix the archive soon, the primary might
run out of the disk space and cause a PANIC error.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center