Обсуждение: Error messages in a hot standby server's logfiles

Поиск
Список
Период
Сортировка

Error messages in a hot standby server's logfiles

От
John Scalia
Дата:
Hi all,

My setup is: postgresql V9.3.3 running on a CentOS 6.5 (kernel 2.6.32-358.18.1.el6.x86_64) and I have 3 servers, one
primaryand two hot standbys. In our failover and loss of  
communications testing, I have seen a couple of issues that I'm hard to explain. For instance, we took one hot standby
outof service by shutting down postgresql on it. Now, we're  
hot standby with log shipping as an insurance policy, so the WAL segments continued to be copied onto that out of
servicestandby for a few minutes. On restart, I see: 

cp: cannot stat '/mnt/wallogs/archive/0000000C.history': No such file or directory
cp: cannot stat '/mnt/wallogs/archive/0000000B0000001900000077': No such file or directory

Later in the logfile, I see another failure for 000000B.history.

In looking at the /mnt/wallogs/archive directory, those files aren't there, but as the primary never had an issue and
continuedto copy WAL segments to this directory, why was the  
standby looking for them? What triggered this? Also, in that directory, I often see files generated by the
pg_basebackupcommand used to build the standby, files like  
"0000000B0000001900000000.00000028.backup" or generally files ending with .backup in their names. These never get
removedautomatically by the standby server. We have to manually  
remove them. So, I'm guessing they weren't necessary, so why did the primary copy them here using its archive_command?
Whyaren't they removed by some mechanism on the standby? 

--
Jay


Re: Error messages in a hot standby server's logfiles

От
Jerry Sievers
Дата:
John Scalia <jayknowsunix@gmail.com> writes:

> Hi all,
>
> My setup is: postgresql V9.3.3 running on a CentOS 6.5 (kernel
> 2.6.32-358.18.1.el6.x86_64) and I have 3 servers, one primary and two
> hot standbys. In our failover and loss of communications testing, I
> have seen a couple of issues that I'm hard to explain. For instance,
> we took one hot standby out of service by shutting down postgresql on
> it. Now, we're hot standby with log shipping as an insurance policy,
> so the WAL segments continued to be copied onto that out of service
> standby for a few minutes. On restart, I see:
>
> cp: cannot stat '/mnt/wallogs/archive/0000000C.history': No such file or directory
> cp: cannot stat '/mnt/wallogs/archive/0000000B0000001900000077': No such file or directory
>
> Later in the logfile, I see another failure for 000000B.history.

These .history files may or may not exist depending on whether timeline
branching has been done.

The only way Pg knows how to check for them is invoking archive_command
which is your case is cp and thus the message.

HTH

>
> In looking at the /mnt/wallogs/archive directory, those files aren't
> there, but as the primary never had an issue and continued to copy WAL
> segments to this directory, why was the standby looking for them? What
> triggered this? Also, in that directory, I often see files generated
> by the pg_basebackup command used to build the standby, files like
> "0000000B0000001900000000.00000028.backup" or generally files ending
> with .backup in their names. These never get removed automatically by
> the standby server. We have to manually remove them. So, I'm guessing
> they weren't necessary, so why did the primary copy them here using
> its archive_command? Why aren't they removed by some mechanism on the
> standby?
>
> --
> Jay

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


Re: Error messages in a hot standby server's logfiles

От
Jerry Sievers
Дата:
Sorry, meant 'restore_command' not archive_command.  See below...

John Scalia <jayknowsunix@gmail.com> writes:

> Hi all,
>
> My setup is: postgresql V9.3.3 running on a CentOS 6.5 (kernel
> 2.6.32-358.18.1.el6.x86_64) and I have 3 servers, one primary and two
> hot standbys. In our failover and loss of communications testing, I
> have seen a couple of issues that I'm hard to explain. For instance,
> we took one hot standby out of service by shutting down postgresql on
> it. Now, we're hot standby with log shipping as an insurance policy,
> so the WAL segments continued to be copied onto that out of service
> standby for a few minutes. On restart, I see:
>
> cp: cannot stat '/mnt/wallogs/archive/0000000C.history': No such file or directory
> cp: cannot stat '/mnt/wallogs/archive/0000000B0000001900000077': No such file or directory
>
> Later in the logfile, I see another failure for 000000B.history.

These .history files may or may not exist depending on whether timeline
branching has been done.

The only way Pg knows how to check for them is invoking archive_command
which is your case is cp and thus the message.

HTH

>
> In looking at the /mnt/wallogs/archive directory, those files aren't
> there, but as the primary never had an issue and continued to copy WAL
> segments to this directory, why was the standby looking for them? What
> triggered this? Also, in that directory, I often see files generated
> by the pg_basebackup command used to build the standby, files like
> "0000000B0000001900000000.00000028.backup" or generally files ending
> with .backup in their names. These never get removed automatically by
> the standby server. We have to manually remove them. So, I'm guessing
> they weren't necessary, so why did the primary copy them here using
> its archive_command? Why aren't they removed by some mechanism on the
> standby?
>
> --
> Jay

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


Re: Error messages in a hot standby server's logfiles

От
jayknowsunix@gmail.com
Дата:
Thanks for the confirmation in what I thought might be happening. Now, to try to convince the QA folks that this is
what'sgoing on. 

Sent from my iPad

> On Oct 7, 2014, at 1:56 PM, Jerry Sievers <gsievers19@comcast.net> wrote:
>
> Sorry, meant 'restore_command' not archive_command.  See below...
>
> John Scalia <jayknowsunix@gmail.com> writes:
>
>> Hi all,
>>
>> My setup is: postgresql V9.3.3 running on a CentOS 6.5 (kernel
>> 2.6.32-358.18.1.el6.x86_64) and I have 3 servers, one primary and two
>> hot standbys. In our failover and loss of communications testing, I
>> have seen a couple of issues that I'm hard to explain. For instance,
>> we took one hot standby out of service by shutting down postgresql on
>> it. Now, we're hot standby with log shipping as an insurance policy,
>> so the WAL segments continued to be copied onto that out of service
>> standby for a few minutes. On restart, I see:
>>
>> cp: cannot stat '/mnt/wallogs/archive/0000000C.history': No such file or directory
>> cp: cannot stat '/mnt/wallogs/archive/0000000B0000001900000077': No such file or directory
>>
>> Later in the logfile, I see another failure for 000000B.history.
>
> These .history files may or may not exist depending on whether timeline
> branching has been done.
>
> The only way Pg knows how to check for them is invoking archive_command
> which is your case is cp and thus the message.
>
> HTH
>
>>
>> In looking at the /mnt/wallogs/archive directory, those files aren't
>> there, but as the primary never had an issue and continued to copy WAL
>> segments to this directory, why was the standby looking for them? What
>> triggered this? Also, in that directory, I often see files generated
>> by the pg_basebackup command used to build the standby, files like
>> "0000000B0000001900000000.00000028.backup" or generally files ending
>> with .backup in their names. These never get removed automatically by
>> the standby server. We have to manually remove them. So, I'm guessing
>> they weren't necessary, so why did the primary copy them here using
>> its archive_command? Why aren't they removed by some mechanism on the
>> standby?
>>
>> --
>> Jay
>
> --
> Jerry Sievers
> Postgres DBA/Development Consulting
> e: postgres.consulting@comcast.net
> p: 312.241.7800