Обсуждение: postgresql says archive is failing but its not

Поиск
Список
Период
Сортировка

postgresql says archive is failing but its not

От
"Mathis, Jason"
Дата:
Hi All,

I am getting some weird archiving failed messages in the logs but nothing is failing. I *think* its just exiting with a non zero code from the script. 

Sound ok, now here is the weird part. Archiving is NOT failing, its working. There are no *.ready files in the archive_status directory. In fact the below wal file is not on the server. Yet postgres complains about this file every time the archiver runs, in fact I have over a thousand messages telling me over the past 24 hours, same file every time, no *.ready files and everything is archiving. We have other servers doing the same thing. So I suspect the script that runs the archive is to blame, although we tried redirecting to debug and didn't find anything yet. As if this is not strange enough the error message will appear for hour blocks of time. For example 2:00pm - 2:59pm, errors on what looks like every run, but then the errors will stop and nothing until another random hour like 11:00pm. 

Does anyone have a clue as to what is going on? 

Thanks! 

related info>>>

2015-01-09 02:00:50.478 CST,,,31084,,54a2e24d.796c,49091,,2014-12-30 11:35:09 CST,,0,LOG,00000,"archive command failed with exit code 1","The failed archive command was: /path/to/archive/script pg_xlog/0000000100002ED0000000CB 0000000100002ED0000000CB 9.1/clustername",,,,,,,,""
2015-01-09 02:00:50.479 CST,,,31084,,54a2e24d.796c,49092,,2014-12-30 11:35:09 CST,,0,WARNING,01000,"transaction log file ""0000000100002ED0000000CB"" could not be archived: too many failures",,,,,,,,,""

postgres=# select version();
                                            version
-----------------------------------------------------------------------------------------------
 PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
(1 row)

Re: postgresql says archive is failing but its not

От
Kevin Grittner
Дата:
"Mathis, Jason" <jmathis@enova.com> wrote:

> I am getting some weird archiving failed messages in the logs but
> nothing is failing. I *think* its just exiting with a non zero
> code from the script.

The archive command (script or not) must exit with 0 if successful.
A nonzero exit code indicates failure.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: postgresql says archive is failing but its not

От
"Mathis, Jason"
Дата:
But if it was failing then I would be having a stack of *.ready files right? In fact I should have a "0000000100002ED0000000CB.ready" file but I don't. It weird man, so weird. 

On Fri, Jan 9, 2015 at 11:58 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
"Mathis, Jason" <jmathis@enova.com> wrote:

> I am getting some weird archiving failed messages in the logs but
> nothing is failing. I *think* its just exiting with a non zero
> code from the script.

The archive command (script or not) must exit with 0 if successful.
A nonzero exit code indicates failure.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: postgresql says archive is failing but its not

От
Kevin Grittner
Дата:
"Mathis, Jason" <jmathis@enova.com> wrote:
> On Fri, Jan 9, 2015 at 11:58 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> "Mathis, Jason" <jmathis@enova.com> wrote:
>>
>>> I am getting some weird archiving failed messages in the logs
>>> but nothing is failing. I *think* its just exiting with a non
>>> zero code from the script.
>>
>> The archive command (script or not) must exit with 0 if
>> successful.  A nonzero exit code indicates failure.
>
> But if it was failing then I would be having a stack of *.ready
> files right? In fact I should have a
> "0000000100002ED0000000CB.ready" file but I don't. It weird man,
> so weird.

Well, you didn't show this script.  Is it perhaps directly messing
with the *.ready files instead of leaving them to PostgreSQL to
manage?  If it's doing one thing wrong (i.e., using a nonzero exit
code after copying to the archive directory) perhaps it's doing
something else wrong (like removing the copied WAL file or messing
with the *.ready files).  If you mess with internals that you
shouldn't be touching, you can expect weird.

Perhaps the script was initially only getting one of these things
wrong and it was causing failures, so the author went further down
the rabbit hole in an attempt to get it sorta working?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: postgresql says archive is failing but its not

От
"Mathis, Jason"
Дата:
It is a script although I am not the author just a lonely dba with weird problems:) 

But it does not appear to be messing with any wal files in the xlog or xlog/archive_status dirs. I guess I am just confused why its about a file that is not there, yet everything seems to be working as it should. 

On Fri, Jan 9, 2015 at 12:50 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
"Mathis, Jason" <jmathis@enova.com> wrote:
> On Fri, Jan 9, 2015 at 11:58 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> "Mathis, Jason" <jmathis@enova.com> wrote:
>>
>>> I am getting some weird archiving failed messages in the logs
>>> but nothing is failing. I *think* its just exiting with a non
>>> zero code from the script.
>>
>> The archive command (script or not) must exit with 0 if
>> successful.  A nonzero exit code indicates failure.
>
> But if it was failing then I would be having a stack of *.ready
> files right? In fact I should have a
> "0000000100002ED0000000CB.ready" file but I don't. It weird man,
> so weird.

Well, you didn't show this script.  Is it perhaps directly messing
with the *.ready files instead of leaving them to PostgreSQL to
manage?  If it's doing one thing wrong (i.e., using a nonzero exit
code after copying to the archive directory) perhaps it's doing
something else wrong (like removing the copied WAL file or messing
with the *.ready files).  If you mess with internals that you
shouldn't be touching, you can expect weird.

Perhaps the script was initially only getting one of these things
wrong and it was causing failures, so the author went further down
the rabbit hole in an attempt to get it sorta working?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: postgresql says archive is failing but its not

От
"Mathis, Jason"
Дата:
And it also doesn't explain why for hours I don't get an error. Then suddenly for an hour "block" I get the error and its the same file from hours before. With hours in between with no errors! 

On Fri, Jan 9, 2015 at 12:57 PM, Mathis, Jason <jmathis@enova.com> wrote:
It is a script although I am not the author just a lonely dba with weird problems:) 

But it does not appear to be messing with any wal files in the xlog or xlog/archive_status dirs. I guess I am just confused why its about a file that is not there, yet everything seems to be working as it should. 

On Fri, Jan 9, 2015 at 12:50 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
"Mathis, Jason" <jmathis@enova.com> wrote:
> On Fri, Jan 9, 2015 at 11:58 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> "Mathis, Jason" <jmathis@enova.com> wrote:
>>
>>> I am getting some weird archiving failed messages in the logs
>>> but nothing is failing. I *think* its just exiting with a non
>>> zero code from the script.
>>
>> The archive command (script or not) must exit with 0 if
>> successful.  A nonzero exit code indicates failure.
>
> But if it was failing then I would be having a stack of *.ready
> files right? In fact I should have a
> "0000000100002ED0000000CB.ready" file but I don't. It weird man,
> so weird.

Well, you didn't show this script.  Is it perhaps directly messing
with the *.ready files instead of leaving them to PostgreSQL to
manage?  If it's doing one thing wrong (i.e., using a nonzero exit
code after copying to the archive directory) perhaps it's doing
something else wrong (like removing the copied WAL file or messing
with the *.ready files).  If you mess with internals that you
shouldn't be touching, you can expect weird.

Perhaps the script was initially only getting one of these things
wrong and it was causing failures, so the author went further down
the rabbit hole in an attempt to get it sorta working?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: postgresql says archive is failing but its not

От
David G Johnston
Дата:
Mathis, Jason wrote
> On Fri, Jan 9, 2015 at 12:57 PM, Mathis, Jason <

> jmathis@

> > wrote:
>> It is a script although I am not the author just a lonely dba with weird
>> problems:)
>
> And it also doesn't explain why for hours I don't get an error. Then
> suddenly for an hour "block" I get the error and its the same file from
> hours before. With hours in between with no errors!

You either need to share the script or ask the script author for help.

David J.





--
View this message in context:
http://postgresql.nabble.com/postgresql-says-archive-is-failing-but-its-not-tp5833429p5833445.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


Re: postgresql says archive is failing but its not

От
"Mathis, Jason"
Дата:
We don't need to get into the details of the script. I suppose I was hoping someone would have know of an issue with this version or distro. Maybe it is the script or maybe someone else did miss with the wal/status files and I am just seeing aftermath of that. Sooner or later the server will be restarted and resolve this issue. 

Anyways thanks everyone for your time. But please let me know if something comes up at a later date. 

-jason



On Fri, Jan 9, 2015 at 1:16 PM, David G Johnston <david.g.johnston@gmail.com> wrote:
Mathis, Jason wrote
> On Fri, Jan 9, 2015 at 12:57 PM, Mathis, Jason &lt;

> jmathis@

> &gt; wrote:
>> It is a script although I am not the author just a lonely dba with weird
>> problems:)
>
> And it also doesn't explain why for hours I don't get an error. Then
> suddenly for an hour "block" I get the error and its the same file from
> hours before. With hours in between with no errors!

You either need to share the script or ask the script author for help.

David J.





--
View this message in context: http://postgresql.nabble.com/postgresql-says-archive-is-failing-but-its-not-tp5833429p5833445.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin