Обсуждение: WAL Archiving problem

Поиск
Список
Период
Сортировка

WAL Archiving problem

От
Norberto Dellê
Дата:
Greetings to everyone

I have a PostgreSQL 8.2.4 installation running under Windows XP with WAL
archiving activated.
But at some point Postgres began to ask to archive a WAL segment that
isn't in the pg_xlog directory. I thought that a segment that isn't
succesfully
archived should remain in the pg_xlog directory, or am i wrong?

Any ideas about how this can happen?

Thanks in advance, Norberto

Re: WAL Archiving problem

От
Tom Lane
Дата:
=?ISO-8859-1?Q?Norberto_Dell=EA?= <betodelle@gmail.com> writes:
> I have a PostgreSQL 8.2.4 installation running under Windows XP with WAL
> archiving activated.
> But at some point Postgres began to ask to archive a WAL segment that
> isn't in the pg_xlog directory. I thought that a segment that isn't
> succesfully
> archived should remain in the pg_xlog directory, or am i wrong?

Do you have the postmaster log from around the time that this started
happening?  I'm wondering about a file rename() failing, or some such.

What files do you have, exactly, in pg_xlog and pg_xlog/archive_status?
It'd be useful to see their modification timestamps as well as their
names.

            regards, tom lane

Re: WAL Archiving problem

От
Norberto Delle
Дата:
Tom Lane writes:
> Norberto Delle <betodelle@gmail.com> writes:
>> I have a PostgreSQL 8.2.4 installation running under Windows XP with WAL
>> archiving activated.
>> But at some point Postgres began to ask to archive a WAL segment that
>> isn't in the pg_xlog directory. I thought that a segment that isn't
>> succesfully
>> archived should remain in the pg_xlog directory, or am i wrong?
>
> Do you have the postmaster log from around the time that this started
> happening?  I'm wondering about a file rename() failing, or some such.
>
> What files do you have, exactly, in pg_xlog and pg_xlog/archive_status?
> It'd be useful to see their modification timestamps as well as their
> names.
Hi all

Thank you Tom, for the quick answer

Here is the part of the postmaster log where something wrong happened:

-- This sequence of WAL files was originated by a restore (COPY FROM stdin)

2007-08-20 09:09:40 LOG:  archived transaction log file
"0000000100000002000000DC"
2007-08-20 09:10:27 LOG:  archived transaction log file
"0000000100000002000000DD"
2007-08-20 09:11:07 LOG:  archived transaction log file
"0000000100000002000000DE"
2007-08-20 09:11:33 LOG:  archived transaction log file
"0000000100000002000000DF"
2007-08-20 09:11:38 LOG:  archived transaction log file
"0000000100000002000000E0"
2007-08-20 09:11:42 LOG:  archived transaction log file
"0000000100000002000000E1"
2007-08-20 09:11:46 LOG:  archived transaction log file
"0000000100000002000000E2"
2007-08-20 09:11:50 LOG:  archived transaction log file
"0000000100000002000000E3"
2007-08-20 09:11:53 LOG:  archived transaction log file
"0000000100000002000000E4"
2007-08-20 09:11:57 LOG:  archived transaction log file
"0000000100000002000000E5"
2007-08-20 09:12:01 LOG:  archived transaction log file
"0000000100000002000000E6"
2007-08-20 09:12:09 LOG:  archived transaction log file
"0000000100000002000000E7"
2007-08-20 09:12:20 LOG:  archived transaction log file
"0000000100000002000000E8"
2007-08-20 09:12:21 LOG:  could not receive data from client: Unknown
winsock error 10061
2007-08-20 09:12:21 LOG:  could not receive data from client: Unknown
winsock error 10061
2007-08-20 09:12:21 LOG:  unexpected EOF on client connection
2007-08-20 09:12:21 LOG:  unexpected EOF on client connection
2007-08-20 09:12:21 LOG:  could not receive data from client: Unknown
winsock error 10061
2007-08-20 09:12:21 LOG:  unexpected EOF on client connection

-- Note that here the WAL file '0000000100000002000000E9' was archived
(Postgres thinks it was,
-- because it's not present in the backup directory)

2007-08-20 09:12:33 LOG:  archived transaction log file
"0000000100000002000000E9"
2007-08-20 09:12:46 LOG:  archived transaction log file
"0000000100000002000000EA"
2007-08-20 09:12:57 LOG:  archived transaction log file
"0000000100000002000000EB"

-- And here Postgres is asking to archive '0000000100000002000000E9' again

2007-08-20 09:22:29 LOG:  archive command "C:\Imob\IMOBBackup\bbp.exe
-wal="pg_xlog\0000000100000002000000E9"" failed: return code 13
2007-08-20 09:22:31 LOG:  archive command "C:\Imob\IMOBBackup\bbp.exe
-wal="pg_xlog\0000000100000002000000E9"" failed: return code 13
2007-08-20 09:22:32 LOG:  archive command "C:\Imob\IMOBBackup\bbp.exe
-wal="pg_xlog\0000000100000002000000E9"" failed: return code 13
2007-08-20 09:22:32 WARNING:  transaction log file
"0000000100000002000000E9" could not be archived: too many failures


Looking in bbp.exe log i realized that the archive command fails because
pg_xlog\0000000100000002000000E9 is not found,
and looking in the pg_xlog\archive_status directory there is a file
named '0000000100000002000000E9.XXXXX.ready'.
More information will be difficult to obtain because a don't have direct
access to the server.

I hope this information helps



Re: WAL Archiving problem

От
Tom Lane
Дата:
Norberto Delle <betodelle@gmail.com> writes:
> 2007-08-20 09:12:09 LOG:  archived transaction log file
> "0000000100000002000000E7"
> 2007-08-20 09:12:20 LOG:  archived transaction log file
> "0000000100000002000000E8"
> 2007-08-20 09:12:21 LOG:  could not receive data from client: Unknown
> winsock error 10061
> 2007-08-20 09:12:21 LOG:  could not receive data from client: Unknown
> winsock error 10061
> 2007-08-20 09:12:21 LOG:  unexpected EOF on client connection
> 2007-08-20 09:12:21 LOG:  unexpected EOF on client connection
> 2007-08-20 09:12:21 LOG:  could not receive data from client: Unknown
> winsock error 10061
> 2007-08-20 09:12:21 LOG:  unexpected EOF on client connection

> -- Note that here the WAL file '0000000100000002000000E9' was archived
> (Postgres thinks it was,
> -- because it's not present in the backup directory)

> 2007-08-20 09:12:33 LOG:  archived transaction log file
> "0000000100000002000000E9"
> 2007-08-20 09:12:46 LOG:  archived transaction log file
> "0000000100000002000000EA"

Hmm.  The broken client connections should in theory be unrelated to
anything happening with WAL files, but it does seem mighty suspicious
that they happened in the same time period that that was the active
WAL file.  Do you see a lot of those "error 10061" entries elsewhere
in your logs, or was this an unusual occurrence?  Also, what exactly
is your archiving script doing --- does it send the file over a network
connection?  If the messages we can see above indicate a transient
network problem, as seems likely, that might possibly have affected
the archiving process as well.  Are you sure your archiving script
would have noticed a network-related failure?

> -- And here Postgres is asking to archive '0000000100000002000000E9' again

> 2007-08-20 09:22:29 LOG:  archive command "C:\Imob\IMOBBackup\bbp.exe
> -wal="pg_xlog\0000000100000002000000E9"" failed: return code 13

Ten minutes later --- that's a heck of a long time when you're finishing
a WAL file every ten or fifteen seconds.  Please check exactly what
timestamp is on the .ready file.

            regards, tom lane