Обсуждение: BUG #5176: database recovery produces infinite loop
The following bug has been logged online:
Bug reference: 5176
Logged by: Helge Milde
Email address: helge@monsternett.no
PostgreSQL version: 8.1.18
Operating system: Debian 2.6.18-6-686
Description: database recovery produces infinite loop
Details:
Our database recently crashed, and since then, we haven't been able to start
it again.
Here's the postmaster log with debug level 5:
--- log start ---
LOG: could not load root certificate file "root.crt": No SSL error
reported
DETAIL: Will not verify client certificates.
DEBUG: invoking IpcMemoryCreate(size=10469376)
DEBUG: max_safe_fds = 985, usable_fds = 1000, already_open = 5
LOG: database system was interrupted while in recovery at 2009-11-10
13:07:47 CET
HINT: This probably means that some data is corrupted and you will have to
use the last backup for recovery.
LOG: checkpoint record is at 8/931B8558
LOG: redo record is at 8/931B3544; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 566148231; next OID: 25210
LOG: next MultiXactId: 18; next MultiXactOffset: 37
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 8/931B3544
--- log end ---
Nothing else of interest follows.
It seems the recovery process is failing; doing a strace on the 'postgres:
startup process' pid, I see this:
-- strace log start ---
Process 29888 attached - interrupt to quit
close(14) = 0
open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14
_llseek(14, 245760, [245760], SEEK_SET) = 0
write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192)
= 8192
fsync(14) = 0
close(14) = 0
open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14
_llseek(14, 245760, [245760], SEEK_SET) = 0
write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192)
= 8192
fsync(14) = 0
close(14) = 0
-- strace log end ---
This repeats infinetly.
We do have some database dumps, but they are not very recent, so any help on
fixing this would be much appreciated.
Thanks,
Helge Milde
"Helge Milde" <helge@monsternett.no> writes:
> It seems the recovery process is failing; doing a strace on the 'postgres:
> startup process' pid, I see this:
> -- strace log start ---
> Process 29888 attached - interrupt to quit
> close(14) = 0
> open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14
> _llseek(14, 245760, [245760], SEEK_SET) = 0
> write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192)
> = 8192
> fsync(14) = 0
> close(14) = 0
> open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14
> _llseek(14, 245760, [245760], SEEK_SET) = 0
> write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192)
> = 8192
> fsync(14) = 0
> close(14) = 0
> -- strace log end ---
> This repeats infinetly.
Hmm, can you attach to that process with gdb and collect a few stack
traces to show where it's looping? This isn't a symptom we've seen
before, AFAIR.
regards, tom lane
I'm sorry, but the issue has been fixed now; we've upgraded to postgres-8.3, so I can't reproduce this anymore..
I'm not completely sure how it got fixed (it wasn't me), but I can ask the person who fixed it and get back to you
tomorrow;perhaps that might help narrow the field a little bit.
On Tue, Nov 10, 2009 at 09:31:06AM -0500, Tom Lane wrote:
>"Helge Milde" <helge@monsternett.no> writes:
>> It seems the recovery process is failing; doing a strace on the 'postgres:
>> startup process' pid, I see this:
>
>> -- strace log start ---
>> Process 29888 attached - interrupt to quit
>> close(14) = 0
>> open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14
>> _llseek(14, 245760, [245760], SEEK_SET) = 0
>> write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192)
>> = 8192
>> fsync(14) = 0
>> close(14) = 0
>> open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14
>> _llseek(14, 245760, [245760], SEEK_SET) = 0
>> write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192)
>> = 8192
>> fsync(14) = 0
>> close(14) = 0
>> -- strace log end ---
>
>> This repeats infinetly.
>
>Hmm, can you attach to that process with gdb and collect a few stack
>traces to show where it's looping? This isn't a symptom we've seen
>before, AFAIR.
>
> regards, tom lane
--
Helge Milde, 69701808
www.monsternett.no