Problem with pg_compresslog'd archives

Поиск
Список
Период
Сортировка
От Karl Denninger
Тема Problem with pg_compresslog'd archives
Дата
Msg-id 4B6F1A8F.6000104@denninger.net
обсуждение исходный текст
Ответы Re: Problem with pg_compresslog'd archives  (Oleg Bartunov <oleg@sai.msu.su>)
Re: Problem with pg_compresslog'd archives  (Koichi Suzuki <koichi.szk@gmail.com>)
Список pgsql-general
This may belong in a bug report, but I'll post it here first...

There appears to be a **SERIOUS** problem with using pg_compresslog and
pg_uncompresslog with Postgresql 8.4.2.

Here's my configuration snippet:

full_page_writes = on                   # recover from partial page writes
wal_buffers = 256kB                     # min 32kB
                                        # (change requires restart)
#wal_writer_delay = 200ms               # 1-10000 milliseconds

#commit_delay = 0                       # range 0-100000, in microseconds
#commit_siblings = 5                    # range 1-1000

# - Checkpoints -

checkpoint_segments = 64                # in logfile segments, min 1,
16MB each
#checkpoint_timeout = 5min              # range 30s-1h
checkpoint_completion_target = 0.9      # checkpoint target duration,
0.0 - 1.0
#checkpoint_warning = 30s               # 0 disables

archive_command = 'test ! -f /dbms/pg_archive/%f.bz2 && pg_compresslog
%p | bzip2 - >/dbms/pg_archive/%f.bz2'           #command to use to
archive a logfile segment

All appears to be fine with the writes, and they are being saved off on
the nightly backups without incident.

I take a full dump using the instructions in the documentation and make
sure I copy the proper "must have" file for consistency to be reached.

The problem comes when I try to restore.

recovery_conf contains:

restore_command = '/usr/local/pgsql/recovery.sh %f %p'

And that file contains:


#! /bin/sh

infile=$1
outfile=$2

if test -f /dbms/pg_archive/$infile.bz2
then
        bunzip2 -c /dbms/pg_archive/$infile.bz2 |
/usr/local/pgsql/bin/pg_decompresslog - $outfile
        exit 0
else
        exit 1
fi

==============

The problem is that it appears that some of the segments being saved are
no good!  On occasion I get this when trying to restore...

Feb  7 12:43:51 dbms2 postgres[2001]: [210-1] LOG:  restored log file
"00000001000001710000009A" from archive
Feb  7 12:43:52 dbms2 postgres[2001]: [211-1] LOG:  restored log file
"00000001000001710000009B" from archive
Feb  7 12:43:52 dbms2 postgres[2001]: [212-1] LOG:  restored log file
"00000001000001710000009C" from archive
Feb  7 12:43:52 dbms2 postgres[2001]: [213-1] LOG:  restored log file
"00000001000001710000009D" from archive
Feb  7 12:43:53 dbms2 postgres[2001]: [214-1] LOG:  restored log file
"00000001000001710000009E" from archive
Feb  7 12:43:53 dbms2 postgres[2001]: [215-1] LOG:  restored log file
"00000001000001710000009F" from archive
Feb  7 12:43:54 dbms2 postgres[2001]: [216-1] LOG:  restored log file
"0000000100000171000000A0" from archive
Feb  7 12:43:54 dbms2 postgres[2001]: [217-1] LOG:  restored log file
"0000000100000171000000A1" from archive
Feb  7 12:43:55 dbms2 postgres[2001]: [218-1] LOG:  restored log file
"0000000100000171000000A2" from archive
Feb  7 12:43:55 dbms2 postgres[2001]: [219-1] LOG:  restored log file
"0000000100000171000000A3" from archive
Feb  7 12:43:56 dbms2 postgres[2001]: [220-1] LOG:  restored log file
"0000000100000171000000A4" from archive
Feb  7 12:43:56 dbms2 postgres[2001]: [221-1] LOG:  restored log file
"0000000100000171000000A5" from archive
Feb  7 12:43:57 dbms2 postgres[2001]: [222-1] LOG:  restored log file
"0000000100000171000000A6" from archive
Feb  7 12:43:57 dbms2 postgres[2001]: [223-1] PANIC:  corrupted page
pointers: lower = 772, upper = 616, special = 0
Feb  7 12:43:57 dbms2 postgres[2001]: [223-2] CONTEXT:  xlog redo
hot_update: rel 1663/616245/1193269; tid 53/93; new 53/4
Feb  7 12:43:57 dbms2 postgres[2000]: [1-1] LOG:  startup process (PID
2001) was terminated by signal 6: Abort trap
Feb  7 12:43:57 dbms2 postgres[2000]: [2-1] LOG:  terminating any other
active server processes

Eek.

I assume this means that either A6 or A7 is corrupt.  But I have the
file both in the restore AND ON THE MACHINE WHERE IT ORIGINATED:

On the SOURCE machine (which is running just fine):
tickerforum# cksum *171*A[67]*
172998591 830621 0000000100000171000000A6.bz2
1283345296 1541006 0000000100000171000000A7.bz2

And off the BACKUP archive, which is what I'm trying to restore:

# cksum *171*A[67]*
172998591 830621 0000000100000171000000A6.bz2
1283345296 1541006 0000000100000171000000A7.bz2

Identical, says the checksums.

This is VERY BAD - if pg_compresslog is damaging the files in some
instances then ANY BACKUP TAKEN USING THEM IS SUSPECT AND MAY NOT
RESTORE!!!!!!

Needless to say this is a MAJOR problem.

-- Karl Denninger


Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: zhong ming wu
Дата:
Сообщение: weird bug in rebuilding RPMs
Следующее
От: Oleg Bartunov
Дата:
Сообщение: Re: Problem with pg_compresslog'd archives