Обсуждение: Database corruption in RH 6.2/prepackaged PG

Поиск
Список
Период
Сортировка

Database corruption in RH 6.2/prepackaged PG

От
pgsql-bugs@postgresql.org
Дата:
Saku Airila (saku@bitblit.fi) reports a bug with a severity of 1
The lower the number the more severe it is.

Short Description
Database corruption in RH 6.2/prepackaged PG

Long Description
Random database corruption. The system I'm running has ~10 databases
online on a single server without other load. Sometimes one of the
databases corrupts itself beyond repair. I have vacuumdb and pg_dump
running nightly for cleaning the db and making a backup.

Postgres version:
PostgreSQL 6.5.3 on i686-pc-linux-gnu, compiled by gcc egcs-2.91.66

System:
Red Hat Linux release 6.2 (Zoot)
Kernel 2.2.14-5.0 on an i686
650 Mhz AMD Duron, MSI K7T Pro mainboard, 64 MB + 100 MB swap,
ASUS 53C896 U2 SCSI, IBM DDYS-T09170N 9 GB disk,
D-Link DFE500TX (DEC tulip) ethernet.

Problem description:
The nightly cron jobs return me the following message:
----
dumpClasses(): command failed.  Explanation from backend: 'pqReadData() -- backend closed the channel unexpectedly.
        This probably means the backend terminated abnormally
        before or while processing the request.
'.
----
I get the following message to the syslog wher trying to dump or
vacuum the db manually:
----
Jan 13 16:32:38 db kernel: Unable to handle kernel paging request at virtual add
ress 0003000b
Jan 13 16:32:38 db kernel: current->tss.cr3 = 00e99000, %cr3 = 00e99000
Jan 13 16:32:38 db kernel: *pde = 00000000
Jan 13 16:32:38 db kernel: Oops: 0000
Jan 13 16:32:38 db kernel: CPU:    0
Jan 13 16:32:38 db kernel: EIP:    0010:[update_vm_cache_conditional+111/284]
Jan 13 16:32:38 db kernel: EFLAGS: 00010206
Jan 13 16:32:38 db kernel: eax: 00000000   ebx: 00030003   ecx: c2250220   edx:
00001050
Jan 13 16:32:38 db kernel: esi: 00000000   edi: 00001000   ebp: 0002e000   esp:
c0e9be9c
Jan 13 16:32:38 db kernel: ds: 0018   es: 0018   ss: 0018
Jan 13 16:32:38 db kernel: Process postmaster (pid: 28997, process nr: 33, stack
page=c0e9b000)
Jan 13 16:32:38 db kernel: Stack: 0002e000 c099a000 c3f30000 0c225022 c013bf19 c
2250220 0002e000 c099a000
Jan 13 16:32:38 db kernel:        00001000 40251c40 c33570e0 ffffffea c225026c 0
0002000 c1a8d7e0 c1a8d7e0
Jan 13 16:32:38 db kernel:        c1a8d7e0 0002e000 00000000 c0e9bf08 00000000 0
0000000 c33efa00 00000000
Jan 13 16:32:38 db kernel: Call Trace: [ext2_file_write+1042/1559] [refile_buffe
r+82/178] [__brelse+19/82] [ext2_update_inode+825/840] [sys_recv+30/35] [sys_wri
te+214/248] [ext2_file_write+0/1559]
Jan 13 16:32:38 db kernel:        [system_call+52/56]
Jan 13 16:32:38 db kernel: Code: 39 4b 08 75 f0 39 6b 0c 75 eb ff 43 14 b8 02 00
 00 00 0f ab

I don't know if this is really a PostgreSQL problem or a Linux problem, but I'm quite sure the harware itself is ok.

If more information is needed, I'm happy to send the database dump,
altough the system being a confidential production system I need to
make sure the dump will not be disclosed to any third parties.

Thanks,

Saku Airila, saku@bitblit.fi
Systems Engineer, Bitblit Oy, Helsinki, Finland

Sample Code


No file was uploaded with this report

Re: Database corruption in RH 6.2/prepackaged PG

От
Tom Lane
Дата:
pgsql-bugs@postgresql.org writes:
> Problem description:
> The nightly cron jobs return me the following message:
> ----
> dumpClasses(): command failed.  Explanation from backend: 'pqReadData() -- backend closed the channel unexpectedly.
>         This probably means the backend terminated abnormally
>         before or while processing the request.

Can't tell much from this.  What would be useful is to look at the
postmaster log and a stack backtrace from the crashed backend.

The default startup script for your RH probably sends the postmaster
log file to the bit-bucket, so you'll have to change it.  Make sure
the postmaster is invoked without -S switch, and redirect its stdout
and stderr to some handy log file, etc

    postmaster -i -D wherever >/full/path/to/logfile 2>&1 &

(The extra & at the end is needed if you don't use -S.)

If you don't see a core file in $PGDATA/base/yourdb/core, then you
probably also need to add "ulimit -c unlimited" to the postmaster
start script, to allow dumping core from the postmaster and its
child processes.

Let us know when you have more detail ...

            regards, tom lane

PS: BTW, it would probably save time all around if you first update
to Postgres 7.0.3 and then see if the bug is still there.