Обсуждение: postgres unable to start

Поиск
Список
Период
Сортировка

postgres unable to start

От
tommaso
Дата:
Hi All,

one of our users killed a postgres process with kill -9 "PID"
after that the DB server is not longer able to start.

Here is the log:

Jul 28 13:06:46 hdmisv03 postgres[7916]: [1790-1] user=,db=,host= DEBUG: 00000: server process (PID 7918) exited with exit code 0
Jul 28 13:06:46 hdmisv03 postgres[7916]: [1790-2] user=,db=,host= LOCATION: LogChildExit, postmaster.c:2707
Jul 28 13:06:46 hdmisv03 postgres[7917]: [1795-1] user=,db=,host= WARNING: 01000: could not open directory "base/34728840": No such file or directory
Jul 28 13:06:46 hdmisv03 postgres[7917]: [1795-2] user=,db=,host= CONTEXT: xlog redo drop db: dir 34728840/1663
Jul 28 13:06:46 hdmisv03 postgres[7917]: [1795-3] user=,db=,host= LOCATION: pgfnames, dirmod.c:323
Jul 28 13:06:46 hdmisv03 postgres[7917]: [1796-1] user=,db=,host= WARNING: 01000: some useless files may be left behind in old database directory "base/34728840"
Jul 28 13:06:46 hdmisv03 postgres[7917]: [1796-2] user=,db=,host= CONTEXT: xlog redo drop db: dir 34728840/1663
Jul 28 13:06:46 hdmisv03 postgres[7917]: [1796-3] user=,db=,host= LOCATION: dbase_redo, dbcommands.c:2058
Jul 28 13:06:46 hdmisv03 postgres[7917]: [1797-1] user=,db=,host= DEBUG: 00000: page 1 of relation base/35531019/35671296 is uninitialized
Jul 28 13:06:46 hdmisv03 postgres[7917]: [1797-2] user=,db=,host= CONTEXT: xlog redo Insert item, node: 1663/35531019/35671296 blkno: 1 offset: 4 nitem: 1 isdata: F isleaf F isdelete F updateBlkno:11
Jul 28 13:06:46 hdmisv03 postgres[7917]: [1797-3] user=,db=,host= LOCATION: log_invalid_page, xlogutils.c:74
Jul 28 13:06:46 hdmisv03 postgres[7917]: [1798-1] user=,db=,host= PANIC: XX001: corrupted page pointers: lower = 59234, upper = 3, special = 59235
Jul 28 13:06:46 hdmisv03 postgres[7917]: [1798-2] user=,db=,host= CONTEXT: xlog redo Insert item, node: 1663/35531019/35671296 blkno: 1 offset: 4 nitem: 1 isdata: F isleaf F isdelete F updateBlkno:11
Jul 28 13:06:46 hdmisv03 postgres[7917]: [1798-3] user=,db=,host= LOCATION: PageAddItem, bufpage.c:144
Jul 28 13:06:46 hdmisv03 postgres[7916]: [1791-1] user=,db=,host= DEBUG: 00000: reaping dead processes
Jul 28 13:06:46 hdmisv03 postgres[7916]: [1791-2] user=,db=,host= LOCATION: reaper, postmaster.c:2238
Jul 28 13:06:46 hdmisv03 postgres[7916]: [1792-1] user=,db=,host= LOG: 00000: startup process (PID 7917) was terminated by signal 6: Aborted
Jul 28 13:06:46 hdmisv03 postgres[7916]: [1792-2] user=,db=,host= LOCATION: LogChildExit, postmaster.c:2727
Jul 28 13:06:46 hdmisv03 postgres[7916]: [1793-1] user=,db=,host= LOG: 00000: aborting startup due to startup process failure


Has anybody an idea how to fix that error ?


TIA
Tommaso

Re: postgres unable to start

От
Craig Ringer
Дата:
On 28/07/2011 7:51 PM, tommaso wrote:
Hi All,

one of our users killed a postgres process with kill -9 "PID"
after that the DB server is not longer able to start.


There is a lot of information missing from this question, though you did include the log entries (thanks). Please see this for the rest of the info you need:

http://wiki.postgresql.org/wiki/Guide_to_reporting_problems

Why did the user "kill -9" the DB, anyway? Was it perhaps not responding for an extended period, and maybe in the "D" state in PS? Are there / were there any messages in "dmesg" or in the kernel log files? Perhaps related to I/O errors or file system errors? Was there a recent fsck on the file system, a recent reboot of the server, or recent power loss/interruption?

--
Craig Ringer

POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/

Re: postgres unable to start

От
tommaso
Дата:
Hallo,

some info:

Postgres 8.4
Ubuntu 10.04.1
Linux hdmisv03 2.6.32-24-server #39-Ubuntu SMP Wed Jul 28 06:21:40 UTC 2010 x86_64 GNU/Linux

and we installed Postgres through atp.

The user wanted just to kill a connection from another user on a database. He did not kill the whole server process but only one connection on a single database.
The filesystem and the postgres server were ok. No reboot or power interuption, however fsck was not run recently.
The cluster includes about 450 databases. Perhaps too much?

If we identify the corrupted database (I think the one with OID 35531019), is there a way to remove it manually from the file system?

Tommaso


On Thu, 2011-07-28 at 20:02 +0800, Craig Ringer wrote:
On 28/07/2011 7:51 PM, tommaso wrote:
Hi All,

one of our users killed a postgres process with kill -9 "PID"
after that the DB server is not longer able to start.


There is a lot of information missing from this question, though you did include the log entries (thanks). Please see this for the rest of the info you need:

http://wiki.postgresql.org/wiki/Guide_to_reporting_problems

Why did the user "kill -9" the DB, anyway? Was it perhaps not responding for an extended period, and maybe in the "D" state in PS? Are there / were there any messages in "dmesg" or in the kernel log files? Perhaps related to I/O errors or file system errors? Was there a recent fsck on the file system, a recent reboot of the server, or recent power loss/interruption?

--
Craig Ringer

POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/