Обсуждение: recovery after segmentation fault
postgresql suddenly died... during recovery 2009-04-08 16:35:34 CEST FATAL: the database system is starting up ^^^ several 2009-04-08 16:35:34 CEST LOG: incomplete startup packet 2009-04-08 16:36:53 CEST FATAL: the database system is starting up 2009-04-08 16:36:53 CEST LOG: startup process (PID 3176) was terminated by signal 11: Segmentation fault 2009-04-08 16:36:53 CEST LOG: aborting startup due to startup process failure It could be something wrong with the recovery process in an aborted transaction that is causing the segfault... How can I resurrect the server and load a backup? It was serving more than one DB and I assume that only one is causing problems. Can I skip just that one from recovery and start from backup? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
Ivan Sergio Borgonovo <mail@webthatworks.it> writes: > 2009-04-08 16:36:53 CEST LOG: startup process (PID 3176) was > terminated by signal 11: Segmentation fault 2009-04-08 16:36:53 CEST > LOG: aborting startup due to startup process failure Hmm, what Postgres version is this? Can you get a stack trace from the startup process crash? The only simple way out of this is to delete the presumably-corrupt WAL log by running pg_resetxlog. That will destroy the evidence about what went wrong, though, so if you'd like to contribute to preventing such problems in future you need to save a copy of everything beforehand (eg, tar up all of $PGDATA). Also you might have a corrupt database afterwards :-( regards, tom lane
On Wed, 08 Apr 2009 10:59:54 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ivan Sergio Borgonovo <mail@webthatworks.it> writes: > > 2009-04-08 16:36:53 CEST LOG: startup process (PID 3176) was > > terminated by signal 11: Segmentation fault 2009-04-08 16:36:53 > > CEST LOG: aborting startup due to startup process failure > > Hmm, what Postgres version is this? Can you get a stack trace from > the startup process crash? How on Debian? Debian does all it's automagic stuff in init. I never learned how to start pg manually. > The only simple way out of this is to delete the presumably-corrupt > WAL log by running pg_resetxlog. That will destroy the evidence I couldn't find it... mmm what a strange place for an executable: /usr/lib/postgresql/8.3/bin/pg_resetxlog > about what went wrong, though, so if you'd like to contribute to > preventing such problems in future you need to save a copy of > everything beforehand (eg, tar up all of $PGDATA). Also you might > have a corrupt database afterwards :-( What if I just don't care about recovery of *one* DB (that is maybe the culprit) and just see the server restart then just do a restore from a VERY recent backup? Is there a way to just kill recovery for one DB? Just don't start it at all? This is the same DB having problem with recreation of gin index BTW... and I've the feeling that the problem is related to that index once more... I was vacuuming full, I aborted... I think the DB is trying to recreate the index but due to some problem (can I say bug or is it too early?) it segfaults. I think this could be of some help: 2009-04-08 16:47:13 CEST LOG: database system was not properly shut down; automatic recovery in progress 2009-04-08 16:47:13 CEST LOG: redo starts at 72/9200EBC8 BTW: Linux amd64, debian stock kernel Debian etch/backport: Version: 8.3.4-1~bpo40+1 Now let's learn how to use pg_resetxlog thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Wed, Apr 08, 2009 at 05:24:08PM +0200, Ivan Sergio Borgonovo wrote: > How on Debian? > Debian does all it's automagic stuff in init. I never learned how to > start pg manually. What might be easier is turning on core dumps (ulimit -S -c unlimited) and then start postgres and see if it drops a core dump, which you can then feed to gdb. All the binaries are in /usr/lib/postgresql/8.3/bin/ (Debian supports parallel installs of multiple versions of postgres). > What if I just don't care about recovery of *one* DB (that is maybe > the culprit) and just see the server restart then just do a restore > from a VERY recent backup? > > Is there a way to just kill recovery for one DB? Just don't start it > at all? Unfortunatly, the XLOG is shared betweens all databases on one cluster. > This is the same DB having problem with recreation of gin index > BTW... and I've the feeling that the problem is related to that > index once more... I was vacuuming full, I aborted... > > I think the DB is trying to recreate the index but due to some > problem (can I say bug or is it too early?) it segfaults. Interesting, hope you can get a good backtrace. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Вложения
On Wed, 8 Apr 2009 23:59:43 +0200 Martijn van Oosterhout <kleptog@svana.org> wrote: > What might be easier is turning on core dumps (ulimit -S -c > unlimited) and then start postgres and see if it drops a core thanks. > > Is there a way to just kill recovery for one DB? Just don't > > start it at all? > > Unfortunatly, the XLOG is shared betweens all databases on one > cluster. bwaaa. That's a bit of a pain. I'm trying to understand this a bit better... I think nothing terrible really happened since: a) the DB that has the higher write load was actually the one that caused the problem and I restored from a backup. b) the other DBs have some writes too... but the software using them doesn't have any idea about transactions so it is built with atomic statement in mind... No operation I can think of was writing in more than one table and I'd think most (all?) the operations were atomic at the statement level. So if I lost some writes in logs for the other DBs... that shouldn't be a problem, right? I just lost some data... not coherency? right? > > This is the same DB having problem with recreation of gin index > > BTW... and I've the feeling that the problem is related to that > > index once more... I was vacuuming full, I aborted... > > I think the DB is trying to recreate the index but due to some > > problem (can I say bug or is it too early?) it segfaults. > Interesting, hope you can get a good backtrace. I backed up all the data dir. I'm currently transferring it to my dev box. I've already the same DB... but it is on lenny. And it never gave me a problem. Version are slightly different anyway: Version: 8.3.6-1 (working) Version: 8.3.4-1~bpo40+1 (sometimes problematic[1]) 8.4 is at the door... and the only choice I have to fix the problem on that box is: - upgrade to lenny - build postgresql from source, that is going to be a maintenance pain. Could anything related to vacuum and/or gin index had been fixet between 8.3.4 and 8.3.6? I think that if I'll stick with some rituals I can live with it. Avoid vacuum full when there is load and restart the server before doing it. [1] slow vacuum full and gin index update -- Ivan Sergio Borgonovo http://www.webthatworks.it
Martijn van Oosterhout wrote: > On Wed, Apr 08, 2009 at 05:24:08PM +0200, Ivan Sergio Borgonovo wrote: >> How on Debian? >> Debian does all it's automagic stuff in init. I never learned how to >> start pg manually. > > What might be easier is turning on core dumps (ulimit -S -c unlimited) > and then start postgres and see if it drops a core dump, which you can > then feed to gdb. Note that ulimit is inherited by child processes; it doesn't apply system wide. You'll need to set the ulimit somewhere like the postgresql init script, where the postmaster is a child of the shell in which the ulimit command is run. Also, because Debian strips its binaries by default, you might need to rebuild the postgresql packages with debugging enabled and without stripping to get a useful backtrace. Worth a try anyway, though. Does Debian have a repository full of debug symbol packages like Ubuntu does? -- Craig Ringer