Обсуждение: Help! PostgreSQL stuck at starting up after crash

Поиск
Список
Период
Сортировка

Help! PostgreSQL stuck at starting up after crash

От
Samuel Hwang
Дата:
version Postgresql 9.1.1 on centos5 x64

We experience slow performance and found the server is running 3 vacuum process on the same db which use up 99% of CPU. 
Then we kill -9 one of those process which cause postgresql to crash and it tried to restart after the crash
However when the starting process reach the last WAL files, it just stuck there

pg_controldata shows the db is in Archive Recovery mode and when using psql to connect the db, it says FATAL: the database system starting up.

I took a chance and upgrade to PostgreSql 9.1.2 and see if anything changed
it still stuck at the end of recovery.
pg_controldata shows db is in Crash recovery, but that probably different wording I think
using psql to connect the db, it says FATAL: the database system is starting up.

I pretty much run out of idea here.
Can anyone help what to go from here?

Samuel

Re: Help! PostgreSQL stuck at starting up after crash

От
Samuel Hwang
Дата:

Thanks, Rick.

I think we learned the lesson not to do kill -9.  One weird thing was that the three stuck automatically vacuum processes had been running for probably more than a day and they were all working on the same pretty static database. I doubt there were something wrong with our db. However it would be nice postgtrsql would be tolerant to that.

The symptoms we experienced was postgresql was extremely slow and failed all the db tests.  The slowness came from the 3 vacuum process which use up all the cpu.

Apparently killing is not the right solution. Is a proper/safer way to recover the performance of postgresql?

On Thu, Jan 19, 2012 at 8:15 AM, Rick Dicaire <kritek@gmail.com> wrote:
On Wed, Jan 18, 2012 at 7:33 PM, Samuel Hwang <samuel@replicon.com> wrote:
> We experience slow performance and found the server is running 3 vacuum
> process on the same db which use up 99% of CPU.
> Then we kill -9 one of those process which cause postgresql to crash and it
> tried to restart after the crash

While I cannot help with restarting, you shouldn't ever use kill -9
unless no other kill signal works.
kill default signal is SIGTERM (15), refer to man 7 signal (in linux).

vacuum processes can be set on a per table basis, perhaps you need to
closely examine that part of configuration.

Sorry I can't help further.

--
aRDy Music and Rick Dicaire present:
http://www.ardynet.com
http://www.ardynet.com:9000/ardymusic.ogg.m3u



--

Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046

Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us

We are hiring! | search jobs

Re: Help! PostgreSQL stuck at starting up after crash

От
"David Hornsby"
Дата:
Sounds like you have a corrupt wal files that you will have to reset the
wal logs with pgresetxlog.

http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html

This will result in missing transactions so before you do this shutdown
postgres and make a copy of the database files first. That way if you
don't like what happens you can always go back to the way things were.

Also right now would be a good time to evaluate your backup strategy,
which is a different topic for a different thread, but I can certainly
help with that as well.

-David Hornsby

> version Postgresql 9.1.1 on centos5 x64
>
> We experience slow performance and found the server is running 3 vacuum
> process on the same db which use up 99% of CPU.
> Then we kill -9 one of those process which cause postgresql to crash and
> it
> tried to restart after the crash
> However when the starting process reach the last WAL files, it just stuck
> there
>
> pg_controldata shows the db is in Archive Recovery mode and when using
> psql
> to connect the db, it says FATAL: the database system starting up.
>
> I took a chance and upgrade to PostgreSql 9.1.2 and see if anything
> changed
> it still stuck at the end of recovery.
> pg_controldata shows db is in Crash recovery, but that probably different
> wording I think
> using psql to connect the db, it says FATAL: the database system is
> starting up.
>
> I pretty much run out of idea here.
> Can anyone help what to go from here?
>
> Samuel
>



Re: Help! PostgreSQL stuck at starting up after crash

От
Samuel Hwang
Дата:
I don't know how to make sure if WAL logs corrupted. 
At the end of the recovery in postgresql log I saw 

2012-01-18 18:30:58.570 MST     3666 - LOG:  consistent recovery state reached at 56C/CD0AFE00
2012-01-18 18:30:58.587 MST     3666 - LOG:  recovery stopping before abort of transaction 541802043, time 2012-01-18 12:50:08.531615-07
2012-01-18 18:30:58.587 MST     3666 - LOG:  redo done at 56C/CD226C58
2012-01-18 18:30:58.587 MST     3666 - LOG:  last completed transaction was at log time 2012-01-18 12:49:28.321605-07
2012-01-18 18:30:58.589 MST     3666 - LOG:  selected new timeline ID: 2
2012-01-18 18:30:59.187 MST     3666 - LOG:  archive recovery complete

just nothing happened after that and postgresql is stuck at starting up and not getting out of archive recovery mode.
at that time there is no cpu/disk activities and it seemed like it's waiting for something? 

Fortunately this is a development/test database and we don't have any backup plan on it as data loss is not a big issue. 
In production environment we do set up stream replication and on going backup for both db and WAL logs.
However, this do raise some worries, especially my impression is that postgresql shall protect the data pretty well.

I will give it a try on pg_resetxlog, Thanks for the heads-up.


On Thu, Jan 19, 2012 at 7:30 AM, David Hornsby <david@beechglen.com> wrote:
Sounds like you have a corrupt wal files that you will have to reset the
wal logs with pgresetxlog.

http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html

This will result in missing transactions so before you do this shutdown
postgres and make a copy of the database files first. That way if you
don't like what happens you can always go back to the way things were.

Also right now would be a good time to evaluate your backup strategy,
which is a different topic for a different thread, but I can certainly
help with that as well.

-David Hornsby

> version Postgresql 9.1.1 on centos5 x64
>
> We experience slow performance and found the server is running 3 vacuum
> process on the same db which use up 99% of CPU.
> Then we kill -9 one of those process which cause postgresql to crash and
> it
> tried to restart after the crash
> However when the starting process reach the last WAL files, it just stuck
> there
>
> pg_controldata shows the db is in Archive Recovery mode and when using
> psql
> to connect the db, it says FATAL: the database system starting up.
>
> I took a chance and upgrade to PostgreSql 9.1.2 and see if anything
> changed
> it still stuck at the end of recovery.
> pg_controldata shows db is in Crash recovery, but that probably different
> wording I think
> using psql to connect the db, it says FATAL: the database system is
> starting up.
>
> I pretty much run out of idea here.
> Can anyone help what to go from here?
>
> Samuel
>





--

Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046

Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us

We are hiring! | search jobs

Re: Help! PostgreSQL stuck at starting up after crash

От
"Kevin Grittner"
Дата:
Samuel Hwang <samuel@replicon.com> wrote:
> I don't know how to make sure if WAL logs corrupted.
> At the end of the recovery in postgresql log I saw
>
> 2012-01-18 18:30:58.570 MST     3666 - LOG:  consistent recovery
>   state reached at 56C/CD0AFE00
> 2012-01-18 18:30:58.587 MST     3666 - LOG:  recovery stopping
>   before abort of transaction 541802043, time 2012-01-18
>   12:50:08.531615-07
> 2012-01-18 18:30:58.587 MST     3666 - LOG:  redo done at
>   56C/CD226C58
> 2012-01-18 18:30:58.587 MST     3666 - LOG:  last completed
>   transaction was at log time 2012-01-18 12:49:28.321605-07
> 2012-01-18 18:30:58.589 MST     3666 - LOG:  selected new timeline
>   ID: 2
> 2012-01-18 18:30:59.187 MST     3666 - LOG:  archive recovery
>   complete
>
> just nothing happened after that and postgresql is stuck at
> starting up and not getting out of archive recovery mode.\

What do you base that on?  (copy/paste)

> at that time there is no cpu/disk activities and it seemed like
> it's waiting for something?

That looks like normal recovery.  I would expect it to be waiting
for clients to connect at that point.

What happens when you try to connect to the database after that
above has been logged?  (Copy/paste the psql command line and any
errors, please.)

-Kevin

Re: Help! PostgreSQL stuck at starting up after crash

От
Samuel Hwang
Дата:
> just nothing happened after that and postgresql is stuck at
> starting up and not getting out of archive recovery mode.\

What do you base that on?  (copy/paste) 

==> I have waited for at least 2 hours, but the database is still in starting up state. In the mean time, there is not cpu/disk activities.

> at that time there is no cpu/disk activities and it seemed like
> it's waiting for something?

That looks like normal recovery.  I would expect it to be waiting
for clients to connect at that point.

What happens when you try to connect to the database after that
above has been logged?  (Copy/paste the psql command line and any
errors, please.)

==> when I tried to use psql to connect the db, I got 

FATAL:  the database system is starting up

and it doesn't allow connection.

On Thu, Jan 19, 2012 at 10:16 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Samuel Hwang <samuel@replicon.com> wrote:
> I don't know how to make sure if WAL logs corrupted.
> At the end of the recovery in postgresql log I saw
>
> 2012-01-18 18:30:58.570 MST     3666 - LOG:  consistent recovery
>   state reached at 56C/CD0AFE00
> 2012-01-18 18:30:58.587 MST     3666 - LOG:  recovery stopping
>   before abort of transaction 541802043, time 2012-01-18
>   12:50:08.531615-07
> 2012-01-18 18:30:58.587 MST     3666 - LOG:  redo done at
>   56C/CD226C58
> 2012-01-18 18:30:58.587 MST     3666 - LOG:  last completed
>   transaction was at log time 2012-01-18 12:49:28.321605-07
> 2012-01-18 18:30:58.589 MST     3666 - LOG:  selected new timeline
>   ID: 2
> 2012-01-18 18:30:59.187 MST     3666 - LOG:  archive recovery
>   complete
>
> just nothing happened after that and postgresql is stuck at
> starting up and not getting out of archive recovery mode.\

What do you base that on?  (copy/paste)

> at that time there is no cpu/disk activities and it seemed like
> it's waiting for something?

That looks like normal recovery.  I would expect it to be waiting
for clients to connect at that point.

What happens when you try to connect to the database after that
above has been logged?  (Copy/paste the psql command line and any
errors, please.)

-Kevin



--

Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046

Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us

We are hiring! | search jobs

Re: Help! PostgreSQL stuck at starting up after crash

От
Samuel Hwang
Дата:
pg_resetxlog does the trick and db can be started and readable.

I am dumping the data out and import to a newly created database cluster.

We pretty much lost the data for the last two days, but since our postgresql were running well, it is fewer than it looks.

Thanks a lot for the help.

On Thu, Jan 19, 2012 at 7:30 AM, David Hornsby <david@beechglen.com> wrote:
Sounds like you have a corrupt wal files that you will have to reset the
wal logs with pgresetxlog.

http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html

This will result in missing transactions so before you do this shutdown
postgres and make a copy of the database files first. That way if you
don't like what happens you can always go back to the way things were.

Also right now would be a good time to evaluate your backup strategy,
which is a different topic for a different thread, but I can certainly
help with that as well.

-David Hornsby

> version Postgresql 9.1.1 on centos5 x64
>
> We experience slow performance and found the server is running 3 vacuum
> process on the same db which use up 99% of CPU.
> Then we kill -9 one of those process which cause postgresql to crash and
> it
> tried to restart after the crash
> However when the starting process reach the last WAL files, it just stuck
> there
>
> pg_controldata shows the db is in Archive Recovery mode and when using
> psql
> to connect the db, it says FATAL: the database system starting up.
>
> I took a chance and upgrade to PostgreSql 9.1.2 and see if anything
> changed
> it still stuck at the end of recovery.
> pg_controldata shows db is in Crash recovery, but that probably different
> wording I think
> using psql to connect the db, it says FATAL: the database system is
> starting up.
>
> I pretty much run out of idea here.
> Can anyone help what to go from here?
>
> Samuel
>





--

Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046

Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us

We are hiring! | search jobs

Re: Help! PostgreSQL stuck at starting up after crash

От
Samuel Hwang
Дата:
correct typo. 
We pretty much lost the data for the last two days, but since our postgresql wereN'T running well, it is fewer than it looks.


On Thu, Jan 19, 2012 at 3:45 PM, Samuel Hwang <samuel@replicon.com> wrote:
pg_resetxlog does the trick and db can be started and readable.

I am dumping the data out and import to a newly created database cluster.

We pretty much lost the data for the last two days, but since our postgresql were running well, it is fewer than it looks.

Thanks a lot for the help.

On Thu, Jan 19, 2012 at 7:30 AM, David Hornsby <david@beechglen.com> wrote:
Sounds like you have a corrupt wal files that you will have to reset the
wal logs with pgresetxlog.

http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html

This will result in missing transactions so before you do this shutdown
postgres and make a copy of the database files first. That way if you
don't like what happens you can always go back to the way things were.

Also right now would be a good time to evaluate your backup strategy,
which is a different topic for a different thread, but I can certainly
help with that as well.

-David Hornsby

> version Postgresql 9.1.1 on centos5 x64
>
> We experience slow performance and found the server is running 3 vacuum
> process on the same db which use up 99% of CPU.
> Then we kill -9 one of those process which cause postgresql to crash and
> it
> tried to restart after the crash
> However when the starting process reach the last WAL files, it just stuck
> there
>
> pg_controldata shows the db is in Archive Recovery mode and when using
> psql
> to connect the db, it says FATAL: the database system starting up.
>
> I took a chance and upgrade to PostgreSql 9.1.2 and see if anything
> changed
> it still stuck at the end of recovery.
> pg_controldata shows db is in Crash recovery, but that probably different
> wording I think
> using psql to connect the db, it says FATAL: the database system is
> starting up.
>
> I pretty much run out of idea here.
> Can anyone help what to go from here?
>
> Samuel
>





--

Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046

Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us

We are hiring! | search jobs




--

Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046

Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us

We are hiring! | search jobs

Re: Help! PostgreSQL stuck at starting up after crash

От
David Hornsby
Дата:
Glad to hear that you can spin up the database and get to your data. Sorry to hear that you did end up losing data on that server.

-David Hornsby

On 1/19/2012 5:46 PM, Samuel Hwang wrote:
correct typo. 
We pretty much lost the data for the last two days, but since our postgresql wereN'T running well, it is fewer than it looks.


On Thu, Jan 19, 2012 at 3:45 PM, Samuel Hwang <samuel@replicon.com> wrote:
pg_resetxlog does the trick and db can be started and readable.

I am dumping the data out and import to a newly created database cluster.

We pretty much lost the data for the last two days, but since our postgresql were running well, it is fewer than it looks.

Thanks a lot for the help.

On Thu, Jan 19, 2012 at 7:30 AM, David Hornsby <david@beechglen.com> wrote:
Sounds like you have a corrupt wal files that you will have to reset the
wal logs with pgresetxlog.

http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html

This will result in missing transactions so before you do this shutdown
postgres and make a copy of the database files first. That way if you
don't like what happens you can always go back to the way things were.

Also right now would be a good time to evaluate your backup strategy,
which is a different topic for a different thread, but I can certainly
help with that as well.

-David Hornsby

> version Postgresql 9.1.1 on centos5 x64
>
> We experience slow performance and found the server is running 3 vacuum
> process on the same db which use up 99% of CPU.
> Then we kill -9 one of those process which cause postgresql to crash and
> it
> tried to restart after the crash
> However when the starting process reach the last WAL files, it just stuck
> there
>
> pg_controldata shows the db is in Archive Recovery mode and when using
> psql
> to connect the db, it says FATAL: the database system starting up.
>
> I took a chance and upgrade to PostgreSql 9.1.2 and see if anything
> changed
> it still stuck at the end of recovery.
> pg_controldata shows db is in Crash recovery, but that probably different
> wording I think
> using psql to connect the db, it says FATAL: the database system is
> starting up.
>
> I pretty much run out of idea here.
> Can anyone help what to go from here?
>
> Samuel
>





--

Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046

Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us

We are hiring! | search jobs




--

Shian-Miin Samuel Hwang | Software Developer | Phone 1-403-2626519 (ext. 276) | Fax 1-403-233-8046

Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us

We are hiring! | search jobs



-- 
David Hornsby
Beechglen Development Inc.
P: (513) 922 - 0509 x432
C: (513) 254 - 0605
F: (513) 347 - 2834
W: beechglen.com