Re: "PANIC: could not open critical system index 2662" - twice

Поиск

Список

Период

Сортировка

От	Laurenz Albe
Тема	Re: "PANIC: could not open critical system index 2662" - twice
Дата	7 апреля 2023 г. 14:04:34
Msg-id	4f4ab16320d7b3a4e9950eaabde752dce204dc77.camel@cybertec.at обсуждение исходный текст
Ответ на	"PANIC: could not open critical system index 2662" - twice (Evgeny Morozov <postgresql3@realityexists.net>)
Ответы	Re: "PANIC: could not open critical system index 2662" - twice (Michael Paquier <michael@paquier.xyz>) Re: "PANIC: could not open critical system index 2662" - twice ("Peter J. Holzer" <hjp-pgsql@hjp.at>) Re: "PANIC: could not open critical system index 2662" - twice (Evgeny Morozov <postgresql3@realityexists.net>)
Список	pgsql-general

Дерево обсуждения

On Thu, 2023-04-06 at 16:41 +0000, Evgeny Morozov wrote:
>  Our PostgreSQL 15.2 instance running on Ubuntu 18.04 has crashed with this error:
>
> 2023-04-05 09:24:03.448 UTC [15227] ERROR:  index "pg_class_oid_index" contains unexpected zero page at block 0
> [...]
>
> We had the same thing happened about a month ago on a different database on the same cluster.
> For a while PG actually ran OK as long as you didn't access that specific DB, but when trying
> to back up that DB with pg_dump it would crash every time. At that time one of the disks
> hosting the ZFS dataset with the PG data directory on it was reporting errors, so we thought
> it was likely due to that.
>
> Unfortunately, before we could replace the disks, PG crashed completely and would not start
> again at all, so I had to rebuild the cluster from scratch and restore from pg_dump backups
> (still onto the old, bad disks). Once the disks were replaced (all of them) I just copied
> the data to them using zfs send | zfs receive and didn't bother restoring pg_dump backups
> again - which was perhaps foolish in hindsight.
>
> Well, yesterday it happened again. The server still restarted OK, so I took fresh pg_dump
> backups of the databases we care about (which ran fine), rebuilt the cluster and restored
> the pg_dump backups again - now onto the new disks, which are not reporting any problems.
>
> So while everything is up and running now this error has me rather concerned. Could the
> error we're seeing now have been caused by some corruption in the PG data that's been there
> for a month (so it could still be attributed to the bad disk), which should now be fixed by
> having restored from backups onto good disks?

Yes, that is entirely possible.

> Could this be a PG bug?

It could be, but data corruption caused by bad hardware is much more likely.

> What can I do to figure out why this is happening and prevent it from happening again?

No idea about the former, but bad hardware is a good enough explanation.

As to keeping it from happening: use good hardware.

Yours,
Laurenz Albe

В списке pgsql-general по дате отправления:

Предыдущее

От: Jehan-Guillaume de Rorthais
Дата: 07 апреля 2023 г., 13:46:12
Сообщение: Re: Patroni vs pgpool II

Следующее

От: Ron
Дата: 07 апреля 2023 г., 14:12:22
Сообщение: Re: Patroni vs pgpool II

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: "PANIC: could not open critical system index 2662" - twice

Предыдущее

Следующее