Re: Segfault leading to crash, recovery mode, and TOAST corruption

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Segfault leading to crash, recovery mode, and TOAST corruption
Дата
Msg-id 25981.1528243651@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Segfault leading to crash, recovery mode, and TOAST corruption  (Jonathan Marks <jonathanaverymarks@gmail.com>)
Ответы Re: Segfault leading to crash, recovery mode, and TOAST corruption
Список pgsql-general
Jonathan Marks <jonathanaverymarks@gmail.com> writes:
> We had two issues today (once this morning and once a few minutes ago)
> with our primary database (RDS running 10.1, 32 cores, 240 GB RAM, 5TB
> total disk space, 20k PIOPS) where the database suddenly crashed and
> went into recovery mode.

I'd suggest updating to 10.4 ... see below.

> Both times that the server crashed, we saw this in the logs:
> 2018-06-05 23:08:44 UTC:172.31.7.89(36224):production@OURDB:[12173]:ERROR:  canceling statement due to statement
timeout
> 2018-06-05 23:08:44 UTC::@:[48863]:LOG:  worker process: parallel worker for PID 12173 (PID 20238) exited with exit
code1 
> 2018-06-05 23:08:49 UTC::@:[48863]:LOG:  server process (PID 12173) was terminated by signal 11: Segmentation fault

This looks to be a parallel leader process getting confused when a worker
process exits unexpectedly.  There were some related fixes in 10.2, which
might resolve the issue, though it's also possible we have more to do there.

> After the first crash, we then started getting errors like:
> 2018-06-05 23:08:45 UTC:172.31.6.84(33392):production@OURDB:[11888]:ERROR:  unexpected chunk number 0 (expected 1)
fortoast value 1592283014 in pg_toast_26656 

This definitely looks to be the "reuse of TOAST OIDs immediately after
crash" issue that was fixed in 10.4.  AFAIK it's recoverable corruption;
I believe you'll find that VACUUMing the parent table will make the
errors stop, and all will be well.  But an update would be prudent to
prevent it from happening again.

            regards, tom lane


В списке pgsql-general по дате отправления:

Предыдущее
От: Jan Claeys
Дата:
Сообщение: Re: Code of Conduct plan
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: Pgagent is not reading pgpass file either in Windows or Linux.