Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100

Поиск

Список

Период

Сортировка

От	Pavan Deolasee
Тема	Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100
Дата	10 апреля 2018 г. 16:42:47
Msg-id	CABOikdPJUv5EyxRZnJY2oA7EKi=bZdoryXNmUGK3dFAQ08tZdw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100 (Pavan Deolasee <pavan.deolasee@gmail.com>)
Список	pgsql-general

Дерево обсуждения

On Fri, Apr 6, 2018 at 8:55 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

On Fri, Apr 6, 2018 at 2:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
adsj@novozymes.com (Adam Sjøgren) writes:
>> [... still waiting for the result, I will return with what it said
>> when the server does ...]

> It did eventually finish, with the same result:

Huh. So what we have here, apparently, is that regular MVCC snapshots
think there is exactly one copy of the 1698936148/0 row, but TOAST fetches
think there is more than one. This is darn odd, not least because we
never do UPDATEs in toast tables, only inserts and deletes, so there
certainly shouldn't be update chains there.

It seems like you've got some corner case wherein SnapshotToast sees a row
that isn't visible according to MVCC --- probably a row left over from
some previous cycle of life. That is, I'm imagining the OID counter
wrapped around and we've reused a toast OID, but for some reason there's
still a row in the table with that OID. I'm not sure offhand how we could
get into such a state. Alvaro, does this ring any bells (remembering that
this is 9.3)?

FWIW one of our support customers reported a very similar TOAST table corruption issue last week which nearly caused an outage. After a lot of analysis, I think I've now fully understood the reasons behind the corruption, the underlying bug(s) and possible remedy. I am currently working on writing a reproducible test case to demonstrate the problem and writing the fix. More details on that soon.

I've posted a reproducer and a proposed fix to -hackers [1]

In the particular case that I investigated, a database crash/recovery was involved. But I think we should be able to create a scenario where OID wrap-around or a standby promotion triggers the problem. I don't know if any of that was involved in the cases reported on this thread, but I've a strong suspicion that the underlying bug is probably the same.

Thanks,

Pavan

[1] https://www.postgresql.org/message-id/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-%3Dh4OLosXHf9w%40mail.gmail.com

Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-general по дате отправления:

Предыдущее

От: hmidi slim
Дата: 10 апреля 2018 г., 16:24:49
Сообщение: Using enum instead of join tables

Следующее

От: Alexandre Arruda
Дата: 10 апреля 2018 г., 17:31:20
Сообщение: Re: ERROR: found multixact from before relminmxid

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100

Предыдущее

Следующее