Re: Bugs in TOAST handling, OID assignment and redo recovery

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Bugs in TOAST handling, OID assignment and redo recovery
Дата
Msg-id d68b7715-d25b-7a0f-477d-6c4816a15ac0@iki.fi
обсуждение исходный текст
Ответ на Bugs in TOAST handling, OID assignment and redo recovery  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Ответы Re: Bugs in TOAST handling, OID assignment and redo recovery  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Список pgsql-hackers
On 10/04/18 13:29, Pavan Deolasee wrote:
> Hello,
> 
> One of our 2ndQuadrant support customers recently reported a sudden rush of
> TOAST errors post a crash recovery, nearly causing an outage. Most errors
> read like this:
> 
> ERROR: unexpected chunk number 0 (expected 1) for toast value nnnn
> 
> While we could bring back the cluster to normal quickly using some
> workarounds, I investigated this in more detail and identified two long
> standing bugs in TOAST as well as redo recovery.

Great detective work!

> ISTM that the right fix is to teach toast_save_datum() to check for
> existence of duplicate chunk_id by scanning the table with the same
> SnapshotToast that it later uses to fetch the tuples. We already do that in
> case of toast rewrite, but not for regular inserts. I propose to do that
> for regular path too, as done in the attached patch.

It would seem more straightforward to add a snapshot parameter to 
GetNewOidWithIndex(), so that the this one caller could pass 
SnapshotToast, while others pass SnapshotDirty. With your patch, you 
check the index twice: first with SnapshotDirty, in 
GetNewOidWithIndex(), and then with SnapshotToast, in the caller.

If I'm reading the rewrite case correctly, it's a bit different and 
quite special. In the loop with GetNewOidWithIndex(), it needs to check 
that the OID is unused in two tables, the old TOAST table, and the new 
one. You can only pass one index to GetNewOidWithIndex(), so it needs to 
check the second index manually. It's not because of the snapshot issue. 
Although I wonder if we should be using SnapshotToast in that 
GetNewOidWithIndex() call, too. I.e. if we should be checking both the 
old and the new toast table with SnapshotToast.

> I think the right fix is to simply ignore the nextOid counter while
> replaying ONLINE checkpoint record. We must have already initialised
> ShmemVariableCache->nextOid
> to the value stored in this (or some previous) checkpoint record when redo
> recovery is started. As and when we see XLOG_NEXTOID record, we would
> maintain the shared memory counter correctly. If we don't see any
> XLOG_NEXTOID record, the value we started with must be the correct value. I
> see no problem even when OID wraps around during redo recovery.
> XLOG_NEXTOID should record that correctly.

Agreed. With nextXid, we advance ShmemVariableCache->nextXid if the 
value in the online checkpoint record is greater than 
ShmemVariableCache->nextXid. But we don't have such a wraparound-aware 
concept of "greater than" for OIDs. Ignoring the online checkpoint 
record's nextOid value seem fine to me.

- Heikki


В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Steele
Дата:
Сообщение: Re: [PATCH] Add missing type conversion functions for PL/Python
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pgsql: Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers.