Re: Truncation failure in autovacuum results in data corruption(duplicate keys)

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Truncation failure in autovacuum results in data corruption(duplicate keys)
Дата
Msg-id CAPpHfdvqWECmi6SWt8K3p16GtObpRgyAGuKzan4w2HGRoFiK=Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Truncation failure in autovacuum results in data corruption (duplicate keys)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, Apr 18, 2018 at 11:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > Relation truncation throws away the page image in memory without ever
> > writing it to disk.  Then, if the subsequent file truncate step fails,
> > we have a problem, because anyone who goes looking for that page will
> > fetch it afresh from disk and see the tuples as live.
>
> > There are WAL entries recording the row deletions, but that doesn't
> > help unless we crash and replay the WAL.
>
> > It's hard to see a way around this that isn't fairly catastrophic for
> > performance :-(.
>
> Just to throw out a possibly-crazy idea: maybe we could fix this by
> PANIC'ing if truncation fails, so that we replay the row deletions from
> WAL.  Obviously this would be intolerable if the case were frequent,
> but we've had only two such complaints in the last nine years, so maybe
> it's tolerable.  It seems more attractive than taking a large performance
> hit on truncation speed in normal cases, anyway.

We have only two complaints of data corruption in nine years.  But I
suspect that in vast majority of cases truncation error didn't cause
the corruption OR the corruption wasn't noticed.  So, once we
introduce PANIC here, we would get way more complaints.

> A gotcha to be concerned about is what happens if we replay from WAL,
> come to the XLOG_SMGR_TRUNCATE WAL record, and get the same truncation
> failure again, which is surely not unlikely.  PANIC'ing again will not
> do.  I think we could probably handle that by having the replay code
> path zero out all the pages it was unable to delete; as long as that
> succeeds, we can call it good and move on.
>
> Or maybe just do that in the mainline case too?  That is, if ftruncate
> fails, handle it by zeroing the undeletable pages and pressing on?

I've just started really digging into this set of problems.  But this
idea looks good for me so soon...

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Two proposed modifications to the PostgreSQL FDW
Следующее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: libpq compression