Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Дата
Msg-id CAMsr+YE0hvvaeAz2GbfzHYgPfZeN4KK+bCo6yMZTrNTsfCcTzg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Anthony Iliopoulos <ailiop@altatus.com>)
Ответы Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Anthony Iliopoulos <ailiop@altatus.com>)
Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Robert Haas <robertmhaas@gmail.com>)
Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Andreas Karlsson <andreas@proxel.se>)
Список pgsql-hackers
On 9 April 2018 at 18:50, Anthony Iliopoulos <ailiop@altatus.com> wrote:

There is a clear responsibility of the application to keep
its buffers around until a successful fsync(). The kernels
do report the error (albeit with all the complexities of
dealing with the interface), at which point the application
may not assume that the write()s where ever even buffered
in the kernel page cache in the first place.



 
What you seem to be asking for is the capability of dropping
buffers over the (kernel) fence and idemnifying the application
from any further responsibility, i.e. a hard assurance
that either the kernel will persist the pages or it will
keep them around till the application recovers them
asynchronously, the filesystem is unmounted, or the system
is rebooted.

That's what Pg appears to assume now, yes.
 
Whether that's reasonable is a whole different topic.

I'd like a middle ground where the kernel lets us register our interest and tells us if it lost something, without us having to keep eight million FDs open for some long period. "Tell us about anything that happens under pgdata/" or an inotify-style per-directory-registration option. I'd even say that's ideal.

In the mean time, I propose that we fsync() on close() before we age FDs out of the LRU on backends. Yes, that will hurt throughput and cause stalls, but we don't seem to have many better options. At least it'll only flush what we actually wrote to the OS buffers not what we may have in shared_buffers. If the bgwriter does the same thing, we should be 100% safe from this problem on 4.13+, and it'd be trivial to make it a GUC much like the fsync or full_page_writes options that people can turn off if they know the risks / know their storage is safe / don't care.

Some keen person who wants to later could optimise it by adding a fsync worker thread pool in backends, so we don't block the main thread. Frankly that might be a nice thing to have in the checkpointer anyway. But it's out of scope for fixing this in durability terms.

I'm partway through a patch that makes fsync panic on errors now. Once that's done, the next step will be to force fsync on close() in md and see how we go with that.

Thoughts?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Optimizing nested ConvertRowtypeExpr execution
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: [sqlsmith] Failed assertion in create_gather_path