Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Поиск

Список

Период

Сортировка

От	Craig Ringer
Тема	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Дата	31 марта 2018 г. 22:13:09
Msg-id	CAMsr+YHczzQJPGr94Y_Zw34Yzuw8UkzmxEB9eWuFaALRSxY-pA@mail.gmail.com обсуждение исходный текст
Ответ на	PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Craig Ringer <craig@2ndquadrant.com>)
Ответы	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Tom Lane <tgl@sss.pgh.pa.us>) Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Anthony Iliopoulos <ailiop@altatus.com>)
Список	pgsql-hackers

Дерево обсуждения

On 31 March 2018 at 21:24, Anthony Iliopoulos <ailiop@altatus.com> wrote:

On Fri, Mar 30, 2018 at 10:18:14AM +1300, Thomas Munro wrote:

> >> Yeah, I see why you want to PANIC.
> >
> > Indeed. Even doing that leaves question marks about all the kernel
> > versions before v4.13, which at this point is pretty much everything
> > out there, not even detecting this reliably. This is messy.

There may still be a way to reliably detect this on older kernel
versions from userspace, but it will be messy whatsoever. On EIO
errors, the kernel will not restore the dirty page flags, but it
will flip the error flags on the failed pages. One could mmap()
the file in question, obtain the PFNs (via /proc/pid/pagemap)
and enumerate those to match the ones with the error flag switched
on (via /proc/kpageflags). This could serve at least as a detection
mechanism, but one could also further use this info to logically
map the pages that failed IO back to the original file offsets,
and potentially retry IO just for those file ranges that cover
the failed pages. Just an idea, not tested.

That sounds like a huge amount of complexity, with uncertainty as to how it'll behave kernel-to-kernel, for negligble benefit.

I was exploring the idea of doing selective recovery of one relfilenode, based on the assumption that we know the filenode related to the fd that failed to fsync(). We could redo only WAL on that relation. But it fails the same test: it's too complex for a niche case that shouldn't happen in the first place, so it'll probably have bugs, or grow bugs in bitrot over time.

Remember, if you're on ext4 with errors=remount-ro, you get shut down even harder than a PANIC. So we should just use the big hammer here.

I'll send a patch this week.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tomas Vondra
Дата: 31 марта 2018 г., 21:53:22
Сообщение: Re: [PROPOSAL] Shared Ispell dictionaries

Следующее

От: Tom Lane
Дата: 31 марта 2018 г., 22:38:12
Сообщение: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Предыдущее

Следующее