Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Дата
Msg-id CAEepm=2KSqu-fj8gEbLSE=uNcWWWpZ4bcjFtqYTGSCp0Lr_cSw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Craig Ringer <craig@2ndquadrant.com>)
Ответы Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On Tue, Apr 3, 2018 at 3:03 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> I see little benefit to not just PANICing unconditionally on EIO, really. It
> shouldn't happen, and if it does, we want to be pretty conservative and
> adopt a data-protective approach.
>
> I'm rather more worried by doing it on ENOSPC. Which looks like it might be
> necessary from what I recall finding in my test case + kernel code reading.
> I really don't want to respond to a possibly-transient ENOSPC by PANICing
> the whole server unnecessarily.

Yeah, it'd be nice to give an administrator the chance to free up some
disk space after ENOSPC is reported, and stay up.  Running out of
space really shouldn't take down the database without warning!  The
question is whether the data remains in cache and marked dirty, so
that retrying is a safe option (since it's potentially gone from our
own buffers, so if the OS doesn't have it the only place your
committed data can definitely still be found is the WAL... recovery
time).  Who can tell us?  Do we need a per-filesystem answer?  Delayed
allocation is a somewhat filesystem-specific thing, so maybe.
Interestingly, there don't seem to be many operating systems that can
report ENOSPC from fsync(), based on a quick scan through some
documentation:

POSIX, AIX, HP-UX, FreeBSD, OpenBSD, NetBSD: no
Illumos/Solaris, Linux, macOS: yes

I don't know if macOS really means it or not; it just tells you to see
the errors for read(2) and write(2).  By the way, speaking of macOS, I
was curious to see if the common BSD heritage would show here.  Yeah,
somewhat.  It doesn't appear to keep buffers on writeback error, if
this is the right code[1] (though it could be handling it somewhere
else for all I know).

[1] https://github.com/apple/darwin-xnu/blob/master/bsd/vfs/vfs_bio.c#L2695

-- 
Thomas Munro
http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: 2018-03 Commitfest Summary (Andres #1)
Следующее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] MERGE SQL Statement for PG11