Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Дата
Msg-id CAMsr+YHXQyRAf6YVDUN_MbHvjLFVPpzh83c1HYKnUw+UM5=RRA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Andres Freund <andres@anarazel.de>)
Ответы Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Andres Freund <andres@anarazel.de>)
Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On 10 April 2018 at 04:37, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2018-04-09 22:30:00 +0200, Tomas Vondra wrote:
>> Maybe. I'd certainly prefer automated recovery from an temporary I/O
>> issues (like full disk on thin-provisioning) without the database
>> crashing and restarting. But I'm not sure it's worth the effort.
>
> Oh, I agree on that one. But that's more a question of how we force the
> kernel's hand on allocating disk space. In most cases the kernel
> allocates the disk space immediately, even if delayed allocation is in
> effect. For the cases where that's not the case (if there are current
> ones, rather than just past bugs), we should be able to make sure that's
> not an issue by pre-zeroing the data and/or using fallocate.

Nitpick: In most cases the kernel reserves disk space immediately,
before returning from write(). NFS seems to be the main exception
here.

EXT4 and XFS don't allocate until later, it by performing actual
writes to FS metadata, initializing disk blocks, etc. So we won't
notice errors that are only detectable at actual time of allocation,
like thin provisioning problems, until after write() returns and we
face the same writeback issues.

So I reckon you're safe from space-related issues if you're not on NFS
(and whyyy would you do that?) and not thinly provisioned. I'm sure
there are other corner cases, but I don't see any reason to expect
space-exhaustion-related corruption problems on a sensible FS backed
by a sensible block device. I haven't tested things like quotas,
verified how reliable space reservation is under concurrency, etc as
yet.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Excessive PostmasterIsAlive calls slow down WAL redo
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Excessive PostmasterIsAlive calls slow down WAL redo