Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Дата
Msg-id 9CE3ABD7-72DD-4D11-A940-7B56E090D11C@anarazel.de
обсуждение исходный текст
Ответ на Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers

On April 9, 2018 6:59:03 PM PDT, Craig Ringer <craig@2ndquadrant.com> wrote:
>On 10 April 2018 at 04:37, Andres Freund <andres@anarazel.de> wrote:
>> Hi,
>>
>> On 2018-04-09 22:30:00 +0200, Tomas Vondra wrote:
>>> Maybe. I'd certainly prefer automated recovery from an temporary I/O
>>> issues (like full disk on thin-provisioning) without the database
>>> crashing and restarting. But I'm not sure it's worth the effort.
>>
>> Oh, I agree on that one. But that's more a question of how we force
>the
>> kernel's hand on allocating disk space. In most cases the kernel
>> allocates the disk space immediately, even if delayed allocation is
>in
>> effect. For the cases where that's not the case (if there are current
>> ones, rather than just past bugs), we should be able to make sure
>that's
>> not an issue by pre-zeroing the data and/or using fallocate.
>
>Nitpick: In most cases the kernel reserves disk space immediately,
>before returning from write(). NFS seems to be the main exception
>here.
>
>EXT4 and XFS don't allocate until later, it by performing actual
>writes to FS metadata, initializing disk blocks, etc. So we won't
>notice errors that are only detectable at actual time of allocation,
>like thin provisioning problems, until after write() returns and we
>face the same writeback issues.
>
>So I reckon you're safe from space-related issues if you're not on NFS
>(and whyyy would you do that?) and not thinly provisioned. I'm sure
>there are other corner cases, but I don't see any reason to expect
>space-exhaustion-related corruption problems on a sensible FS backed
>by a sensible block device. I haven't tested things like quotas,
>verified how reliable space reservation is under concurrency, etc as
>yet.

How's that not solved by pre zeroing and/or fallocate as I suggested above?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Excessive PostmasterIsAlive calls slow down WAL redo
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS