Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Поиск
Список
Период
Сортировка
От Antonis Iliopoulos
Тема Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Дата
Msg-id CAN+tDYwLHyXKCMDk_DKV1Lujt5qmNxm1AStqBLwhFNQ6ov25pg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers


On Wed, Apr 4, 2018 at 4:42 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>
> On 4 April 2018 at 22:25, Bruce Momjian <bruce@momjian.us> wrote:
>>
>> On Wed, Apr  4, 2018 at 10:09:09PM +0800, Craig Ringer wrote:
>> > On 4 April 2018 at 22:00, Craig Ringer <craig@2ndquadrant.com> wrote:
>> >  
>> >
>> >     It's the error reporting issues around closing and reopening files with
>> >     outstanding buffered I/O that's really going to hurt us here. I'll be
>> >     expanding my test case to cover that shortly.
>> >
>> >
>> >
>> > Also, just to be clear, this is not in any way confined to xfs and/or lvm as I
>> > originally thought it might be.
>> >
>> > Nor is ext3/ext4's errors=remount-ro protective. data_err=abort doesn't help
>> > either (so what does it do?).
>>
>> Anthony Iliopoulos reported in this thread that errors=remount-ro is
>> only affected by metadata writes.
>
>
> Yep, I gathered. I was referring to data_err.  

As far as I recall data_err=abort pertains to the jbd2 handling of
potential writeback errors. Jbd2 will inetrnally attempt to drain
the data upon txn commit (and it's even kind enough to restore
the EIO at the address space level, that otherwise would get eaten).

When data_err=abort is set, then jbd2 forcibly shuts down the
entire journal, with the error being propagated upwards to ext4.
I am not sure at which point this would be manifested to userspace
and how, but in principle any subsequent fs operations would get
some filesystem error due to the journal being down (I would
assume similar to remounting the fs read-only).

Since you are using data=journal, I would indeed expect to see
something more than what you saw in dmesg.

I can have a look later, I plan to also respond to some of the other
interesting issues that you guys raised in the thread.

Best regards,
Anthony

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Marina Polyakova
Дата:
Сообщение: Add support for printing/reading MergeAction nodes
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS