Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Дата
Msg-id CAMsr+YHtosoQKzHh-nAmyG75cAPTzTtwyk871d+1O-sNQRdeyg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On 2 April 2018 at 02:24, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
 

Maybe my drive-by assessment of those kernel routines is wrong and
someone will correct me, but I'm starting to think you might be better
to assume the worst on all systems.  Perhaps a GUC that defaults to
panicking, so that users on those rare OSes could turn that off?  Even
then I'm not sure if the failure mode will be that great anyway or if
it's worth having two behaviours.  Thoughts?


I see little benefit to not just PANICing unconditionally on EIO, really. It shouldn't happen, and if it does, we want to be pretty conservative and adopt a data-protective approach.

I'm rather more worried by doing it on ENOSPC. Which looks like it might be necessary from what I recall finding in my test case + kernel code reading. I really don't want to respond to a possibly-transient ENOSPC by PANICing the whole server unnecessarily.

BTW, the support team at 2ndQ is presently working on two separate issues where ENOSPC resulted in DB corruption, though neither of them involve logs of lost page writes. I'm planning on taking some time tomorrow to write a torture tester for Pg's ENOSPC handling and to verify ENOSPC handling in the test case I linked to in my original StackOverflow post.

If this is just an EIO issue then I see no point doing anything other than PANICing unconditionally.

If it's a concern for ENOSPC too, we should try harder to fail more nicely whenever we possibly can.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Partition-wise aggregation/grouping
Следующее
От: "Bossart, Nathan"
Дата:
Сообщение: Re: Change RangeVarGetRelidExtended() to take flags argument?