Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Поиск

Список

Период

Сортировка

От	Mark Dilger
Тема	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Дата	10 апреля 2018 г. 03:33:29
Msg-id	2CA8B3CF-673B-4A93-A3C0-B0F3FA43040E@gmail.com обсуждение исходный текст
Ответ на	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

> On Apr 9, 2018, at 2:25 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>
>
> On 04/09/2018 11:08 PM, Andres Freund wrote:
>> Hi,
>>
>> On 2018-04-09 13:55:29 -0700, Mark Dilger wrote:
>>> I can also imagine a master and standby that are similarly provisioned,
>>> and thus hit an out of disk error at around the same time, resulting in
>>> corruption on both, even if not the same corruption.
>>
>> I think it's a grave mistake conflating ENOSPC issues (which we should
>> solve by making sure there's always enough space pre-allocated), with
>> EIO type errors.  The problem is different, the solution is different.

I'm happy to take your word for that.

> In any case, that certainly does not count as data corruption spreading
> from the master to standby.

Maybe not from the point of view of somebody looking at the code.  But a
user might see it differently.  If the data being loaded into the master
and getting replicated to the standby "causes" both to get corrupt, then
it seems like corruption spreading.  I put "causes" in quotes because there
is some argument to be made about "correlation does not prove cause" and so
forth, but it still feels like causation from an arms length perspective.
If there is a pattern of standby servers tending to fail more often right
around the time that the master fails, you'll have a hard time comforting
users, "hey, it's not technically causation."  If loading data into the
master causes the master to hit ENOSPC, and replicating that data to the
standby causes the standby to hit ENOSPC, and if the bug abound ENOSPC has
not been fixed, then this looks like corruption spreading.

I'm certainly planning on taking a hard look at the disk allocation on my
standby servers right soon now.

mark

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Michael Paquier
Дата: 10 апреля 2018 г., 03:26:13
Сообщение: Re: Fix pg_rewind which can be run as root user

Следующее

От: Daniel Gustafsson
Дата: 10 апреля 2018 г., 03:50:15
Сообщение: Re: [HACKERS] Optional message to user when terminating/cancelling backend

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Предыдущее

Следующее