Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Поиск
Список
Период
Сортировка
От Gasper Zejn
Тема Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Дата
Msg-id 75bfe2e2-90b0-d411-56b3-14d440c6b5b0@owca.info
обсуждение исходный текст
Ответ на Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On 09. 04. 2018 15:42, Tomas Vondra wrote:
> On 04/09/2018 12:29 AM, Bruce Momjian wrote:
>> An crazy idea would be to have a daemon that checks the logs and
>> stops Postgres when it seems something wrong.
>>
> That doesn't seem like a very practical way. It's better than nothing,
> of course, but I wonder how would that work with containers (where I
> think you may not have access to the kernel log at all). Also, I'm
> pretty sure the messages do change based on kernel version (and possibly
> filesystem) so parsing it reliably seems rather difficult. And we
> probably don't want to PANIC after I/O error on an unrelated device, so
> we'd need to understand which devices are related to PostgreSQL.
>
> regards
>

For a bit less (or more) crazy idea, I'd imagine creating a Linux kernel
module with kprobe/kretprobe capturing the file passed to fsync or even
byte range within file and corresponding return value shouldn't be that
hard. Kprobe has been a part of Linux kernel for a really long time, and
from first glance it seems like it could be backported to 2.6 too.

Then you could have stable log messages or implement some kind of "fsync
error log notification" via whatever is the most sane way to get this
out of kernel.

If the kernel is new enough and has eBPF support (seems like >=4.4),
using bcc-tools[1] should enable you to write a quick script to get
exactly that info via perf events[2].

Obviously, that's a stopgap solution ...


Kind regards,
Gasper


[1] https://github.com/iovisor/bcc
[2]
https://blog.yadutaf.fr/2016/03/30/turn-any-syscall-into-event-introducing-ebpf-kernel-probes/


В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Naylor
Дата:
Сообщение: Re: Documentation for bootstrap data conversion
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)