Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Дата
Msg-id 11523fe8-7614-9d57-1ad5-c12a4c4ec9cf@gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-bugs
Hello,
22.06.2021 16:00, Thomas Munro wrote:
> On Tue, Jun 22, 2021 at 9:30 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Hmm, the simplest explanation would be that the read() or write() on the
>> relmapper file is not atomic. We assume that it is, and don't use a lock
>> in load_relmap_file() because of that. Is there anything unusual about
>> the filesystem, mount options or the kernel you're using? I could not
>> reproduce this on my laptop. Does the attached patch fix it for you?
> I have managed to reproduce this twice on a laptop running Linux
> 5.10.0-2-amd64, after trying many things for several hours.  Both
> times I was using ext4 in a loopback file (underlying is xfs, I had no
> luck there hence hunch that I should try ext4, may not be significant
> though) with fsync=off (ditto).
I'm sorry, I forgot that I've set "fsync=off" in my postgresql.conf (to
avoid NVME-specific slowdown on fsyncs).
It really does matter. With fsync=on the demo script passes 20
iterations successfully.
I reproduce the issue on Ubuntu 20.04 with the kernel 5.9.15, ext4
(without any specific options) on NVME storage, and Ryzen 3700x.
It was first encountered on Debian 10 with the kernel 4.19.0, ext4 on
software RAID built on NVME storage too, and Xeon 5220.

The attached patch fixes it for me (with fsync=off). 3 runs by 20
iterations completed without the error (without the patch I get the
error on the first iteration).

Best regards,
Alexander



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"