Changing default value of wal_sync_method to open_datasync on Linux

Поиск
Список
Период
Сортировка
Hello,

I propose changing the default value of wal_sync_method from fdatasync to open_datasync on Linux.  The patch is
attached. I'm feeling this may be controversial, so I'd like to hear your opinions.
 

The reason for change is better performance.  Robert Haas said open_datasync was much faster than fdatasync with NVRAM
inthis thread:
 

https://www.postgresql.org/message-id/flat/C20D38E97BCB33DAD59E3A1@lab.ntt.co.jp#C20D38E97BCB33DAD59E3A1@lab.ntt.co.jp

pg_test_fsync shows higher figures for open_datasync:

[SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered]
--------------------------------------------------
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
        open_datasync                     50829.597 ops/sec      20 usecs/op
        fdatasync                         42094.381 ops/sec      24 usecs/op
        fsync                                          42209.972 ops/sec      24 usecs/op
        fsync_writethrough                            n/a
        open_sync                         48669.605 ops/sec      21 usecs/op
--------------------------------------------------


[HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback]
(the figures seem oddly high, though; this may be due to some VM configuration)
--------------------------------------------------
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
        open_datasync                     34648.778 ops/sec      29 usecs/op
        fdatasync                         31570.947 ops/sec      32 usecs/op
        fsync                             27783.283 ops/sec      36 usecs/op
        fsync_writethrough                              n/a
        open_sync                         35238.866 ops/sec      28 usecs/op
--------------------------------------------------


pgbench only shows marginally better results, although the difference is within an error range.  The following is the
tpsof the default read/write workload of pgbench.  I ran the test with all the tables and indexes preloaded with
pg_prewarm(except pgbench_history), and the checkpoint not happening.  I ran a write workload before running the
benchmarkso that no new WAL file would be created during the benchmark run.
 

[SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered]
--------------------------------------------------
                   1      2      3    avg
fdatasync      17610  17164  16678  17150
open_datasync  17847  17457  17958  17754 (+3%)

[HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback]
(the figures seem oddly high, though; this may be due to some VM configuration)
--------------------------------------------------
                  1     2     3   avg
fdatasync      4911  5225  5198  5111
open_datasync  4996  5284  5317  5199 (+1%)


As the removed comment describes, when wal_sync_method is open_datasync (or open_sync), open() fails with errno=EINVAL
ifthe ext4 volume is mounted with data=journal.  That's because open() specifies O_DIRECT in that case.  I don't think
that'sa problem in practice, because data=journal will not be used for performance, and wal_level needs to be changed
fromits default replica to minimal and max_wal_senders must be set to 0 for O_DIRECT to be used.
 


Regards
Takayuki Tsunakawa


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: SHA-2 functions
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Changing default value of wal_sync_method to open_datasync onLinux