Re: Changing default value of wal_sync_method to open_datasync onLinux

Поиск
Список
Период
Сортировка
От Mark Kirkwood
Тема Re: Changing default value of wal_sync_method to open_datasync onLinux
Дата
Msg-id b7422f6a-91dc-c562-7315-aa6ca64cab5c@catalyst.net.nz
обсуждение исходный текст
Ответ на Changing default value of wal_sync_method to open_datasync on Linux  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Ответы RE: Changing default value of wal_sync_method to open_datasync onLinux
Список pgsql-hackers
On 20/02/18 13:27, Tsunakawa, Takayuki wrote:

> Hello,
>
> I propose changing the default value of wal_sync_method from fdatasync to open_datasync on Linux.  The patch is
attached. I'm feeling this may be controversial, so I'd like to hear your opinions.
 
>
> The reason for change is better performance.  Robert Haas said open_datasync was much faster than fdatasync with
NVRAMin this thread:
 
>
>
https://www.postgresql.org/message-id/flat/C20D38E97BCB33DAD59E3A1@lab.ntt.co.jp#C20D38E97BCB33DAD59E3A1@lab.ntt.co.jp
>
> pg_test_fsync shows higher figures for open_datasync:
>
> [SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered]
> --------------------------------------------------
> 5 seconds per test
> O_DIRECT supported on this platform for open_datasync and open_sync.
>
> Compare file sync methods using one 8kB write:
> (in wal_sync_method preference order, except fdatasync is Linux's default)
>          open_datasync                     50829.597 ops/sec      20 usecs/op
>          fdatasync                         42094.381 ops/sec      24 usecs/op
>          fsync                                          42209.972 ops/sec      24 usecs/op
>          fsync_writethrough                            n/a
>          open_sync                         48669.605 ops/sec      21 usecs/op
> --------------------------------------------------
>
>
> [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback]
> (the figures seem oddly high, though; this may be due to some VM configuration)
> --------------------------------------------------
> 5 seconds per test
> O_DIRECT supported on this platform for open_datasync and open_sync.
>
> Compare file sync methods using one 8kB write:
> (in wal_sync_method preference order, except fdatasync is Linux's default)
>          open_datasync                     34648.778 ops/sec      29 usecs/op
>          fdatasync                         31570.947 ops/sec      32 usecs/op
>          fsync                             27783.283 ops/sec      36 usecs/op
>          fsync_writethrough                              n/a
>          open_sync                         35238.866 ops/sec      28 usecs/op
> --------------------------------------------------
>
>
> pgbench only shows marginally better results, although the difference is within an error range.  The following is the
tpsof the default read/write workload of pgbench.  I ran the test with all the tables and indexes preloaded with
pg_prewarm(except pgbench_history), and the checkpoint not happening.  I ran a write workload before running the
benchmarkso that no new WAL file would be created during the benchmark run.
 
>
> [SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered]
> --------------------------------------------------
>                     1      2      3    avg
> fdatasync      17610  17164  16678  17150
> open_datasync  17847  17457  17958  17754 (+3%)
>
> [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback]
> (the figures seem oddly high, though; this may be due to some VM configuration)
> --------------------------------------------------
>                    1     2     3   avg
> fdatasync      4911  5225  5198  5111
> open_datasync  4996  5284  5317  5199 (+1%)
>
>
> As the removed comment describes, when wal_sync_method is open_datasync (or open_sync), open() fails with
errno=EINVALif the ext4 volume is mounted with data=journal.  That's because open() specifies O_DIRECT in that case.  I
don'tthink that's a problem in practice, because data=journal will not be used for performance, and wal_level needs to
bechanged from its default replica to minimal and max_wal_senders must be set to 0 for O_DIRECT to be used.
 
>
>

I think the use of 'nobarrier' is probably disabling most/all reliable  
writing to the devices. What do the numbers look like if use remove this  
option?

regards

Mark


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: master check fails on Windows Server 2008
Следующее
От: "Tsunakawa, Takayuki"
Дата:
Сообщение: RE: Changing default value of wal_sync_method to open_datasync onLinux