Обсуждение: wal_synch_method = open_sync safe on RHEL 5.5?

Поиск
Список
Период
Сортировка

wal_synch_method = open_sync safe on RHEL 5.5?

От
Mark Kirkwood
Дата:
Some more on the RHEL 5.5 system I'm helping to setup. Some benchmarking using pgbench appeared to suggest that wal_sync_method=open_sync was a little faster than fdatasync [1]. Now I recall some discussion about this enabling direct io and the general flakiness of this on Linux, so is the option regarded as safe?

[1] The workout:

$ pgbench -i -s 1000 bench
$ pgbench -c [1,2,4,8,32,64,128] -t 10000

Performance peaked around 2500 tps @32 clients using open_sync and 2200 with fdatasync. However the disk arrays are on a SAN and I suspect that when testing with fdatasync later in the day there may have been workload 'leakage' from other hosts hitting the SAN.

Re: wal_synch_method = open_sync safe on RHEL 5.5?

От
Greg Smith
Дата:
Mark Kirkwood wrote:
Now I recall some discussion about this enabling direct io and the general flakiness of this on Linux, so is the option regarded as safe?

No one has ever refuted the claims in http://archives.postgresql.org/pgsql-hackers/2007-10/msg01310.php that it can be unsafe under a heavy enough level of mixed load on RHEL5.  Given the performance benefits are marginal on ext3, I haven't ever considered it worth the risk.  (I've seen much larger gains on Linux+Veritas VxFS).  From what I've seen, recent Linux kernel work has reinforced that the old O_SYNC implementation was full of bugs now that more work is being done to improve that area.  My suspicion (based on no particular data, just what I've seen it tested with) is that it only really worked before in the very specific way that Oracle does O_SYNC writes, which is different from what PostgreSQL does.

P.S. Be wary of expecting pgbench to give you useful numbers on a single run.  For the default write-heavy test, I recommend three runs of 10 minutes each (-T 600 on recent PostgreSQL versions) before I trust any results it gives.  You can get useful data from the select-only test in only a few seconds, but not the one that writes a bunch.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: wal_synch_method = open_sync safe on RHEL 5.5?

От
Mark Mielke
Дата:
The conclusion I read was that Linux O_SYNC behaves like O_DSYNC on other systems. For WAL, this seems satisfactory?

Personally, I use fdatasync(). I wasn't able to measure a reliable difference for my far more smaller databases, and fdatasync() seems reliable and fast enough, that fighting with O_SYNC doesn't seem to be worth it. Also, technically speaking, fdatasync() appeals more to me, as it allows the system to buffer while it can, and the application to instruct it across what boundaries it should not buffer. O_SYNC / O_DSYNC seem to imply a requirement that it does a synch on every block. My gut tells me that fdatasync() gives the operating system more opportunities to optimize (whether it does or not is a different issue :-) ).

Cheers,
mark


On 06/17/2010 11:29 PM, Greg Smith wrote:
Mark Kirkwood wrote:
Now I recall some discussion about this enabling direct io and the general flakiness of this on Linux, so is the option regarded as safe?

No one has ever refuted the claims in http://archives.postgresql.org/pgsql-hackers/2007-10/msg01310.php that it can be unsafe under a heavy enough level of mixed load on RHEL5.  Given the performance benefits are marginal on ext3, I haven't ever considered it worth the risk.  (I've seen much larger gains on Linux+Veritas VxFS).  From what I've seen, recent Linux kernel work has reinforced that the old O_SYNC implementation was full of bugs now that more work is being done to improve that area.  My suspicion (based on no particular data, just what I've seen it tested with) is that it only really worked before in the very specific way that Oracle does O_SYNC writes, which is different from what PostgreSQL does.

P.S. Be wary of expecting pgbench to give you useful numbers on a single run.  For the default write-heavy test, I recommend three runs of 10 minutes each (-T 600 on recent PostgreSQL versions) before I trust any results it gives.  You can get useful data from the select-only test in only a few seconds, but not the one that writes a bunch.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us 


-- 
Mark Mielke <mark@mielke.cc>

Re: wal_synch_method = open_sync safe on RHEL 5.5?

От
Mark Kirkwood
Дата:
On 18/06/10 15:29, Greg Smith wrote:
>
> P.S. Be wary of expecting pgbench to give you useful numbers on a
> single run.  For the default write-heavy test, I recommend three runs
> of 10 minutes each (-T 600 on recent PostgreSQL versions) before I
> trust any results it gives.  You can get useful data from the
> select-only test in only a few seconds, but not the one that writes a
> bunch.
>

Yeah, I did several runs of each, and a couple with -c 128 and -t 100000
to give the setup a good workout (also 2000-2400 tps, nice to see a well
behaved SAN).


Cheers

Mark

Re: wal_synch_method = open_sync safe on RHEL 5.5?

От
Greg Smith
Дата:
Mark Mielke wrote:
> The conclusion I read was that Linux O_SYNC behaves like O_DSYNC on
> other systems. For WAL, this seems satisfactory?

It would be if it didn't have any bugs or limitiations, but it does.
The one pointed out in the message I linked to suggests that a mix of
buffered and O_SYNC direct I/O can cause a write error, with the exact
behavior you get depending on the kernel version.  That's a path better
not explored as I see it.

The kernels that have made some effort to implement this correctly
actually expose O_DSYNC, on newer Linux systems.  My current opinion is
that if you only have Linux O_SYNC, don't use it.  The ones with O_DSYNC
haven't been around for long enough to be proven or disproven as
effective yet.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us