Обсуждение: Direct I/O issues

Поиск
Список
Период
Сортировка

Direct I/O issues

От
Greg Smith
Дата:
I've been trying to optimize a Linux system where benchmarking suggests
large performance differences between the various wal_sync_method options
(with o_sync being the big winner).  I started that by using
src/tools/fsync/test_fsync to get an idea what I was dealing with (and to
spot which drives had write caching turned on).  Since those results
didn't match what I was seeing in the benchmarks, I've been browsing the
backend source to figure out why.  I noticed test_fsync appears to be,
ahem, out of sync with what the engine is doing.

It looks like V8.1 introduced O_DIRECT writes to the WAL, determined at
compile time by a series of preprocessor tests in
src/backend/access/transam/xlog.c When O_DIRECT is available,
O_SYNC/O_FSYNC/O_DSYNC writes use it.  test_fsync doesn't do that.

I moved the new code (in 8.2 beta 3, lines 61-92 in xlog.c) into
test_fsync; all the flags had the same name so it dropped right in.  You
can get the version I made at http://www.westnet.com/~gsmith/test_fsync.c
(fixed a compiler warning, too)

The results I get now look fishy.  I'm not sure if I screwed up a step, or
if I'm seeing a real problem.  The system here is running RedHat Linux,
RHEL ES 4.0 kernel 2.6.9, and the disk I'm writing to is a standard
7200RPM IDE drive.  I turned off write caching with hdparm -W 0

Here's an excerpt from the stock test_fsync:

Compare one o_sync write to two:
         one 16k o_sync write     8.717944
         two 8k o_sync writes    17.501980

Compare file sync methods with 2 8k writes:
         (o_dsync unavailable)
         open o_sync, write      17.018495
         write, fdatasync         8.842473
         write, fsync,            8.809117

And here's the version I tried to modify to include O_DIRECT support:

Compare one o_sync write to two:
         one 16k o_sync write     0.004995
         two 8k o_sync writes     0.003027

Compare file sync methods with 2 8k writes:
         (o_dsync unavailable)
         open o_sync, write       0.004978
         write, fdatasync         8.845498
         write, fsync,            8.834037

Obivously the o_sync writes aren't waiting for the disk.  Is this a
problem with O_DIRECT under Linux?  Or is my code just not correctly
testing this behavior?

Just as a sanity check, I did try this on another system, running SuSE
with drives connected to a cciss SCSI device, and I got exactly the same
results.  I'm concerned that Linux users who use O_SYNC because they
notice it's faster will be losing their WAL integrity without being aware
of the problem, especially as the whole O_DIRECT business isn't even
mentioned in the WAL documentation--it really deserves to be brought up in
the wal_sync_method notes at
http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html

And while I'm mentioning improvements to that particular documentation
page...the wal_buffers notes there are so sparse they misled me initially.
They suggest only bumping it up for situations with very large
transactions; since I was testing with small ones I left it woefully
undersized initially.  I would suggest copying the text from
http://developer.postgresql.org/pgdocs/postgres/wal-configuration.html to
here: "When full_page_writes is set and the system is very busy, setting
this value higher will help smooth response times during the period
immediately following each checkpoint."  That seems to match what I found
in testing.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Direct I/O issues

От
Tom Lane
Дата:
Greg Smith <gsmith@gregsmith.com> writes:
> The results I get now look fishy.

There are at least two things wrong with this program:

* It does not respect the alignment requirement for O_DIRECT buffers
  (reportedly either 512 or 4096 bytes depending on filesystem).

* It does not check for errors (if it had, you might have realized the
  other problem).

            regards, tom lane

Re: Direct I/O issues

От
Greg Smith
Дата:
On Thu, 23 Nov 2006, Tom Lane wrote:

> * It does not check for errors (if it had, you might have realized the
>  other problem).

All the test_fsync code needs to check for errors better; there have been
multiple occasions where I've run that with quesiontable input and it
didn't complain, it just happily ran and reported times that were almost
0.

Thanks for the note about alignment, I had seen something about that in
the xlog.c but wasn't sure if that was important in this case.

It's very important to the project I'm working on that I get this cleared
up, and I think I'm in a good position to fix it myself now.  I just
wanted to report the issue and get some initial feedback on what's wrong.
I'll try to rewrite that code with an eye toward the "Determine optimal
fdatasync/fsync, O_SYNC/O_DSYNC options" to-do item, which is what I'd
really like to have.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Direct I/O issues

От
Bruce Momjian
Дата:
Greg Smith wrote:
> On Thu, 23 Nov 2006, Tom Lane wrote:
>
> > * It does not check for errors (if it had, you might have realized the
> >  other problem).
>
> All the test_fsync code needs to check for errors better; there have been
> multiple occasions where I've run that with quesiontable input and it
> didn't complain, it just happily ran and reported times that were almost
> 0.
>
> Thanks for the note about alignment, I had seen something about that in
> the xlog.c but wasn't sure if that was important in this case.
>
> It's very important to the project I'm working on that I get this cleared
> up, and I think I'm in a good position to fix it myself now.  I just
> wanted to report the issue and get some initial feedback on what's wrong.
> I'll try to rewrite that code with an eye toward the "Determine optimal
> fdatasync/fsync, O_SYNC/O_DSYNC options" to-do item, which is what I'd
> really like to have.

Please send an updated patch for test_fsync.c so we can get it working
for 8.2.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +