Обсуждение: Re: [HACKERS] fsync method checking

Поиск
Список
Период
Сортировка

Re: [HACKERS] fsync method checking

От
Bruce Momjian
Дата:
Kurt Roeckx wrote:
> On Thu, Mar 18, 2004 at 01:50:32PM -0500, Bruce Momjian wrote:
> > > I'm not sure I believe these numbers at all... my experience is that
> > > getting trustworthy disk I/O numbers is *not* easy.
> >
> > These numbers were reproducable on all the platforms I tested.
>
> It's not because they are reproducable that they mean anything in
> the real world.

OK, what better test do you suggest?  Right now, there has been no
testing of these.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] fsync method checking

От
Tom Lane
Дата:
Kurt Roeckx <Q@ping.be> writes:
> I have no idea what the access pattern is for normal WAL
> operations or how many times it gets synched.  Does it only do
> f(data)sync() at commit time, or for every block it writes?

If we are using fsync/fdatasync, we issue those at commit time or when
completing a WAL segment.  If we are using the open flags, then of
course there's no separate sync call.

My previous point about checking different fsync spacings corresponds to
different assumptions about average transaction size.  I think a useful
tool for determining wal_sync_method has got to be able to reflect that
range of possibilities.

            regards, tom lane

Re: [HACKERS] fsync method checking

От
Josh Berkus
Дата:
Tom, Bruce,

> My previous point about checking different fsync spacings corresponds to
> different assumptions about average transaction size.  I think a useful
> tool for determining wal_sync_method has got to be able to reflect that
> range of possibilities.

Questions:
1) This is an OSS project.   Why not just recruit a bunch of people on
PERFORMANCE and GENERAL to test the 4 different synch methods using real
databases?   No test like reality, I say ....

2) Won't Jan's work on 7.5 memory and I/O management mean that we have to
re-evaluate synching anyway?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: [HACKERS] fsync method checking

От
Bruce Momjian
Дата:
Josh Berkus wrote:
> Tom, Bruce,
>
> > My previous point about checking different fsync spacings corresponds to
> > different assumptions about average transaction size.  I think a useful
> > tool for determining wal_sync_method has got to be able to reflect that
> > range of possibilities.
>
> Questions:
> 1) This is an OSS project.   Why not just recruit a bunch of people on
> PERFORMANCE and GENERAL to test the 4 different synch methods using real
> databases?   No test like reality, I say ....

Well, I wrote the program to allow testing.  I don't see a complex test
as being that much better than simple one.  We don't need accurate
numbers.  We just need to know if fsync or O_SYNC is faster.

>
> 2) Won't Jan's work on 7.5 memory and I/O management mean that we have to
> re-evaluate synching anyway?

No, it should not change sync issues.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] fsync method checking

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> 1) This is an OSS project.   Why not just recruit a bunch of people on
> PERFORMANCE and GENERAL to test the 4 different synch methods using real
> databases?   No test like reality, I say ....

I agree --- that is likely to yield *far* more useful results than
any standalone test program, for the purpose of finding out what
wal_sync_method to use in real databases.  However, there's a second
issue here: we would like to move sync/checkpoint responsibility into
the bgwriter, and that requires knowing whether it's valid to let one
process fsync on behalf of writes that were done by other processes.
That's got nothing to do with WAL sync performance.  I think that it
would be sensible to make a test program that focuses on this one
specific question.  (There has been some handwaving to the effect that
everybody knows this is safe on Unixen, but I question whether the
handwavers have seen the internals of HPUX or AIX for instance; and
besides we need to worry about Windows now.)

A third reason for having a simple test program is to confirm whether
your drives are syncing at all (cf. hdparm discussion).

> 2) Won't Jan's work on 7.5 memory and I/O management mean that we have to
> re-evaluate synching anyway?

So far nothing's been done that touches WAL writing.  However, I am
thinking about making the bgwriter process take some of the load of
writing WAL buffers (right now it only writes data-file buffers).
And you're right, after that happens we will need to re-measure.
The open flags will probably become considerably more attractive than
they are now, if the bgwriter handles most non-commit writes of WAL.
(We might also think of letting the bgwriter use a different sync method
than the backends do.)

            regards, tom lane

Re: [HACKERS] fsync method checking

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Well, I wrote the program to allow testing.  I don't see a complex test
> as being that much better than simple one.  We don't need accurate
> numbers.  We just need to know if fsync or O_SYNC is faster.

Faster than what?  The thing everyone is trying to point out here is
that it depends on context, and we have little faith that this test
program creates a context similar to a live Postgres database.

            regards, tom lane

Re: [HACKERS] fsync method checking

От
Kevin Brown
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Well, I wrote the program to allow testing.  I don't see a complex test
> > as being that much better than simple one.  We don't need accurate
> > numbers.  We just need to know if fsync or O_SYNC is faster.
>
> Faster than what?  The thing everyone is trying to point out here is
> that it depends on context, and we have little faith that this test
> program creates a context similar to a live Postgres database.

Note, too, that the preferred method isn't likely to depend just on the
operating system, it's likely to depend also on the filesystem type
being used.

Linux provides quite a few of them: ext2, ext3, jfs, xfs, and reiserfs,
and that's just off the top of my head.  I imagine the performance of
the various syncing methods will vary significantly between them.


It seems reasonable to me that decisions such as which sync method to
use should initially be made at installation time: have the test program
run on the target filesystem as part of the installation process, and
build the initial postgresql.conf based on the results.  You might even
be able to do some additional testing such as measuring the difference
between random block access and sequential access, and again feed the
results into the postgresql.conf file.  This is no substitute for
experience with the platform, but I expect it's likely to get you closer
to something optimal than doing nothing.  The only question, of course,
is whether or not it's worth going to the effort when it may or may not
gain you a whole lot.  Answering that is going to require some
experimentation with such an automatic configuration system.



--
Kevin Brown                          kevin@sysexperts.com

Re: [HACKERS] fsync method checking

От
Kevin Brown
Дата:
I wrote:
> Note, too, that the preferred method isn't likely to depend just on the
> operating system, it's likely to depend also on the filesystem type
> being used.
>
> Linux provides quite a few of them: ext2, ext3, jfs, xfs, and reiserfs,
> and that's just off the top of my head.  I imagine the performance of
> the various syncing methods will vary significantly between them.

For what it's worth, my database throughput for transactions involving
a lot of inserts, updates, and deletes is about 12% faster using
fdatasync() than O_SYNC under Linux using JFS.

I'll run the test program and report my results with it as well, so
we'll be able to see if there's any consistency between it and the live
database.




--
Kevin Brown                          kevin@sysexperts.com

Re: [HACKERS] fsync method checking

От
Kurt Roeckx
Дата:
On Thu, Mar 18, 2004 at 02:22:10PM -0500, Bruce Momjian wrote:
>
> OK, what better test do you suggest?  Right now, there has been no
> testing of these.

I suggest you start by doing atleast preallocating a 16 MB file
and do the tests on that, to atleast be somewhat simular to what
WAL does.

I have no idea what the access pattern is for normal WAL
operations or how many times it gets synched.  Does it only do
f(data)sync() at commit time, or for every block it writes?

I think if you write more data you'll see more differences
between O_(D)SYNC and f(data)sync().

I guess it can depend on if you have lots of small transactions,
or more big ones.

Atleast try to make something that covers different access
patterns.


Kurt


Re: [HACKERS] fsync method checking

От
markw@osdl.org
Дата:
On 18 Mar, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> 1) This is an OSS project.   Why not just recruit a bunch of people on
>> PERFORMANCE and GENERAL to test the 4 different synch methods using real
>> databases?   No test like reality, I say ....
>
> I agree --- that is likely to yield *far* more useful results than
> any standalone test program, for the purpose of finding out what
> wal_sync_method to use in real databases.  However, there's a second
> issue here: we would like to move sync/checkpoint responsibility into
> the bgwriter, and that requires knowing whether it's valid to let one
> process fsync on behalf of writes that were done by other processes.
> That's got nothing to do with WAL sync performance.  I think that it
> would be sensible to make a test program that focuses on this one
> specific question.  (There has been some handwaving to the effect that
> everybody knows this is safe on Unixen, but I question whether the
> handwavers have seen the internals of HPUX or AIX for instance; and
> besides we need to worry about Windows now.)

I could certainly do some testing if you want to see how DBT-2 does.
Just tell me what to do. ;)

Mark

Re: [HACKERS] fsync method checking

От
Tom Lane
Дата:
markw@osdl.org writes:
> I could certainly do some testing if you want to see how DBT-2 does.
> Just tell me what to do. ;)

Just do some runs that are identical except for the wal_sync_method
setting.  Note that this should not have any impact on SELECT
performance, only insert/update/delete performance.

            regards, tom lane

Re: [HACKERS] fsync method checking

От
Bruce Momjian
Дата:
markw@osdl.org wrote:
> On 18 Mar, Tom Lane wrote:
> > Josh Berkus <josh@agliodbs.com> writes:
> >> 1) This is an OSS project.   Why not just recruit a bunch of people on
> >> PERFORMANCE and GENERAL to test the 4 different synch methods using real
> >> databases?   No test like reality, I say ....
> >
> > I agree --- that is likely to yield *far* more useful results than
> > any standalone test program, for the purpose of finding out what
> > wal_sync_method to use in real databases.  However, there's a second
> > issue here: we would like to move sync/checkpoint responsibility into
> > the bgwriter, and that requires knowing whether it's valid to let one
> > process fsync on behalf of writes that were done by other processes.
> > That's got nothing to do with WAL sync performance.  I think that it
> > would be sensible to make a test program that focuses on this one
> > specific question.  (There has been some handwaving to the effect that
> > everybody knows this is safe on Unixen, but I question whether the
> > handwavers have seen the internals of HPUX or AIX for instance; and
> > besides we need to worry about Windows now.)
>
> I could certainly do some testing if you want to see how DBT-2 does.
> Just tell me what to do. ;)

To test, you would run from CVS version src/tools/fsync, find the
fastest fsync method from the last group of outputs, then try the
wal_fsync_method setting to see if the one that tools/fsync says is
fastest is actually fastest.  However, it might be better to run your
tests and get some indication of how frequently writes and fsync's are
going to WAL and modify tools/fsync to match what your DBT-2 test does.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] fsync method checking

От
markw@osdl.org
Дата:
On 22 Mar, Tom Lane wrote:
> markw@osdl.org writes:
>> I could certainly do some testing if you want to see how DBT-2 does.
>> Just tell me what to do. ;)
>
> Just do some runs that are identical except for the wal_sync_method
> setting.  Note that this should not have any impact on SELECT
> performance, only insert/update/delete performance.

Ok, here are the results I have from my 4-way xeon system, a 14 disk
volume for the log and a 52 disk volume for everything else:
    http://developer.osdl.org/markw/pgsql/wal_sync_method.html

7.5devel-200403222

wal_sync_method         metric
default (fdatasync)     1935.28
fsync                   1613.92

# ./test_fsync -f /opt/pgdb/dbt2/pg_xlog/test.out
Simple write timing:
        write                    0.018787

Compare fsync times on write() and non-write() descriptor:
(If the times are similar, fsync() can sync data written
 on a different descriptor.)
        write, fsync, close     13.057781
        write, close, fsync     13.311313

Compare one o_sync write to two:
        one 16k o_sync write     6.515122
        two 8k o_sync writes    12.455124

Compare file sync methods with one 8k write:
        (o_dsync unavailable)
        open o_sync, write       6.270724
        write, fdatasync        13.275225
        write, fsync,           13.359847

Compare file sync methods with 2 8k writes:
        (o_dsync unavailable)
        open o_sync, write      12.479563
        write, fdatasync        13.651709
        write, fsync,           14.000240

Re: [HACKERS] fsync method checking

От
Manfred Spraul
Дата:
Tom Lane wrote:

>markw@osdl.org writes:
>
>
>>I could certainly do some testing if you want to see how DBT-2 does.
>>Just tell me what to do. ;)
>>
>>
>
>Just do some runs that are identical except for the wal_sync_method
>setting.  Note that this should not have any impact on SELECT
>performance, only insert/update/delete performance.
>
>
I've made a test run that compares fsync and fdatasync: The performance
was identical:
- with fdatasync:

http://khack.osdl.org/stp/290607/

- with fsync:
http://khack.osdl.org/stp/290483/

I don't understand why. Mark - is there a battery backed write cache in
the raid controller, or something similar that might skew the results?
The test generates quite a lot of wal traffic - around 1.5 MB/sec.
Perhaps the writes are so large that the added overhead of syncing the
inode is not noticable?
Is the pg_xlog directory on a seperate drive?

Btw, it's possible to request such tests through the web-interface, see
http://www.osdl.org/lab_activities/kernel_testing/stp/script_param.html

--
    Manfred


Re: [HACKERS] fsync method checking

От
Manfred Spraul
Дата:
markw@osdl.org wrote:

>Compare file sync methods with one 8k write:
>        (o_dsync unavailable)
>        open o_sync, write       6.270724
>        write, fdatasync        13.275225
>        write, fsync,           13.359847
>
>
Odd. Which filesystem, which kernel? It seems fdatasync is broken and
syncs the inode, too.

--
    Manfred


Re: [HACKERS] fsync method checking

От
Steve Atkins
Дата:
On Fri, Mar 26, 2004 at 07:25:53AM +0100, Manfred Spraul wrote:

> >Compare file sync methods with one 8k write:
> >       (o_dsync unavailable)
> >       open o_sync, write       6.270724
> >       write, fdatasync        13.275225
> >       write, fsync,           13.359847
> >
> >
> Odd. Which filesystem, which kernel? It seems fdatasync is broken and
> syncs the inode, too.

This may be relevant.

From the man page for fdatasync on a moderately recent RedHat installation:

  BUGS
       Currently (Linux 2.2) fdatasync is equivalent to fsync.

Cheers,
  Steve