Re: fsync or fdatasync

Поиск
Список
Период
Сортировка
От Ragnar Kjørstad
Тема Re: fsync or fdatasync
Дата
Msg-id 20020910224830.A30625@vestdata.no
обсуждение исходный текст
Ответ на Re: fsync or fdatasync  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: fsync or fdatasync
Список pgsql-admin
On Tue, Sep 10, 2002 at 03:17:00PM -0400, Tom Lane wrote:
> =?iso-8859-1?Q?Ragnar_Kj=F8rstad?= <postgres@ragnark.vestdata.no> writes:
> > On Tue, Sep 10, 2002 at 11:40:24AM -0400, Bruce Momjian wrote:
> >> We use fdatasync where available, and fsync when it is not.
>
> > Makes sense.
>
> >> We also use O_SYNC on open if it is available.
>
> s/also/instead/ ...

Yes, I understood that...

> open_datasync is the first choice if available.

I assume open_datasync means open with O_SYNC flag..

> > Why? That will slow tings down...
>
> On what evidence do you assert that?
>
> In theory open_datasync can be the fastest alternative for WAL writing,
> because it should cause the kernel to force each WAL write() request
> down to disk immediately.  fdatasync will result in the same amount of
> I/O, but it will also require the kernel to scan its disk cache to see
> if there are any other dirty blocks that need to be written.  On many
> kernels this check is not very efficient and can chew substantial
> amounts of CPU time.

Yes, I see your argument.
However, I've just checked the linux-implementation of fsync() and I
can't really see how it could chew substantial amounts of CPU time. The
way it works every inode has a list of dirty data buffers - all it does
it traverse that list and do a write on each.

Anyway - I'm sure this is not enough to convince you, so I'll have to
set up a test instead. But not tonight.


> The tradeoff is that open_datasync syncs each WAL
> block individually, which is unnecessary if you are committing
> multiple blocks worth of WAL entries at once --- but there's no hard
> evidence that that slows things down, especially not when the WAL logs
> are on their own disk spindle.

Well, in theory fsync() will allow the disk to reorder the writes, and
that should give significantly better performance, because it will
reduce the required number of seeks. If the WAL is on a seperate spindel
there will very few seeks in the first place, so there is less to gain,
but for the case with the WAL on the same disk as something else there
is probably some gain. But it makes sense to optimize for the
WAL-on-seperate-disk case...

Another advantage is that fsync() would allow the elevator to merge
multiple IO-requests. Still the same number of bytes to write, but fewer
bigger requests are typicly faster.

But again; numbers speek. I'll get back to you once I find the time to
test it.


> Check the pghackers archives (a year or two back) for lots and lots of
> discussion, but I recall we demonstrated that the current default
> choices are reasonable for at least some set of Unixen.  If you've got
> more information showing that the present default is wrong on some
> kernel, let's have it ... but don't waste our time with blanket
> assertions that "X is the right (or wrong) choice", because we know
> that's not so across all the platforms we support.  We'd not have
> bothered with four sync methods if there weren't good evidence that each
> is the best available choice on some platforms.

No argument there; I'm sure there are applications for all of them.
My point is that I think fdatasync() would be the fastest choice for the
linux kernel.



--
Ragnar Kjørstad

В списке pgsql-admin по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Vacuum analyze infos
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: fsync or fdatasync