Обсуждение: AW: AW: WAL does not recover gracefully from out-of-dis k-sp ace

Поиск
Список
Период
Сортировка

AW: AW: WAL does not recover gracefully from out-of-dis k-sp ace

От
Zeugswetter Andreas SB
Дата:
> > Even with true fdatasync it's not obviously good for performance - it takes
> > too long time to write 16Mb files and fills OS buffer cache
> with trash-:(
> >>
> >> True.  But at least the write is (hopefully) being done at a
> >> non-performance-critical time.
>
> > So you have non critical time every five minutes ?
> > Those platforms that don't have fdatasync won't profit anyway.
>
> Yes they will; you're forgetting the cost of updating
> filesystem overhead.

I did have that in mind, but I thought that in effect the OS would
optimize sparse file allocation somehow.
Doing some tests however showed that while your variant is really good
and saves 12 seconds, the performance is *very* poor for eighter variant.

A short test shows, that opening the file O_SYNC, and thus avoiding fsync()
would cut the effective time needed to sync write the xlog more than in half.
Of course we would need to buffer >= 1 xlog page before write (or commit)
to gain the full advantage.

prewrite 0 + write and fsync:        60.4 sec
sparse file + write with O_SYNC:        37.5 sec
no prewrite + write with O_SYNC:        36.8 sec
prewrite 0 + write with O_SYNC:        24.0 sec

These times include the prewrite when applicable on AIX with jfs.
Testprogram attached. I may be overseeing something, though.

Andreas


Вложения

Re: AW: AW: WAL does not recover gracefully from out-of-dis k-sp ace

От
Tom Lane
Дата:
Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at> writes:
> A short test shows, that opening the file O_SYNC, and thus avoiding fsync()
> would cut the effective time needed to sync write the xlog more than in half.
> Of course we would need to buffer >= 1 xlog page before write (or commit)
> to gain the full advantage.

> prewrite 0 + write and fsync:        60.4 sec
> sparse file + write with O_SYNC:        37.5 sec
> no prewrite + write with O_SYNC:        36.8 sec
> prewrite 0 + write with O_SYNC:        24.0 sec

This seems odd.  As near as I can tell, O_SYNC is simply a command to do
fsync implicitly during each write call.  It cannot save any I/O unless
I'm missing something significant.  Where is the performance difference
coming from?

The reason I'm inclined to question this is that what we want is not an
fsync per write but an fsync per transaction, and we can't easily buffer
all of a transaction's XLOG writes...
        regards, tom lane