Re: Experimental patch for inter-page delay in VACUUM

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Experimental patch for inter-page delay in VACUUM
Дата
Msg-id 1969.1068044941@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Experimental patch for inter-page delay in VACUUM  (Greg Stark <gsstark@mit.edu>)
Ответы Re: Experimental patch for inter-page delay in VACUUM
Список pgsql-hackers
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> You want to find, open, and fsync() every file in the database cluster
>> for every checkpoint?  Sounds like a non-starter to me.

> Except a) this is outside any critical path, and b) only done every few
> minutes and c) the fsync calls on files with no dirty buffers ought to be
> cheap, at least as far as i/o.

The directory search and opening of the files is in itself nontrivial
overhead ... particularly on systems where open(2) isn't speedy, such
as Solaris.  I also disbelieve your assumption that fsync'ing a file
that doesn't need it will be free.  That depends entirely on what sort
of indexes the OS keeps on its buffer cache.  There are Unixen where
fsync requires a scan through the entire buffer cache because there is
no data structure that permits finding associated buffers any more
efficiently than that.  (IIRC, the HPUX system I'm typing this on is
like that.)  On those sorts of systems, we'd be way better off to use
O_SYNC or O_DSYNC on all our writes than to invoke multiple fsyncs.
Check the archives --- this was all gone into in great detail when we
were testing alternative methods for fsyncing the WAL files.

> So the NetBSD and Sun developers I checked with both asserted fsync does in
> fact guarantee this. And SUSv2 seems to back them up:

>     The fsync() function can be used by an application to indicate that all
>     data for the open file description named by fildes is to be transferred to
>     the storage device associated with the file described by fildes in an
>     implementation-dependent manner.

The question here is what is meant by "data for the open file
description".  If it said "all data for the file referenced by the open
FD" then I would agree that the spec says what you claim.  As is, I
think it would be entirely within the spec for the OS to dump only
buffers that had been dirtied through that particular FD.  Notice that
the last part of the sentence is careful to respect the distinction
between the FD and the file; why isn't the first part?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Reinhard Max
Дата:
Сообщение: Re: Erroneous PPC spinlock code
Следующее
От: vjanand@uwm.edu
Дата:
Сообщение: BTree index