Re: fsync alternatives (was: Re: [HACKERS] TODO item)

Поиск
Список
Период
Сортировка
От Alfred Perlstein
Тема Re: fsync alternatives (was: Re: [HACKERS] TODO item)
Дата
Msg-id 20000207111736.D25520@fw.wintelcom.net
обсуждение исходный текст
Ответ на Re: fsync alternatives (was: Re: [HACKERS] TODO item)  (Bruce Momjian <pgman@candle.pha.pa.us>)
Ответы Re: fsync alternatives (was: Re: [HACKERS] TODO item)  (Bruce Momjian <pgman@candle.pha.pa.us>)
Список pgsql-hackers
* Bruce Momjian <pgman@candle.pha.pa.us> [000207 11:00] wrote:
> > > So, I think we are safe if we can either keep that file descriptor open
> > > until commit, or re-open it and fsync it on commit.  That assume a
> > > re-open is hitting the same file.  My opinion is that we should just
> > > fsync it on close and not worry about a reopen.
> > 
> > I'm pretty sure that the standard is that a close on a file _should_
> > fsync it.
> 
> This is not true.  close flushes the user buffers to kernel buffers.  It
> does not force to physical disk in all cases, I think.  There is really
> no need to force them to disk on close.  The only time they have to be
> forced to disk is when the system shuts down, or on an fsync call.
> 
> > 
> > In re the fsync problems...
> > 
> > I came across this option when investigating implementing range fsync()
> > for FreeBSD, 'O_FSYNC'/'O_SYNC'.
> > 
> > Why not keep 2 file descritors open for each datafile, one opened
> > with O_FSYNC (exists but not documented in FreeBSD) and one normal?
> > This garantees sync writes for all write operations on that fd.
> 
> We actually don't want this.  We like to just fsync the file descriptor
> and retroactively fsync all our writes.  fsync allows us to decouple the
> write and the fsync, which is what we really are attempting to do.  Our
> current behavour is to do write/fsync together, which is wasteful.

Yes, the way I understand it is that one backend doing the fsync
will sync the entire file perhaps forcing a sync in the middle of
a somewhat critical update being done by another instance of the
backend.

Since the current behavior seems to be write/fsync/write/fsync...
instead of write/write/write/fsync you may as well try opening the
filedescriptor with O_FSYNC on operating systems that support it to
avoid the cross-fsync problem.

Another option is to use O_FSYNC descriptiors and aio_write to
allow a sync writes to be 'backgrounded'.  More and more unix OS's
are supporting aio nowadays.

I'm aware of the performance implications sync writes cause, but
using fsync after every write seems to cause massive amounts of
unessesary disk IO that could be avoided with using explicit
sync descriptors with little increase in complexity considering
what I understand of the current implementation.

Basically it would seem to be a good hack until you get the algorithm
to batch fsyncs working. (write/write/write.../fsync)  At that point
you may want to window over the files using msync(), but there may
be a better way, one that allows a vector of io to be scheduled for
sync write in one go, rather than a buffer at a time.

-Alfred


В списке pgsql-hackers по дате отправления:

Предыдущее
От: wieck@debis.com (Jan Wieck)
Дата:
Сообщение: RI project status
Следующее
От: Taral
Дата:
Сообщение: Re: [HACKERS] ONLY