Re: [HACKERS] TODO item

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] TODO item
Дата
Msg-id 24481.949852063@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] TODO item  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Ответы Re: [HACKERS] TODO item  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: [HACKERS] TODO item  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Список pgsql-hackers
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>>>> BTW, I have worked a little bit on this item. The idea is pretty
>>>> simple. Instead of doing a real fsync() in pg_fsync(), just marking it
>>>> so that we remember to do fsync() at the commit time. Following
>>>> patches illustrate the idea.

In the form you have shown it, it would be completely useless, for
two reasons:

1. It doesn't guarantee that the right files are fsync'd.  It would
in fact fsync whichever files happen to be using the same kernel
file descriptor numbers at the close of the transaction as the ones
you really wanted to fsync were using at the time fsync was requested.

2. It doesn't guarantee that the files are fsync'd in the right order.
Per my discussion a few days ago, the only reason for doing fsync at all
is to guarantee that the data pages touched by a transaction get flushed
to disk before the pg_log update claiming that the transaction is done
gets flushed to disk.  A change like this completely destroys that
ordering, since pg_fsync_pending has no idea which fd is pg_log.

You could possibly fix #1 by logging fsync requests at the vfd level;
then, whenever a vfd is closed to free up a kernel fd, check the fsync
flag and execute the pending fsync before closing the file.  You could
possibly fix #2 by having transaction commit invoke the pg_fsync_pending
scan before it updates pg_log (and then fsyncing pg_log itself again
after).

(Actually, you could probably eliminate the notion of "fsync request"
entirely, and simply have each vfd get marked "dirty" automatically when
written to.  Both closing a vfd and the scan at xact commit would look
at the dirty bit to decide to do fsync.)

What would still need to be thought about is whether this scheme
preserves the ordering guarantee when a group of concurrent backends
is considered, rather than one backend in isolation.  (I believe that
fsync() will apply to all dirty kernel buffers for a file, not just
those dirtied by the requesting process, so each backend's fsyncs can
affect the order in which other backends' writes hit the disk.)
Offhand I do not see any problems there, but it's the kind of thing
that requires more than offhand thought...
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sevo Stille
Дата:
Сообщение: Re: [HACKERS] Proposal for new SET variables for optimizercosts
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] psql -e and -n flags