Re: [HACKERS] TODO item

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] TODO item
Дата
Msg-id 20018.949941617@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] TODO item  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Список pgsql-hackers
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> possibly fix #2 by having transaction commit invoke the pg_fsync_pending
>> scan before it updates pg_log (and then fsyncing pg_log itself again
>> after).

> I do not understand #2. I call pg_fsync_pending twice in
> RecordTransactionCommit, one is after FlushBufferPool, and the other
> is after TansactionIdCommit and FlushBufferPool. Or am I missing
> something?

Oh, OK.  That's what I meant.  The snippet you posted didn't show where
you were calling the fsync routine from.

> I thought about that too. If the ordering was that important, a
> database managed by backends with -F on could be seriously
> corrupted. I've never heard of such disasters caused by -F.

This is why I think that fsync actually offers very little extra
protection ;-)

> BTW, Hiroshi has noticed me an excellent point #3:

>> This backend has to force the flush of a free buffer
>> page. Unfortunately the page was dirtied by the
>> above operation of Session-1 and calls pg_fsync()
>> for the table A. However fsync() is postponed until
>> commit of this backend.
>> 
>> Session-1
>> commit;
>> There's no dirty buffer page for the table A.
>> So pg_fsync() isn't called for the table A.

Oooh, right.  Backend A dirties the page, but leaves it sitting in
shared buffer.  Backend B needs the buffer space, so it does the
fwrite of the page.  Now if backend A wants to commit, it can fsync
everything it's written --- but does that guarantee the page that
was actually written by B will get flushed to disk?  Not sure.

If the pending-fsync logic is based on either physical fds or vfds
then it definitely *won't* work; A might have found the desired page
sitting in buffer cache to begin with, and never have opened the
underlying file at all!

So it seems you would need to keep a list of all the relation files (and
segments) you've written to in the current xact, and open and fsync each
one just before writing/fsyncing pg_log.  Even then, you're assuming
that fsync applied to a file via an fd belonging to one backend will
flush disk buffers written to the same file via *other* fds belonging
to *other* processes.  I'm not sure that that is true on all Unixes...
heck, I'm not sure it's true on any.  The fsync(2) man page here isn't
real specific.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Inprise/Corel merger
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] TODO item