Re: [HACKERS] TODO item

Поиск
Список
Период
Сортировка
От Alfred Perlstein
Тема Re: [HACKERS] TODO item
Дата
Msg-id 20000209020448.P17536@fw.wintelcom.net
обсуждение исходный текст
Ответ на Re: [HACKERS] TODO item  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Ответы Re: [HACKERS] TODO item  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Список pgsql-hackers
* Tatsuo Ishii <t-ishii@sra.co.jp> [000209 00:51] wrote:
> > BTW, Hiroshi has noticed me an excellent point #3:
> > 
> > >Session-1
> > >begin;
> > >update A ...;
> > >
> > >Session-2
> > >begin;
> > >select * fromB ..;
> > >    There's no PostgreSQL shared buffer available.
> > >    This backend has to force the flush of a free buffer
> > >    page. Unfortunately the page was dirtied by the
> > >    above operation of Session-1 and calls pg_fsync()
> > >    for the table A. However fsync() is postponed until
> > >    commit of this backend.
> > >
> > >Session-1
> > >commit;
> > >    There's no dirty buffer page for the table A.
> > >    So pg_fsync() isn't called for the table A.
> > 
> > Seems there's no easy solution for this. Maybe now is the time to give
> > up my idea...
> 
> Thinking about a little bit more, I have come across yet another
> possible solution. It is actually *very* simple. Details as follows.
> 
> In xact.c:RecordTransactionCommit() there are two FlushBufferPool
> calls. One is for relation files and the other is for pg_log. I add
> sync() right after these FlushBufferPool. It will force any pending
> kernel buffers physically be written onto disk, thus should guarantee
> the ACID of the transaction (see attached code fragment).
> 
> There are two things that we should worry about sync, however.
> 
> 1. Does sync really wait for the completion of data be written on to
> disk?
> 
> I looked into the man page of sync(2) on Linux 2.0.36:
> 
>        According to  the  standard  specification  (e.g.,  SVID),
>        sync()  schedules  the  writes,  but may return before the
>        actual writing is done.   However,  since  version  1.3.20
>        Linux  does actually wait.  (This still does not guarantee
>        data integrity: modern disks have large caches.)
> 
> It seems that sync(2) blocks until data is written. So it would be ok
> at least with Linux. I'm not sure about other platforms, though.

It is incorrect to assume that sync() wait until all buffers are
flushed on any other platform than Linux, I didn't think
that Linux even did so but the kernel sources say yes.  

Solaris doesn't do this and niether does FreeBSD/NetBSD.

I guess if you wanted to implement this for linux only then it would
work, you ought to then also warn people that a non-dedicated db server
could experiance different performance using this code.

-Alfred


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Chris
Дата:
Сообщение: Re: [HACKERS] backend startup
Следующее
От: Kevin Lo
Дата:
Сообщение: Re: [HACKERS] WinNT compiling: ongoing