Re: [HACKERS] TODO item

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: [HACKERS] TODO item
Дата
Msg-id 200002071735.MAA08496@candle.pha.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] TODO item  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы fsync alternatives (was: Re: [HACKERS] TODO item)  (Alfred Perlstein <bright@wintelcom.net>)
Re: [HACKERS] TODO item  (Tom Lane <tgl@sss.pgh.pa.us>)
RE: [HACKERS] TODO item  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
Список pgsql-hackers
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Don't tell me we fsync on every buffer write, and not just at
> > transaction commit?  That is terrible.
> 
> If you don't have -F set, yup.  Why did you think fsync mode was
> so slow?
> 
> > What if we set a flag on the file descriptor stating we dirtied/wrote
> > one of its buffers during the transaction, and cycle through the file
> > descriptors on buffer commit and fsync all involved in the transaction. 
> 
> That's exactly what Tatsuo was describing, I believe.  I think Hiroshi
> has pointed out a serious problem that would make it unreliable when
> multiple backends are running: if some *other* backend fwrites the page
> instead of your backend, and it doesn't fsync until *its* transaction is
> done (possibly long after yours), then you lose the ordering guarantee
> that is the point of the whole exercise...

OK, I understand now.  You are saying if my backend dirties a buffer,
but another backend does the write, would my backend fsync() that buffer
that the other backend wrote.

I can't imagine how fsync could flush _only_ the file discriptor buffers
modified by the current process.  It would have to affect all buffers
for the file descriptor.

BSDI says:
    Fsync() causes all modified data and attributes of fd to be moved to a    permanent storage device.  This normally
resultsin all in-core modified    copies of buffers for the associated file to be written to a disk.
 

Looking at the BSDI kernel, there is a user-mode file descriptor table,
which maps to a kernel file descriptor table.  This table can be shared,
so a file descriptor opened multiple times, like in a fork() call.  The
kernel table maps to an actual file inode/vnode that maps to a file. 
The only thing that is kept in the file descriptor table is the current
offset in the file (struct file in BSD).  There is no mapping of who
wrote which blocks.

In fact, I would suggest that any kernel implementation that could track
such things would be pretty broken.  I can imagine some cases the use of
that mapping of blocks to file descriptors would cause compatibility
problems.  Those buffers have to be shared by all processes.

So, I think we are safe if we can either keep that file descriptor open
until commit, or re-open it and fsync it on commit.  That assume a
re-open is hitting the same file.  My opinion is that we should just
fsync it on close and not worry about a reopen.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [HACKERS] New Globe
Следующее
От: Alfred Perlstein
Дата:
Сообщение: fsync alternatives (was: Re: [HACKERS] TODO item)