RE: [HACKERS] TODO item

Поиск
Список
Период
Сортировка
От Hiroshi Inoue
Тема RE: [HACKERS] TODO item
Дата
Msg-id 000001bf71c4$79e14b80$2801007e@tpf.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] TODO item  (Bruce Momjian <pgman@candle.pha.pa.us>)
Ответы Re: [HACKERS] TODO item  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [HACKERS] TODO item  (Bruce Momjian <pgman@candle.pha.pa.us>)
Список pgsql-hackers
> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Bruce Momjian
>
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > Don't tell me we fsync on every buffer write, and not just at
> > > transaction commit?  That is terrible.
> >
> > If you don't have -F set, yup.  Why did you think fsync mode was
> > so slow?
> >
> > > What if we set a flag on the file descriptor stating we dirtied/wrote
> > > one of its buffers during the transaction, and cycle through the file
> > > descriptors on buffer commit and fsync all involved in the
> transaction.
> >
> > That's exactly what Tatsuo was describing, I believe.  I think Hiroshi
> > has pointed out a serious problem that would make it unreliable when
> > multiple backends are running: if some *other* backend fwrites the page
> > instead of your backend, and it doesn't fsync until *its* transaction is
> > done (possibly long after yours), then you lose the ordering guarantee
> > that is the point of the whole exercise...
>
> OK, I understand now.  You are saying if my backend dirties a buffer,
> but another backend does the write, would my backend fsync() that buffer
> that the other backend wrote.
>
> I can't imagine how fsync could flush _only_ the file discriptor buffers
> modified by the current process.  It would have to affect all buffers
> for the file descriptor.
>
> BSDI says:
>
>      Fsync() causes all modified data and attributes of fd to be
> moved to a
>      permanent storage device.  This normally results in all
> in-core modified
>      copies of buffers for the associated file to be written to a disk.
>
> Looking at the BSDI kernel, there is a user-mode file descriptor table,
> which maps to a kernel file descriptor table.  This table can be shared,
> so a file descriptor opened multiple times, like in a fork() call.  The
> kernel table maps to an actual file inode/vnode that maps to a file.
> The only thing that is kept in the file descriptor table is the current
> offset in the file (struct file in BSD).  There is no mapping of who
> wrote which blocks.
>
> In fact, I would suggest that any kernel implementation that could track
> such things would be pretty broken.  I can imagine some cases the use of
> that mapping of blocks to file descriptors would cause compatibility
> problems.  Those buffers have to be shared by all processes.
>
> So, I think we are safe if we can either keep that file descriptor open
> until commit, or re-open it and fsync it on commit.  That assume a
> re-open is hitting the same file.  My opinion is that we should just
> fsync it on close and not worry about a reopen.
>

I asked about this question 4 months ago but got no answer.
Obviouly this needs not only md/fd stuff changes but also bufmgr
changes.  Keeping dirtied list of segments of each backend seems
to work. But I'm afraid of other oversights.

The problem is that this feature is very difficult to verify.
In addtion WAL would solve this item naturally.

Is it still valuable to solve this item in current spec ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] network_ops in 7.0 and pg_dump question
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] TODO item