Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I am suggesting opening and marking a file descriptor as needing fsync
> even if I only dirty the buffer and not write it. I understand another
> backend may write my buffer and remove it before I commit my
> transaction. However, I will be the one to fsync it. I am also
> suggesting that such file descriptors never get recycled until
> transaction commit.
> Is that wrong?
I see where you're going, and you could possibly make it work, but
there are a bunch of problems. One objection is that kernel FDs
are a very finite resource on a lot of platforms --- you don't really
want to tie up one FD for every dirty buffer, and you *certainly*
don't want to get into a situation where you can't release kernel
FDs until end of xact. You might be able to get around that by
associating the fsync-needed bit with VFDs instead of FDs.
What may turn out to be a nastier problem is the circular dependency
this creates between shared-buffer management and md.c/fd.c. Right now
(IIRC at 3am) md/fd are clearly at a lower level than bufmgr, but that
would stop being true if you make FDs be proxies for dirtied buffers.
Here is one off-the-top-of-the-head trouble scenario: bufmgr wants to
dump a buffer that was dirtied by another backend -> needs to open FD ->
fd.c has no free FDs, needs to close one -> needs to dump and fsync a
buffer so it can forget the FD -> bufmgr needs to get I/O lock on two
different buffers at once -> potential deadlock against another backend
doing the reverse. (Assuming you even get that far, and don't hang up
at the recursive entry to bufmgr trying to get a spinlock you already
hold...)
Possibly with close study you can prove that no such problem can happen.
My point is just that this isn't a trivial change. Is it worth
investing substantial effort on what will ultimately be a dead end?
regards, tom lane