I've been looking into Mergl's "update" performance problem. With
current sources, on a sequential-scan update of about 10,000 out of
1,000,000 records, I observe 33712 read() calls and 34107 write() calls.
The table occupies 33334 disk blocks, so the number of reads looks about
right -- but the number of writes is at least a factor of 3 higher than
it should be!
It looks to me like something is broken such that bufmgr.c *always*
thinks that a buffer is dirty (and needs written out) when it is
released.
Poking around for the cause, I find that heapgettup() calls
SetBufferCommitInfoNeedsSave() for every single tuple read from the
table:
7.14 42.15 1000055/1000055 heap_getnext [9]
[10] 18.8 7.14 42.15 1000055 heapgettup [10] 1.53 30.10 1000020/1000020
HeapTupleSatisfiesSnapshot[11] 1.68 3.27 1000055/1000055 RelationGetBufferWithBuffer [50]
4.31 0.00 2066832/4129472 LockBuffer [45] 0.25 0.56 33361/33698
ReleaseAndReadBuffer[76] 0.44 0.00 1000000/1000000 SetBufferCommitInfoNeedsSave [92]
0.01 0.00 33371/33371 nextpage [240] 0.00 0.00 10/2033992 ReleaseBuffer [46]
0.00 0.00 45/201 HeapTupleSatisfiesNow [647] 0.00 0.00 5/9
nocachegetattr[730]
This could only be from the call to SetBufferCommitInfoNeedsSave in
the HeapTupleSatisfies macro. If I'm reading the code correctly,
that means that HeapTupleSatisfiesSnapshot() always changes the
t_infomask field of the tuple.
I don't understand this code well enough to fix it, but I assert that
it's broken. Most of these tuples are *not* being modified, and there
is no reason to have to rewrite the buffer.
regards, tom lane