Here's a rebased version of the patch, including the above-mentioned fixes. Nothing else new.
I've applied this to 0892ecbc015930d, the last commit to which it applies cleanly.
When I test this by repeatedly incrementing a counter in a randomly chosen row, then querying the whole table and comparing the results to what my driver knows they should be, I get discrepancies.
No crash/recovery needs to be done to get the behavior.
The number of rows is correct, so one version of every row is visible, but it is sometimes the wrong version.
The discrepancy arises shortly after the first time this type of message appears:
6930 UPDATE 2013-09-18 12:36:34.519 PDT:LOG: started new XID range, XIDs 1000033-, MultiXIDs 1-, tentative LSN 0/FA517F8
6930 UPDATE 2013-09-18 12:36:34.519 PDT:STATEMENT: update foo set count=count+1 where index=$1
6928 UPDATE 2013-09-18 12:36:34.521 PDT:LOG: closed old XID range at 1000193 (LSN 0/FA58A08)
6928 UPDATE 2013-09-18 12:36:34.521 PDT:STATEMENT: update foo set count=count+1 where index=$1
I'll work on getting the driver to shutdown the database the first time it finds a problem so that autovac doesn't destroy evidence.
I have uploaded the script to reproduce, and a tarball of the data directory (when started, it will go through recovery. table "foo" is in the jjanes database and role.)