I wrote:
> I was able to partially reproduce whelk's failure here. I got a
> couple of cases of "cannot freeze committed xmax", which then leads
> to the second NOTICE diff; but I couldn't reproduce the first
> NOTICE diff. That was out of about a thousand tries :-( so it's not
> looking like a promising thing to reproduce without modifying the test.
... however, it's trivial to reproduce via manual interference,
using the same strategy discussed recently for another case:
add a pg_sleep at the start of the heap_surgery.sql script,
run "make installcheck", and while that's running start another
session in which you begin a serializable transaction, execute
any old SELECT, and wait. AFAICT this reproduces all of whelk's
symptoms with 100% reliability.
With a little more effort, this could be automated by putting
some long-running transaction (likely, it needn't be any more
complicated than "select pg_sleep(10)") in a second test script
launched in parallel with heap_surgery.sql.
So this confirms the suspicion that the cause of the buildfarm
failures is a concurrently-open transaction, presumably from
autovacuum. I don't have time to poke further right now.
regards, tom lane