I’ve found a bug in Postgres which causes old row versions to appear in a table. DEAD rows in a hot chain are getting frozen and becoming visible. I’ve repro’d this in both 9.6.1 and 11-devel.
The repro consists of two short psql scripts.
While the repro does an explicit VACUUM FREEZE, this bug also happens with autovacuum.
FILE: lock.sql
begin;
select id from t where id=3 for key share;
select pg_sleep(1);
update t set x=x+1 where id=3;
commit;
vacuum freeze t;
select ctid, xmin, xmax, id from t;
FILE: repro.sql
drop table t;
create table t (id int primary key, name char(3), x integer);
insert into t values (1, '111', 0);
insert into t values (3, '333', 0);
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
It’s about 50-50 whether any given run of repro.sql will produce output like:
…
ctid | xmin | xmax | id | x
-------+------+------+----+---
(0,1) | 984 | 0 | 1 | 0
(0,7) | 990 | 0 | 3 | 5
(2 rows)
ctid | xmin | xmax | id | x
-------+------+------+----+---
(0,1) | 984 | 0 | 1 | 0
(0,3) | 986 | 0 | 3 | 1 // This, and x = 2, 3 and 4 came back from the DEAD
(0,4) | 987 | 0 | 3 | 2
(0,5) | 988 | 0 | 3 | 3
(0,6) | 989 | 0 | 3 | 4
(0,7) | 990 | 0 | 3 | 5
(6 rows)
Root cause analysis: lazy_scan_heap() deletes DEAD tuples in heap_page_prune(). However, it is possible for concurrent commits/rollbacks to render a tuple DEAD by the time we reach the switch statement on HeapTupleSatisfiesVacuum(). If such a row IsHotUpdated or IsHeapOnly we can’t delete it below, and must allow a later prune to take care of it.
if (HeapTupleIsHotUpdated(&tuple) || HeapTupleIsHeapOnly(&tuple))
nkeep += 1; // Don't delete, allow later prune to delete it
else
tupgone = true; // We can delete it below
Because tupgone is false we freeze instead of deleting. Freezing a DEAD tuple makes it visible. Here is a comment in heap_prepare_freeze_tuple()
* It is assumed that the caller has checked the tuple with
* HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD
* (else we should be removing the tuple, not freezing it).
It is rare that we run into a DEAD tuple in this way during a freeze. More often RECENTLY_DEAD is returned. But we did see this with a more realistic long running test and I was able to create the simplified test case above. Skipping the Freeze on a DEAD tuple that IsHotUpdated or IsHeapOnly does fix the problem. I’ve attached a patch with this fix.