pg_check_frozen() reports corrupted VM freeze state.
Found with one of my stress tests. Simplified to the repro below.
The reason for the 33 rows/pages is that I wanted to test if a 2nd vacuum freeze repaired the situation. I was
confoundedtill I discovered SKIP_PAGES_THRESHOLD was 32.
My analysis is that heap_prepare_freeze_tuple->FreezeMultiXactId() returns FRM_NOOP if the MultiXACT locked rows
haven'tcommitted. This results in changed=false and totally_frozen=true(as initialized). When this returns to
lazy_scan_heap(),no rows are added to the frozen[] array. Yet, tuple_totally_frozen is true. This means the page is
markedfrozen in the VM, even though the MultiXACT row wasn't left untouch.
A fix to heap_prepare_freeze_tuple() that seems to do the trick is:
else
{
Assert(flags & FRM_NOOP);
+ totally_frozen = false;
}
BASH script repro below:
#!/bin/bash
p="psql -h 127.0.0.1 -p 5432 postgres"
echo "create extension pg_visibility;" | $p
$p <<XXX
drop table t;
create table t (i int primary key, c char(7777));
alter table t alter column c set storage plain;
insert into t select generate_series(0, 32, 1), 'XXX';
XXX
# Start two share lockers in the background
$p <<XXX >/dev/null &
begin;
select i, length(c) from t for share;
select pg_sleep(2);
commit;
XXX
$p <<XXX >/dev/null &
begin;
select i, length(c) from t for share;
select pg_sleep(2);
commit;
XXX
# Freeze while multixact locks are held
echo "vacuum freeze t;" | $p
echo "select pg_check_frozen('t');" | $p
sleep 4; # Wait for share locks to be released
# See if another freeze corrects the problem
echo "vacuum freeze t;" | $p
echo "select pg_check_frozen('t');" | $p