On Thu, 28 Oct 2021 at 14:36, Peter Geoghegan <pg@bowt.ie> wrote:
> I'll take another guess: I wonder if commit 56788d21 ("Allocate
> consecutive blocks during parallel seqscans") is somehow causing
> parallel CREATE INDEX to produce wrong results. The author (David
> Rowley) is CC'd. Does a bug in that commit seem like it might explain
> this problem, David? Might parallel workers in a parallel index build
> somehow become confused about which worker is supposed to scan which
> heap block, leading to duplicate TIDs in the final index?
I stared at that code for a while and didn't really see how it could
be broken, unless if the atomics implementation on that machine were
broken. Thomas and I had a look at that earlier and on his FreeBSD
machine, it uses the arch-x64.h implementation of
pg_atomic_fetch_add_u64_impl().
I also noted that pg_atomic_fetch_add_u64() is not really getting much
use. regress.c is the only other location, however, I really doubt
that this is the issue here.
Just to see what it would look like if this was broken, I went and
mocked up such a bug by adding the following code just above "return
page;" at then of table_block_parallelscan_nextpage:
if (page == 1000)
page = 999;
I then did:
create table b (b int not null, t text not null);
insert into b select x,x::text from generate_series(1,200000)x;
set max_parallel_workers_per_gather=0;
select sum(b), sum(length(t)) from b;
set max_parallel_workers_per_gather=2;
select sum(b), sum(length(t)) from b;
I noted that I get different results between the parallel and
non-parallel query due to page 999 being read twice. I then created
the following index:
set max_parallel_maintenance_workers = 2;
create index on b(t);
If I run a query to perform an index scan to find any value of "t"
that's on page 999, then I get:
postgres=# select ctid,* from b where t = '185050';
ctid | b | t
----------+--------+--------
(999,54) | 185050 | 185050
(999,54) | 185050 | 185050
(2 rows)
We just get the same tid twice in the index. That's quite different
from another value of "t" getting into the list of tids for '185050'.
David