Re: BUG #17245: Index corruption involving deduplicated entries

Поиск

Список

Период

Сортировка

От	David Rowley
Тема	Re: BUG #17245: Index corruption involving deduplicated entries
Дата	28 октября 2021 г. 09:49:57
Msg-id	CAApHDvoeu9rtEwdawqeiFtyMaDbaTgEHu1Ns-2YSo+A+_5v5GA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: BUG #17245: Index corruption involving deduplicated entries (Peter Geoghegan <pg@bowt.ie>)
Ответы	Re: BUG #17245: Index corruption involving deduplicated entries
Список	pgsql-bugs

Дерево обсуждения

On Thu, 28 Oct 2021 at 14:36, Peter Geoghegan <pg@bowt.ie> wrote:
> I'll take another guess: I wonder if commit 56788d21 ("Allocate
> consecutive blocks during parallel seqscans") is somehow causing
> parallel CREATE INDEX to produce wrong results. The author (David
> Rowley) is CC'd. Does a bug in that commit seem like it might explain
> this problem, David? Might parallel workers in a parallel index build
> somehow become confused about which worker is supposed to scan which
> heap block, leading to duplicate TIDs in the final index?

I stared at that code for a while and didn't really see how it could
be broken, unless if the atomics implementation on that machine were
broken.  Thomas and I had a look at that earlier and on his FreeBSD
machine, it uses the arch-x64.h implementation of
pg_atomic_fetch_add_u64_impl().

I also noted that pg_atomic_fetch_add_u64() is not really getting much
use.  regress.c is the only other location, however, I really doubt
that this is the issue here.

Just to see what it would look like if this was broken, I went and
mocked up such a bug by adding the following code just above "return
page;" at then of table_block_parallelscan_nextpage:

if (page == 1000)
    page = 999;

I then did:

create table b (b int not null, t text not null);
insert into b select x,x::text from generate_series(1,200000)x;
set max_parallel_workers_per_gather=0;
select sum(b), sum(length(t)) from b;
set max_parallel_workers_per_gather=2;
select sum(b), sum(length(t)) from b;

I noted that I get different results between the parallel and
non-parallel query due to page 999 being read twice.  I then created
the following index:

set max_parallel_maintenance_workers = 2;
create index on b(t);

If I run a query to perform an index scan to find any value of "t"
that's on page 999, then I get:

postgres=# select ctid,* from b where t = '185050';
   ctid   |   b    |   t
----------+--------+--------
 (999,54) | 185050 | 185050
 (999,54) | 185050 | 185050
(2 rows)

We just get the same tid twice in the index. That's quite different
from another value of "t" getting into the list of tids for '185050'.

David

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #17245: Index corruption involving deduplicated entries