Re: BUG #17245: Index corruption involving deduplicated entries

Поиск
Список
Период
Сортировка
От Alexander Kukushkin
Тема Re: BUG #17245: Index corruption involving deduplicated entries
Дата
Msg-id CAFh8B=k8WtQwRGVADGCXjU_ftwbM8reTfEAdsu_1BHRH_a1tpg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17245: Index corruption involving deduplicated entries  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-bugs
Hi Peter,

On Fri, 12 Nov 2021 at 03:15, Peter Geoghegan <pg@bowt.ie> wrote:

You have said elsewhere that you're sure that this isn't the parallel
VACUUM bug, since you know that you didn't run a manual VACUUM, even
once. So I wonder what the issue might be. Since you deleted duplicate
rows from a unique index, there probably weren't very many affected
rows in total.

It is a sharded setup, 16 shards in total. The table structure is similar between shards.
Also, more or less the same tables were affected across all shards.
Depending on the table the number of affected rows was from one to a few hundreds.
Picture looked similar across all shards, although, the set of tables was a bit varying from shard to shard.
The first duplicates I resolved manually, but after rules became clear I simply wrote queries that remove either everything but the first or the last version of the row based on the timestamp.
 
It sounds like a pretty subtle issue to me
(particularly compared to the parallel VACUUM bug, which wasn't all
that subtle when it hit at all).

If I had to guess, I'd guess that it has something to do with the
snapshot scalability work. Specifically, a recently reported issue
involving confusion about the structure of HOT chains during pruning:

https://www.postgresql.org/message-id/flat/20211110192010.ckvfzz352hsba5xf%40alap3.anarazel.de#4c3d9c9988164f5ea3c15999bcf50ce7

Please join in on the other thread if you have anything more to add.

Unfortunately, not really. I only ran bt_index_check(index => r.oid, heapallindexed => true), because bt_index_parent_check takes SharedLock on the index and blocks DML, which we have a lot.
All errors looked similarly:
ERROR: heap tuple (314222,250) from table ""${table_name}"" lacks matching index tuple within index ""${table_name}_pkey""
CONTEXT:
DETAIL: SQL statement ""SELECT bt_index_check(index => r.oid, heapallindexed => true)""
HINT: Retrying verification using the function bt_index_parent_check() might provide a more specific error.
 

I could easily be wrong about that, though. You upgraded using
pg_upgrade, right? That's certainly a confounding factor here.

Correct, pg_upgrade --link, from 9.6 to 14. Any other upgrade method is resource and time consuming.
 
--
Alexander Kukushkin

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: BUG #17280: global-buffer-overflow on select from pg_stat_slru
Следующее
От: Sandeep Thakkar
Дата:
Сообщение: Re: Tenable Report Issue even after upgrading to correct Postgres version