Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY
Дата
Msg-id 20220524181542.cmec7rcv3vvvtze4@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY  (Andres Freund <andres@anarazel.de>)
Ответы Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY  (Andres Freund <andres@anarazel.de>)
Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY  (Andrey Borodin <x4mmm@yandex-team.ru>)
Список pgsql-bugs
Hi,

On 2022-05-24 09:37:02 -0700, Andres Freund wrote:
> On 2022-05-24 11:02:12 +0200, Alvaro Herrera wrote:
> > I approached it yesterday by running the test case with each
> > set_indexsafe_procflags() callsite commented out, see which one breaks
> > things.  Didn't reach any conclusion because I ran into thermal problems
> > with my laptop, which got me angry and couldn't make any further
> > progress.
>
> Do we have any idea what really causes the corruption?

Reproed the problem with the pgbench script, against an existing cluster. One
note: With fsync=on, it's much harder to reproduce.

One instance:

pgbench: error: client 0 script 2 aborted in command 3 query 0: ERROR:  heap tuple (37,56) from table "tbl" lacks
matchingindex tuple within index "idx"
 
HINT:  Retrying verification using the function bt_index_parent_check() might provide a more specific error.

the last record for that offset is (with waldump enhanced to print offsets):

rmgr: Heap2       len (rec/tot):     60/    60, tx:          0, lsn: 0/8E56D218, prev 0/8E56D1F0, desc: PRUNE
latestRemovedXid1878329 nredirected 1 ndead 0 nunused 1, unused: [178], redirected: [56->53], blkref #0: rel
1663/5/27905blk 37
 

the tuple originally was inserted:
rmgr: Heap        len (rec/tot):     95/    95, tx:    1802927, lsn: 0/89776380, prev 0/89776358, desc: INSERT off 56
flags0x04, blkref #0: rel 1663/5/27905 blk 37
 
rmgr: Heap        len (rec/tot):     48/    48, tx:    1802927, lsn: 0/89776460, prev 0/89776420, desc: HEAP_CONFIRM
off56, blkref #0: rel 1663/5/27905 blk 37
 

and updated a "while" back:
rmgr: Heap        len (rec/tot):     54/    54, tx:    1804510, lsn: 0/898B5E68, prev 0/898B5E40, desc: LOCK off 56:
xid1804510: flags 0x00 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/5/27905 blk 37
 
rmgr: Heap        len (rec/tot):     73/    73, tx:    1804510, lsn: 0/898B5EA0, prev 0/898B5E68, desc: HOT_UPDATE off
56xmax 1804510 flags 0x60 ; new off 80 xmax 1804510, blkref #0: rel 1663/5/27905 blk 37
 

since which there have been numerous redirections to other tuples:
rmgr: Heap2       len (rec/tot):    212/   212, tx:          0, lsn: 0/89AC6B40, prev 0/89AC6B00, desc: PRUNE
latestRemovedXid1806705 nredirected 16 ndead 0 nunused 47, unused: [47, 61, 12, 14, 38, 52, 53, 60, 75, 16, 17, 31, 50,
33,59, 76, 49, 51, 54, 63, 89, 95, 18, 19, 29, 32, 48, 62, 81, 85, 88, 106, 28, 39, 40, 44, 26, 27, 45, 58, 73, 91, 34,
41,80, 86, 96], redirected: [1->74, 3->93, 4->37, 7->94, 8->72, 9->87, 10->105, 15->108, 21->71, 23->98, 30->79,
35->102,55->92, 56->107, 67->103, 84->104], blkref #0: rel 1663/5/27905 blk 37
 
rmgr: Heap2       len (rec/tot):    220/   220, tx:          0, lsn: 0/89DC6240, prev 0/89DC6218, desc: PRUNE
latestRemovedXid1810120 nredirected 27 ndead 0 nunused 29, unused: [74, 88, 93, 19, 94, 53, 59, 105, 16, 29, 118, 108,
71,26, 38, 98, 79, 52, 102, 31, 54, 92, 107, 50, 113, 103, 104, 60, 85], redirected: [1->122, 3->62, 5->49, 7->91,
10->121,15->28, 21->51, 23->119, 24->58, 30->73, 33->81, 34->86, 35->89, 36->14, 42->76, 45->112, 48->75, 55->80,
56->116,67->17, 68->115, 70->61, 78->117, 83->27, 84->18, 97->120, 99->114], blkref #0: rel 1663/5/27905 blk 37
 
....
rmgr: Heap2       len (rec/tot):     60/    60, tx:          0, lsn: 0/8E56D218, prev 0/8E56D1F0, desc: PRUNE
latestRemovedXid1878329 nredirected 1 ndead 0 nunused 1, unused: [178], redirected: [56->53], blkref #0: rel
1663/5/27905blk 37
 


the target of the redirection that's pruned away in 0/8E56D218 is more recent:
rmgr: Heap        len (rec/tot):     54/    54, tx:    1878329, lsn: 0/8E556070, prev 0/8E556020, desc: LOCK off 178:
xid1878329: flags 0x00 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/5/27905 blk 37
 
rmgr: Heap        len (rec/tot):     74/    74, tx:    1878329, lsn: 0/8E5560A8, prev 0/8E556070, desc: HOT_UPDATE off
178xmax 1878329 flags 0x60 ; new off 53 xmax 1878329, blkref #0: rel 1663/5/27905 blk 37
 


I suspect the problem might be related to pruning done during the validation
scan. Once PROC_IN_SAFE_IC is set, the backend itself will not preserve tids
its own snapshot might need. Which will wreak havoc during the validation
scan.

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY
Следующее
От: Andres Freund
Дата:
Сообщение: Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY