Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY
От | Andres Freund |
---|---|
Тема | Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY |
Дата | |
Msg-id | 20220524181542.cmec7rcv3vvvtze4@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY
(Andres Freund <andres@anarazel.de>)
Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY (Andrey Borodin <x4mmm@yandex-team.ru>) |
Список | pgsql-bugs |
Hi, On 2022-05-24 09:37:02 -0700, Andres Freund wrote: > On 2022-05-24 11:02:12 +0200, Alvaro Herrera wrote: > > I approached it yesterday by running the test case with each > > set_indexsafe_procflags() callsite commented out, see which one breaks > > things. Didn't reach any conclusion because I ran into thermal problems > > with my laptop, which got me angry and couldn't make any further > > progress. > > Do we have any idea what really causes the corruption? Reproed the problem with the pgbench script, against an existing cluster. One note: With fsync=on, it's much harder to reproduce. One instance: pgbench: error: client 0 script 2 aborted in command 3 query 0: ERROR: heap tuple (37,56) from table "tbl" lacks matchingindex tuple within index "idx" HINT: Retrying verification using the function bt_index_parent_check() might provide a more specific error. the last record for that offset is (with waldump enhanced to print offsets): rmgr: Heap2 len (rec/tot): 60/ 60, tx: 0, lsn: 0/8E56D218, prev 0/8E56D1F0, desc: PRUNE latestRemovedXid1878329 nredirected 1 ndead 0 nunused 1, unused: [178], redirected: [56->53], blkref #0: rel 1663/5/27905blk 37 the tuple originally was inserted: rmgr: Heap len (rec/tot): 95/ 95, tx: 1802927, lsn: 0/89776380, prev 0/89776358, desc: INSERT off 56 flags0x04, blkref #0: rel 1663/5/27905 blk 37 rmgr: Heap len (rec/tot): 48/ 48, tx: 1802927, lsn: 0/89776460, prev 0/89776420, desc: HEAP_CONFIRM off56, blkref #0: rel 1663/5/27905 blk 37 and updated a "while" back: rmgr: Heap len (rec/tot): 54/ 54, tx: 1804510, lsn: 0/898B5E68, prev 0/898B5E40, desc: LOCK off 56: xid1804510: flags 0x00 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/5/27905 blk 37 rmgr: Heap len (rec/tot): 73/ 73, tx: 1804510, lsn: 0/898B5EA0, prev 0/898B5E68, desc: HOT_UPDATE off 56xmax 1804510 flags 0x60 ; new off 80 xmax 1804510, blkref #0: rel 1663/5/27905 blk 37 since which there have been numerous redirections to other tuples: rmgr: Heap2 len (rec/tot): 212/ 212, tx: 0, lsn: 0/89AC6B40, prev 0/89AC6B00, desc: PRUNE latestRemovedXid1806705 nredirected 16 ndead 0 nunused 47, unused: [47, 61, 12, 14, 38, 52, 53, 60, 75, 16, 17, 31, 50, 33,59, 76, 49, 51, 54, 63, 89, 95, 18, 19, 29, 32, 48, 62, 81, 85, 88, 106, 28, 39, 40, 44, 26, 27, 45, 58, 73, 91, 34, 41,80, 86, 96], redirected: [1->74, 3->93, 4->37, 7->94, 8->72, 9->87, 10->105, 15->108, 21->71, 23->98, 30->79, 35->102,55->92, 56->107, 67->103, 84->104], blkref #0: rel 1663/5/27905 blk 37 rmgr: Heap2 len (rec/tot): 220/ 220, tx: 0, lsn: 0/89DC6240, prev 0/89DC6218, desc: PRUNE latestRemovedXid1810120 nredirected 27 ndead 0 nunused 29, unused: [74, 88, 93, 19, 94, 53, 59, 105, 16, 29, 118, 108, 71,26, 38, 98, 79, 52, 102, 31, 54, 92, 107, 50, 113, 103, 104, 60, 85], redirected: [1->122, 3->62, 5->49, 7->91, 10->121,15->28, 21->51, 23->119, 24->58, 30->73, 33->81, 34->86, 35->89, 36->14, 42->76, 45->112, 48->75, 55->80, 56->116,67->17, 68->115, 70->61, 78->117, 83->27, 84->18, 97->120, 99->114], blkref #0: rel 1663/5/27905 blk 37 .... rmgr: Heap2 len (rec/tot): 60/ 60, tx: 0, lsn: 0/8E56D218, prev 0/8E56D1F0, desc: PRUNE latestRemovedXid1878329 nredirected 1 ndead 0 nunused 1, unused: [178], redirected: [56->53], blkref #0: rel 1663/5/27905blk 37 the target of the redirection that's pruned away in 0/8E56D218 is more recent: rmgr: Heap len (rec/tot): 54/ 54, tx: 1878329, lsn: 0/8E556070, prev 0/8E556020, desc: LOCK off 178: xid1878329: flags 0x00 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/5/27905 blk 37 rmgr: Heap len (rec/tot): 74/ 74, tx: 1878329, lsn: 0/8E5560A8, prev 0/8E556070, desc: HOT_UPDATE off 178xmax 1878329 flags 0x60 ; new off 53 xmax 1878329, blkref #0: rel 1663/5/27905 blk 37 I suspect the problem might be related to pruning done during the validation scan. Once PROC_IN_SAFE_IC is set, the backend itself will not preserve tids its own snapshot might need. Which will wreak havoc during the validation scan. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Peter GeogheganДата:
Сообщение: Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY
Следующее
От: Andres FreundДата:
Сообщение: Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY