Re: BUG #17245: Index corruption involving deduplicated entries
От | Peter Geoghegan |
---|---|
Тема | Re: BUG #17245: Index corruption involving deduplicated entries |
Дата | |
Msg-id | CAH2-Wz=WOp0mtu6so+4yjMaCUu+2hAmY8g-7AuGmkyX51iSCDA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17245: Index corruption involving deduplicated entries (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: BUG #17245: Index corruption involving deduplicated entries
|
Список | pgsql-bugs |
On Thu, Oct 28, 2021 at 10:16 AM Peter Geoghegan <pg@bowt.ie> wrote: > Now this is looking like a problem in VACUUM (pruning?), not a CREATE > INDEX thing. It looks like somehow an item that should be LP_DEAD ends > up being LP_UNUSED during pruning. I have CC'd Andres, to get his > thoughts on this. Further analysis based on lots more heap pages provided by Kamigishi Rei (thanks a lot!) strongly supports this theory. The problem looks like an interaction between the snapshot scalability work, VACUUM, and shared row locks. I cannot tell for sure if the table in question has actually allocated MultiXacts in its lifetime, but it has certainly had quite a few shared locks. Note that the other people that had similar complaints about Postgres 14 all used foreign keys on affected tables. I'm attaching my personal notes on this. They have a little commentary, but are mostly useful because they outline the exact ways in which the data is corrupt, which is pretty tedious to put together manually. There are some very clear patterns here: * Most of the heap pages I've looked at have rows that were never updated or locked. There are usually 2 or 3 such tuples on each heap page, at least among those known to be corrupt -- 1 or 2 of them usually tie back to corruption in the index. * Most individual duplicated-in-index heap TIDs point to heap tuples that are HEAP_XMAX_KEYSHR_LOCK|HEAP_XMAX_LOCK_ONLY. These heap tuples have the same xmin and xmax. * The transaction ID 365637 is very over-represented, appearing in several corrupt heap tuple headers, located across several heap pages. * Its "neighbor" transaction ID is 365638, which appears once more. To me this suggests some kind of confusion with an OldestXmin style cutoff during VACUUM. * As suspected, there are a smaller number of TIDs in the index that point to LP_UNUSED items in the heap -- a distinct form of corruption to the more noticeable duplicate TIDs (superficially distinct, at least). These aren't usually duplicated in the index, though they can be. This all but confirms that the original complaint was in fact just a result of a TID/item pointer being recycled in the heap "too early". It also explains why amcheck's heapallindexed option didn't ever complain about any index in the whole database (only the index structure itself looked corrupt). -- Peter Geoghegan
Вложения
В списке pgsql-bugs по дате отправления: