Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?
От | Peter Geoghegan |
---|---|
Тема | Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM? |
Дата | |
Msg-id | CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5qxWQ@mail.gmail.com обсуждение исходный текст |
Список | pgsql-hackers |
The code in gistvacuum.c is closely based on similar code in nbtree.c, except that it only acquires an exclusive lock -- not a super-exclusive lock. I suspect that that's because it seemed unnecessary; nbtree plain index scans have their own special reasons for this, that don't apply to GiST. Namely: nbtree plain index scans that don't use an MVCC snapshot clearly need some other interlock to protect against concurrent recycling of pointed-to-by-leaf-page TIDs. And so as a general rule nbtree VACUUM needs a full super-exclusive/cleanup lock, just in case there is a plain index scan that uses some other kind of snapshot (logical replication, say). To say the same thing another way: nbtree follows "the third rule" described by "62.4. Index Locking Considerations" in the docs [1], but GiST does not. The idea that GiST's behavior is okay here does seem consistent with what the same docs go on to say about it: "When using an MVCC-compliant snapshot, there is no problem because the new occupant of the slot is certain to be too new to pass the snapshot test". But what about index-only scans, which GiST also supports? I think that the rules are different there, even though index-only scans use an MVCC snapshot. The (admittedly undocumented) reason why we can never drop the leaf page pin for an index-only scan in nbtree (but can do so for plain index scans) also relates to heap interlocking -- but with a twist. Here's the twist: the second heap pass by VACUUM can set visibility map bits independently of the first (once LP_DEAD items from the first pass over the heap are set to LP_UNUSED, which renders the page all-visible) -- this all happens at the end of lazy_vacuum_heap_page(). That's why index-only scans can't just assume that VACUUM won't have deleted the TID from the leaf page they're scanning immediately after they're done reading it. VACUUM could even manage to set the visibility map bit for a relevant heap page inside lazy_vacuum_heap_page(), before the index-only scan can read the visibility map. If that is allowed to happen, the index-only would give wrong answers if one of the TID references held in local memory by the index-only scan happens to be marked LP_UNUSED inside lazy_vacuum_heap_page(). IOW, it looks like we run the risk of a concurrently recycled dead-to-everybody TID becoming visible during GiST index-only scans, just because we have no interlock. In summary: UUIC this is only safe for nbtree because 1.) It acquires a super-exclusive lock when vacuuming leaf pages, and 2.) Index-only scans never drop their pin on the leaf page when accessing the visibility map "in-sync" with the scan (of course we hope not to access the heap proper at all for index-only scans). These precautions are both necessary to make the race condition I describe impossible, because they ensure that VACUUM cannot reach lazy_vacuum_heap_page() until after our index-only scan reads the visibility map (and then has to read the heap for at least that one dead-to-all TID, discovering that the TID is dead to its snapshot). Why wouldn't GiST need to take the same precautions, though? [1] https://www.postgresql.org/docs/devel/index-locking.html -- Peter Geoghegan
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Stan HuДата:
Сообщение: Re: lastOverflowedXid does not handle transaction ID wraparound