Обсуждение: HOT vs freezing issue causing "cannot freeze committed xmax"
Hi, In a development branch of mine Thomas / the CF bot found a relatively rare regression failures. That turned out to be because there was an edge case in which heap_page_prune() was a bit more pessimistic than lazy_scan_heap(). But I wonder if this isn't an issue more broadly: The issue I am concerned about is lazy_scan_heap()'s logic for DEAD HOT updated tuples: /* * Ordinarily, DEAD tuples would have been removed by * heap_page_prune(), but it's possible that the tuple * state changed since heap_page_prune() looked. In * particular an INSERT_IN_PROGRESS tuple could have * changed to DEAD if the inserter aborted. So this * cannot be considered an error condition. * * If the tuple is HOT-updated then it must only be * removed by a prune operation; so we keep it just as if * it were RECENTLY_DEAD. Also, if it's a heap-only * tuple, we choose to keep it, because it'll be a lot * cheaper to get rid of it in the next pruning pass than * to treat it like an indexed tuple. Finally, if index * cleanup is disabled, the second heap pass will not * execute, and the tuple will not get removed, so we must * treat it like any other dead tuple that we choose to * keep. * * If this were to happen for a tuple that actually needed * to be deleted, we'd be in trouble, because it'd * possibly leave a tuple below the relation's xmin * horizon alive. heap_prepare_freeze_tuple() is prepared * to detect that case and abort the transaction, * preventing corruption. */ if (HeapTupleIsHotUpdated(&tuple) || HeapTupleIsHeapOnly(&tuple) || params->index_cleanup == VACOPT_TERNARY_DISABLED) nkeep += 1; else tupgone = true; /* we can delete the tuple */ all_visible = false; break; In the case the HOT logic triggers, we'll call heap_prepare_freeze_tuple() even when the tuple is dead. Which then can lead us to if (TransactionIdPrecedes(xid, cutoff_xid)) { /* * If we freeze xmax, make absolutely sure that it's not an XID * that is important. (Note, a lock-only xmax can be removed * independent of committedness, since a committed lock holder has * released the lock). */ if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) && TransactionIdDidCommit(xid)) ereport(PANIC, (errcode(ERRCODE_DATA_CORRUPTED), errmsg_internal("cannot freeze committed xmax %u", xid))); freeze_xmax = true; (before those errors we'd just have unset xmax) Now obviously the question is whether it's possible that heap_page_prune() left alive anything that could be seen as DEAD for the check in lazy_scan_heap(), and that additionally is older than the cutoff passed to heap_prepare_freeze_tuple(). I'm not sure - it seems like it could be possible in some corner cases, when transactions abort after the heap_page_prune() but before the second HeapTupleSatisfiesVacuum(). But regardless of whether it's possible today, it seems extremely fragile. ISTM we should at least have a bunch of additional error checks in the HOT branch for HEAPTUPLE_DEAD. Greetings, Andres Freund
On Thu, Jul 23, 2020 at 2:10 PM Andres Freund <andres@anarazel.de> wrote: > In the case the HOT logic triggers, we'll call > heap_prepare_freeze_tuple() even when the tuple is dead. I think this is very bad. I've always been confused about these comments, but I couldn't quite put my finger on the problem. Now I think I can: the comments here imagine that we have the option either to set tupgone, causing the line pointer to be marked unused by an eventual call to lazy_vacuum_page(), or that we can decline to set tupgone, which will leave the tuple around to be handled by the next vacuum. However, we don't really have a choice at all. A choice implies that either option is correct, and therefore we can elect the one we prefer. But here, it's not just that one option is incorrect, but that both options are incorrect. Setting tupgone controls whether or not the tuple is considered for freezing. If we DON'T consider freezing it, then we might manage to advance relfrozenxid while an older XID still exists in the table. If we DO consider freezing it, we will correctly conclude that it needs to be frozen, but then the freezing code is in an impossible situation, because it has no provision for getting rid of tuples, only for keeping them. Its logic assumes that whenever we are freezing xmin or xmax we do that in a way that causes the tuple to be visible to everyone, but this tuple should be visible to no one. So with your changes it now errors out instead of corrupting data, but that's just replacing one bad thing (data corruption) with another (VACUUM failures). I think the actual correct behavior here is to mark the line pointer as dead. The easiest way to accomplish that is probably to have lazy_scan_heap() just emit an extra XLOG_HEAP2_CLEAN record beyond whatever HOT-pruning already did, if it's necessary. A better solution would probably be to merge HOT-pruning with setting things all-visible and have a single function that does both, but that seems a lot more invasive, and definitely unsuitable for back-patching. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2020-07-24 11:06:58 -0400, Robert Haas wrote: > On Thu, Jul 23, 2020 at 2:10 PM Andres Freund <andres@anarazel.de> wrote: > > In the case the HOT logic triggers, we'll call > > heap_prepare_freeze_tuple() even when the tuple is dead. > > I think this is very bad. I've always been confused about these > comments, but I couldn't quite put my finger on the problem. Now I > think I can: the comments here imagine that we have the option either > to set tupgone, causing the line pointer to be marked unused by an > eventual call to lazy_vacuum_page(), or that we can decline to set > tupgone, which will leave the tuple around to be handled by the next > vacuum. Yea. I think the only saving grace is that it's not obvious when the situation can arise without prior corruption. But even if that's actuall impossible, it seems extremely fragile. I stared at heap_prune_chain() for quite a while and couldn't convince myself either way. > However, we don't really have a choice at all. A choice implies that > either option is correct, and therefore we can elect the one we > prefer. But here, it's not just that one option is incorrect, but that > both options are incorrect. Setting tupgone controls whether or not > the tuple is considered for freezing. If we DON'T consider freezing > it, then we might manage to advance relfrozenxid while an older XID > still exists in the table. If we DO consider freezing it, we will > correctly conclude that it needs to be frozen, but then the freezing > code is in an impossible situation, because it has no provision for > getting rid of tuples, only for keeping them. Its logic assumes that > whenever we are freezing xmin or xmax we do that in a way that causes > the tuple to be visible to everyone, but this tuple should be visible > to no one. So with your changes it now errors out instead of > corrupting data, but that's just replacing one bad thing (data > corruption) with another (VACUUM failures). I suspect that the legitimate cases hitting this branch won't error out, because then xmin/xmax aren't old enough to need to be frozen. > I think the actual correct behavior here is to mark the line pointer > as dead. That's not trivial, because just doing that naively will break HOT. > The easiest way to accomplish that is probably to have > lazy_scan_heap() just emit an extra XLOG_HEAP2_CLEAN record beyond > whatever HOT-pruning already did, if it's necessary. A better solution > would probably be to merge HOT-pruning with setting things all-visible > and have a single function that does both, but that seems a lot more > invasive, and definitely unsuitable for back-patching. I suspect that merging pruning and this logic in lazy_scan_heap() really is the only proper way to solve this kind of issue. I wonder if, given we don't know if this issue can be hit in a real database, and given that it already triggers an error, the right way to deal with this in the back-branches is to emit a more precise error message. I.e. if we hit this branch, and either xmin/xmax are older than the cutoff, then we issue a more specific ERROR. Greetings, Andres Freund