Обсуждение: Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
От
Alvaro Herrera
Дата:
Peter Geoghegan wrote: > On Fri, Oct 6, 2017 at 8:29 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > /* > > * When a tuple is frozen, the original Xmin is lost, but we know it's a > > * committed transaction. So unless the Xmax is InvalidXid, we don't know > > * for certain that there is a match, but there may be one; and we must > > * return true so that a HOT chain that is half-frozen can be walked > > * correctly. > > * > > * We no longer freeze tuples this way, but we must keep this in order to > > * interpret pre-pg_upgrade pages correctly. > > */ > > if (TransactionIdEquals(xmin, FrozenTransactionId) && > > TransactionIdIsValid(xmax)) > > return true; > > > > return false; > > } > > Wouldn't this last "if" test, to cover the pg_upgrade case, be better > targeted by comparing *raw* xmin to FrozenTransactionId? You're using > the potentially distinct xmin value returned by > HeapTupleHeaderGetXmin() for the test here. I think we should be > directly targeting tuples frozen on or before 9.4 (prior to > pg_upgrade) instead. Yes, agreed, I should change that. Thanks for continuing to think about this. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 12, 2017 at 7:31 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> Wouldn't this last "if" test, to cover the pg_upgrade case, be better >> targeted by comparing *raw* xmin to FrozenTransactionId? You're using >> the potentially distinct xmin value returned by >> HeapTupleHeaderGetXmin() for the test here. I think we should be >> directly targeting tuples frozen on or before 9.4 (prior to >> pg_upgrade) instead. > > Yes, agreed, I should change that. Thanks for continuing to think about > this. I haven't really followed this thread in depth, but I'm very nervous about the idea that we should ever be examining the raw-xmin of a tuple that has been marked HEAP_XMIN_FROZEN for anything other than forensic purposes. The design principle I followed in writing the original patch was that marking a tuple HEAP_XMIN_FROZEN was identical to setting the xmin to 2 except that the original xmin was still available for manual inspection. If we're finding that we need the raw xmin after freezing, doesn't that mean that our freezing algorithm was flat busted prior to that commit? And maybe still busted when things wrap around multiple times and raw-xmins are reused? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
От
Peter Geoghegan
Дата:
On Fri, Oct 13, 2017 at 4:54 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I haven't really followed this thread in depth, but I'm very nervous > about the idea that we should ever be examining the raw-xmin of a > tuple that has been marked HEAP_XMIN_FROZEN for anything other than > forensic purposes. I'm nervous about it as well. These problems are very difficult to reason about. > The design principle I followed in writing the > original patch was that marking a tuple HEAP_XMIN_FROZEN was identical > to setting the xmin to 2 except that the original xmin was still > available for manual inspection. I did point this out myself. > If we're finding that we need the > raw xmin after freezing, doesn't that mean that our freezing algorithm > was flat busted prior to that commit? I wouldn't put it that way at all. That commit provided us with the opportunity to put in a better fix for a problem with update chain traversal, a problem that was particularly critical when certain HOT pruning + freezing races occur (the "failed to find parent tuple" thing). It's clear that freezing was just as busted before and after that commit, though. > And maybe still busted when > things wrap around multiple times and raw-xmins are reused? I'm more concerned about the situation that will remain (or has now been created) on 9.3, where we don't have raw xmin, and so must use very forgiving FrozenTransactionId matching within HeapTupleUpdateXmaxMatchesXmin(). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas wrote: > I haven't really followed this thread in depth, but I'm very nervous > about the idea that we should ever be examining the raw-xmin of a > tuple that has been marked HEAP_XMIN_FROZEN for anything other than > forensic purposes. Yeah, me too. If you see another way to fix the problem, let's discuss it. I think a possible way is to avoid considering that the relfrozenxid value computed by the caller is final. Something like this: if page freezing sees that there is a HOT chain which would end up half-frozen because of that freeze age, decrease the freeze xid enough that the whole chain remains unfrozen; communicate the new value to caller so that it is used as the true new relfrozenxid going forwards. That preserves the HOT chain intact so that it can be pruned and frozen correctly in a later VACUUM call. Need to be careful about HOT chains containing members that are old enough to cause a long term problem (i.e. a table where relfrozenxid doesn't move forward enough). One thing I didn't quite investigate is why this bug only shows up with multixacts so far. Is it just because multixacts provide an easy way to reproduce it, and that there are others, more difficult ways to cause the same problem without involving multixacts? If so, then the problem is likely present in 9.2 as well. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
От
Peter Geoghegan
Дата:
On Tue, Oct 17, 2017 at 3:02 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Yeah, me too. If you see another way to fix the problem, let's discuss > it. I doubt that there is a better way. > I think a possible way is to avoid considering that the relfrozenxid > value computed by the caller is final. While that alternative seems possible, it also seems riskier. > One thing I didn't quite investigate is why this bug only shows up with > multixacts so far. Is it just because multixacts provide an easy way to > reproduce it, and that there are others, more difficult ways to cause > the same problem without involving multixacts? If so, then the problem > is likely present in 9.2 as well. The obvious explanation (although not necessarily the correct one) is that freezing didn't have a MultiXactIdGetUpdateXid() call in 9.2. The way we pass down both cutoff_xid and cutoff_multi to FreezeMultiXactId() seems like it might be involved in the data corruption that we saw (the incorrect pruning/failed to find parent tuple thing). I might spend some time figuring this out later in the week. It's hard to pin down, and I've only really started to learn about MultiXacts in the past few months. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 17, 2017 at 6:02 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Robert Haas wrote: >> I haven't really followed this thread in depth, but I'm very nervous >> about the idea that we should ever be examining the raw-xmin of a >> tuple that has been marked HEAP_XMIN_FROZEN for anything other than >> forensic purposes. > > Yeah, me too. If you see another way to fix the problem, let's discuss > it. Well, I guess what I don't understand is, suppose we have a HOT chain that looks like this, where [x,y] denotes a tuple with an xmin of x and an xmax of y. [a,b]->[b,c]->[c,d] If b is eligible to be frozen, then b is committed and all-visible, which means that the [a,b] tuple should be pruned altogether. IOW, I don't think that a non-root tuple should ever have a frozen xmin; if that happens, I think we've already screwed up. So I don't quite understand how this problem arises in the first place. I think that the way we are supposed to be guaranteeing this is to first prune the page and then freeze it. If we ever freeze without first pruning, I think that's a bug. Now it could be that the problem is that there's a race: after we prune the page, somehow the xmin horizon advances before we freeze. But that also seems like something that shouldn't happen - our notion of the freeze threshold should be frozen at the beginning of the scan and should not advance, and it should be prior to our freeze horizon, which could be allowed to advance but not retreat in the course of the scan. Apologies if this is stupid; please clue me on what I'm missing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
От
Peter Geoghegan
Дата:
On Wed, Oct 18, 2017 at 1:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Well, I guess what I don't understand is, suppose we have a HOT chain > that looks like this, where [x,y] denotes a tuple with an xmin of x > and an xmax of y. > > [a,b]->[b,c]->[c,d] > > If b is eligible to be frozen, then b is committed and all-visible, > which means that the [a,b] tuple should be pruned altogether. IOW, I > don't think that a non-root tuple should ever have a frozen xmin; if > that happens, I think we've already screwed up. So I don't quite > understand how this problem arises in the first place. Technically, we don't freeze the heap-only tuples here, because we cannot. because freezing tuples renders them visible (we don't set XMIN_FROZEN). Instead, we set hint bits with WAL-logging, on the assumption that nobody will ever *need* to interrogate the clog -- see commit 20b65522 from several weeks back. To be clear, this *does* mean that hint bits stop being mere hints, which I find troubling. I described these heap-only tuples as "logically frozen" over on the "heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead bug" thread, which is where I brought up my general concern. > I think that the way we are supposed to be guaranteeing this is to > first prune the page and then freeze it. There is a race where we cannot prune the page, though. That's why we had to revisit what I suppose was a tacit assumption, and address its problems in the commit that started this thread (commit a5736bf7). > But that also seems like something > that shouldn't happen - our notion of the freeze threshold should be > frozen at the beginning of the scan and should not advance, and it > should be prior to our freeze horizon, which could be allowed to > advance but not retreat in the course of the scan. It is not allowed retreat -- kind of. (Alvaro mentioned something in passing about an *alternative* fix where it really was allowed to retreat in the simplest sense [1], but I don't think that that's going to happen.) > Apologies if this is stupid; please clue me on what I'm missing. I don't think that your questions are stupid at all. It intuitively seems bad that xmin freezing is only something we can do to HOT root and non-HOT tuples, while xmax freezing needs to happen to heap-only tuples, as well as HOT root tuples and non-HOT tuples. But, as I said, I'm still playing catch-up on MultiXacts, and I too feel like I might still be missing important details. [1] https://postgr.es/m/20171017100200.ruszi2c6qqwetce5@alvherre.pgsql -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 18, 2017 at 5:52 PM, Peter Geoghegan <pg@bowt.ie> wrote: > There is a race where we cannot prune the page, though. That's why we > had to revisit what I suppose was a tacit assumption, and address its > problems in the commit that started this thread (commit a5736bf7). The commit message for a5736bf7 doesn't say anything about a race; it just claims that it is fixing traversal of half-frozen update chains, without making any reference to how such a thing as a half-frozen update chain came to exist in the first place. I think the commit that talks about that race condition is 20b65522. /me studies the problem for a while. What's bothering me about this is: how is cutoff_xid managing to be a new enough transaction ID for this to happen in the first place? The cutoff XID should certainly be older than anything any current snapshot can see, because it's computed using GetOldestXmin() -- which means that XIDs older than the cutoff can't still be running (except maybe if old_snapshot_threshold is set to a non-default value). And that means that the state of any tuple to which heap_prepare_freeze_tuple() does anything shouldn't be able to change between the time the HOT-prune is done and the time freezing is completed. To put that the other way around, an xmin or xmax that was still running when we did the HOT prune should also be too new to get frozen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
От
Peter Geoghegan
Дата:
On Thu, Oct 19, 2017 at 7:21 AM, Robert Haas <robertmhaas@gmail.com> wrote: > The commit message for a5736bf7 doesn't say anything about a race; it > just claims that it is fixing traversal of half-frozen update chains, > without making any reference to how such a thing as a half-frozen > update chain came to exist in the first place. I think the commit > that talks about that race condition is 20b65522. Well, the race was that the wrong tuple is pruned, because of naivety in the heap_prune_chain() update chain handling, which is what started work that became commit a5736bf7. That's by far the worst consequence of the traversal of half-frozen update chain business that a5736bf7's commit message describes, since we know it leads to wrong answers to index scans (and REINDEX fails with "failed to find parent tuple"). It's the only consequence that's been directly observed, but probably not the only one that's possible. Most other places that suffer the problem are in obscure-to-users paths like EvalPlanQual(). > /me studies the problem for a while. > > What's bothering me about this is: how is cutoff_xid managing to be a > new enough transaction ID for this to happen in the first place? The > cutoff XID should certainly be older than anything any current > snapshot can see, because it's computed using GetOldestXmin() -- which > means that XIDs older than the cutoff can't still be running (except > maybe if old_snapshot_threshold is set to a non-default value). The problem that we directly observed during pruning, which caused "failed to find parent tuple" even after the first freeze-the-dead commit (commit 20b65522), was that the mixture of frozen and unfrozen tuples in a hot chain caused heap_prune_chain() to make an incorrect conclusion about the continuity of a HOT chain. We'd prune the wrong tuples -- it failed to find the visible tuple at the end of the HOT chain, and respect that that meant it could not prune down to a stub ItemId (blow away what it *incorrectly* thought was an entire HOT chain). Now that the problem is fixed by a5736bf7, this test case will prune and get an LP_REDIRECT ItemId (not an LP_DEAD one), as we're no longer confused about the continuity of HOT chains within heap_prune_chain(). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
От
Peter Geoghegan
Дата:
On Thu, Oct 19, 2017 at 9:03 AM, Peter Geoghegan <pg@bowt.ie> wrote: >> /me studies the problem for a while. >> >> What's bothering me about this is: how is cutoff_xid managing to be a >> new enough transaction ID for this to happen in the first place? The >> cutoff XID should certainly be older than anything any current >> snapshot can see, because it's computed using GetOldestXmin() -- which >> means that XIDs older than the cutoff can't still be running (except >> maybe if old_snapshot_threshold is set to a non-default value). > Now that the problem is fixed by a5736bf7, this test case will prune > and get an LP_REDIRECT ItemId (not an LP_DEAD one), as we're no longer > confused about the continuity of HOT chains within heap_prune_chain(). In case I was unclear: the key point here is that it wasn't wrong to prune the tuples because they might still be visible to somebody's snapshot. It was wrong to do so in a specific way that could still happen prior to a5736bf7, leaving only a stub LP_DEAD line pointer, because index scans needed to do HOT chain traversal in order to get to the visible tuple that wasn't pruned (the one that was still frozen). You might say that the HOT chain "appeared to split in two" from heap_prune_chain()'s perspective, a problem that has nothing to do with an XID cutoff. That's why after 20b65522 there was no problem with sequential scans. There were remaining problems with index scans, though, purely because of this HOT pruning business -- that was fixed in a5736bf7. (And, as I've said, problems with "traversal of half-frozen update chains" are presumed to have existed for routines like EvalPlanQual() -- even more subtle problems.) One thing I'm not clear on is if we could or should have limited this new HeapTupleUpdateXmaxMatchesXmin() stuff to HOT chains, and so not altered behavior for cross-page cases (for routines like EvalPlanQualFetch() -- not routines like heap_prune_chain(), that only ever deal with a single page at a time). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers