Обсуждение: Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

Поиск
Список
Период
Сортировка

Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

От
Alvaro Herrera
Дата:
Peter Geoghegan wrote:
> On Fri, Oct 6, 2017 at 8:29 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> >     /*
> >      * When a tuple is frozen, the original Xmin is lost, but we know it's a
> >      * committed transaction.  So unless the Xmax is InvalidXid, we don't know
> >      * for certain that there is a match, but there may be one; and we must
> >      * return true so that a HOT chain that is half-frozen can be walked
> >      * correctly.
> >      *
> >      * We no longer freeze tuples this way, but we must keep this in order to
> >      * interpret pre-pg_upgrade pages correctly.
> >      */
> >     if (TransactionIdEquals(xmin, FrozenTransactionId) &&
> >         TransactionIdIsValid(xmax))
> >         return true;
> >
> >     return false;
> > }
> 
> Wouldn't this last "if" test, to cover the pg_upgrade case, be better
> targeted by comparing *raw* xmin to FrozenTransactionId? You're using
> the potentially distinct xmin value returned by
> HeapTupleHeaderGetXmin() for the test here. I think we should be
> directly targeting tuples frozen on or before 9.4 (prior to
> pg_upgrade) instead.

Yes, agreed, I should change that.  Thanks for continuing to think about
this.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

От
Robert Haas
Дата:
On Thu, Oct 12, 2017 at 7:31 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> Wouldn't this last "if" test, to cover the pg_upgrade case, be better
>> targeted by comparing *raw* xmin to FrozenTransactionId? You're using
>> the potentially distinct xmin value returned by
>> HeapTupleHeaderGetXmin() for the test here. I think we should be
>> directly targeting tuples frozen on or before 9.4 (prior to
>> pg_upgrade) instead.
>
> Yes, agreed, I should change that.  Thanks for continuing to think about
> this.

I haven't really followed this thread in depth, but I'm very nervous
about the idea that we should ever be examining the raw-xmin of a
tuple that has been marked HEAP_XMIN_FROZEN for anything other than
forensic purposes.  The design principle I followed in writing the
original patch was that marking a tuple HEAP_XMIN_FROZEN was identical
to setting the xmin to 2 except that the original xmin was still
available for manual inspection.  If we're finding that we need the
raw xmin after freezing, doesn't that mean that our freezing algorithm
was flat busted prior to that commit?  And maybe still busted when
things wrap around multiple times and raw-xmins are reused?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

От
Peter Geoghegan
Дата:
On Fri, Oct 13, 2017 at 4:54 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I haven't really followed this thread in depth, but I'm very nervous
> about the idea that we should ever be examining the raw-xmin of a
> tuple that has been marked HEAP_XMIN_FROZEN for anything other than
> forensic purposes.

I'm nervous about it as well. These problems are very difficult to reason about.

> The design principle I followed in writing the
> original patch was that marking a tuple HEAP_XMIN_FROZEN was identical
> to setting the xmin to 2 except that the original xmin was still
> available for manual inspection.

I did point this out myself.

> If we're finding that we need the
> raw xmin after freezing, doesn't that mean that our freezing algorithm
> was flat busted prior to that commit?

I wouldn't put it that way at all. That commit provided us with the
opportunity to put in a better fix for a problem with update chain
traversal, a problem that was particularly critical when certain HOT
pruning + freezing races occur (the "failed to find parent tuple"
thing). It's clear that freezing was just as busted before and after
that commit, though.

> And maybe still busted when
> things wrap around multiple times and raw-xmins are reused?

I'm more concerned about the situation that will remain (or has now
been created) on 9.3, where we don't have raw xmin, and so must use
very forgiving FrozenTransactionId matching within
HeapTupleUpdateXmaxMatchesXmin().

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozenupdate chains

От
Alvaro Herrera
Дата:
Robert Haas wrote:

> I haven't really followed this thread in depth, but I'm very nervous
> about the idea that we should ever be examining the raw-xmin of a
> tuple that has been marked HEAP_XMIN_FROZEN for anything other than
> forensic purposes.

Yeah, me too.  If you see another way to fix the problem, let's discuss
it.

I think a possible way is to avoid considering that the relfrozenxid
value computed by the caller is final.  Something like this: if page
freezing sees that there is a HOT chain which would end up half-frozen
because of that freeze age, decrease the freeze xid enough that the
whole chain remains unfrozen; communicate the new value to caller so
that it is used as the true new relfrozenxid going forwards.  That
preserves the HOT chain intact so that it can be pruned and frozen
correctly in a later VACUUM call.  Need to be careful about HOT chains
containing members that are old enough to cause a long term problem
(i.e. a table where relfrozenxid doesn't move forward enough).

One thing I didn't quite investigate is why this bug only shows up with
multixacts so far.  Is it just because multixacts provide an easy way to
reproduce it, and that there are others, more difficult ways to cause
the same problem without involving multixacts?  If so, then the problem
is likely present in 9.2 as well.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

От
Peter Geoghegan
Дата:
On Tue, Oct 17, 2017 at 3:02 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Yeah, me too.  If you see another way to fix the problem, let's discuss
> it.

I doubt that there is a better way.

> I think a possible way is to avoid considering that the relfrozenxid
> value computed by the caller is final.

While that alternative seems possible, it also seems riskier.

> One thing I didn't quite investigate is why this bug only shows up with
> multixacts so far.  Is it just because multixacts provide an easy way to
> reproduce it, and that there are others, more difficult ways to cause
> the same problem without involving multixacts?  If so, then the problem
> is likely present in 9.2 as well.

The obvious explanation (although not necessarily the correct one) is
that freezing didn't have a MultiXactIdGetUpdateXid() call in 9.2. The
way we pass down both cutoff_xid and cutoff_multi to
FreezeMultiXactId() seems like it might be involved in the data
corruption that we saw (the incorrect pruning/failed to find parent
tuple thing).

I might spend some time figuring this out later in the week. It's hard
to pin down, and I've only really started to learn about MultiXacts in
the past few months.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

От
Robert Haas
Дата:
On Tue, Oct 17, 2017 at 6:02 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Robert Haas wrote:
>> I haven't really followed this thread in depth, but I'm very nervous
>> about the idea that we should ever be examining the raw-xmin of a
>> tuple that has been marked HEAP_XMIN_FROZEN for anything other than
>> forensic purposes.
>
> Yeah, me too.  If you see another way to fix the problem, let's discuss
> it.

Well, I guess what I don't understand is, suppose we have a HOT chain
that looks like this, where [x,y] denotes a tuple with an xmin of x
and an xmax of y.

[a,b]->[b,c]->[c,d]

If b is eligible to be frozen, then b is committed and all-visible,
which means that the [a,b] tuple should be pruned altogether.  IOW, I
don't think that a non-root tuple should ever have a frozen xmin; if
that happens, I think we've already screwed up.  So I don't quite
understand how this problem arises in the first place.

I think that the way we are supposed to be guaranteeing this is to
first prune the page and then freeze it.  If we ever freeze without
first pruning, I think that's a bug.  Now it could be that the problem
is that there's a race: after we prune the page, somehow the xmin
horizon advances before we freeze.  But that also seems like something
that shouldn't happen - our notion of the freeze threshold should be
frozen at the beginning of the scan and should not advance, and it
should be prior to our freeze horizon, which could be allowed to
advance but not retreat in the course of the scan.

Apologies if this is stupid; please clue me on what I'm missing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

От
Peter Geoghegan
Дата:
On Wed, Oct 18, 2017 at 1:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Well, I guess what I don't understand is, suppose we have a HOT chain
> that looks like this, where [x,y] denotes a tuple with an xmin of x
> and an xmax of y.
>
> [a,b]->[b,c]->[c,d]
>
> If b is eligible to be frozen, then b is committed and all-visible,
> which means that the [a,b] tuple should be pruned altogether.  IOW, I
> don't think that a non-root tuple should ever have a frozen xmin; if
> that happens, I think we've already screwed up.  So I don't quite
> understand how this problem arises in the first place.

Technically, we don't freeze the heap-only tuples here, because we
cannot. because freezing tuples renders them visible (we don't set
XMIN_FROZEN). Instead, we set hint bits with WAL-logging, on the
assumption that nobody will ever *need* to interrogate the clog -- see
commit 20b65522 from several weeks back. To be clear, this *does* mean
that hint bits stop being mere hints, which I find troubling.

I described these heap-only tuples as "logically frozen" over on the
"heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead
bug" thread, which is where I brought up my general concern.

> I think that the way we are supposed to be guaranteeing this is to
> first prune the page and then freeze it.

There is a race where we cannot prune the page, though. That's why we
had to revisit what I suppose was a tacit assumption, and address its
problems in the commit that started this thread (commit a5736bf7).

> But that also seems like something
> that shouldn't happen - our notion of the freeze threshold should be
> frozen at the beginning of the scan and should not advance, and it
> should be prior to our freeze horizon, which could be allowed to
> advance but not retreat in the course of the scan.

It is not allowed retreat -- kind of. (Alvaro mentioned something in
passing about an *alternative* fix where it really was allowed to
retreat in the simplest sense [1], but I don't think that that's going
to happen.)

> Apologies if this is stupid; please clue me on what I'm missing.

I don't think that your questions are stupid at all. It intuitively
seems bad that xmin freezing is only something we can do to HOT root
and non-HOT tuples, while xmax freezing needs to happen to heap-only
tuples, as well as HOT root tuples and non-HOT tuples. But, as I said,
I'm still playing catch-up on MultiXacts, and I too feel like I might
still be missing important details.

[1] https://postgr.es/m/20171017100200.ruszi2c6qqwetce5@alvherre.pgsql
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

От
Robert Haas
Дата:
On Wed, Oct 18, 2017 at 5:52 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> There is a race where we cannot prune the page, though. That's why we
> had to revisit what I suppose was a tacit assumption, and address its
> problems in the commit that started this thread (commit a5736bf7).

The commit message for a5736bf7 doesn't say anything about a race; it
just claims that it is fixing traversal of half-frozen update chains,
without making any reference to how such a thing as a half-frozen
update chain came to exist in the first place.  I think the commit
that talks about that race condition is 20b65522.

/me studies the problem for a while.

What's bothering me about this is: how is cutoff_xid managing to be a
new enough transaction ID for this to happen in the first place?  The
cutoff XID should certainly be older than anything any current
snapshot can see, because it's computed using GetOldestXmin() -- which
means that XIDs older than the cutoff can't still be running (except
maybe if old_snapshot_threshold is set to a non-default value).  And
that means that the state of any tuple to which
heap_prepare_freeze_tuple() does anything shouldn't be able to change
between the time the HOT-prune is done and the time freezing is
completed.  To put that the other way around, an xmin or xmax that was
still running when we did the HOT prune should also be too new to get
frozen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

От
Peter Geoghegan
Дата:
On Thu, Oct 19, 2017 at 7:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> The commit message for a5736bf7 doesn't say anything about a race; it
> just claims that it is fixing traversal of half-frozen update chains,
> without making any reference to how such a thing as a half-frozen
> update chain came to exist in the first place.  I think the commit
> that talks about that race condition is 20b65522.

Well, the race was that the wrong tuple is pruned, because of naivety
in the heap_prune_chain() update chain handling, which is what started
work that became commit a5736bf7. That's by far the worst consequence
of the traversal of half-frozen update chain business that a5736bf7's
commit message describes, since we know it leads to wrong answers to
index scans (and REINDEX fails with "failed to find parent tuple").
It's the only consequence that's been directly observed, but probably
not the only one that's possible. Most other places that suffer the
problem are in obscure-to-users paths like EvalPlanQual().

> /me studies the problem for a while.
>
> What's bothering me about this is: how is cutoff_xid managing to be a
> new enough transaction ID for this to happen in the first place?  The
> cutoff XID should certainly be older than anything any current
> snapshot can see, because it's computed using GetOldestXmin() -- which
> means that XIDs older than the cutoff can't still be running (except
> maybe if old_snapshot_threshold is set to a non-default value).

The problem that we directly observed during pruning, which caused
"failed to find parent tuple" even after the first freeze-the-dead
commit (commit 20b65522), was that the mixture of frozen and unfrozen
tuples in a hot chain caused heap_prune_chain() to make an incorrect
conclusion about the continuity of a HOT chain. We'd prune the wrong
tuples -- it failed to find the visible tuple at the end of the HOT
chain, and respect that that meant it could not prune down to a stub
ItemId (blow away what it *incorrectly* thought was an entire HOT
chain).

Now that the problem is fixed by a5736bf7, this test case will prune
and get an LP_REDIRECT ItemId (not an LP_DEAD one), as we're no longer
confused about the continuity of HOT chains within heap_prune_chain().

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

От
Peter Geoghegan
Дата:
On Thu, Oct 19, 2017 at 9:03 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>> /me studies the problem for a while.
>>
>> What's bothering me about this is: how is cutoff_xid managing to be a
>> new enough transaction ID for this to happen in the first place?  The
>> cutoff XID should certainly be older than anything any current
>> snapshot can see, because it's computed using GetOldestXmin() -- which
>> means that XIDs older than the cutoff can't still be running (except
>> maybe if old_snapshot_threshold is set to a non-default value).

> Now that the problem is fixed by a5736bf7, this test case will prune
> and get an LP_REDIRECT ItemId (not an LP_DEAD one), as we're no longer
> confused about the continuity of HOT chains within heap_prune_chain().

In case I was unclear: the key point here is that it wasn't wrong to
prune the tuples because they might still be visible to somebody's
snapshot. It was wrong to do so in a specific way that could still
happen prior to a5736bf7, leaving only a stub LP_DEAD line pointer,
because index scans needed to do HOT chain traversal in order to get
to the visible tuple that wasn't pruned (the one that was still
frozen). You might say that the HOT chain "appeared to split in two"
from heap_prune_chain()'s perspective, a problem that has nothing to
do with an XID cutoff.

That's why after 20b65522 there was no problem with sequential scans.
There were remaining problems with index scans, though, purely because
of this HOT pruning business -- that was fixed in a5736bf7. (And, as
I've said, problems with "traversal of half-frozen update chains" are
presumed to have existed for routines like EvalPlanQual() -- even more
subtle problems.)

One thing I'm not clear on is if we could or should have limited this
new HeapTupleUpdateXmaxMatchesXmin() stuff to HOT chains, and so not
altered behavior for cross-page cases (for routines like
EvalPlanQualFetch() -- not routines like heap_prune_chain(), that only
ever deal with a single page at a time).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers