Обсуждение: Inaccuracy in VACUUM's tuple count estimates

Поиск
Список
Период
Сортировка

Inaccuracy in VACUUM's tuple count estimates

От
Tom Lane
Дата:
I've been looking at the complaint Tim Wilson posted in pgsql-performance
about badly inaccurate reltuples updates coming from VACUUM.  There seem
to be a number of problems leading to that.  The key point is that when
VACUUM has scanned only part of the relation, it assumes that the
live-tuple density in that part of the relation is num_tuples (I'm
speaking of the counter accumulated in lazy_scan_heap) divided by
scanned_pages, and then it tries to extrapolate that information to the
rest of the relation.  Now, the validity of that extrapolation is a bit
questionable given that VACUUM is considering a highly nonrandom subset of
the table's pages, but the real problem is the values are wrong even for
the pages we did look at.  To wit:

* scanned_pages is not reliably the number of pages we scanned, because
somebody thought it would be cute to bump it even for pages we decided
didn't need to be scanned because they contain no freezable tuples.
So we have an increment in scanned_pages, but no corresponding increment
in the tuple count, leading to a density underestimate.  This seems to
only happen in vacuum-for-wraparound cases, but it's still wrong.  We need
to separate the logic about whether we skipped any pages from the
statistical counters.

* num_tuples has very little to do with the number of live tuples, because
it actually counts all nonremovable tuples, including RECENTLY_DEAD,
INSERT_IN_PROGRESS, and DELETE_IN_PROGRESS tuples.  In the case Tim is
complaining about, the VACUUM happens concurrently with a long transaction
that is bulk-updating most tuples in the relation, some of them several
times, so that VACUUM sees multiple images of every tuple (one
INSERT_IN_PROGRESS, the rest DELETE_IN_PROGRESS), and thus arrives at a
horrid overestimate of the number of live tuples.

I figured it'd be easy enough to get a better estimate by adding another
counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
assuming that in-progress inserts and deletes will both commit).  I did
that, and found that it helped Tim's test case not at all :-(.  A bit of
sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
whether the transaction has since marked it for deletion:
           /*            * It'd be possible to discern between INSERT/DELETE in progress            * here by looking
atxmax - but that doesn't seem beneficial for            * the majority of callers and even detrimental for some. We'd
         * rather have callers look at/wait for xmin than xmax. It's            * always correct to return
INSERT_IN_PROGRESSbecause that's            * what's happening from the view of other backends.            */
returnHEAPTUPLE_INSERT_IN_PROGRESS;
 

It did not use to blow this question off: back around 8.3 you got
DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
less laziness + fuzzy thinking here.  Maybe we should have a separate
HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?  Is it *really*
the case that callers other than VACUUM itself are okay with failing
to make this distinction?  I'm dubious: there are very few if any
callers that treat the INSERT and DELETE cases exactly alike.
        regards, tom lane



Re: Inaccuracy in VACUUM's tuple count estimates

От
Andres Freund
Дата:
On 2014-06-06 15:44:25 -0400, Tom Lane wrote:
> I figured it'd be easy enough to get a better estimate by adding another
> counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
> assuming that in-progress inserts and deletes will both commit).  I did
> that, and found that it helped Tim's test case not at all :-(.  A bit of
> sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
> INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
> whether the transaction has since marked it for deletion:
> 
>             /*
>              * It'd be possible to discern between INSERT/DELETE in progress
>              * here by looking at xmax - but that doesn't seem beneficial for
>              * the majority of callers and even detrimental for some. We'd
>              * rather have callers look at/wait for xmin than xmax. It's
>              * always correct to return INSERT_IN_PROGRESS because that's
>              * what's happening from the view of other backends.
>              */
>             return HEAPTUPLE_INSERT_IN_PROGRESS;

That's only the case of a couple of days ago. I really wasn't sure
wheter to go that way or discern the two cases. That changed in the wake
of:
http://www.postgresql.org/message-id/20140530143150.GA11051@localhost

I tried to solicit feedback (e.g. by CCing you :)) but I mostly
failed. Alvaro agreed, on IM, that it's better this way.

> It did not use to blow this question off: back around 8.3 you got
> DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
> less laziness + fuzzy thinking here.

My argument for not discerning wasn't that it's hard to do, but that it
might confuse callers more the other way round. E.g. doing a
XactLockTableWait(xmax) might not be sufficient for the tuple being
alive.


> Maybe we should have a separate
> HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?

Maybe.

> Is it *really*
> the case that callers other than VACUUM itself are okay with failing
> to make this distinction?  I'm dubious: there are very few if any
> callers that treat the INSERT and DELETE cases exactly alike.

I looked through all of them and saw none that'd be problematic. And
some, like predicate.c, where the new behaviour seems to be better. Most
of the ones that care about INSERT/DELETE_IN_PROGRESS wait on xmin/xmax
respectively.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Inaccuracy in VACUUM's tuple count estimates

От
Amit Kapila
Дата:
On Sat, Jun 7, 2014 at 1:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>
> On 2014-06-06 15:44:25 -0400, Tom Lane wrote:
> > I figured it'd be easy enough to get a better estimate by adding another
> > counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
> > assuming that in-progress inserts and deletes will both commit).  I did
> > that, and found that it helped Tim's test case not at all :-(.  A bit of
> > sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
> > INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
> > whether the transaction has since marked it for deletion:
> >
> >             /*
> >              * It'd be possible to discern between INSERT/DELETE in progress
> >              * here by looking at xmax - but that doesn't seem beneficial for
> >              * the majority of callers and even detrimental for some. We'd
> >              * rather have callers look at/wait for xmin than xmax. It's
> >              * always correct to return INSERT_IN_PROGRESS because that's
> >              * what's happening from the view of other backends.
> >              */
> >             return HEAPTUPLE_INSERT_IN_PROGRESS;
>
> That's only the case of a couple of days ago. I really wasn't sure
> wheter to go that way or discern the two cases. That changed in the wake
> of:
> http://www.postgresql.org/message-id/20140530143150.GA11051@localhost

Won't this change impact the calculation of number of live
rows for analyze (acquire_sample_rows() considers the 

HEAPTUPLE_DELETE_IN_PROGRESS tuples as liverows

for tuples updated by transactions other than current transaction)?


Even if we think that estimates are okay, the below comment

in acquire_same_rows() doesn't seem to suggest it.


/*
 * We count delete-in-progress rows as still live, using
 * the same reasoning given above; but we don't bother to
 * include them in the sample.
 *

..
*/


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Inaccuracy in VACUUM's tuple count estimates

От
Robert Haas
Дата:
On Fri, Jun 6, 2014 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It did not use to blow this question off: back around 8.3 you got
> DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
> less laziness + fuzzy thinking here.  Maybe we should have a separate
> HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?  Is it *really*
> the case that callers other than VACUUM itself are okay with failing
> to make this distinction?

I think that would be a good idea for conceptual clarity if nothing
else.  If callers are OK with it, then they can treat both of those
codes alike; but then at least there's clear evidence as to the
author's intent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Inaccuracy in VACUUM's tuple count estimates

От
Andres Freund
Дата:
On 2014-06-09 10:14:32 -0400, Robert Haas wrote:
> On Fri, Jun 6, 2014 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > It did not use to blow this question off: back around 8.3 you got
> > DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
> > less laziness + fuzzy thinking here.  Maybe we should have a separate
> > HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?  Is it *really*
> > the case that callers other than VACUUM itself are okay with failing
> > to make this distinction?
> 
> I think that would be a good idea for conceptual clarity if nothing
> else.  If callers are OK with it, then they can treat both of those
> codes alike; but then at least there's clear evidence as to the
> author's intent.

I am happy to introduce the code for that. But it'd have to be >=9.4 or
> 9.4?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Inaccuracy in VACUUM's tuple count estimates

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-09 10:14:32 -0400, Robert Haas wrote:
>> I think that would be a good idea for conceptual clarity if nothing
>> else.  If callers are OK with it, then they can treat both of those
>> codes alike; but then at least there's clear evidence as to the
>> author's intent.

> I am happy to introduce the code for that. But it'd have to be >=9.4 or
> 9.4?

We need a solution that can be back-patched, unless you're prepared to
revert what you already did to HTSV in the back branches.

Having said that, it's not clear to me that we couldn't change HTSV's
API in the back branches.  What third-party code would be likely to
be depending on it?
        regards, tom lane



Re: Inaccuracy in VACUUM's tuple count estimates

От
Andres Freund
Дата:
On 2014-06-09 10:30:43 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-06-09 10:14:32 -0400, Robert Haas wrote:
> >> I think that would be a good idea for conceptual clarity if nothing
> >> else.  If callers are OK with it, then they can treat both of those
> >> codes alike; but then at least there's clear evidence as to the
> >> author's intent.
> 
> > I am happy to introduce the code for that. But it'd have to be >=9.4 or
> > 9.4?
> 
> We need a solution that can be back-patched, unless you're prepared to
> revert what you already did to HTSV in the back branches.

Well, I think reverting surely wouldn't be the right cure. It fixes a
somewhat nasty bug. I'd certainly be prepared to add the two lines
necessary to make it return DELETE_IN_PROGRESS after trying to
understand Kevin's email about predicate.c and going through the other
callers another time.
I did ask about what people think is the more appropriate return
value. And the only person that had voiced an opinion was Alvaro
agreeing that it's more correct to return INSERT_IN_PROGRESS... Don't
make this about me insisting on anything.

> Having said that, it's not clear to me that we couldn't change HTSV's
> API in the back branches.  What third-party code would be likely to
> be depending on it?

I'm not sure. I could imagine tools like pg_repack depending on it (it
doesn't). I started to search for external users of the function and
have only found a bunch of forks of postgres so far. Several of which
I've never heard of before.
I think it's more reasonable to return DELETE_IN_PROGRESS in existing
branches and then go the new return value in 9.5 or maybe 9.4.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Inaccuracy in VACUUM's tuple count estimates

От
Kevin Grittner
Дата:
Andres Freund <andres@2ndquadrant.com> wrote:

> Well, I think reverting surely wouldn't be the right cure. It
> fixes a somewhat nasty bug. I'd certainly be prepared to add the
> two lines necessary to make it return DELETE_IN_PROGRESS after
> trying to understand Kevin's email about predicate.c and going
> through the other callers another time.

I'm not actually sure yet whether the current state of affairs
causes a problem for the serializable transaction isolation level
implementation.  The most important thing to me is that whatever is
implemented is accurately documented in code comments so I can make
any necessary adjustments to code in predicate.c -- or possibly
determine that I *do* need some change to HTSV.  Right now the HTSV
embedded code comments suggest that the enum names and comments
don't accurately describe the conditions under which they are
returned, but I can't find anything else which does, short of
reverse-engineering that from some fairly complex code.

Perhaps it would be good if you could provide a concise description
of the conditions under which value could currently be returned on
this (or the related) thread before we talk about what changes
might be needed?  Maybe this is clear to others involved in the
discussion, but I am not confident that I fully understand what
gets returned under what conditions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Inaccuracy in VACUUM's tuple count estimates

От
Andres Freund
Дата:
On 2014-06-09 08:00:52 -0700, Kevin Grittner wrote:
> I'm not actually sure yet whether the current state of affairs
> causes a problem for the serializable transaction isolation level
> implementation.

I'd replied in the other thread before noticing you'd replied
here... From what I understand right now it's not affected at all.

I tried to make things a bit clearer there - but I am not sure I've
succeed. I'm certainly willing to explain things further if you can tell
me which are is unclear.

> Right now the HTSV
> embedded code comments suggest that the enum names and comments
> don't accurately describe the conditions under which they are
> returned, but I can't find anything else which does, short of
> reverse-engineering that from some fairly complex code.

Not really. You could even argue the current code more correctly
represents the (old) comments:HEAPTUPLE_INSERT_IN_PROGRESS,    /* inserting xact is still in progress
*/HEAPTUPLE_DELETE_IN_PROGRESS   /* deleting xact is still in progress */
 
the current code will return INSERT_IN_PROGRESS even if the tuple has
*also* been deleted in another xact...
I think the problem here is that there's simply no way to really
represent that case accurately with the current API.

> Perhaps it would be good if you could provide a concise description
> of the conditions under which value could currently be returned on
> this (or the related) thread before we talk about what changes
> might be needed? Maybe this is clear to others involved in the
> discussion, but I am not confident that I fully understand what
> gets returned under what conditions.
HEAPTUPLE_DEAD,                /* tuple is dead and deletable */
1) xmin has committed, xmax has committed and wasn't just a locker. Xmax
precedes OldestXmin.HEAPTUPLE_LIVE,                /* tuple is live (committed, no deleter) */
1) xmin has committed, xmax unset
2) xmin has committed, xmax is locked only. Status of xmax is irrelevant
3) xmin has committed, xmax has aborted.HEAPTUPLE_RECENTLY_DEAD,    /* tuple is dead, but not deletable yet */
1) xmin has committed, xmax has committed and wasn't only a locker. But
xmax doesn't precede OldestXmin.HEAPTUPLE_INSERT_IN_PROGRESS,        /* inserting xact is still in progress */
new:
1) xmin is in progress, xmin is the current backend, xmax is invalid
2) xmin is in progress, xmin is the current backend, xmax only a locker
3) xmin is in progress, xmin is the current backend, xmax aborted
4) xmin is in progress, xmin is *not* current backend, xmax is irrelevant
old:
1) xmin is in progress, xmax is invalid
2) xmin is in progress, xmax is only a lockerHEAPTUPLE_DELETE_IN_PROGRESS    /* deleting xact is still in progress */
new:
1) xmin has committed, xmax is in progress, xmax is not just a locker
2) xmin is in progress, xmin is the current backend, xmax is not just a  locker and in progress.
old:
1) xmin has committed, xmax is in progress, xmax is not just a locker
2) xmin is in progress, xmax is set and not not just a locker

Note that the 2) case here never checked xmax's status.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Inaccuracy in VACUUM's tuple count estimates

От
Kevin Grittner
Дата:
Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-06-09 08:00:52 -0700, Kevin Grittner wrote:

> I tried to make things a bit clearer there - but I am not sure I've
> succeed. I'm certainly willing to explain things further if you can tell
> me which are is unclear.

Thanks!  IMO, something like this should be included in the code
comments.

>     HEAPTUPLE_INSERT_IN_PROGRESS,    /* inserting xact is still in progress */
>     HEAPTUPLE_DELETE_IN_PROGRESS    /* deleting xact is still in progress */
> the current code will return INSERT_IN_PROGRESS even if the tuple has
> *also* been deleted in another xact...
> I think the problem here is that there's simply no way to really
> represent that case accurately with the current API.

For purposes of predicate.c, if the "also deleted" activity might
be rolled back without rolling back the insert, INSERT_IN_PROGRESS
is the only correct value.  If they will either both commit or
neither will commit, predicate.c would be more efficient if
HEAPTUPLE_RECENTLY_DEAD was returned, but I
HEAPTUPLE_INSERT_IN_PROGRESS would be OK from a correctness PoV.

>> Perhaps it would be good if you could provide a concise description
>> of the conditions under which value could currently be returned on
>> this (or the related) thread before we talk about what changes
>> might be needed? Maybe this is clear to others involved in the
>> discussion, but I am not confident that I fully understand what
>> gets returned under what conditions.
>
>     HEAPTUPLE_DEAD,                /* tuple is dead and deletable */
> 1) xmin has committed, xmax has committed and wasn't just a locker. Xmax
> precedes OldestXmin.

Perfect.

>     HEAPTUPLE_LIVE,                /* tuple is live (committed, no deleter) */
> 1) xmin has committed, xmax unset
> 2) xmin has committed, xmax is locked only. Status of xmax is irrelevant
> 3) xmin has committed, xmax has aborted.

Perfect.

>     HEAPTUPLE_RECENTLY_DEAD,    /* tuple is dead, but not deletable yet */
> 1) xmin has committed, xmax has committed and wasn't only a locker. But
> xmax doesn't precede OldestXmin.

For my purposes, it would be better if this also included:
 2) xmin is in progress, xmax matches (or includes) xmin

... but that would be only a performance tweak.

>     HEAPTUPLE_INSERT_IN_PROGRESS,        /* inserting xact is still in progress
> */
> new:
> 1) xmin is in progress, xmin is the current backend, xmax is invalid
> 2) xmin is in progress, xmin is the current backend, xmax only a locker
> 3) xmin is in progress, xmin is the current backend, xmax aborted
> 4) xmin is in progress, xmin is *not* current backend, xmax is irrelevant
> old:
> 1) xmin is in progress, xmax is invalid
> 2) xmin is in progress, xmax is only a locker

I think this is OK from a correctness PoV.  There may be an
opportunity to optimize.  I will look more closely at whether it
seems likely to matter much, and what sort of change in the return
conditions or in predicate.c might be needed.

>     HEAPTUPLE_DELETE_IN_PROGRESS    /* deleting xact is still in progress */
> new:
> 1) xmin has committed, xmax is in progress, xmax is not just a locker
> 2) xmin is in progress, xmin is the current backend, xmax is not just a
>   locker and in progress.

I'm not clear on how 2) could happen unless xmax is the current
backend or a subtransaction thereof.  Could you clarify?

> old:
> 1) xmin has committed, xmax is in progress, xmax is not just a locker
> 2) xmin is in progress, xmax is set and not not just a locker
>
> Note that the 2) case here never checked xmax's status.

Again, I'm not sure how 2) could happen unless they involve the
same top-level transaction.  What am I missing?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Inaccuracy in VACUUM's tuple count estimates

От
Andres Freund
Дата:
On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> >     HEAPTUPLE_INSERT_IN_PROGRESS,    /* inserting xact is still in progress */
> >     HEAPTUPLE_DELETE_IN_PROGRESS    /* deleting xact is still in progress */
> > the current code will return INSERT_IN_PROGRESS even if the tuple has
> > *also* been deleted in another xact...
> > I think the problem here is that there's simply no way to really
> > represent that case accurately with the current API.
> 
> For purposes of predicate.c, if the "also deleted" activity might
> be rolled back without rolling back the insert, INSERT_IN_PROGRESS
> is the only correct value.  If they will either both commit or
> neither will commit, predicate.c would be more efficient if
> HEAPTUPLE_RECENTLY_DEAD was returned, but I
> HEAPTUPLE_INSERT_IN_PROGRESS would be OK from a correctness PoV.

That's basically the argument for the new behaviour.

But I am not sure, given predicate.c's coding, how
HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate,
since that's the contentious point with Tom? Since 'both in progress'
can only happen if xmin and xmax are the same toplevel xid and you
resolve subxids to toplevel xids I think it should currently be safe
either way?

> >     HEAPTUPLE_RECENTLY_DEAD,    /* tuple is dead, but not deletable yet */
> > 1) xmin has committed, xmax has committed and wasn't only a locker. But
> > xmax doesn't precede OldestXmin.
> 
> For my purposes, it would be better if this also included:
>  2) xmin is in progress, xmax matches (or includes) xmin
> 
> ... but that would be only a performance tweak.

I don't see that happening as there's several callers for which it is
important to know whether the xacts are still alive or not.

> >     HEAPTUPLE_DELETE_IN_PROGRESS    /* deleting xact is still in progress */
> > new:
> > 1) xmin has committed, xmax is in progress, xmax is not just a locker
> > 2) xmin is in progress, xmin is the current backend, xmax is not just a
> >   locker and in progress.
> 
> I'm not clear on how 2) could happen unless xmax is the current
> backend or a subtransaction thereof.  Could you clarify?
> 
> > old:
> > 1) xmin has committed, xmax is in progress, xmax is not just a locker
> > 2) xmin is in progress, xmax is set and not not just a locker
> >
> > Note that the 2) case here never checked xmax's status.
> 
> Again, I'm not sure how 2) could happen unless they involve the
> same top-level transaction.  What am I missing?

Right, both can only happen if the tuple is created & deleted in the
same backend. Is that in contradiction to something you see?

Andres



Re: Inaccuracy in VACUUM's tuple count estimates

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote:
>> For purposes of predicate.c, if the "also deleted" activity might
>> be rolled back without rolling back the insert, INSERT_IN_PROGRESS
>> is the only correct value.

> That's basically the argument for the new behaviour.

> But I am not sure, given predicate.c's coding, how
> HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate,
> since that's the contentious point with Tom? Since 'both in progress'
> can only happen if xmin and xmax are the same toplevel xid and you
> resolve subxids to toplevel xids I think it should currently be safe
> either way?

Assuming that Kevin's describing his needs correctly, we could resolve
this by inventing HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS.  VACUUM could
assume that that's a probably-dead tuple, while SSI could do something
different.  I'm not sure if it's worth special-casing xmin == xmax,
where the tuple is guaranteed to never be of interest to any other
transaction?

The reason this stuff is not too carefully spec'd is that when HTSV
was written, there was no expectation that there was any correctness
issue around which of these cases was returned.  I wonder whether SSI
should be using HTSV at all.
        regards, tom lane



Re: Inaccuracy in VACUUM's tuple count estimates

От
Kevin Grittner
Дата:
Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote:

> I am not sure, given predicate.c's coding, how
> HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate,
> since that's the contentious point with Tom? Since 'both in
> progress'
> can only happen if xmin and xmax are the same toplevel xid and you
> resolve subxids to toplevel xids I think it should currently be safe
> either way?

The only way that it could be a problem is if the DELETE is in a
subtransaction which might get rolled back without rolling back the
INSERT.  If we ignore the conflict because we assume the INSERT
will be negated by the DELETE, and that doesn't happen, we would
get false negatives which would compromise correctness.  If we
assume that the DELETE might not happen when the DELETE is not in a
separate subtransaction we might get a false positive, which would
only be a performance hit.  If we know either is possible and have
a way to check in predicate.c, it's fine to check it there.

>>>     HEAPTUPLE_RECENTLY_DEAD,    /* tuple is dead, but not deletable yet */
>>> 1) xmin has committed, xmax has committed and wasn't only a locker. But
>>> xmax doesn't precede OldestXmin.
>>
>>  For my purposes, it would be better if this also included:
>>   2) xmin is in progress, xmax matches (or includes) xmin
>>
>>  ... but that would be only a performance tweak.
>
> I don't see that happening as there's several callers for which it is
> important to know whether the xacts are still alive or not.

OK

>>>     HEAPTUPLE_DELETE_IN_PROGRESS    /* deleting xact is still in progress */
>>> new:
>>> 1) xmin has committed, xmax is in progress, xmax is not just a locker
>>> 2) xmin is in progress, xmin is the current backend, xmax is not just a
>>>   locker and in progress.
>>
>>  I'm not clear on how 2) could happen unless xmax is the current
>>  backend or a subtransaction thereof.  Could you clarify?
>>
>>> old:
>>> 1) xmin has committed, xmax is in progress, xmax is not just a locker
>>> 2) xmin is in progress, xmax is set and not not just a locker
>>>
>>> Note that the 2) case here never checked xmax's status.
>>
>>  Again, I'm not sure how 2) could happen unless they involve the
>>  same top-level transaction.  What am I missing?
>
> Right, both can only happen if the tuple is created & deleted in the
> same backend. Is that in contradiction to something you see?

Well, you're making a big point that the status of xmax was not
checked in the old code.  If xmax is the same as xmin and xmin is
in progress, the additional check seems redundant -- unless I'm
missing something.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Inaccuracy in VACUUM's tuple count estimates

От
Kevin Grittner
Дата:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The reason this stuff is not too carefully spec'd is that when
> HTSV was written, there was no expectation that there was any
> correctness issue around which of these cases was returned.  I
> wonder whether SSI should be using HTSV at all.

That's certainly a reasonable question.  When I was writing the SSI
code, I was blissfully unaware of HTSV and had coded up a way to
check this which seemed to work for all tests we had.  Jeff Davis,
reviewing the code, was concerned that such separate code was more
likely to miss something or to break as visibility handling
changed.  He argued that HTSV was basically checking for the same
things I was, and a redundant and version which did the check
differently was a bad idea.  Here is where it was discussed during
development:

http://www.postgresql.org/message-id/flat/1296499247.11513.777.camel@jdavis#1296499247.11513.777.camel@jdavis

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Inaccuracy in VACUUM's tuple count estimates

От
Alvaro Herrera
Дата:
Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote:

> >>> old:
> >>> 1) xmin has committed, xmax is in progress, xmax is not just a locker
> >>> 2) xmin is in progress, xmax is set and not not just a locker
> >>>
> >>> Note that the 2) case here never checked xmax's status.
> >>
> >>  Again, I'm not sure how 2) could happen unless they involve the
> >>  same top-level transaction.  What am I missing?
> >
> > Right, both can only happen if the tuple is created & deleted in the
> > same backend. Is that in contradiction to something you see?
> 
> Well, you're making a big point that the status of xmax was not
> checked in the old code.  If xmax is the same as xmin and xmin is
> in progress, the additional check seems redundant -- unless I'm
> missing something.

IIRC the case that prompted the fix was a CREATE INDEX in the same
transaction that created a tuple which was later deleted by an aborted
subtransaction.  If the creating transaction is not this backend, that's
fine.  But if the creating transaction IsCurrentTransactionId() then we
need to be careful about aborted subxacts: if a tuple was deleted by an
aborted subxact then it's still visible to this transaction.  Xmax must
be checked in this case.  Note that the difference is pretty specific to
CREATE INDEX.  Vacuuming doesn't have to worry about such cases, mainly
because you can't run vacuum in a transaction.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Inaccuracy in VACUUM's tuple count estimates

От
Kevin Grittner
Дата:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Assuming that Kevin's describing his needs correctly, we could resolve
> this by inventing HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS.  VACUUM could
> assume that that's a probably-dead tuple, while SSI could do something
> different.

That could work.

On the other hand, the old code, where such a transaction showed as
HEAPTUPLE_DELETE_IN_PROGRESS, might still work for predicate.c as
long as it's clear that this return code sometimes means "insert
and delete are both in progress and the insert might commit without
committing the delete".  Those conditions could be identified
within predicate.c; although it seems like that would involve
duplicating some work which was already in HTSV, with the usual
costs and risks of duplicate code.

> I'm not sure if it's worth special-casing xmin == xmax,
> where the tuple is guaranteed to never be of interest to any other
> transaction?

That could be checked in predicate.c.  I see no reason to create a
separate return code for it if it's not of interest for other
callers.  It could even be left as a later optimization, since a
false positive serialization failure doesn't compromise
correctness, but it seems like it is probably easy enough to cover
to just do so now.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Inaccuracy in VACUUM's tuple count estimates

От
Andres Freund
Дата:
On 2014-06-09 11:24:22 -0700, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote:
> 
> > I am not sure, given predicate.c's coding, how
> > HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate,
> > since that's the contentious point with Tom? Since 'both in
> > progress'
> > can only happen if xmin and xmax are the same toplevel xid and you
> > resolve subxids to toplevel xids I think it should currently be safe
> > either way?
> 
> The only way that it could be a problem is if the DELETE is in a
> subtransaction which might get rolled back without rolling back the
> INSERT.

The way I understand the code in that case the subxid in xmax would have
been resolved the toplevel xid.
/* * Find top level xid.  Bail out if xid is too early to be a conflict, or * if it's our own xid. */if
(TransactionIdEquals(xid,GetTopTransactionIdIfAny()))    return;xid = SubTransGetTopmostTransaction(xid);if
(TransactionIdPrecedes(xid,TransactionXmin))    return;if (TransactionIdEquals(xid, GetTopTransactionIdIfAny()))
return;

That should essentially make that case harmless, right? So it seems the
optimization (and pessimization in other cases) of only tracking
toplevel xids seems to save the day here?

>  If we ignore the conflict because we assume the INSERT
> will be negated by the DELETE, and that doesn't happen, we would
> get false negatives which would compromise correctness.  If we
> assume that the DELETE might not happen when the DELETE is not in a
> separate subtransaction we might get a false positive, which would
> only be a performance hit.  If we know either is possible and have
> a way to check in predicate.c, it's fine to check it there.

Given the above I don't think this currently can happen. Am I understand
it correctly? If so, it certainly deserves a comment...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Inaccuracy in VACUUM's tuple count estimates

От
Kevin Grittner
Дата:
Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-06-09 11:24:22 -0700, Kevin Grittner wrote:

>> The only way that it could be a problem is if the DELETE is in a
>> subtransaction which might get rolled back without rolling back the
>> INSERT.
>
> The way I understand the code in that case the subxid in xmax would have
> been resolved the toplevel xid.
>
>     /*
>     * Find top level xid.  Bail out if xid is too early to be a conflict, or
>     * if it's our own xid.
>     */
>     if (TransactionIdEquals(xid, GetTopTransactionIdIfAny()))
>         return;
>     xid = SubTransGetTopmostTransaction(xid);
>     if (TransactionIdPrecedes(xid, TransactionXmin))
>         return;
>     if (TransactionIdEquals(xid, GetTopTransactionIdIfAny()))
>         return;
>
> That should essentially make that case harmless, right?

Hmm.  Since the switch statement above doesn't limit the
HEAPTUPLE_DELETE_IN_PROGRESS or HEAPTUPLE_INSERT_IN_PROGRESS cases
by visibility, we are safe.  I had been thinking that we had early
returns based on visibility, like the we do for HEAPTUPLE_LIVE and
HEAPTUPLE_RECENTLY_DEAD.  Since we don't do that, there is no
problem with the code either before or after your recent change.
Apologies that I didn't notice that sooner.

It might be possible that with more guarantees of what those return
codes mean (possibly by adding the new one mentioned by Tom) we
could add that early return in one or both cases, which would
better optimize some corner cases involving subtransactions.  But
now doesn't seem like a good time to worry about that.  Such a
micro-optimization would not be a sane candidate for back-patching,
or for introducing this late in a release cycle.

So, nothing to see here, folks.  Move along.  predicate.c is
neutral in this matter.

Good spot, BTW!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Inaccuracy in VACUUM's tuple count estimates

От
Andres Freund
Дата:
Hi Tom,

On 2014-06-06 15:44:25 -0400, Tom Lane wrote:
> I figured it'd be easy enough to get a better estimate by adding another
> counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
> assuming that in-progress inserts and deletes will both commit).

Did you plan to backpatch that? My inclination would be no...

>  I did
> that, and found that it helped Tim's test case not at all :-(.  A bit of
> sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
> INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
> whether the transaction has since marked it for deletion:
> 
>             /*
>              * It'd be possible to discern between INSERT/DELETE in progress
>              * here by looking at xmax - but that doesn't seem beneficial for
>              * the majority of callers and even detrimental for some. We'd
>              * rather have callers look at/wait for xmin than xmax. It's
>              * always correct to return INSERT_IN_PROGRESS because that's
>              * what's happening from the view of other backends.
>              */
>             return HEAPTUPLE_INSERT_IN_PROGRESS;
> 
> It did not use to blow this question off: back around 8.3 you got
> DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
> less laziness + fuzzy thinking here.  Maybe we should have a separate
> HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?  Is it *really*
> the case that callers other than VACUUM itself are okay with failing
> to make this distinction?  I'm dubious: there are very few if any
> callers that treat the INSERT and DELETE cases exactly alike.

My current position on this is that we should leave the code as is <9.4
and HEAPTUPLE_INSERT_IN_PROGRESS for the 9.4/master. Would you be ok
with that? The second best thing imo would be to discern and return
HEAPTUPLE_INSERT_IN_PROGRESS/HEAPTUPLE_DELETE_IN_PROGRESS for the
respective cases.
Which way would you like to go?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Inaccuracy in VACUUM's tuple count estimates

От
tim_wilson
Дата:
Given that this seems to have slipped off the hackers radar (or in too hard
basket) I have constructed a horrible solution.

I will stop using autovacuum for this relation , I will use our own system
to monitor the relation, and I will reset pgclass.reltuples on this relation
after vacuum is done to the correct value.

I note that vacuum.c has comments in vac_update_relstat that changes to
pg_class are done without a transaction. Are there dangers of my doing an 
update pg_class set reltuples=60000 where relkind='r' and
relname='my_hot_table' ? 







--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Inaccuracy-in-VACUUM-s-tuple-count-estimates-tp5806367p5809273.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Inaccuracy in VACUUM's tuple count estimates

От
Bruce Momjian
Дата:
On Thu, Jun 12, 2014 at 01:40:59PM +0200, Andres Freund wrote:
> Hi Tom,
> 
> On 2014-06-06 15:44:25 -0400, Tom Lane wrote:
> > I figured it'd be easy enough to get a better estimate by adding another
> > counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
> > assuming that in-progress inserts and deletes will both commit).
> 
> Did you plan to backpatch that? My inclination would be no...
> 
> >  I did
> > that, and found that it helped Tim's test case not at all :-(.  A bit of
> > sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
> > INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
> > whether the transaction has since marked it for deletion:
> > 
> >             /*
> >              * It'd be possible to discern between INSERT/DELETE in progress
> >              * here by looking at xmax - but that doesn't seem beneficial for
> >              * the majority of callers and even detrimental for some. We'd
> >              * rather have callers look at/wait for xmin than xmax. It's
> >              * always correct to return INSERT_IN_PROGRESS because that's
> >              * what's happening from the view of other backends.
> >              */
> >             return HEAPTUPLE_INSERT_IN_PROGRESS;
> > 
> > It did not use to blow this question off: back around 8.3 you got
> > DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
> > less laziness + fuzzy thinking here.  Maybe we should have a separate
> > HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?  Is it *really*
> > the case that callers other than VACUUM itself are okay with failing
> > to make this distinction?  I'm dubious: there are very few if any
> > callers that treat the INSERT and DELETE cases exactly alike.
> 
> My current position on this is that we should leave the code as is <9.4
> and HEAPTUPLE_INSERT_IN_PROGRESS for the 9.4/master. Would you be ok
> with that? The second best thing imo would be to discern and return
> HEAPTUPLE_INSERT_IN_PROGRESS/HEAPTUPLE_DELETE_IN_PROGRESS for the
> respective cases.
> Which way would you like to go?

Did we ever come to a conclusion on this?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Inaccuracy in VACUUM's tuple count estimates

От
tim_wilson
Дата:
Was slightly optimistic that this comment in the release notes might mean
that my bug with bloat on hot tables might have been fixed in 9.4

/Make VACUUM properly report dead but not-yet-removable rows to the
statistics collector (Hari Babu)

Previously these were reported as live rows./

Unfortunately that is not the case, and we still have the problem in 9.4



--
View this message in context:
http://postgresql.nabble.com/Inaccuracy-in-VACUUM-s-tuple-count-estimates-tp5806367p5834687.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Inaccuracy in VACUUM's tuple count estimates

От
Robert Haas
Дата:
On Mon, Jan 19, 2015 at 10:53 PM, tim_wilson <tim.wilson@telogis.com> wrote:
> Was slightly optimistic that this comment in the release notes might mean
> that my bug with bloat on hot tables might have been fixed in 9.4
>
> /Make VACUUM properly report dead but not-yet-removable rows to the
> statistics collector (Hari Babu)
>
> Previously these were reported as live rows./
>
> Unfortunately that is not the case, and we still have the problem in 9.4

As far as I can tell from reviewing the thread, what we need to do
here is basically invent HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS.
There was a lot of concern about doing that in the back-branches, but
I'm skeptical that the concern is justified.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company