Обсуждение: BUG #14057: vacuum setting reltuples=0 for tables with >0 tuples
BUG #14057: vacuum setting reltuples=0 for tables with >0 tuples
От
andrew@tao11.riddles.org.uk
Дата:
The following bug has been logged on the website: Bug reference: 14057 Logged by: Andrew Gierth Email address: andrew@tao11.riddles.org.uk PostgreSQL version: 9.4.5 Operating system: any Description: This is my analysis of an issue reported via IRC by nicolas.baccelli@gmail.com. The original issue was bad query plans caused by strangely bad estimates, which were traced to reltuples=0 (with relpages>0) values in pg_class. The affected relations are very small (one page only, order of 10 to 100 rows). Monitoring over time showed that these were being reset to 0 by autovacuum (even though the tables involved are static). This was traced to vacuum-for-wraparound, which is relevant since it means that the vacuum is being performed with scan_all true. (The tables are targets of FKs, thus many key-share locks which may require mxid cleanup.) The problem then seems to be this: If cleanup lock isn't acquired for the page when we try and lock it conditionally, and scan_all is true, then we scan the page to see if it needs freezing before blocking on the cleanup lock. If it does not, we skip it, but scanned_pages is still incremented in this code path, even though we did not update any of the tuple counts. (We are assuming that the cleanup lock is frequently missed in this case because the vulnerable tables are frequently used in joins.) Accordingly, we get a new reltuples estimate of 0 (since scanned_pages is not 0 the tuple counts are trusted and assumed to reflect the whole rel, since it's only one page). It looks like fixing this requires breaking scanned_pages out into at least two separate counters, since we currently use it to track both whether we can update stats, and whether we can update relfrozenxid.
Hi Andrew, On 2016-03-31 10:37:39 +0000, andrew@tao11.riddles.org.uk wrote: > It looks like fixing this requires breaking scanned_pages out into at least > two separate counters, since we currently use it to track both whether we > can update stats, and whether we can update relfrozenxid. Are you planning to submit a fix? - Andres
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes: >> It looks like fixing this requires breaking scanned_pages out into >> at least two separate counters, since we currently use it to track >> both whether we can update stats, and whether we can update >> relfrozenxid. Andres> Are you planning to submit a fix? Somewhat belatedly, yes. Attached are patches against master and all the back branches back to 9.2. I've reproduced the bug on all of them, and confirmed that this fixes it on all of them. Is it worth also including the isolation tester script in the changes? I can go ahead and commit these if they look ok. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Вложения
On 2017-03-16 06:55:54 +0000, Andrew Gierth wrote: > >>>>> "Andres" == Andres Freund <andres@anarazel.de> writes: > > >> It looks like fixing this requires breaking scanned_pages out into > >> at least two separate counters, since we currently use it to track > >> both whether we can update stats, and whether we can update > >> relfrozenxid. > > Andres> Are you planning to submit a fix? > > Somewhat belatedly, yes. > > Attached are patches against master and all the back branches back to > 9.2. Cool. > I've reproduced the bug on all of them, and confirmed that this > fixes it on all of them. Is it worth also including the isolation > tester script in the changes? Hm, I haven't seen the isolationtester test (it's not in this thread, right?) - how fragile and how slow is it? > diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c > index 8aefa7aaa9..a33541a115 100644 > --- a/src/backend/commands/vacuumlazy.c > +++ b/src/backend/commands/vacuumlazy.c > @@ -100,7 +100,8 @@ typedef struct LVRelStats > BlockNumber old_rel_pages; /* previous value of pg_class.relpages */ > BlockNumber rel_pages; /* total number of pages */ > BlockNumber scanned_pages; /* number of pages we examined */ > - double scanned_tuples; /* counts only tuples on scanned pages */ > + BlockNumber tupcount_pages; /* pages whose tuples we counted */ > + double scanned_tuples; /* counts only tuples on tupcount_pages */ > double old_rel_tuples; /* previous value of pg_class.reltuples */ > double new_rel_tuples; /* new estimated total # of tuples */ > BlockNumber pages_removed; > @@ -258,6 +259,10 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt, > * density") with nonzero relpages and reltuples=0 (which means "zero > * tuple density") unless there's some actual evidence for the latter. > * > + * It's important that we use tupcount_pages and not scanned_pages for the > + * check described above; scanned_pages counts pages where we could not get > + * cleanup lock, and which were processed only for frozenxid purposes. > + * > * We do update relallvisible even in the corner case, since if the table > * is all-visible we'd definitely like to know that. But clamp the value > * to be not more than what we're setting relpages to. > @@ -267,7 +272,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt, > */ > new_rel_pages = vacrelstats->rel_pages; > new_rel_tuples = vacrelstats->new_rel_tuples; > - if (vacrelstats->scanned_pages == 0 && new_rel_pages > 0) > + if (vacrelstats->tupcount_pages == 0 && new_rel_pages > 0) > { > new_rel_pages = vacrelstats->old_rel_pages; > new_rel_tuples = vacrelstats->old_rel_tuples; > @@ -423,6 +428,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, > nblocks = RelationGetNumberOfBlocks(onerel); > vacrelstats->rel_pages = nblocks; > vacrelstats->scanned_pages = 0; > + vacrelstats->tupcount_pages = 0; > vacrelstats->nonempty_pages = 0; > vacrelstats->latestRemovedXid = InvalidTransactionId; > > @@ -613,6 +619,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, > } > > vacrelstats->scanned_pages++; > + vacrelstats->tupcount_pages++; > > page = BufferGetPage(buf); > > @@ -999,7 +1006,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, > /* now we can compute the new value for pg_class.reltuples */ > vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false, > nblocks, > - vacrelstats->scanned_pages, > + vacrelstats->tupcount_pages, > num_tuples); > > /* > @@ -1264,7 +1271,7 @@ lazy_cleanup_index(Relation indrel, > > ivinfo.index = indrel; > ivinfo.analyze_only = false; > - ivinfo.estimated_count = (vacrelstats->scanned_pages < vacrelstats->rel_pages); > + ivinfo.estimated_count = (vacrelstats->tupcount_pages < vacrelstats->rel_pages); > ivinfo.message_level = elevel; > ivinfo.num_heap_tuples = vacrelstats->new_rel_tuples; > ivinfo.strategy = vac_strategy; Without having tested it, this looks sane. Greetings, Andres Freund -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes: >> I've reproduced the bug on all of them, and confirmed that this >> fixes it on all of them. Is it worth also including the isolation >> tester script in the changes? Andres> Hm, I haven't seen the isolationtester test (it's not in this Andres> thread, right?) - how fragile and how slow is it? Oh, sorry, forgot to include that. There are two versions of the test, because the error is slightly harder to reproduce in older branches; this one works in 9.6 and master: setup { create table smalltbl as select i as id, 'foo '||i as val from generate_series(1,20) i; } setup { vacuum analyze smalltbl; } teardown { drop table smalltbl; } session "worker" step "open" { BEGIN; DECLARE c1 CURSOR FOR select * from smalltbl; } step "fetch1" { FETCH NEXT FROM c1; } step "close" { COMMIT; } step "stats" { select relpages, reltuples from pg_class where oid='smalltbl'::regclass; } session "vacuumer" step "vac" { VACUUM smalltbl; } step "modify" { insert into smalltbl select max(id)+1, 'foo '||(max(id) + 1) from smalltbl; delete from smalltbl where id in (select min(id) from smalltbl); } permutation "modify" "vac" "stats" permutation "modify" "open" "fetch1" "vac" "close" "stats" permutation "modify" "vac" "stats" The first and last permutations return relpages=1 reltuples=20 as expected, but the middle one returns relpages=1 reltuples=0 when the bug is present, due to the worker thread's cursor holding a pin on the page. 9.5 and before need a slightly more complex setup that juggles the values of vacuum_freeze_table_age and relfrozenxid in order to get the right code path in vacuum. They don't seem to be fragile at all - there are no timing issues and the results always seem to be consistent. There's no locking and runtime is basically just how long to create/drop the table and do 3 rounds of updates/vacuums on it. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On 2017-03-16 21:03:44 +0000, Andrew Gierth wrote: > >>>>> "Andres" == Andres Freund <andres@anarazel.de> writes: > > >> I've reproduced the bug on all of them, and confirmed that this > >> fixes it on all of them. Is it worth also including the isolation > >> tester script in the changes? > > Andres> Hm, I haven't seen the isolationtester test (it's not in this > Andres> thread, right?) - how fragile and how slow is it? > > Oh, sorry, forgot to include that. There are two versions of the test, > because the error is slightly harder to reproduce in older branches; > this one works in 9.6 and master: > > setup { > create table smalltbl > as select i as id, > 'foo '||i as val > from generate_series(1,20) i; > } Hm, should we prevent autovacuum/analyze from running on the table? > setup { > vacuum analyze smalltbl; > } > teardown { > drop table smalltbl; > } > > session "worker" > step "open" { BEGIN; DECLARE c1 CURSOR FOR select * from smalltbl; } > step "fetch1" { FETCH NEXT FROM c1; } > step "close" { COMMIT; } > step "stats" { select relpages, reltuples from pg_class where oid='smalltbl'::regclass; } > > session "vacuumer" > step "vac" { VACUUM smalltbl; } > step "modify" { > insert into smalltbl > select max(id)+1, 'foo '||(max(id) + 1) from smalltbl; > delete from smalltbl > where id in (select min(id) from smalltbl); > } > > permutation "modify" "vac" "stats" > permutation "modify" "open" "fetch1" "vac" "close" "stats" > permutation "modify" "vac" "stats" > > The first and last permutations return relpages=1 reltuples=20 as > expected, but the middle one returns relpages=1 reltuples=0 when the bug > is present, due to the worker thread's cursor holding a pin on the page. > > 9.5 and before need a slightly more complex setup that juggles the > values of vacuum_freeze_table_age and relfrozenxid in order to get the > right code path in vacuum. > > They don't seem to be fragile at all - there are no timing issues and > the results always seem to be consistent. There's no locking and runtime > is basically just how long to create/drop the table and do 3 rounds of > updates/vacuums on it. Seems like a good thing to include in the tree. I'd be ok with just including the simpler version in the relevant branches. - Andres -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes: Andres> Hm, should we prevent autovacuum/analyze from running on the table? Probably, yes; good point. (Also, I just realized it would speed things up a bit to change the largely useless second column to a non-toastable type or remove it so we can avoid having a toast table.) Andres> Seems like a good thing to include in the tree. I'd be ok with Andres> just including the simpler version in the relevant branches. Ok. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Andres" == Andres Freund <andres@anarazel.de> writes: > Andres> Seems like a good thing to include in the tree. I'd be ok with > Andres> just including the simpler version in the relevant branches. > Ok. I dunno ... there doesn't seem to be any meaningful portability risk here, and it's not clear to me what class of future bug this test might hope to catch. Do we really need to spend test cycles forevermore on this? regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On 2017-03-16 17:28:24 -0400, Tom Lane wrote: > Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > > "Andres" == Andres Freund <andres@anarazel.de> writes: > > Andres> Seems like a good thing to include in the tree. I'd be ok with > > Andres> just including the simpler version in the relevant branches. > > > Ok. > > I dunno ... there doesn't seem to be any meaningful portability risk > here, and it's not clear to me what class of future bug this test might > hope to catch. Do we really need to spend test cycles forevermore on > this? We had previous bugs around this, so it doesn't seem like a bad idea to test it. Also it should be so short in comparison to the rest of the isolationtests that it won't matter wrt total runtime? - Andres -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Oops, sorry about the overly-long line. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs