Обсуждение: Why not represent "never vacuumed" accurately wrt pg_class.relpages?
Hi, estimate_rel_size() explains: /* * HACK: if the relation has never yet been vacuumed, use a * minimum size estimate of 10 pages. The idea here is to avoid * assuming a newly-created table is really small, even if it * currently is, because that may not be true once some data gets * loaded into it. Once a vacuum or analyze cycle has been done * on it, it's more reasonable to believe the size is somewhat * stable. * * (Note that this is only an issue if the plan gets cached and * used again after the table has been filled. What we're trying * to avoid is using a nestloop-type plan on a table that has * grown substantially since the plan was made. Normally, * autovacuum/autoanalyze will occur once enough inserts have * happened and cause cached-plan invalidation; but that doesn't * happen instantaneously, and it won't happen at all for cases * such as temporary tables.) * * We approximate "never vacuumed" by "has relpages = 0", which * means this will also fire on genuinely empty relations. Not * great, but fortunately that's a seldom-seen case in the real * world, and it shouldn't degrade the quality of the plan too * much anyway to err in this direction. * * There are two exceptions wherein we don't apply this heuristic. * One is if the table has inheritance children. Totally empty * parent tables are quite common, so we should be willing to * believe that they are empty. Also, we don't apply the 10-page * minimum to indexes. */ I don't quite get why we don't instead just represent "never vacuumed" by storing a more meaningful value in relpages? We could go for InvalidBlockNumber, or even NULL (although the latter would be a bit annoying due to not being mappable to a struct anymore). I've seen numerous cases where relpages = 0 -> never vacuumed has caused worse plans, and it just doesn't seem necessary? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I don't quite get why we don't instead just represent "never vacuumed" > by storing a more meaningful value in relpages? Mostly, not wanting to break clients that look at these fields. If catalog compatibility weren't a concern, I'd seriously consider replacing both of them with a float "average tuples per page" ratio. > We could go for > InvalidBlockNumber, or even NULL (although the latter would be a bit > annoying due to not being mappable to a struct anymore). NULL seems right out on every ground. I don't much care for InvalidBlockNumber either. > I've seen numerous cases where relpages = 0 -> never vacuumed has caused > worse plans, and it just doesn't seem necessary? Worse plans than what? And why do you blame it on this representation? We don't believe that relpages is the actual size of the table. regards, tom lane
Hi, On 2018-12-11 09:47:38 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I don't quite get why we don't instead just represent "never vacuumed" > > by storing a more meaningful value in relpages? > > Mostly, not wanting to break clients that look at these fields. > If catalog compatibility weren't a concern, I'd seriously consider > replacing both of them with a float "average tuples per page" ratio. Yea, that'd be better. I'm not sure I believe this is a grave concern, but if we really are concerned about not breaking clients, we could just add a separate bool for the fact the table has been vacuumed. > > I've seen numerous cases where relpages = 0 -> never vacuumed has caused > > worse plans, and it just doesn't seem necessary? > > Worse plans than what? The plans if the stats were believed. > And why do you blame it on this representation? We don't believe that > relpages is the actual size of the table. No, but we assume that there's 10 pages. Even if both relpages and the actual relation stats say there's not. And we assume there's as many tuples on the page as can fit on it, using get_rel_data_width(). So if you have a small table with a handful of entries at most, you suddenly get estimates of a few hundred to ~a thousand rows. I'd one client once insert a row into a lock table to make sense, just to make sure it never gets vacuumed while there were no locks. CREATE TABLE locks(lockid int); VACUUM locks; EXPLAIN SELECT * FROM locks; ┌─────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────────────────────────────┤ │ Seq Scan on locks (cost=0.00..35.50 rows=2550 width=4) │ └─────────────────────────────────────────────────────────┘ (1 row) INSERT INTO locks VALUES (1); VACUUM locks; EXPLAIN SELECT * FROM locks; ┌─────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────────────────────────┤ │ Seq Scan on locks (cost=0.00..1.01 rows=1 width=4) │ └─────────────────────────────────────────────────────┘ (1 row) DELETE FROM locks; VACUUM locks ; EXPLAIN SELECT * FROM locks; ┌─────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────────────────────────────┤ │ Seq Scan on locks (cost=0.00..35.50 rows=2550 width=4) │ └─────────────────────────────────────────────────────────┘ (1 row) Estimates on a table constantly varying by three orders of magnitude - not hard to imagine large plan changes due to that. That's not what I was complaining about, but isn't it fairly broken that the "actually empty relation" case here doesn't get hit if relpages == 0? We clamp curpages to 10 before returning for the the empty size: if (curpages < 10 && rel->rd_rel->relpages == 0 && !rel->rd_rel->relhassubclass && rel->rd_rel->relkind != RELKIND_INDEX) curpages = 10; /* report estimated # pages */ *pages = curpages; /* quick exit if rel is clearly empty */ if (curpages == 0) { *tuples = 0; *allvisfrac = 0; break; } I guess you could argue that the relation would potentially not be be empty anymore by the time the plan is executed, but if that were part of the logic it a) wouldn't just be relevant if relpages = 0, and b) should be documented. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-12-11 09:47:38 -0500, Tom Lane wrote: >> And why do you blame it on this representation? We don't believe that >> relpages is the actual size of the table. > No, but we assume that there's 10 pages. Even if both relpages and the > actual relation stats say there's not. And we assume there's as many > tuples on the page as can fit on it, using get_rel_data_width(). So if > you have a small table with a handful of entries at most, you suddenly > get estimates of a few hundred to ~a thousand rows. That's intentional, and not particularly constrained by the representation used in pg_class. The downsides of incorrectly assuming a table is tiny are a lot worse than those of assuming the opposite. > I guess you could argue that the relation would potentially not be be > empty anymore by the time the plan is executed, but if that were part of > the logic it a) wouldn't just be relevant if relpages = 0, and b) should > be documented. Yeah, that's the case where you can get hurt the worst. As for "not documented", the first para in the comment is trying to say that. regards, tom lane
Hi, On 2018-12-11 13:43:47 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-12-11 09:47:38 -0500, Tom Lane wrote: > >> And why do you blame it on this representation? We don't believe that > >> relpages is the actual size of the table. > > > No, but we assume that there's 10 pages. Even if both relpages and the > > actual relation stats say there's not. And we assume there's as many > > tuples on the page as can fit on it, using get_rel_data_width(). So if > > you have a small table with a handful of entries at most, you suddenly > > get estimates of a few hundred to ~a thousand rows. > > That's intentional, and not particularly constrained by the representation > used in pg_class. The downsides of incorrectly assuming a table is tiny > are a lot worse than those of assuming the opposite. How's being unable to distinguish "never vacuumed" from "table is knowingly empty right now" not constrained by the representation? I mean: * We approximate "never vacuumed" by "has relpages = 0", which * means this will also fire on genuinely empty relations. Not * great, but fortunately that's a seldom-seen case in the real * world, and it shouldn't degrade the quality of the plan too * much anyway to err in this direction. * * There are two exceptions wherein we don't apply this heuristic. * One is if the table has inheritance children. Totally empty * parent tables are quite common, so we should be willing to * believe that they are empty. Also, we don't apply the 10-page * minimum to indexes. My case of small tables of ephemeral data aside, with hash & range partitioning it's becoming more common to have individual tables be empty too, by virtue of nothing falling into the range/being hashed of the partitions. Without having a separate "no page, but really" value, how can we fix this? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > How's being unable to distinguish "never vacuumed" from "table is > knowingly empty right now" not constrained by the representation? Well, OK, it is constrained, but even if it weren't I'd be afraid to assume a table is totally empty, even if it was so the last time ANALYZE looked at it. The plans you get from that are spectacularly fragile. regards, tom lane
Hi, On 2018-12-11 13:59:55 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > How's being unable to distinguish "never vacuumed" from "table is > > knowingly empty right now" not constrained by the representation? > > Well, OK, it is constrained, but even if it weren't I'd be afraid to > assume a table is totally empty, even if it was so the last time > ANALYZE looked at it. The plans you get from that are spectacularly > fragile. I think some clamping would be acceptable, but three orders of magnitude off seems excessive. By clamping both the tuple density and the number of pages, we're essentially multiplying the error. But if this is your concern: Why are we believing that a previously vacuumed table is actually empty if it was previously one block long (relpages == 1), but not if it previously zero blocks long (relpages == 0). In the former case we're entering the /* quick exit if rel is clearly empty */ if (curpages == 0) { *tuples = 0; *allvisfrac = 0; break; } bit, but not in the latter case. Your fragility argument seems to hold about as much sway there. To me the whole logic around this largely seems to be cargo culting things forward. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > To me the whole logic around this largely seems to be cargo culting > things forward. Well, feel free to propose some other behavior --- but I'm more worried about how badly it will fail when wrong than whether the plans are better when right. regards, tom lane
On Tue, Dec 11, 2018 at 11:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: > > I don't quite get why we don't instead just represent "never vacuumed" > > by storing a more meaningful value in relpages? > > Mostly, not wanting to break clients that look at these fields. > If catalog compatibility weren't a concern, I'd seriously consider > replacing both of them with a float "average tuples per page" ratio. I think we should do exactly that thing. And I also agree that assuming 10 pages when pg_class says 0 and 1 page when pg_class says 1 is not particularly bright. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Tue, Dec 11, 2018 at 11:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > > > I don't quite get why we don't instead just represent "never vacuumed" > > > by storing a more meaningful value in relpages? > > > > Mostly, not wanting to break clients that look at these fields. > > If catalog compatibility weren't a concern, I'd seriously consider > > replacing both of them with a float "average tuples per page" ratio. > > I think we should do exactly that thing. On first blush, I tend to agree, but what I really wanted to remark here is that I don't think we should be hand-tying ourselves as to what we can do because we don't want to upset people who are reading the catalogs. We make changes to the catalogs pretty regularly and while we get complaints here and there, by and large, users are accustomed to them and appreciate that we don't have a lot of baggage from trying to support old interfaces and that we're able to make as much progress year over year as we are. Further, we support major releases for 5 years for a reason- users have quite a bit of time to adjust to the changes. > And I also agree that assuming 10 pages when pg_class says 0 and 1 > page when pg_class says 1 is not particularly bright. Agreed. Thanks! Stephen