Обсуждение: tableam: abstracting relation sizing code
Hi, It looks to me as though any table AM that uses the relation forks supported by PostgreSQL in a more or less normal manner is likely to require an implementation of the relation_size callback that is identical to the one for heap, and an implementation of the estimate_rel_size method that is extremely similar to the one for heap. The latter is especially troubling as the amount of code duplication is non-trivial, and it's full of special hacks. Here is a patch that tries to improve the situation. I don't know whether there is some better approach; this seemed like the obvious thing to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Thu, Jun 06, 2019 at 04:40:53PM -0400, Robert Haas wrote: > It looks to me as though any table AM that uses the relation forks > supported by PostgreSQL in a more or less normal manner is likely to > require an implementation of the relation_size callback that is > identical to the one for heap, and an implementation of the > estimate_rel_size method that is extremely similar to the one for > heap. The latter is especially troubling as the amount of code > duplication is non-trivial, and it's full of special hacks. > > Here is a patch that tries to improve the situation. I don't know > whether there is some better approach; this seemed like the obvious > thing to do. Looks like a neat split. "allvisfrac", "pages" and "tuples" had better be documented about which result they represent. + * usable_bytes_per_page should contain the approximate number of bytes per + * page usable for tuple data, excluding the page header and any anticipated + * special space. [...] +table_block_estimate_rel_size(Relation rel, int32 *attr_widths, + BlockNumber *pages, double *tuples, + double *allvisfrac, + Size overhead_bytes_per_tuple, + Size usable_bytes_per_page) Could you explain what's the use cases you have in mind for usable_bytes_per_page? All AMs using smgr need to have a page header, which implies that the usable number of bytes is (BLCKSZ - page header) like heap for tuple data. In the AMs you got to work with, do you store some extra data in the page which is not used for tuple storage? I think that makes sense, just wondering about the use case. -- Michael
Вложения
> On 6 Jun 2019, at 22:40, Robert Haas <robertmhaas@gmail.com> wrote: > It looks to me as though any table AM that uses the relation forks > supported by PostgreSQL in a more or less normal manner is likely to > require an implementation of the relation_size callback that is > identical to the one for heap, and an implementation of the > estimate_rel_size method that is extremely similar to the one for > heap. The latter is especially troubling as the amount of code > duplication is non-trivial, and it's full of special hacks. Makes sense. Regarding one of the hacks: + * HACK: if the relation has never yet been vacuumed, use a minimum size + * estimate of 10 pages. The idea here is to avoid assuming a + * newly-created table is really small, even if it currently is, because + * that may not be true once some data gets loaded into it. When this is a generic function for AM’s, would it make sense to make the minimum estimate a passed in value rather than hardcoded at 10? I don’t have a case in mind, but I also don’t think that assumptions made for heap necessarily makes sense for all AM’s. Just thinking out loud. > Here is a patch that tries to improve the situation. I don't know > whether there is some better approach; this seemed like the obvious > thing to do. A small nitpick on the patch: + * estimate_rel_size callback, because it has a few additional paramters. s/paramters/parameters/ cheers ./daniel
On Thu, Jun 6, 2019 at 10:08 PM Michael Paquier <michael@paquier.xyz> wrote: > Looks like a neat split. Thanks. > "allvisfrac", "pages" and "tuples" had better be documented about > which result they represent. A lot of the table AM stuff (and the related slot stuff) lacks function header comments; I don't like that and think it should be improved. However, that's not the job of this patch. I think it's completely correct for this patch to document, as it does, that the arguments have the same meaning as for the estimate_rel_size method, and leave it at that. There is certainly negative value in duplicating the definitions in multiple places, where they might get out of sync. The header comment for table_relation_estimate_size() refers the reader to the comments for estimate_rel_size(), which says: * estimate_rel_size - estimate # pages and # tuples in a table or index * * We also estimate the fraction of the pages that are marked all-visible in * the visibility map, for use in estimation of index-only scans. * * If attr_widths isn't NULL, it points to the zero-index entry of the * relation's attr_widths[] cache; we fill this in if we have need to compute * the attribute widths for estimation purposes. ...which AFAICT constitutes as much documentation of these parameters as we have got. I think this is all a bit byzantine and could probably be made clearer, but (1) fortunately it's not all that hard to guess what these are supposed to mean and (2) I don't feel obliged to do semi-related comment cleanup every time I patch tableam. > Could you explain what's the use cases you have in mind for > usable_bytes_per_page? All AMs using smgr need to have a page header, > which implies that the usable number of bytes is (BLCKSZ - page > header) like heap for tuple data. In the AMs you got to work with, do > you store some extra data in the page which is not used for tuple > storage? I think that makes sense, just wondering about the use > case. Exactly. BLCKSZ - page header is probably the maximum unless you roll your own page format, but you could easily have less if you use the special space. zheap is storing transaction slots there; you could store an epoch to avoid freezing, and there's probably quite a few other reasonable possibilities. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 7, 2019 at 4:11 AM Daniel Gustafsson <daniel@yesql.se> wrote: > Makes sense. Regarding one of the hacks: > > + * HACK: if the relation has never yet been vacuumed, use a minimum size > + * estimate of 10 pages. The idea here is to avoid assuming a > + * newly-created table is really small, even if it currently is, because > + * that may not be true once some data gets loaded into it. > > When this is a generic function for AM’s, would it make sense to make the > minimum estimate a passed in value rather than hardcoded at 10? I don’t have a > case in mind, but I also don’t think that assumptions made for heap necessarily > makes sense for all AM’s. Just thinking out loud. I think that's probably going in the wrong direction. It's arguable, of course. However, it seems to me that there's nothing heap-specific about the number 10. It's not computed based on the format of a heap page or a heap tuple. It's just somebody's guess (likely Tom's) about how to plan with empty relations. If somebody finds that another number works better for their AM, it's probably also better for heap, at least on that person's workload. It seems more likely to me that this needs to be a GUC or reloption than that it needs to be an AM-specific property. It also seems to me that if the parameters of the hack randomly vary from one AM to another, it's likely to create confusing plan differences that have nothing to do with the actual merits of what the AMs are doing. But all that being said, I'm not blocking anybody from fooling around with this; nobody's obliged to use the helper function. It's just there for people who want the same AM-independent logic that the heap uses. > > Here is a patch that tries to improve the situation. I don't know > > whether there is some better approach; this seemed like the obvious > > thing to do. > > A small nitpick on the patch: > > + * estimate_rel_size callback, because it has a few additional paramters. > > s/paramters/parameters/ Good catch, and now I notice that the callback is not called estimate_rel_size but relation_estimate_size. Updated patch attached; thanks for the review. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Fri, Jun 7, 2019 at 8:43 AM Robert Haas <robertmhaas@gmail.com> wrote: > Good catch, and now I notice that the callback is not called > estimate_rel_size but relation_estimate_size. Updated patch attached; > thanks for the review. Let's try that one more time, and this time perhaps I'll make it compile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Hi, On 2019-06-07 08:32:37 -0400, Robert Haas wrote: > On Thu, Jun 6, 2019 at 10:08 PM Michael Paquier <michael@paquier.xyz> wrote: > > "allvisfrac", "pages" and "tuples" had better be documented about > > which result they represent. > > A lot of the table AM stuff (and the related slot stuff) lacks > function header comments; I don't like that and think it should be > improved. However, that's not the job of this patch. I think it's > completely correct for this patch to document, as it does, that the > arguments have the same meaning as for the estimate_rel_size method, > and leave it at that. There is certainly negative value in duplicating > the definitions in multiple places, where they might get out of sync. > The header comment for table_relation_estimate_size() refers the > reader to the comments for estimate_rel_size(), which says: Note that these function ended up that way by precisely this logic... ;) Greetings, Andres Freund
On 2019-Jun-07, Robert Haas wrote: > On Fri, Jun 7, 2019 at 8:43 AM Robert Haas <robertmhaas@gmail.com> wrote: > > Good catch, and now I notice that the callback is not called > > estimate_rel_size but relation_estimate_size. Updated patch attached; > > thanks for the review. > > Let's try that one more time, and this time perhaps I'll make it compile. It looks good to me, passes tests. This version seems to introduce a warning in my build: /pgsql/source/master/src/backend/access/table/tableam.c: In function 'table_block_relation_estimate_size': /pgsql/source/master/src/backend/access/table/tableam.c:633:12: warning: implicit declaration of function 'rint' [-Wimplicit-function-declaration] *tuples = rint(density * (double) curpages); ^~~~ /pgsql/source/master/src/backend/access/table/tableam.c:633:12: warning: incompatible implicit declaration of built-in function'rint' /pgsql/source/master/src/backend/access/table/tableam.c:633:12: note: include '<math.h>' or provide a declaration of 'rint' -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 7 Jun 2019, at 14:43, Robert Haas <robertmhaas@gmail.com> wrote: > I think that's probably going in the wrong direction. It's arguable, > of course. However, it seems to me that there's nothing heap-specific > about the number 10. It's not computed based on the format of a heap > page or a heap tuple. It's just somebody's guess (likely Tom's) about > how to plan with empty relations. If somebody finds that another > number works better for their AM, it's probably also better for heap, > at least on that person's workload. Fair enough, that makes sense. > Good catch, and now I notice that the callback is not called > estimate_rel_size but relation_estimate_size. Updated patch attached; > thanks for the review. The commit message still refers to it as estimate_rel_size though. The comment on table_block_relation_estimate_size does too but that one might be intentional. The v3 patch applies cleanly and passes tests (I did not see the warning that Alvaro mentioned, so yay for testing on multiple compilers). During re-review, the below part stood out as a bit odd however: + if (curpages < 10 && + relpages == 0 && + !rel->rd_rel->relhassubclass) + curpages = 10; + + /* report estimated # pages */ + *pages = curpages; + /* quick exit if rel is clearly empty */ + if (curpages == 0) + { + *tuples = 0; + *allvisfrac = 0; + return; + } While I know this codepath isn’t introduced by this patch (it was introduced in 696d78469f3), I hadn’t seen it before so sorry for thread-jacking slightly. Maybe I’m a bit thick but if the rel is totally empty and without children, then curpages as well as relpages would be both zero. But if so, how can we enter the second "quick exit” block since curpages by then will be increased to 10 in the block just before for the empty case? It seems to me that the blocks should be the other way around to really have a fast path, but I might be missing something. cheers ./daniel
On Fri, Jun 7, 2019 at 6:42 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > Good catch, and now I notice that the callback is not called > > estimate_rel_size but relation_estimate_size. Updated patch attached; > > thanks for the review. > > The commit message still refers to it as estimate_rel_size though. The comment on > table_block_relation_estimate_size does too but that one might be intentional. Oops. New version attached, hopefully fixing those and the compiler warning Alvaro noted. > During re-review, the below part stood out as a bit odd however: > > + if (curpages < 10 && > + relpages == 0 && > + !rel->rd_rel->relhassubclass) > + curpages = 10; > + > + /* report estimated # pages */ > + *pages = curpages; > + /* quick exit if rel is clearly empty */ > + if (curpages == 0) > + { > + *tuples = 0; > + *allvisfrac = 0; > + return; > + } > > While I know this codepath isn’t introduced by this patch (it was introduced in > 696d78469f3), I hadn’t seen it before so sorry for thread-jacking slightly. > > Maybe I’m a bit thick but if the rel is totally empty and without children, > then curpages as well as relpages would be both zero. But if so, how can we > enter the second "quick exit” block since curpages by then will be increased to > 10 in the block just before for the empty case? It seems to me that the blocks > should be the other way around to really have a fast path, but I might be > missing something. Well, as you say, I'm just moving the code. However, note that curpages is the size of the relation RIGHT NOW whereas relpages is the size the last time the relation was analyzed. So I guess the case you're wondering about would happen if the relation were analyzed and then truncated. It seems there are lots of things that could be done here in the hopes of improving things, like keeping track in pg_class of whether analyze has ever happened rather than using relpages == 0 as a bad approximation, but I'd rather not drift further OT, so if you're in the mood to talk about that stuff, I would appreciate it if you could start a new thread. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On 2019-Jun-10, Robert Haas wrote: > On Fri, Jun 7, 2019 at 6:42 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > The commit message still refers to it as estimate_rel_size though. The comment on > > table_block_relation_estimate_size does too but that one might be intentional. > > Oops. New version attached, hopefully fixing those and the compiler > warning Alvaro noted. It does fix the warning, thanks. > > Maybe I’m a bit thick but if the rel is totally empty and without children, > > then curpages as well as relpages would be both zero. But if so, how can we > > enter the second "quick exit” block since curpages by then will be increased to > > 10 in the block just before for the empty case? It seems to me that the blocks > > should be the other way around to really have a fast path, but I might be > > missing something. > > Well, as you say, I'm just moving the code. I agree that you're just moving the code, but this seems to have been recently broken in 696d78469f37 -- it was correct before that (the heuristic for never vacuumed rels was in optimizer/plancat.c). So in reality the problem that Daniel pointed out is an open item for pg12. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jun 10, 2019 at 3:46 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I agree that you're just moving the code, but this seems to have been > recently broken in 696d78469f37 -- it was correct before that (the > heuristic for never vacuumed rels was in optimizer/plancat.c). So in > reality the problem that Daniel pointed out is an open item for pg12. I took a look at this but I don't see that Andres did anything in that commit other than move code. In the new code, heapam_estimate_rel_size() does this: + if (curpages < 10 && + relpages == 0 && + !rel->rd_rel->relhassubclass) + curpages = 10; + + /* report estimated # pages */ + *pages = curpages; + /* quick exit if rel is clearly empty */ + if (curpages == 0) + { + *tuples = 0; + *allvisfrac = 0; + return; + } And here's what the code in estimate_rel_size looked like before the commit you mention: if (curpages < 10 && rel->rd_rel->relpages == 0 && !rel->rd_rel->relhassubclass && rel->rd_rel->relkind != RELKIND_INDEX) curpages = 10; /* report estimated # pages */ *pages = curpages; /* quick exit if rel is clearly empty */ if (curpages == 0) { *tuples = 0; *allvisfrac = 0; break; } It's all the same, except that now that the test is in heap-specific code it no longer needs to test for RELKIND_INDEX. I may be missing something here, but I don't know what it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On 11 Jun 2019, at 15:17, Robert Haas <robertmhaas@gmail.com> wrote: > I may be missing something here, but I don't know what it is. After looking at it closer yesterday I think my original question came from a misunderstanding of the codepath, so I too don’t think there is an issue here (unless I’m also missing something). cheers ./daniel
On 2019-Jun-11, Robert Haas wrote: > I may be missing something here, but I don't know what it is. Huh, you're right, I misread the diff. Thanks for double-checking. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 10 Jun 2019, at 21:35, Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Jun 7, 2019 at 6:42 PM Daniel Gustafsson <daniel@yesql.se> wrote: >>> Good catch, and now I notice that the callback is not called >>> estimate_rel_size but relation_estimate_size. Updated patch attached; >>> thanks for the review. >> >> The commit message still refers to it as estimate_rel_size though. The comment on >> table_block_relation_estimate_size does too but that one might be intentional. > > Oops. New version attached, hopefully fixing those and the compiler > warning Alvaro noted. +1 on this version of the patch, no warning, passes tests and looks good. cheers ./daniel
Hi, On 2019-06-10 15:35:18 -0400, Robert Haas wrote: > On Fri, Jun 7, 2019 at 6:42 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > Good catch, and now I notice that the callback is not called > > > estimate_rel_size but relation_estimate_size. Updated patch attached; > > > thanks for the review. > > > > The commit message still refers to it as estimate_rel_size though. The comment on > > table_block_relation_estimate_size does too but that one might be intentional. > > Oops. New version attached, hopefully fixing those and the compiler > warning Alvaro noted. Just to understand: What version are you targeting? It seems pretty clearly v13 material to me? Greetings, Andres Freund
On Tue, Jun 11, 2019 at 7:17 PM Andres Freund <andres@anarazel.de> wrote: > Just to understand: What version are you targeting? It seems pretty > clearly v13 material to me? My current plan is to commit this to v13 as soon as the tree opens. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jun 12, 2019 at 9:14 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jun 11, 2019 at 7:17 PM Andres Freund <andres@anarazel.de> wrote: > > Just to understand: What version are you targeting? It seems pretty > > clearly v13 material to me? > > My current plan is to commit this to v13 as soon as the tree opens. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company