Обсуждение: Re: [HACKERS] Enhanced containment selectivity function
Cleaned-up patch attached and applied. Catalog version updated in separate patch. --------------------------------------------------------------------------- Matteo Beccati wrote: > Hi, > > >>Moving it in contrib/ltree would be more difficult to me because it > >>depends on other functions declared in selfuncs.c > >>(get_restriction_variable, etc). > > > > I'd be willing to consider exporting those functions from selfuncs.c. > > In the meanwhile here is the latest patch which uses both mcv and > histogram values. > > > BTW, when restoring my test database I've found out that there were many > errors on ALTER INDEX "something" OWNER TO ... : > > ERROR: "something" is not a table, view, or sequence > > This using 8.1devel pg_restore and a 8.0.3 compressed dump. I could be > wrong, but I didn't get those errors a few days ago (some cvs updates ago). > > > Best regards > -- > Matteo Beccati > http://phpadsnew.com/ > http://phppgads.com/ -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: contrib/ltree/ltree.sql.in =================================================================== RCS file: /cvsroot/pgsql/contrib/ltree/ltree.sql.in,v retrieving revision 1.10 diff -c -c -r1.10 ltree.sql.in *** contrib/ltree/ltree.sql.in 27 Feb 2006 16:09:48 -0000 1.10 --- contrib/ltree/ltree.sql.in 26 Apr 2006 18:25:16 -0000 *************** *** 230,236 **** RIGHTARG = ltree, PROCEDURE = ltree_isparent, COMMUTATOR = '<@', ! RESTRICT = contsel, JOIN = contjoinsel ); --- 230,236 ---- RIGHTARG = ltree, PROCEDURE = ltree_isparent, COMMUTATOR = '<@', ! RESTRICT = parentsel, JOIN = contjoinsel ); *************** *** 248,254 **** RIGHTARG = ltree, PROCEDURE = ltree_risparent, COMMUTATOR = '@>', ! RESTRICT = contsel, JOIN = contjoinsel ); --- 248,254 ---- RIGHTARG = ltree, PROCEDURE = ltree_risparent, COMMUTATOR = '@>', ! RESTRICT = parentsel, JOIN = contjoinsel ); Index: src/backend/utils/adt/geo_selfuncs.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/geo_selfuncs.c,v retrieving revision 1.27 diff -c -c -r1.27 geo_selfuncs.c *** src/backend/utils/adt/geo_selfuncs.c 5 Mar 2006 15:58:42 -0000 1.27 --- src/backend/utils/adt/geo_selfuncs.c 26 Apr 2006 18:25:21 -0000 *************** *** 20,26 **** #include "utils/geo_decls.h" - /* * Selectivity functions for geometric operators. These are bogus -- unless * we know the actual key distribution in the index, we can't make a good --- 20,25 ---- *************** *** 93,95 **** --- 92,95 ---- { PG_RETURN_FLOAT8(0.001); } + Index: src/backend/utils/adt/selfuncs.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v retrieving revision 1.199 diff -c -c -r1.199 selfuncs.c *** src/backend/utils/adt/selfuncs.c 20 Apr 2006 17:50:18 -0000 1.199 --- src/backend/utils/adt/selfuncs.c 26 Apr 2006 18:25:28 -0000 *************** *** 4852,4854 **** --- 4852,5033 ---- PG_RETURN_VOID(); } + + + #define DEFAULT_PARENT_SEL 0.001 + + /* + * parentsel - Selectivity of parent relationship for ltree data types. + */ + Datum + parentsel(PG_FUNCTION_ARGS) + { + PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0); + Oid operator = PG_GETARG_OID(1); + List *args = (List *) PG_GETARG_POINTER(2); + int varRelid = PG_GETARG_INT32(3); + VariableStatData vardata; + Node *other; + bool varonleft; + Datum *values; + int nvalues; + float4 *numbers; + int nnumbers; + double selec = 0.0; + + /* + * If expression is not variable <@ something or something <@ variable, + * then punt and return a default estimate. + */ + if (!get_restriction_variable(root, args, varRelid, + &vardata, &other, &varonleft)) + PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL); + + /* + * If the something is a NULL constant, assume operator is strict and + * return zero, ie, operator will never return TRUE. + */ + if (IsA(other, Const) && + ((Const *) other)->constisnull) + { + ReleaseVariableStats(vardata); + PG_RETURN_FLOAT8(0.0); + } + + if (HeapTupleIsValid(vardata.statsTuple)) + { + Form_pg_statistic stats; + double mcvsum = 0.0; + double mcvsel = 0.0; + double hissel = 0.0; + + stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple); + + if (IsA(other, Const)) + { + /* Variable is being compared to a known non-null constant */ + Datum constval = ((Const *) other)->constvalue; + bool match = false; + int i; + + /* + * Is the constant "<@" to any of the column's most common values? + */ + if (get_attstatsslot(vardata.statsTuple, + vardata.atttype, vardata.atttypmod, + STATISTIC_KIND_MCV, InvalidOid, + &values, &nvalues, + &numbers, &nnumbers)) + { + FmgrInfo contproc; + + fmgr_info(get_opcode(operator), &contproc); + + for (i = 0; i < nvalues; i++) + { + /* be careful to apply operator right way 'round */ + if (varonleft) + match = DatumGetBool(FunctionCall2(&contproc, + values[i], + constval)); + else + match = DatumGetBool(FunctionCall2(&contproc, + constval, + values[i])); + + /* calculate total selectivity of all most-common-values */ + mcvsum += numbers[i]; + + /* calculate selectivity of matching most-common-values */ + if (match) + mcvsel += numbers[i]; + } + } + else + { + /* no most-common-values info available */ + values = NULL; + numbers = NULL; + i = nvalues = nnumbers = 0; + } + + free_attstatsslot(vardata.atttype, values, nvalues, NULL, 0); + + /* + * Is the constant "<@" to any of the column's histogram values? + */ + if (get_attstatsslot(vardata.statsTuple, + vardata.atttype, vardata.atttypmod, + STATISTIC_KIND_HISTOGRAM, InvalidOid, + &values, &nvalues, + NULL, NULL)) + { + FmgrInfo contproc; + + fmgr_info(get_opcode(operator), &contproc); + + for (i = 0; i < nvalues; i++) + { + /* be careful to apply operator right way 'round */ + if (varonleft) + match = DatumGetBool(FunctionCall2(&contproc, + values[i], + constval)); + else + match = DatumGetBool(FunctionCall2(&contproc, + constval, + values[i])); + /* count matching histogram values */ + if (match) + hissel++; + } + + if (hissel > 0.0) + { + /* + * some matching values found inside histogram, divide + * matching entries number by total histogram entries to + * get the histogram related selectivity + */ + hissel /= nvalues; + } + } + else + { + /* no histogram info available */ + values = NULL; + i = nvalues = 0; + } + + free_attstatsslot(vardata.atttype, values, nvalues, + NULL, 0); + + + /* + * calculate selectivity based on MCV and histogram result + * histogram selectivity needs to be scaled down if there are any + * most-common-values + */ + selec = mcvsel + hissel * (1.0 - mcvsum); + + /* + * don't return 0.0 selectivity unless all table values are inside + * mcv + */ + if (selec == 0.0 && mcvsum != 1.0) + selec = DEFAULT_PARENT_SEL; + } + else + selec = DEFAULT_PARENT_SEL; + } + else + selec = DEFAULT_PARENT_SEL; + + ReleaseVariableStats(vardata); + + /* result should be in range, but make sure... */ + CLAMP_PROBABILITY(selec); + + PG_RETURN_FLOAT8((float8) selec); + } + Index: src/include/catalog/pg_proc.h =================================================================== RCS file: /cvsroot/pgsql/src/include/catalog/pg_proc.h,v retrieving revision 1.406 diff -c -c -r1.406 pg_proc.h *** src/include/catalog/pg_proc.h 25 Apr 2006 00:25:20 -0000 1.406 --- src/include/catalog/pg_proc.h 26 Apr 2006 18:25:35 -0000 *************** *** 3812,3817 **** --- 3812,3819 ---- DESCR("GiST support"); DATA(insert OID = 2592 ( gist_circle_compress PGNSP PGUID 12 f f t f i 1 2281 "2281" _null_ _null_ _null_ gist_circle_compress- _null_ )); DESCR("GiST support"); + DATA(insert OID = 2599 ( parentsel PGNSP PGUID 12 f f t f s 4 701 "2281 26 2281 23" _null_ _null_ _null_parentsel - _null_ )); + DESCR("enhanced restriction selectivity for ltree isparent comparison operators"); /* Index: src/include/utils/selfuncs.h =================================================================== RCS file: /cvsroot/pgsql/src/include/utils/selfuncs.h,v retrieving revision 1.28 diff -c -c -r1.28 selfuncs.h *** src/include/utils/selfuncs.h 5 Mar 2006 15:59:07 -0000 1.28 --- src/include/utils/selfuncs.h 26 Apr 2006 18:25:35 -0000 *************** *** 134,137 **** --- 134,139 ---- extern Datum hashcostestimate(PG_FUNCTION_ARGS); extern Datum gistcostestimate(PG_FUNCTION_ARGS); + extern Datum parentsel(PG_FUNCTION_ARGS); + #endif /* SELFUNCS_H */
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Cleaned-up patch attached and applied. Please revert this patch. This has not been updated to satisfy the previous agreement about how it should work. It is completely inappropriate to be dropping code that's specific to one contrib module into the core selfuncs.c file. What we had agreed to do was look at exporting some of the currently-static functions in selfuncs.c so that contrib modules could make use of them from outside the core --- but this patch doesn't do that. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Cleaned-up patch attached and applied. > > Please revert this patch. This has not been updated to satisfy the > previous agreement about how it should work. It is completely > inappropriate to be dropping code that's specific to one contrib module > into the core selfuncs.c file. What we had agreed to do was look at > exporting some of the currently-static functions in selfuncs.c so that > contrib modules could make use of them from outside the core --- but > this patch doesn't do that. OK, reverted, but I saw it using contsel() so I figured we were allowing it, but I see contsel() is used by our "box", so ltree was just using something that was already there. Let me see if I can break out the new selectivity function into /contrib. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > OK, reverted, but I saw it using contsel() so I figured we were allowing > it, but I see contsel() is used by our "box", so ltree was just using > something that was already there. Let me see if I can break out the new > selectivity function into /contrib. What really needs to happen next is to think about which bits of selfuncs.c should be exposed --- what's generally useful, and do we think that it has an API clean/stable enough to expose? The reason I kept all that stuff static so far was because it got whacked around every release or two, and I didn't want to be constrained by worries about breaking outside modules. I'd still prefer to minimize the number of routines exposed, so some thought is needed. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > OK, reverted, but I saw it using contsel() so I figured we were allowing > > it, but I see contsel() is used by our "box", so ltree was just using > > something that was already there. Let me see if I can break out the new > > selectivity function into /contrib. > > What really needs to happen next is to think about which bits of > selfuncs.c should be exposed --- what's generally useful, and do we > think that it has an API clean/stable enough to expose? The reason I > kept all that stuff static so far was because it got whacked around > every release or two, and I didn't want to be constrained by worries > about breaking outside modules. I'd still prefer to minimize the number > of routines exposed, so some thought is needed. Well, the ltree routine just needs struct VariableStatData and macro ReleaseVariableStats(), and we need to change one function from static to global. That and some #includes, and it works. Do we want to be more formal about it? -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> What really needs to happen next is to think about which bits of >> selfuncs.c should be exposed --- what's generally useful, and do we >> think that it has an API clean/stable enough to expose? > Well, the ltree routine just needs struct VariableStatData and > macro ReleaseVariableStats(), and we need to change one function from > static to global. That and some #includes, and it works. Do we want to > be more formal about it? If we're going to expose VariableStatData then I think the minimum reasonable set of supporting functions would be examine_variable(), get_restriction_variable(), get_join_variable(). Matteo's patch only uses the second one but that's because he's only implementing one type of estimator (no joinsel function for instance). I'd be inclined to create a new header file, too, instead of cluttering builtins.h with this stuff. utils/selfuncs.h probably. I'm willing to work on this, but it doesn't look like you reverted the prior patch yet? regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> What really needs to happen next is to think about which bits of > >> selfuncs.c should be exposed --- what's generally useful, and do we > >> think that it has an API clean/stable enough to expose? > > > Well, the ltree routine just needs struct VariableStatData and > > macro ReleaseVariableStats(), and we need to change one function from > > static to global. That and some #includes, and it works. Do we want to > > be more formal about it? > > If we're going to expose VariableStatData then I think the minimum > reasonable set of supporting functions would be examine_variable(), > get_restriction_variable(), get_join_variable(). Matteo's patch only > uses the second one but that's because he's only implementing one type > of estimator (no joinsel function for instance). > > I'd be inclined to create a new header file, too, instead of cluttering > builtins.h with this stuff. utils/selfuncs.h probably. > > I'm willing to work on this, but it doesn't look like you reverted the > prior patch yet? Uh, I just moved the selectivity function over to /contrib/ltree, and moved what I needed, so it now works. You can continue with the plan above, or I can. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> I'm willing to work on this, but it doesn't look like you reverted the >> prior patch yet? > Uh, I just moved the selectivity function over to /contrib/ltree, and > moved what I needed, so it now works. You can continue with the plan > above, or I can. I'll see what I can do with it. I've got an evening of watching my laptop reinstall anyway (just replaced a failing hard drive :-() regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> I'm willing to work on this, but it doesn't look like you reverted the > >> prior patch yet? > > > Uh, I just moved the selectivity function over to /contrib/ltree, and > > moved what I needed, so it now works. You can continue with the plan > > above, or I can. > > I'll see what I can do with it. I've got an evening of watching my > laptop reinstall anyway (just replaced a failing hard drive :-() Interesting. I took one of my laptops into service today and will get it back tomorrow with a new empty harddrive. One big thing I have found to help is to document all laptop changes in a backed-up server file, so when I get a new laptop, I can just go through the list and make the changes in 1/2 hour. I also keep installed program tarballs on the server so those can be installed with the same versions as the other laptops. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hi, Bruce Momjian ha scritto: > Uh, I just moved the selectivity function over to /contrib/ltree, and > moved what I needed, so it now works. You can continue with the plan > above, or I can. I've just been able to install a 8.2-devel to test the ltree selectivity improvements I suggested. It was a big surprise having no improvements at all in the query I used for all my previous tests, until I found out that the test against histogram values was removed by Tom. I strongly think it should be reintroduced: ltree columns are likely to have a unique constraint and the current ltreeparentsel function will behave just like contsel in these cases. Here is the commit: > Log Message: > ----------- > Fix ltreeparentsel so it actually works ... > > Modified Files: > -------------- > pgsql/contrib/ltree: > ltree_op.c (r1.10 -> r1.11) > (http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/ltree/ltree_op.c.diff?r1=1.10&r2=1.11) > Best regards -- Matteo Beccati http://phpadsnew.com http://phppgads.com
Matteo Beccati <php@beccati.com> writes: > It was a big surprise having no improvements at all in the query I used > for all my previous tests, until I found out that the test against > histogram values was removed by Tom. I strongly think it should be > reintroduced: ltree columns are likely to have a unique constraint and > the current ltreeparentsel function will behave just like contsel in > these cases. The histogram values seem completely meaningless in this context --- for containment purposes, they are just ten or so randomly chosen values. I don't believe that the estimator works better with them. Certainly, whether the column is unique or not is totally irrelevant to whether they are representative. What would seem saner to me is to add a datatype-specific analyze function that collects some statistics that are actually relevant to containment, and then make use of those in the estimator. regards, tom lane
Hi, > The histogram values seem completely meaningless in this context --- > for containment purposes, they are just ten or so randomly chosen > values. I don't believe that the estimator works better with them. > Certainly, whether the column is unique or not is totally irrelevant > to whether they are representative. Right, but if the column has a high number of stats, I think that the samples found in the histogram could put the estimator on the right way: i.e. in my case 80% of the values have '1041' as their root leaf and most of the values in the histogram reflect this. You're right saying that the column uniqueness isn't relevant to the histogram, but if the column is unique, there won't be any mcv, and the patch becomes useless. > What would seem saner to me is to add a datatype-specific analyze > function that collects some statistics that are actually relevant > to containment, and then make use of those in the estimator. Perhaps you're right, but unfortunately it's not a thing I can do myself, because of lack of knowledge about both pg and ltree internals :( Best regards -- Matteo Beccati http://phpadsnew.com http://phppgads.com