Re: [HACKERS] multivariate statistics (v19)
От | Alvaro Herrera |
---|---|
Тема | Re: [HACKERS] multivariate statistics (v19) |
Дата | |
Msg-id | 20170206221157.54lzliw3wjhskb6w@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: [HACKERS] multivariate statistics (v19) (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Список | pgsql-hackers |
Looking at 0003, I notice that gram.y is changed to add a WITH ( .. ) clause. If it's not specified, an error is raised. If you create stats with (ndistinct) then you can't alter it later to add "dependencies" or whatever; unless I misunderstand, you have to drop the statistics and create another one. Probably in a forthcoming patch we should have ALTER support to add a stats type. Also, why isn't the default to build everything, rather than nothing? BTW, almost everything in the backend could be inside "utils/", so let's not do that -- let's just create src/backend/statistics/ for all your code. Here a few notes while reading README.dependencies -- some typos, two questions. diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies index 908f094..7f3ed3d 100644 --- a/src/backend/utils/mvstats/README.dependencies +++ b/src/backend/utils/mvstats/README.dependencies @@ -36,7 +36,7 @@ design choice to model the dataset in denormalized way, either because ofperformance or to make queryingeasier. -soft dependencies +Soft dependencies-----------------Real-world data sets often contain data errors, either because of data entry @@ -48,7 +48,7 @@ rendering the approach mostly useless even for slightly noisy data sets, orresult in sudden changes inbehavior depending on minor differences betweensamples provided to ANALYZE. -For this reason the statistics implementes "soft" functional dependencies, +For this reason the statistics implements "soft" functional dependencies,associating each functional dependency with a degreeof validity (a numbernumber between 0 and 1). This degree is then used to combine selectivitiesin a smooth manner. @@ -75,6 +75,7 @@ The algorithm also requires a minimum size of the group to consider itconsistent (currently 3 rows in thesample). Small groups make it less likelyto break the consistency. +## What is it that we store in the catalog?Clause reduction (planner/optimizer)------------------------------------ @@ -95,12 +96,12 @@ example for (a,b,c) we first use (a,b=>c) to break the computation intoand then apply (a=>b) the sameway on P(a=?,b=?). -Consistecy of clauses +Consistency of clauses---------------------Functional dependencies only express general dependencies between columns,withoutreferencing particular values. This assumes that the equality clauses -are in fact consistent with the functinal dependency, i.e. that given a +are in fact consistent with the functional dependency, i.e. that given adependency (a=>b), the value in (b=?) clause isthe value determined by (a=?).If that's not the case, the clauses are "inconsistent" with the functionaldependency andthe result will be over-estimation. @@ -111,6 +112,7 @@ set will be empty, but we'll estimate the selectivity using the ZIP condition.In this case the defaultestimation based on AVIA principle happens to workbetter, but mostly by chance. +## what is AVIA principle?This issue is the price for the simplicity of functional dependencies. If theapplication frequentlyconstructs queries with clauses inconsistent with -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: