Re: RFC: planner statistics in 7.2
| От | Peter Eisentraut | 
|---|---|
| Тема | Re: RFC: planner statistics in 7.2 | 
| Дата | |
| Msg-id | Pine.LNX.4.30.0104201649080.758-100000@peter.localdomain обсуждение исходный текст | 
| Ответ на | RFC: planner statistics in 7.2 (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Ответы | Re: RFC: planner statistics in 7.2 | 
| Список | pgsql-hackers | 
Tom Lane writes: > 2. The M boundary values of an equi-depth histogram, ie, the values that > divide the data distribution into equal-population sets. For example, if > M is 3 this would consist of the min, median, and max values. Why not model that data in a normal distribution (or some other such model)? This should give you fairly good estimates for <, > and between type queries. That way the user wouldn't have to tweak M. (I couldn't even imagine a way to explain to the user how to pick good M's.) > Issue: for datatypes that have no '<' operator, we of course cannot > compute a histogram --- but if there is an '=' then the most-common-value > and number-of-distinct-values stats still make sense. I think the statistics calculation should be data type (or operator, or opclass) specific, like the selectivity estimation functions. For geometric data types you need completely different kinds of statistics, and this would allow users to plug in different methods. The pg_statistics table could just have an array of floats where each data type can store statistics as it chooses and the selectivity estimation routines can interpret the values in a different way per data type. That way you can also make some common sense optimization without hacks, e.g., for boolean columns you possibly only need to calculate 1 value (number of trues). Of course, how to calculate up to N different sets of statistics for N different columns that require up to N different numbers of passes over the table is left as a challenge. ;-) This brings up a question I have: Are statistics calculated for every column? Should they be? Is there a possibility to speed up ANALYZE by controlling this? -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
В списке pgsql-hackers по дате отправления: