Re: Cross-table statistics idea
От | Simon Riggs |
---|---|
Тема | Re: Cross-table statistics idea |
Дата | |
Msg-id | 1159356643.2767.162.camel@holly обсуждение исходный текст |
Ответ на | Cross-table statistics idea ("Jim C. Nasby" <jim@nasby.net>) |
Ответы |
Re: Cross-table statistics idea
|
Список | pgsql-hackers |
On Tue, 2006-09-26 at 21:27 -0500, Jim C. Nasby wrote: > Since I don't recall any ideas ever having been thrown out on how to do > this... > > ISTM that we could gain additional insight on how many rows would likely > result from a join One thing we can do is to use cross-column relationships to improve the estimation of Ndistinct. If we have a table Order_line (PK orderId, lineNum) If we look at lineNum and see it has on average 10 values we can then use this information to compute that Ndistinct should be -0.1, i.e. the number of values is proportional to the number of rows with a factor of 10. Right now if there are more than 10 lineNums per orderId on average then we never decide that orderId is a scalable statistic. I propose adding a final step to ANALYZE that applies a cross-column rule after all columns have been analysed. If all except one column of a PK have very low Ndistinct we can use that to calculate a minimum number of Ndistinct for the column with a high number of values. If that minimum number is less than the Ndistinct estimate in isolation, then we overlay the new value. This is desirable because the estimation of Ndistinct is very sensitive to the number of matching rows in the sample, so Ndistinct estimates are usually very poor for large Ndistinct. The estimates for low Ndistinct are much better, so we can use them with a lower standard error to correct the in-isolation estimate of other columns. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: