Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
От | Andrew Dunstan |
---|---|
Тема | Re: [HACKERS] Bad n_distinct estimation; hacks suggested? |
Дата | |
Msg-id | 426F9703.4010108@dunslane.net обсуждение исходный текст |
Ответ на | Re: [HACKERS] Bad n_distinct estimation; hacks suggested? (Mischa Sandberg <mischa.sandberg@telus.net>) |
Ответы |
Re: Distinct-Sampling (Gibbons paper) for Postgres
|
Список | pgsql-performance |
Mischa Sandberg wrote: > >Perhaps I can save you some time (yes, I have a degree in Math). If I >understand correctly, you're trying extrapolate from the correlation >between a tiny sample and a larger sample. Introducing the tiny sample >into any decision can only produce a less accurate result than just >taking the larger sample on its own; GIGO. Whether they are consistent >with one another has no relationship to whether the larger sample >correlates with the whole population. You can think of the tiny sample >like "anecdotal" evidence for wonderdrugs. > > > Ok, good point. I'm with Tom though in being very wary of solutions that require even one-off whole table scans. Maybe we need an additional per-table statistics setting which could specify the sample size, either as an absolute number or as a percentage of the table. It certainly seems that where D/N ~ 0.3, the estimates on very large tables at least are way way out. Or maybe we need to support more than one estimation method. Or both ;-) cheers andrew
В списке pgsql-performance по дате отправления: