Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
От | Tom Lane |
---|---|
Тема | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Дата | |
Msg-id | 25382.1114318139@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? (Josh Berkus <josh@agliodbs.com>) |
Ответы |
Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Re: [PERFORM] Bad n_distinct estimation; hacks suggested? Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Список | pgsql-hackers |
Josh Berkus <josh@agliodbs.com> writes: > Overall, our formula is inherently conservative of n_distinct. That is, I > believe that it is actually computing the *smallest* number of distinct > values which would reasonably produce the given sample, rather than the > *median* one. This is contrary to the notes in analyze.c, which seem to > think that we're *overestimating* n_distinct. Well, the notes are there because the early tests I ran on that formula did show it overestimating n_distinct more often than not. Greg is correct that this is inherently a hard problem :-( I have nothing against adopting a different formula, if you can find something with a comparable amount of math behind it ... but I fear it'd only shift the failure cases around. regards, tom lane
В списке pgsql-hackers по дате отправления: