Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

От: Simon Riggs
Тема: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Дата: ,
Msg-id: 1114587910.21529.394.camel@localhost.localdomain
(см: обсуждение, исходный текст)
Список: pgsql-hackers

On Tue, 2005-04-26 at 15:00 -0700, Gurmeet Manku wrote:

>  2. In a single scan, it is possible to estimate n_distinct by using
>     a very simple algorithm:
>
>  "Distinct sampling for highly-accurate answers to distinct value
>   queries and event reports" by Gibbons, VLDB 2001.
>
>  http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf

That looks like the one...

...though it looks like some more complex changes to the current
algorithm to use it, and we want the other stats as well...

>  3. In fact, Gibbon's basic idea has been extended to "sliding windows"
>     (this extension is useful in streaming systems like Aurora / Stream):
>
>  "Distributed streams algorithms for sliding windows"
>  by Gibbons and Tirthapura, SPAA 2002.
>
>  http://home.eng.iastate.edu/~snt/research/tocs.pdf
>

...and this offers the possibility of calculating statistics at load
time, as part of the COPY command

Best Regards, Simon Riggs



В списке pgsql-hackers по дате сообщения:

От: Rod Taylor
Дата:
Сообщение: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
От: Brent Verner
Дата:
Сообщение: Re: [proposal] protocol extension to support loadable stream filters