Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

От: Simon Riggs
Тема: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Дата: ,
Msg-id: 1114587910.21529.394.camel@localhost.localdomain
(см: обсуждение, исходный текст)
Список: pgsql-hackers

On Tue, 2005-04-26 at 15:00 -0700, Gurmeet Manku wrote:

>  2. In a single scan, it is possible to estimate n_distinct by using
>     a very simple algorithm:
>  "Distinct sampling for highly-accurate answers to distinct value
>   queries and event reports" by Gibbons, VLDB 2001.

That looks like the one...

...though it looks like some more complex changes to the current
algorithm to use it, and we want the other stats as well...

>  3. In fact, Gibbon's basic idea has been extended to "sliding windows"
>     (this extension is useful in streaming systems like Aurora / Stream):
>  "Distributed streams algorithms for sliding windows"
>  by Gibbons and Tirthapura, SPAA 2002.

...and this offers the possibility of calculating statistics at load
time, as part of the COPY command

Best Regards, Simon Riggs

В списке pgsql-hackers по дате сообщения:

От: Rod Taylor
Сообщение: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
От: Brent Verner
Сообщение: Re: [proposal] protocol extension to support loadable stream filters