Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

От: Mischa Sandberg
Тема: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Дата: ,
Msg-id: 1115155990.4277ee16aba34@webmail.telus.net
(см: обсуждение, исходный текст)
Ответ на: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Markus Schaber)
Ответы: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus)
Список: pgsql-hackers

Скрыть дерево обсуждения

Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  ("Andrew Dunstan", )
 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
  Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
    Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
  Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Marko Ristola, )
  Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
  Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
   Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
    Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
     Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
     Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
      Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
       Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
        Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
         Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
         Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Markus Schaber, )
          Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
           Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
            Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
            Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
            Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (John A Meinel, )
        Re: [PERFORM] Distinct-Sampling (Gibbons paper) for Postgres  (Josh Berkus, )
        Re: Distinct-Sampling (Gibbons paper) for Postgres  (, )
    Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
     Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
      Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Gurmeet Manku, )
      Citation for "Bad n_distinct estimation; hacks suggested?"  (Gurmeet Manku, )

Quoting Markus Schaber <>:

> Hi, Josh,
>
> Josh Berkus wrote:
>
> > Yes, actually.   We need 3 different estimation methods:
> > 1 for tables where we can sample a large % of pages (say, >= 0.1)
> > 1 for tables where we sample a small % of pages but are "easily
> estimated"
> > 1 for tables which are not easily estimated by we can't afford to
> sample a
> > large % of pages.
> >
> > If we're doing sampling-based estimation, I really don't want
> people to lose
> > sight of the fact that page-based random sampling is much less
> expensive than
> > row-based random sampling.   We should really be focusing on
> methods which
> > are page-based.

Okay, although given the track record of page-based sampling for
n-distinct, it's a bit like looking for your keys under the streetlight,
rather than in the alley where you dropped them :-)

How about applying the distinct-sampling filter on a small extra data
stream to the stats collector?

--
Engineers think equations approximate reality.
Physicists think reality approximates the equations.
Mathematicians never make the connection.



В списке pgsql-hackers по дате сообщения:

От: James William Pye
Дата:
Сообщение: Re: inclusions WAS: Increased company involvement
От: Andrew - Supernews
Дата:
Сообщение: Re: A proper fix for the conversion-function problem