Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

От: Andrew Dunstan
Тема: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Дата: ,
Msg-id: 426F9703.4010108@dunslane.net
(см: обсуждение, исходный текст)
Ответ на: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg)
Ответы: Re: Distinct-Sampling (Gibbons paper) for Postgres  ()
Список: pgsql-performance

Скрыть дерево обсуждения

Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
 Re: Bad n_distinct estimation; hacks suggested?  ("Dave Held", )
 Re: Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
  Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
   Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
    Re: Bad n_distinct estimation; hacks suggested?  (Marko Ristola, )
     Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
      Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
       Re: Bad n_distinct estimation; hacks suggested?  (Greg Stark, )
        Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
         Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
         Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  ("Andrew Dunstan", )
         Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
          Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Marko Ristola, )
          Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
           Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
           Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
          Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
         Re: Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
          Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
           Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
            Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
             Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
             Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
              Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
               Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
                Re: Distinct-Sampling (Gibbons paper) for Postgres  (, )
                 Re: Distinct-Sampling (Gibbons paper) for Postgres  (Josh Berkus, )
            Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
             Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
              Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Gurmeet Manku, )
               Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
              Citation for "Bad n_distinct estimation; hacks suggested?"  (Gurmeet Manku, )
               Foreign key constraints compile faster in 7.4  (Ashish Arte, )
                Re: Foreign key constraints compile faster in 7.4  (Tom Lane, )
 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
  Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
  Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
  Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Markus Schaber, )
   Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
    Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
     Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (John A Meinel, )
      Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
     Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )


Mischa Sandberg wrote:

>
>Perhaps I can save you some time (yes, I have a degree in Math). If I
>understand correctly, you're trying extrapolate from the correlation
>between a tiny sample and a larger sample. Introducing the tiny sample
>into any decision can only produce a less accurate result than just
>taking the larger sample on its own; GIGO. Whether they are consistent
>with one another has no relationship to whether the larger sample
>correlates with the whole population. You can think of the tiny sample
>like "anecdotal" evidence for wonderdrugs.
>
>
>

Ok, good point.

I'm with Tom though in being very wary of solutions that require even
one-off whole table scans. Maybe we need an additional per-table
statistics setting which could specify the sample size, either as an
absolute number or as a percentage of the table. It certainly seems that
where D/N ~ 0.3, the estimates on very large tables at least are way way
out.

Or maybe we need to support more than one estimation method.

Or both ;-)

cheers

andrew




В списке pgsql-performance по дате сообщения:

От: Josh Berkus
Дата:
Сообщение: Re: Distinct-Sampling (Gibbons paper) for Postgres
От: PFC
Дата:
Сообщение: Re: BLOB's bypassing the OS Filesystem for better Image