Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

От: Mischa Sandberg
Тема: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Дата: ,
Msg-id: 1114701696.4270ff80d577c@webmail.telus.net
(см: обсуждение, исходный текст)
Ответ на: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus)
Список: pgsql-performance

Скрыть дерево обсуждения

Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
 Re: Bad n_distinct estimation; hacks suggested?  ("Dave Held", )
 Re: Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
  Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
   Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
    Re: Bad n_distinct estimation; hacks suggested?  (Marko Ristola, )
     Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
      Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
       Re: Bad n_distinct estimation; hacks suggested?  (Greg Stark, )
        Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
         Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
         Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  ("Andrew Dunstan", )
         Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
          Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Marko Ristola, )
          Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
           Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
           Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
          Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
         Re: Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
          Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
           Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
            Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
             Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
             Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
              Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
               Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
                Re: Distinct-Sampling (Gibbons paper) for Postgres  (, )
                 Re: Distinct-Sampling (Gibbons paper) for Postgres  (Josh Berkus, )
            Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
             Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
              Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Gurmeet Manku, )
               Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
              Citation for "Bad n_distinct estimation; hacks suggested?"  (Gurmeet Manku, )
               Foreign key constraints compile faster in 7.4  (Ashish Arte, )
                Re: Foreign key constraints compile faster in 7.4  (Tom Lane, )
 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
  Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
  Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
  Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Markus Schaber, )
   Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
    Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
     Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (John A Meinel, )
      Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
     Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )

Quoting Josh Berkus <>:

> > >Perhaps I can save you some time (yes, I have a degree in Math). If I
> > >understand correctly, you're trying extrapolate from the correlation
> > >between a tiny sample and a larger sample. Introducing the tiny sample
> > >into any decision can only produce a less accurate result than just
> > >taking the larger sample on its own; GIGO. Whether they are consistent
> > >with one another has no relationship to whether the larger sample
> > >correlates with the whole population. You can think of the tiny sample
> > >like "anecdotal" evidence for wonderdrugs.
>
> Actually, it's more to characterize how large of a sample we need.  For
> example, if we sample 0.005 of disk pages, and get an estimate, and then
> sample another 0.005 of disk pages and get an estimate which is not even
> close to the first estimate, then we have an idea that this is a table
which
> defies analysis based on small samples.   Wheras if the two estimates
are <
> 1.0 stdev apart, we can have good confidence that the table is easily
> estimated.  Note that this doesn't require progressively larger
samples; any
> two samples would work.

We're sort of wandering away from the area where words are a good way
to describe the problem. Lacking a common scratchpad to work with,
could I suggest you talk to someone you consider has a background in
stats, and have them draw for you why this doesn't work?

About all you can get out of it is, if the two samples are
disjunct by a stddev, yes, you've demonstrated that the union
of the two populations has a larger stddev than either of them;
but your two stddevs are less info than the stddev of the whole.
Breaking your sample into two (or three, or four, ...) arbitrary pieces
and looking at their stddevs just doesn't tell you any more than what
you start with.

--
"Dreams come true, not free." -- S.Sondheim, ITW



В списке pgsql-performance по дате сообщения:

От: Marko Ristola
Дата:
Сообщение: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
От: Enrico Weigelt
Дата:
Сообщение: index on different types