Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

От: Mischa Sandberg
Тема: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Дата: ,
Msg-id: 1115175136.427838e0ae263@webmail.telus.net
(см: обсуждение, исходный текст)
Ответ на: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus)
Список: pgsql-performance

Скрыть дерево обсуждения

Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
 Re: Bad n_distinct estimation; hacks suggested?  ("Dave Held", )
 Re: Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
  Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
   Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
    Re: Bad n_distinct estimation; hacks suggested?  (Marko Ristola, )
     Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
      Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
       Re: Bad n_distinct estimation; hacks suggested?  (Greg Stark, )
        Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
         Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
         Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  ("Andrew Dunstan", )
         Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
          Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Marko Ristola, )
          Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
           Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
           Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
          Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
         Re: Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
          Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
           Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
            Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
             Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
             Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
              Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
               Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan, )
                Re: Distinct-Sampling (Gibbons paper) for Postgres  (, )
                 Re: Distinct-Sampling (Gibbons paper) for Postgres  (Josh Berkus, )
            Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
             Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
              Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Gurmeet Manku, )
               Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Simon Riggs, )
              Citation for "Bad n_distinct estimation; hacks suggested?"  (Gurmeet Manku, )
               Foreign key constraints compile faster in 7.4  (Ashish Arte, )
                Re: Foreign key constraints compile faster in 7.4  (Tom Lane, )
 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
  Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Tom Lane, )
 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
  Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
  Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Markus Schaber, )
   Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )
    Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
     Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (John A Meinel, )
      Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Josh Berkus, )
     Re: [HACKERS] Bad n_distinct estimation; hacks suggested?  (Mischa Sandberg, )

Quoting Josh Berkus <>:

> Mischa,
>
> > Okay, although given the track record of page-based sampling for
> > n-distinct, it's a bit like looking for your keys under the
> streetlight,
> > rather than in the alley where you dropped them :-)
>
> Bad analogy, but funny.

Bad analogy? Page-sampling effort versus row-sampling effort, c'est
moot. It's not good enough for stats to produce good behaviour on the
average. Straight random sampling, page or row, is going to cause
enough untrustworthy engine behaviour,for any %ages small enough to
allow sampling from scratch at any time.

I'm curious what the problem is with relying on a start-up plus
incremental method, when the method in the distinct-sampling paper
doesn't degenerate: you can start when the table is still empty.
Constructing an index requires an initial full scan plus incremental
update; what's the diff?

> Unless, of course, we use indexes for sampling, which seems like a
> *really
> good* idea to me ....

"distinct-sampling" applies for indexes, too. I started tracking the
discussion of this a bit late.  Smart method for this is in VLDB'92:
Gennady Antoshenkov, "Random Sampling from Pseudo-ranked B+-trees". I
don't think this is online anywhere, except if you have a DBLP
membership. Does nybod else know better?
Antoshenkov was the brains behind some of the really cool stuff in DEC
Rdb (what eventually became Oracle). Compressed bitmap indices,
parallel competing query plans, and smart handling of keys with
hyperbolic distributions.
--
Engineers think equations approximate reality.
Physicists think reality approximates the equations.
Mathematicians never make the connection.



В списке pgsql-performance по дате сообщения:

От: Mark Kirkwood
Дата:
Сообщение: Re: Kernel Resources and max_connections
От: Mike Rylander
Дата:
Сообщение: Re: