Re: Simple join optimized badly?

От: Mark Lewis
Тема: Re: Simple join optimized badly?
Дата: ,
Msg-id: 1160579260.8082.121.camel@archimedes
(см: обсуждение, исходный текст)
Ответ на: Re: Simple join optimized badly?  (Tom Lane)
Список: pgsql-performance

Скрыть дерево обсуждения

Simple join optimized badly?  ("Craig A. James", )
 Re: Simple join optimized badly?  (Tom Lane, )
  Re: Simple join optimized badly?  ("Denis Lussier", )
   Re: Simple join optimized badly?  (Jim Nasby, )
   Re: Simple join optimized badly?  (Josh Berkus, )
    Re: Simple join optimized badly?  ("Craig A. James", )
     Re: Simple join optimized badly?  (Mark Kirkwood, )
      Re: Simple join optimized badly?  ("Craig A. James", )
       Re: Simple join optimized badly?  (Mark Kirkwood, )
        Re: Simple join optimized badly?  (Tom Lane, )
         Re: Simple join optimized badly?  (Josh Berkus, )
          Re: Simple join optimized badly?  (Tom Lane, )
    Re: Simple join optimized badly?  (Tom Lane, )
    Re: Simple join optimized badly?  (Scott Marlowe, )
 Re: Simple join optimized badly?  (Bruce Momjian, )
  Re: Simple join optimized badly?  ("Craig A. James", )
 Re: Simple join optimized badly?  (Chris Browne, )
 Re: Simple join optimized badly?  (Chris Browne, )
  Re: Simple join optimized badly?  ("Jim C. Nasby", )
   Re: Simple join optimized badly?  (Tom Lane, )
   Re: Simple join optimized badly?  (Mark Kirkwood, )
    Re: Simple join optimized badly?  (Mark Kirkwood, )
 Re: Simple join optimized badly?  (Tobias Brox, )
  Re: Simple join optimized badly?  ("Jim C. Nasby", )
   Re: Simple join optimized badly?  ("Joshua D. Drake", )
    Re: Simple join optimized badly?  ("Jim C. Nasby", )
     Re: Simple join optimized badly?  ("Joshua D. Drake", )
   Re: Simple join optimized badly?  (Tom Lane, )
    Re: Simple join optimized badly?  ("Jim C. Nasby", )
     Re: Simple join optimized badly?  (Tom Lane, )
      Re: Simple join optimized badly?  ("Jim C. Nasby", )
       Re: Simple join optimized badly?  (Josh Berkus, )
        Re: Simple join optimized badly?  ("Jim C. Nasby", )
  Re: Simple join optimized badly?  (Bruno Wolff III, )
 Re: Simple join optimized badly?  (Brian Herlihy, )
  Re: Simple join optimized badly?  ("Craig A. James", )
   Re: Simple join optimized badly?  ("Joshua D. Drake", )
    Re: Simple join optimized badly?  ("Jim C. Nasby", )
     Re: Simple join optimized badly?  ("Steinar H. Gunderson", )
      Re: Simple join optimized badly?  ("Joshua D. Drake", )
     Re: Simple join optimized badly?  ("Joshua D. Drake", )
      Re: Simple join optimized badly?  (Brian Herlihy, )
       Re: Simple join optimized badly?  (Tom Lane, )
        Re: Simple join optimized badly?  (Brian Herlihy, )
        Re: Simple join optimized badly?  ("Bucky Jordan", )
         Re: Simple join optimized badly?  (Heikki Linnakangas, )
          Re: Simple join optimized badly?  (Bruce Momjian, )
         Collect stats during seqscan (was: Simple join optimized badly?)  ("Jim C. Nasby", )
        Re: Simple join optimized badly?  (Mark Lewis, )

Tom,

I'm interested in the problem of cross-column statistics from a
theoretical perspective.  It would be interesting to sit down and try to
reason out a useful solution, or at very least to understand the problem
better so I can anticipate when it might come and eat me.

From my understanding, the main problem is that if PG knows the
selectivity of n conditions C1,C2,...,Cn then it doesn't know whether
the combined selectivity will be C1*C2*...*Cn (conditions are
independent) or max(C1,C2,...,Cn) (conditions are strictly dependent),
or somewhere in the middle. Therefore, row estimates could be orders of
magnitude off.

I suppose a common example would be a table with a serial primary key
column and a timestamp value which is always inserted as
CURRENT_TIMESTAMP, so the two columns are strongly correlated.  If the
planner guesses that 1% of the rows of the table will match pk>1000000,
and 1% of the rows of the table will match timestamp > X, then it would
be nice for it to know that if you specify both "pk>1000000 AND
timestamp>X" that the combined selectivity is still only 1% and not 1% *
1% = 0.01%.

As long as I'm sitting down and reasoning about the problem anyway, are
there any other types of cases you're aware of where some form of cross-
column statistics would be useful?  In the unlikely event that I
actually come up with a brilliant and simple solution, I'd at least like
to make sure that I'm solving the right problem :)

Thanks,
Mark Lewis



On Tue, 2006-10-10 at 22:38 -0400, Tom Lane wrote:
> Brian Herlihy <> writes:
> > What would it take for hints to be added to postgres?
>
> A *whole lot* more thought and effort than has been expended on the
> subject to date.
>
> Personally I have no use for the idea of "force the planner to do
> exactly X given a query of exactly Y".  You don't have exactly Y
> today, tomorrow, and the day after (if you do, you don't need a
> hint mechanism at all, you need a mysql-style query cache).
> IMHO most of the planner mistakes we see that could be fixed via
> hinting are really statistical estimation errors, and so the right
> level to be fixing them at is hints about how to estimate the number
> of rows produced for given conditions.  Mind you that's still a plenty
> hard problem, but you could at least hope that a hint of that form
> would be useful for more than one query.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match


В списке pgsql-performance по дате сообщения:

От: "Jim C. Nasby"
Дата:
Сообщение: Re: Scrub one large table against another
От: Markus Schaber
Дата:
Сообщение: Re: Scrub one large table against another (vmstat output)