Re: Simple join optimized badly?

От: Tom Lane
Тема: Re: Simple join optimized badly?
Дата: ,
Msg-id: 18722.1160236300@sss.pgh.pa.us
(см: обсуждение, исходный текст)
Ответ на: Simple join optimized badly?  ("Craig A. James")
Ответы: Re: Simple join optimized badly?  ("Denis Lussier")
Список: pgsql-performance

Скрыть дерево обсуждения

Simple join optimized badly?  ("Craig A. James", )
 Re: Simple join optimized badly?  (Tom Lane, )
  Re: Simple join optimized badly?  ("Denis Lussier", )
   Re: Simple join optimized badly?  (Jim Nasby, )
   Re: Simple join optimized badly?  (Josh Berkus, )
    Re: Simple join optimized badly?  ("Craig A. James", )
     Re: Simple join optimized badly?  (Mark Kirkwood, )
      Re: Simple join optimized badly?  ("Craig A. James", )
       Re: Simple join optimized badly?  (Mark Kirkwood, )
        Re: Simple join optimized badly?  (Tom Lane, )
         Re: Simple join optimized badly?  (Josh Berkus, )
          Re: Simple join optimized badly?  (Tom Lane, )
    Re: Simple join optimized badly?  (Tom Lane, )
    Re: Simple join optimized badly?  (Scott Marlowe, )
 Re: Simple join optimized badly?  (Bruce Momjian, )
  Re: Simple join optimized badly?  ("Craig A. James", )
 Re: Simple join optimized badly?  (Chris Browne, )
 Re: Simple join optimized badly?  (Chris Browne, )
  Re: Simple join optimized badly?  ("Jim C. Nasby", )
   Re: Simple join optimized badly?  (Tom Lane, )
   Re: Simple join optimized badly?  (Mark Kirkwood, )
    Re: Simple join optimized badly?  (Mark Kirkwood, )
 Re: Simple join optimized badly?  (Tobias Brox, )
  Re: Simple join optimized badly?  ("Jim C. Nasby", )
   Re: Simple join optimized badly?  ("Joshua D. Drake", )
    Re: Simple join optimized badly?  ("Jim C. Nasby", )
     Re: Simple join optimized badly?  ("Joshua D. Drake", )
   Re: Simple join optimized badly?  (Tom Lane, )
    Re: Simple join optimized badly?  ("Jim C. Nasby", )
     Re: Simple join optimized badly?  (Tom Lane, )
      Re: Simple join optimized badly?  ("Jim C. Nasby", )
       Re: Simple join optimized badly?  (Josh Berkus, )
        Re: Simple join optimized badly?  ("Jim C. Nasby", )
  Re: Simple join optimized badly?  (Bruno Wolff III, )
 Re: Simple join optimized badly?  (Brian Herlihy, )
  Re: Simple join optimized badly?  ("Craig A. James", )
   Re: Simple join optimized badly?  ("Joshua D. Drake", )
    Re: Simple join optimized badly?  ("Jim C. Nasby", )
     Re: Simple join optimized badly?  ("Steinar H. Gunderson", )
      Re: Simple join optimized badly?  ("Joshua D. Drake", )
     Re: Simple join optimized badly?  ("Joshua D. Drake", )
      Re: Simple join optimized badly?  (Brian Herlihy, )
       Re: Simple join optimized badly?  (Tom Lane, )
        Re: Simple join optimized badly?  (Brian Herlihy, )
        Re: Simple join optimized badly?  ("Bucky Jordan", )
         Re: Simple join optimized badly?  (Heikki Linnakangas, )
          Re: Simple join optimized badly?  (Bruce Momjian, )
         Collect stats during seqscan (was: Simple join optimized badly?)  ("Jim C. Nasby", )
        Re: Simple join optimized badly?  (Mark Lewis, )

"Craig A. James" <> writes:
> There are two plans below.  The first is before an ANALYZE HITLIST_ROWS, and it's horrible -- it looks to me like
it'ssorting the 16 million rows of the SEARCH table.  Then I run ANALYZE HITLIST_ROWS, and the plan is pretty decent. 

It would be interesting to look at the before-ANALYZE cost estimate for
the hash join, which you could get by setting enable_mergejoin off (you
might have to turn off enable_nestloop too).  I recall though that
there's a fudge factor in costsize.c that penalizes hashing on a column
that no statistics are available for.  The reason for this is the
possibility that the column has only a small number of distinct values,
which would make a hash join very inefficient (in the worst case all
the values might end up in the same hash bucket, making it no better
than a nestloop).  Once you've done ANALYZE it plugs in a real estimate
instead, and evidently the cost estimate drops enough to make hashjoin
the winner.

You might be able to persuade it to use a hashjoin anyway by increasing
work_mem enough, but on the whole my advice is to do the ANALYZE after
you load up the temp table.  The planner really can't be expected to be
very intelligent when it has no stats.

            regards, tom lane


В списке pgsql-performance по дате сообщения:

От: Tom Lane
Дата:
Сообщение: Re: Simple join optimized badly?
От: "Denis Lussier"
Дата:
Сообщение: Re: Simple join optimized badly?