От: Tom Lane
Тема: Re: Index oddity
Дата: ,
Msg-id: 19759.1086843718@sss.pgh.pa.us
(см: обсуждение, исходный текст)
Ответ на: Re: Index oddity  (ken)
Ответы: Re: Index oddity (still)  (ken)
Список: pgsql-performance

Скрыть дерево обсуждения

Index oddity  (ken, )
 Re: Index oddity  (Rod Taylor, )
  Re: Index oddity  (ken, )
   Re: Index oddity  (Rod Taylor, )
    Re: Index oddity  (ken, )
     Re: Index oddity  (Rod Taylor, )
      Re: Index oddity  (ken, )
       Re: Index oddity  (Christopher Kings-Lynne, )
        Re: Index oddity  (Rod Taylor, )
         Re: Index oddity  (Mark Kirkwood, )
          Re: Index oddity  ("Joshua D. Drake", )
     Re: Index oddity  (Tom Lane, )
      Re: Index oddity (still)  (ken, )
       Re: Index oddity (still)  (Tom Lane, )

ken <> writes:
> ... and here is the plan with statistics set to 1000 ...

>  Seq Scan on nrgfeature f  (cost=0.00..31675.57 rows=18608 width=218)
> (actual time=63.544..1002.701 rows=225 loops=1)
>    Filter: ((upperrightx > 321264.236977215::double precision) AND
> (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> 123286.261898636::double precision) AND (lowerlefty <
> 124985.927450476::double precision) AND (diagonalsize > 49.999::double
> precision))

> ... so yeah, its obviously finding way, way less rows than it thinks it
> will.

Yup.  I think your problem here is that the conditions on the different
columns are highly correlated, but the planner doesn't know anything
about that, because it has no cross-column statistics.

You could check that the individual conditions are accurately estimated
by looking at the estimated and actual row counts in

explain analyze
SELECT * FROM nrgfeature f WHERE upperRightX > 321264.23697721504;

explain analyze
SELECT * FROM nrgfeature f WHERE lowerLeftX < 324046.79981208267;

etc --- but I'll bet lunch that they are right on the money with the
higher statistics targets, and probably not too far off even at the
default.  The trouble is that the planner is simply multiplying these
probabilities together to get its estimate for the combined query,
and when the columns are not independent that leads to a very bad
estimate.

In particular it sounds like you have a *whole lot* of rows that have
diagonalSize just under 50, and so changing the diagonalSize condition
to include those makes for a big change in the predicted number of rows,
even though for the specific values of upperRightX and friends in your
test query there isn't any change in the actual number of matching rows.

I don't have any advice to magically solve this problem.  I would
suggest experimenting with alternative data representations -- for
example, maybe it would help to store "leftX" and "width" in place
of "leftX" and "rightX", etc.  What you want to do is try to decorrelate
the column values.  leftX and rightX are likely to have a strong
correlation, but maybe leftX and width don't.

            regards, tom lane


В списке pgsql-performance по дате сообщения:

От: Tom Lane
Дата:
Сообщение: Re: Index oddity
От: Frank van Vugt
Дата:
Сообщение: *very* inefficient choice made by the planner (regarding IN(...))