Re: rtree indexes aren't being used with 7.0

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: rtree indexes aren't being used with 7.0
Дата
Msg-id 28269.958416786@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: rtree indexes aren't being used with 7.0  (Jeff Hoffmann <jeff@propertykey.com>)
Список pgsql-general
Jeff Hoffmann <jeff@propertykey.com> writes:
> i'm still trying to understand how cost estimates & selectivity work.
> i'm looking at a page in the programmers guide & i'm trying to figure
> out how it works for a normal (btree) index vs. how it would (or should)
> work for an rtree.

"Use the source, Luke ..."  A lot of this low-level stuff is not
adequately documented except in the source code.

There are three basic components to the cost estimate: an
operator-specific selectivity routine, which tries to estimate the
fraction of tuples that will be returned; a generic cost estimator
for indexscans; and a cost estimator for the specific index type.
The generic estimator tries to figure the costs for accesses to the
table proper, given the selectivity fraction; while the index-specific
estimator worries about the cost of accesses to the index.  The generic
cost routine is in backend/optimizer/path/costsize.c, and the other
stuff is in backend/utils/adt/selfuncs.c and
backend/utils/adt/geo_selfuncs.c.

Right now the "index-specific estimator" is no such thing; all the index
types share a single default routine.  If you know more about rtree
access patterns than that code does, feel free to suggest improvements.

But the real problem is of course the selectivity routine; for the
geometric operators we've never had anything but stub routines that
return a constant.  You might want to look at the scalar-comparison
selectivity routines (selfuncs.c) to see how they work.  Basically
the idea for those is that VACUUM ANALYZE stores the min and max values
of each column, and given an operator like "var < constant" we can do
a linear interpolation between the min and max to estimate the
selectivity.  For example, if min=0, max=100, and the constant is 40
then we'd estimate 0.40 selectivity.  (This works great as long as the
data is reasonably uniformly distributed between the min and max, but
falls down badly if it's not.  One of the items on our very long TODO
list is to change VACUUM ANALYZE to store more extensive statistics so
that we can be a little smarter...)

A corresponding kind of idea would be to have VACUUM store the bounding
box, and perhaps other stats like the average box area, and then compare
that to the specific box being probed with to estimate selectivity.

An open question: how to tell VACUUM that for box columns we want
bounding box and not min/max?  I'm inclined to think that we should
generalize the VACUUM ANALYZE mechanism so that it runs N aggregates
over each column and stores the results, where the specific aggregates
to use are looked up in a table based on the column datatype.

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Carlos Felipe Zirbes
Дата:
Сообщение: Where is it?
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Best way to "add" columns