Re: rtree indexes aren't being used with 7.0
От | Tom Lane |
---|---|
Тема | Re: rtree indexes aren't being used with 7.0 |
Дата | |
Msg-id | 28269.958416786@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: rtree indexes aren't being used with 7.0 (Jeff Hoffmann <jeff@propertykey.com>) |
Список | pgsql-general |
Jeff Hoffmann <jeff@propertykey.com> writes: > i'm still trying to understand how cost estimates & selectivity work. > i'm looking at a page in the programmers guide & i'm trying to figure > out how it works for a normal (btree) index vs. how it would (or should) > work for an rtree. "Use the source, Luke ..." A lot of this low-level stuff is not adequately documented except in the source code. There are three basic components to the cost estimate: an operator-specific selectivity routine, which tries to estimate the fraction of tuples that will be returned; a generic cost estimator for indexscans; and a cost estimator for the specific index type. The generic estimator tries to figure the costs for accesses to the table proper, given the selectivity fraction; while the index-specific estimator worries about the cost of accesses to the index. The generic cost routine is in backend/optimizer/path/costsize.c, and the other stuff is in backend/utils/adt/selfuncs.c and backend/utils/adt/geo_selfuncs.c. Right now the "index-specific estimator" is no such thing; all the index types share a single default routine. If you know more about rtree access patterns than that code does, feel free to suggest improvements. But the real problem is of course the selectivity routine; for the geometric operators we've never had anything but stub routines that return a constant. You might want to look at the scalar-comparison selectivity routines (selfuncs.c) to see how they work. Basically the idea for those is that VACUUM ANALYZE stores the min and max values of each column, and given an operator like "var < constant" we can do a linear interpolation between the min and max to estimate the selectivity. For example, if min=0, max=100, and the constant is 40 then we'd estimate 0.40 selectivity. (This works great as long as the data is reasonably uniformly distributed between the min and max, but falls down badly if it's not. One of the items on our very long TODO list is to change VACUUM ANALYZE to store more extensive statistics so that we can be a little smarter...) A corresponding kind of idea would be to have VACUUM store the bounding box, and perhaps other stats like the average box area, and then compare that to the specific box being probed with to estimate selectivity. An open question: how to tell VACUUM that for box columns we want bounding box and not min/max? I'm inclined to think that we should generalize the VACUUM ANALYZE mechanism so that it runs N aggregates over each column and stores the results, where the specific aggregates to use are looked up in a table based on the column datatype. regards, tom lane
В списке pgsql-general по дате отправления: