Re: multi-column index

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: multi-column index
Дата
Msg-id 25832.1111083332@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: multi-column index  (Manfred Koizar <mkoi-pg@aon.at>)
Ответы Re: multi-column index
Список pgsql-performance
Manfred Koizar <mkoi-pg@aon.at> writes:
> On Wed, 16 Mar 2005 22:19:13 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> calculate the correlation explicitly for each index

> May be it's time to revisit an old proposal that has failed to catch
> anybody's attention during the 7.4 beta period:
> http://archives.postgresql.org/pgsql-hackers/2003-08/msg00937.php

> I'm not sure I'd store index correlation in a separate table today.
> You've invented something better for functional index statistics, AFAIR.

Well, the original motivation for calculating correlations on columns
was that historically, you didn't need to re-ANALYZE after creating an
index: the stats on the base table were already in place.  So the idea
was to have the correlations already available whether or not the index
existed.  This works fine for plain indexes on single columns ;-).  We
didn't realize (or at least I didn't) how poorly the per-column stats
apply to multi-column indexes.

I am coming around to the view that we really do need to calculate
index-specific correlation numbers, and that probably does need a
special table ... or maybe better, add a column to pg_index.  The column
in pg_statistic is useless and should be removed, because there isn't
any need for per-column correlation.

Now, as to the actual mechanics of getting the numbers: the above link
seems to imply reading the whole index in index order.  Which is a
hugely expensive proposition for a big index, especially one that's
grown rather than been built recently --- the physical and logical
orderings of the index will be different.  (Hm, maybe we need a stat
about the extent of disorder within the index itself?)  We need a way
to get the number from a small sample of pages.

The idea I was toying with was to recalculate the index keys for the
sample rows that ANALYZE already acquires, and then compare/sort
those.  This is moderately expensive CPU-wise though, and it's also not
clear what "compare/sort" means for non-btree indexes.

If we could get a correlation estimate by probing only a small fraction
of the index pages, that would work, but in a disordered index I'm not
sure how you figure out what you're looking at.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: cpu_tuple_cost
Следующее
От: Michael Fuhr
Дата:
Сообщение: Re: cpu_tuple_cost