Re: Correlation in cost_index()

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Correlation in cost_index()
Дата
Msg-id 1418.1060381541@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Correlation in cost_index()  (Sean Chittenden <sean@chittenden.org>)
Ответы Re: Correlation in cost_index()  (Manfred Koizar <mkoi-pg@aon.at>)
Список pgsql-hackers
Sean Chittenden <sean@chittenden.org> writes:
> indexCorrelation is 1.0 for the 1st key in a multi-column index.

... only if it's perfectly correlated.

> As things stand, however, if a multi-column key is
> used, the indexCorrelation is penalized by the size of the number of
> keys found in the multi-column index.  As things stand the qual
> user_id = 42, on a CLUSTER'ed multi-column index (user_id,utc_date)
> has an indexCorrelation of 0.5, when in fact the correlation is 1.0.

Right, in the perfectly-correlated case this calculation is clearly
wrong.  However, what of cases where the first column shows good
correlation with the physical ordering, but the second does not?

The nasty part of this is that the correlation stat that ANALYZE
computed for the second column is of no value to us.  Two examples:
X    Y                X    Y
A    A                A    BA    B                A    CA    C                A    AB    A                B    AB    B
             B    CB    C                B    BC    A                C    CC    B                C    AC    C
    C    B
 

In both cases ANALYZE will calculate correlation 1.0 for column X,
and something near zero for column Y.  We would like to come out with
index correlation 1.0 for the left-hand case and something much less
(but, perhaps, not zero) for the right-hand case.  I don't really see
a way to do this without actually examining the multi-column ordering
relationship during ANALYZE.

> I tossed a different index on my test table to see how well things
> fare with a low correlation, and this was a bit disturbing:

Seems like most of the error in that estimate has to do with the poor
rowcount estimation.  There's very little percentage in trying to
analyze the effect of index correlation in examples where we don't have
the first-order stats correct ...
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sean Chittenden
Дата:
Сообщение: Re: Correlation in cost_index()
Следующее
От: Manfred Koizar
Дата:
Сообщение: Re: Correlation in cost_index()