Re: proposal : cross-column stats

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: proposal : cross-column stats
Дата
Msg-id 4D057ADD.3040305@fuzzy.cz
обсуждение исходный текст
Ответ на Re: proposal : cross-column stats  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: proposal : cross-column stats  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Dne 13.12.2010 01:05, Robert Haas napsal(a):
> This is a good idea, but I guess the question is what you do next.  If
> you know that the "applicability" is 100%, you can disregard the
> restriction clause on the implied column.  And if it has no
> implicatory power, then you just do what we do now.  But what if it
> has some intermediate degree of implicability?

Well, I think you've missed the e-mail from Florian Pflug - he actually
pointed out that the 'implicativeness' Heikki mentioned is called
conditional probability. And conditional probability can be used to
express the "AND" probability we are looking for (selectiveness).

For two columns, this is actually pretty straighforward - as Florian
wrote, the equation is
  P(A and B) = P(A|B) * P(B) = P(B|A) * P(A)

where P(B) may be estimated from the current histogram, and P(A|B) may
be estimated from the contingency (see the previous mails). And "P(A and
B)" is actually the value we're looking for.

Anyway there really is no "intermediate" degree of aplicability, it just
gives you the right estimate.

And AFAIR this is easily extensible to more than two columns, as
 P(A and B and C) = P(A and (B and C)) = P(A|(B and C)) * P(B and C)

so it's basically a recursion.

Well, I hope my statements are really correct - it's been a few years
since I gained my degree in statistics ;-)

regards
Tomas


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Rob Wultsch
Дата:
Сообщение: Re: ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
Следующее
От: Robert Haas
Дата:
Сообщение: Re: proposal : cross-column stats