Re: Cross-column statistics revisited

Поиск
Список
Период
Сортировка
От Martijn van Oosterhout
Тема Re: Cross-column statistics revisited
Дата
Msg-id 20081017064140.GB1443@svana.org
обсуждение исходный текст
Ответ на Re: Cross-column statistics revisited  (Greg Stark <greg.stark@enterprisedb.com>)
Ответы Re: Cross-column statistics revisited  (Gregory Stark <stark@enterprisedb.com>)
Список pgsql-hackers
On Fri, Oct 17, 2008 at 12:20:58AM +0200, Greg Stark wrote:
> Correlation is the wrong tool. In fact zip codes and city have nearly
> zero correlation.  Zip codes near 00000 are no more likely to be in
> cities starting with A than Z.

I think we need to define our terms better. In terms of linear
correlation you are correct. However, you can define invertable mappings
from zip codes and cities onto the integers which will then have an
almost perfect correlation.

According to a paper I found this is related to the "principle of
maximum entropy". The fact that you can't determine such functions
easily in practice doesn't change the fact that zip codes and city
names are highly correlated.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Cross-column statistics revisited
Следующее
От: Gregory Stark
Дата:
Сообщение: Re: Cross-column statistics revisited