Re: proposal : cross-column stats

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: proposal : cross-column stats
Дата	12 декабря 2010 г. 15:23:18
Msg-id	4D04F6F8.1040403@fuzzy.cz обсуждение исходный текст
Ответ на	Re: proposal : cross-column stats (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список	pgsql-hackers

Дерево обсуждения

Dne 12.12.2010 15:43, Heikki Linnakangas napsal(a):
>> The classic failure case has always been: postcodes and city names.
>> Strongly correlated, but in a way that the computer can't easily see.
> 
> Yeah, and that's actually analogous to the example I used in my
> presentation.
> 
> The way I think of that problem is that once you know the postcode,
> knowing the city name doesn't add any information. The postcode implies
> the city name. So the selectivity for "postcode = ? AND city = ?" should
> be the selectivity of "postcode = ?" alone. The measurement we need is
> "implicativeness": How strongly does column A imply a certain value for
> column B. Perhaps that could be measured by counting the number of
> distinct values of column B for each value of column A, or something
> like that. I don't know what the statisticians call that property, or if
> there's some existing theory on how to measure that from a sample.

Yes, those issues are a righteous punishment for breaking BCNF rules ;-)

I'm not sure it's solvable using the contingency tables, as it requires
knowledge about dependencies between individual values (working with
cells is not enough, although it might improve the estimates).

Well, maybe we could collect these stats (number of cities for a given
ZIP code and number of ZIP codes for a given city). Collecting a good
stats about this is a bit tricky, but possible. What about collecting
this for the MCVs from both columns?

Tomas

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 12 декабря 2010 г., 15:16:56
Сообщение: Re: [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags

Следующее

От: Andrew Dunstan
Дата: 12 декабря 2010 г., 15:24:12
Сообщение: Re: function attributes

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: proposal : cross-column stats

Предыдущее

Следующее