Re: cross column correlation revisted

Поиск
Список
Период
Сортировка
От Yeb Havinga
Тема Re: cross column correlation revisted
Дата
Msg-id 4C3D9DAF.8040807@gmail.com
обсуждение исходный текст
Ответ на Re: cross column correlation revisted  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы Re: cross column correlation revisted  (Joshua Tolley <eggyknap@gmail.com>)
Список pgsql-hackers
Heikki Linnakangas wrote:
> However, the problem is how to represent and store the 
> cross-correlation. For fields with low cardinality, like "gender" and 
> boolean "breast-cancer-or-not" you can count the prevalence of all the 
> different combinations, but that doesn't scale. Another often cited 
> example is zip code + street address. There's clearly a strong 
> correlation between them, but how do you represent that?
>
> For scalar values we currently store a histogram. I suppose we could 
> create a 2D histogram for two columns, but that doesn't actually help 
> with the zip code + street address problem.
In my head the neuron for 'principle component analysis' went on while 
reading this. Back in college it was used to prepare input data before 
feeding it into a neural network. Maybe ideas from PCA could be helpful?

regards,
Yeb Havinga




В списке pgsql-hackers по дате отправления:

Предыдущее
От: PostgreSQL - Hans-Jürgen Schönig
Дата:
Сообщение: Re: cross column correlation revisted
Следующее
От: Yeb Havinga
Дата:
Сообщение: Re: five-key syscaches