Re: Thinking About Correlated Columns (again)

От: Gavin Flower
Тема: Re: Thinking About Correlated Columns (again)
Дата: ,
Msg-id: 5193EE89.1090406@archidevsys.co.nz
(см: обсуждение, исходный текст)
Ответ на: Re: Thinking About Correlated Columns (again)  (Heikki Linnakangas)
Список: pgsql-performance

Скрыть дерево обсуждения

Thinking About Correlated Columns (again)  (Shaun Thomas, )
 Re: Thinking About Correlated Columns (again)  (Heikki Linnakangas, )
  Re: Thinking About Correlated Columns (again)  (Shaun Thomas, )
  Re: Thinking About Correlated Columns (again)  (Nikolas Everett, )
   Re: Thinking About Correlated Columns (again)  (eggyknap, )
  Re: Thinking About Correlated Columns (again)  (Gavin Flower, )
 Re: Thinking About Correlated Columns (again)  (Craig James, )
  Re: Thinking About Correlated Columns (again)  (Andrew Dunstan, )
  Re: Thinking About Correlated Columns (again)  (Gavin Flower, )
   Re: Thinking About Correlated Columns (again)  (Craig James, )
 Re: Thinking About Correlated Columns (again)  (Thomas Kellerer, )
  Re: Thinking About Correlated Columns (again)  (Shaun Thomas, )

On 16/05/13 03:52, Heikki Linnakangas wrote:
On 15.05.2013 18:31, Shaun Thomas wrote:
I've seen conversations on this since at least 2005. There were even
proposed patches every once in a while, but never any consensus. Anyone
care to comment?

Well, as you said, there has never been any consensus.

There are basically two pieces to the puzzle:

1. What metric do you use to represent correlation between columns?

2. How do use collect that statistic?

Based on the prior discussions, collecting the stats seems to be tricky. It's not clear for which combinations of columns it should be collected (all possible combinations? That explodes quickly...), or how it can be collected without scanning the whole table.

I think it would be pretty straightforward to use such a statistic, once we have it. So perhaps we should get started by allowing the DBA to set a correlation metric manually, and use that in the planner.

- Heikki


How about pg comparing actual numbers of rows delivered with the predicted number - and if a specified threshold is reached, then maintaining statistics? There is obviously more to it, such as: is this a relevant query to consider & the size of the tables (no point in attempting to optimise tables with only 10 rows for example).


Cheers,
Gavin

В списке pgsql-performance по дате сообщения:

От: Craig James
Дата:
Сообщение: Re: Thinking About Correlated Columns (again)
От: Andrea Suisani
Дата:
Сообщение: Re: [OT] linux 3.10 kernel will improve ipc,sysv semaphore scalability