Re: proposal : cross-column stats

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: proposal : cross-column stats
Дата
Msg-id 4D06A210.8070009@fuzzy.cz
обсуждение исходный текст
Ответ на Re: proposal : cross-column stats  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: proposal : cross-column stats  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Dne 13.12.2010 22:50, Josh Berkus napsal(a):
> Tomas,
> 
>>   (a) find out what statistics do we need to collect and how to use it
>>   (b) implement a really stupid inefficient solution
>>   (c) optimize in iterations, i.e. making it faster, consuming less
>>       space etc.
> 
> I'll suggest again how to decide *which* columns to cross: whichever
> columns are combined in composite indexes.  In version 2, allow the DBA
> to specify combinations.
> 
> In the unlikely event that correlation could be reduced to a single
> float number, it would be conceivable for each column to have an array
> of correlation stats for every other column where correlation was
> non-random; on most tables (i.e. ones with less than 100 columns) we're
> not talking about that much storage space.
> 
> The main cost would be the time spent collecting that info ...

I think this is a bit early to discuss this, given the fact that we
don't have a working solution yet. But OK, let's discuss these options
anyway

1) collecting the automatically for composite indexes
  I don't think this is wise idea. The first versions definitely won't  be very efficient, and collecting the data for
eachcomposite  index means everyone will be hit by this inefficiency, even if he  actually does not need that (e.g. the
columnsare independent so the  current estimates are quite accurate or he's not using those columns  very often in the
sameWHERE clause).
 
  Another reason against this is that many DBAs don't actually use  composed indexes - they simply create indexes on
eachcolumn and let  the bitmap index scan to work it out. And this would not work for  this case.
 
  And actually it's not very complicated to allow the DBA to do this,  this can be a quite simple PL/pgSQL procedure.

2) collecting correlation for each pair of columns
  Again, you're effectively forcing everyone to pay the price even  though he may not need the feature. Maybe we'll get
thereone day,  but it's not a good idea to do that from the beginning.
 
  And the correlation itself has a very limited use in real life, as  it's not possible to compute it for character
columnsand is not  very useful in case of some numerical columns (e.g. ZIP codes).
 

regards
Tomas


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: hstores in pl/python
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal : cross-column stats