Re: proposal : cross-column stats

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: proposal : cross-column stats
Дата
Msg-id 4D0BD8FF.2070300@enterprisedb.com
обсуждение исходный текст
Ответ на Re: proposal : cross-column stats  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: proposal : cross-column stats  (Tomas Vondra <tv@fuzzy.cz>)
Список pgsql-hackers
On 17.12.2010 23:13, Tomas Vondra wrote:
> Dne 17.12.2010 19:58, Robert Haas napsal(a):
>> I haven't read the paper yet (sorry) but just off the top of my head,
>> one possible problem here is that our n_distinct estimates aren't
>> always very accurate, especially for large tables.  As we've discussed
>> before, making them accurate requires sampling a significant
>> percentage of the table, whereas all of our other statistics can be
>> computed reasonably accurately by sampling a fixed amount of an
>> arbitrarily large table.  So it's possible that relying more heavily
>> on n_distinct could turn out worse overall even if the algorithm is
>> better.  Not sure if that's an issue here, just throwing it out
>> there...
>
> Yes, you're right - the paper really is based on (estimates of) number
> of distinct values for each of the columns as well as for the group of
> columns.
>
> AFAIK it will work with reasonably precise estimates, but the point is
> you need an estimate of distinct values of the whole group of columns.
> So when you want to get an estimate for queries on columns (a,b), you
> need the number of distinct value combinations of these two columns.
>
> And I think we're not collecting this right now, so this solution
> requires scanning the table (or some part of it).

Any idea how sensitive it is to the accuracy of that estimate on 
distinct value combinations? If we get that off by a factor of ten or a 
hundred, what kind of an effect does it have on the final cost estimates?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: proposal : cross-column stats
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: proposal : cross-column stats