Re: proposal : cross-column stats

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: proposal : cross-column stats
Дата
Msg-id 4D0BD277.3060105@fuzzy.cz
обсуждение исходный текст
Ответ на Re: proposal : cross-column stats  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: proposal : cross-column stats  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: proposal : cross-column stats  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-hackers
Dne 17.12.2010 19:58, Robert Haas napsal(a):
> I haven't read the paper yet (sorry) but just off the top of my head,
> one possible problem here is that our n_distinct estimates aren't
> always very accurate, especially for large tables.  As we've discussed
> before, making them accurate requires sampling a significant
> percentage of the table, whereas all of our other statistics can be
> computed reasonably accurately by sampling a fixed amount of an
> arbitrarily large table.  So it's possible that relying more heavily
> on n_distinct could turn out worse overall even if the algorithm is
> better.  Not sure if that's an issue here, just throwing it out
> there...

Yes, you're right - the paper really is based on (estimates of) number
of distinct values for each of the columns as well as for the group of
columns.

AFAIK it will work with reasonably precise estimates, but the point is
you need an estimate of distinct values of the whole group of columns.
So when you want to get an estimate for queries on columns (a,b), you
need the number of distinct value combinations of these two columns.

And I think we're not collecting this right now, so this solution
requires scanning the table (or some part of it).

I know this is a weak point of the whole solution, but the truth is
every cross-column stats solution will have to do something like this. I
don't think we'll find a solution with 0 performance impact, without the
need to scan sufficient part of a table.

That's why I want to make this optional so that the users will use it
only when really needed.

Anyway one possible solution might be to allow the user to set these
values manually (as in case when ndistinct estimates are not precise).

regards
Tomas


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: typed table casting
Следующее
От: Tom Lane
Дата:
Сообщение: Re: unlogged tables vs. GIST