Re: proposal : cross-column stats

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: proposal : cross-column stats
Дата
Msg-id 1292790103.1193.9461.camel@ebony
обсуждение исходный текст
Ответ на Re: proposal : cross-column stats  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: proposal : cross-column stats  (Tomas Vondra <tv@fuzzy.cz>)
Список pgsql-hackers
On Mon, 2010-12-13 at 10:38 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Sun, Dec 12, 2010 at 9:16 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> >> The proposed solution is based on contingency tables, built for selected
> >> groups of columns (not for each possible group). And the contingency
> >> table gives you the ability to estimate the probabilities needed to
> >> compute the selectivity. Or am I missing something?
> 
> > Well, I'm not real familiar with contingency tables, but it seems like
> > you could end up needing to store a huge amount of data to get any
> > benefit out of it, in some cases.
> 
> The reason that this wasn't done years ago is precisely that nobody's
> figured out how to do it with a tolerable amount of stats data and a
> tolerable amount of processing time (both at ANALYZE time and during
> query planning).  It's not hard to see what we'd ideally like to do;
> it's getting from there to something useful in production that's hard.

I think we have to face up to the fact that attempting to derive
meaningful cross-column stats will require larger sample sizes.

If we collect everything we're going to have ~10^9 stats slots with
default stats_target 100 and a 100 column table.

We should disconnect sample size from histogram size, and we need to
make the initial column pairings vastly fewer than all combinations.
Manual specification seems like it will be required for the cases where
we decide not to include it automatically, so it seems we'll need manual
specification anyway. In that case, we should do manual specification
first.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: pg_ctl and port number detection
Следующее
От: 3dmashup
Дата:
Сообщение: Re: Amazon now supporting GPU focused EC2 instances