Re: pg_upgrade and statistics

Поиск

Список

Период

Сортировка

От	Ants Aasma
Тема	Re: pg_upgrade and statistics
Дата	16 марта 2012 г. 07:36:25
Msg-id	CA+CSw_tEYivSXMLqr7j0S298Mgg1FciLVfnSd=DgTEHt-UnDNQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: pg_upgrade and statistics (Alvaro Herrera <alvherre@commandprompt.com>)
Список	pgsql-hackers

Дерево обсуждения

On Thu, Mar 15, 2012 at 8:48 PM, Alvaro Herrera <alvherre@commandprompt.com>
> What Peter proposed seems to me pretty reasonable, in the sense that it
> should be possible to come up with a function that creates some text
> representation of whatever is in pg_statistic, and another function to
> load that data into the new catalog(s).  There's no need to keep
> pg_statistic binary-compatible, or even continue to have only
> pg_statistic (IIRC Zoltan/Hans-Jurgen patch for cross-column stats adds
> a new catalog, pg_statistic2 or similar).  If the upgrade target release
> has room for more/improved stats, that's fine -- they'll be unused after
> loading the stats from the dump.  And the old stats can be
> reacommodated, if necessary.

I have been reading up on selectivity estimation research for the last
few days. I must say that I also think that having a text
representation as an intermediate won't create a huge maintenance
burden. The basic concepts that are there are pretty solid.
Conceptually MCV's and histograms continue to be essential even with
the more complex approaches. Trying to maintain binary compatibility
is probably a bad idea, as Tom noted with the array selectivity patch
- encoding of the information could be better. But given a textual
format it won't be too hard to just massage the data to the new
format. Making it possible to dump and load stats has the additional
bonus of enabling more experimentation with custom stats collectors.
One could easily prototype the stats collection with R, scipy, etc. Of
course the proof will be in the pudding.

Re, the patch, current posted WIP cross-col patch doesn't create a new
catalog,. It repurposes the stat slots mechanism to store multiple
dimensions. But I'll most likely rewrite it to use a separate catalog
because the storage requirements are rather different. I'll post a
proposal in the appropriate thread when I have decently clear idea how
this should work. One thing that seems clear is that multi-dimensional
histograms will want this mechanism even more, optimal histogram
construction is NP-hard in the multi-dimensional case and so people
will want to try different algorithms, or make different tradeoffs on
effort spent on constructing the histogram. Or even build one by hand.

Cheers,
Ants Aasma

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Daniel Farina
Дата: 16 марта 2012 г., 06:02:12
Сообщение: Re: pg_terminate_backend for same-role

Следующее

От: Yeb Havinga
Дата: 16 марта 2012 г., 07:44:48
Сообщение: Re: [v9.2] Add GUC sepgsql.client_label

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: pg_upgrade and statistics

Предыдущее

Следующее