Re: proposal : cross-column stats

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: proposal : cross-column stats
Дата
Msg-id 4D06727E.7030009@fuzzy.cz
обсуждение исходный текст
Ответ на Re: proposal : cross-column stats  (Joshua Tolley <eggyknap@gmail.com>)
Ответы Re: proposal : cross-column stats  (Tomas Vondra <tv@fuzzy.cz>)
Список pgsql-hackers
Dne 13.12.2010 18:59, Joshua Tolley napsal(a):
> On Sun, Dec 12, 2010 at 07:10:44PM -0800, Nathan Boley wrote:
>> Another quick note: I think that storing the full contingency table is
>> wasteful since the marginals are already stored in the single column
>> statistics. Look at copulas [2] ( FWIW I think that Josh Tolley was
>> looking at this a couple years back ).
> 
> Josh Tolley still looks at it occasionally, though time hasn't permitted any
> sort of significant work for quite some time. The multicolstat branch on my
> git.postgresql.org repository will create an empirical copula each
> multi-column index, and stick it in pg_statistic. It doesn't yet do anything
> useful with that information, nor am I convinced it's remotely bug-free. In a
> brief PGCon discussion with Tom a while back, it was suggested a good place
> for the planner to use these stats would be clausesel.c, which is responsible
> for handling code such as "...WHERE foo > 4 AND foo > 5".

Well, that's good news ;-)

I've done a bit of research today, and I've found some quite interesting
papers on this topic (probably, I did not have time to read them, in
most cases I've read just the title and abstract).

[1] Selectivity estimators for multidimensional range queries over real   attributes
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.122.914
   This seems like a very good starting point. AFAIK it precisely   describes what data need to be collected, how to do
themath etc.
 

[2] Selectivity Estimation Without the Attribute Value Independence   Assumption
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.105.8126
   This obviously deals with the independence problem. Haven't   investigated it further, but it seems worth to read.

[3] On Analyzing the Cost of Queries with Multi-Attribute Restrictions   and Sort Operations (A Cost Function for
UniformlyPartitioned   UB-Trees)   http://mistral.in.tum.de/results/publications/MB00_ideas.pdf
 
   Describes something called UB-Tree, and shows how it may be used to   do estimates. Might be interesting as an
alternativeto the   traditional histograms.
 
   There are more details about UB-Trees at
   http://mistral.in.tum.de/results/publications/

[4] http://www.dbnet.ece.ntua.gr/~nikos/edith/qopt_bibl/
   A rather nice collection of papers related to estimation (including   some of the papers listed above).

Hm, I planned to finally read the "Understanding MySQL Internals" over
the Xmas ... that obviously won't happen.

regards
Tomas


В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Fetter
Дата:
Сообщение: Re: CommitFest wrap-up
Следующее
От: Robert Haas
Дата:
Сообщение: Re: CommitFest wrap-up