Re: Does "correlation" mislead the optimizer on large
От | Ron Mayer |
---|---|
Тема | Re: Does "correlation" mislead the optimizer on large |
Дата | |
Msg-id | Pine.LNX.4.44.0301241417140.4023-100000@localhost.localdomain обсуждение исходный текст |
Ответ на | Re: Does "correlation" mislead the optimizer on large (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-performance |
On Fri, 24 Jan 2003, Tom Lane wrote: > > Ron Mayer <ron@intervideo.com> writes: > > A proposal.... (yes I I'm volunteering if people point me in the right > > direction)... > > I do not think ANALYZE is the problem here; at least, it's premature to > worry about that end of things until you've defined (a) what's to be > stored in pg_statistic, and (b) what computation the planner needs to > make to derive a cost estimate given the stats. Cool. Thanks for a good starting point. If I wanted to brainstorm further, should I do so here, or should I encourage interested people to take it off line with me (ron@intervideo.com) and I can post a summary of the conversation? Ron For those who do want to brainstorm with me, my starting point is this: With my particular table, I think the main issue is still that I have a lot of data that looks like: values: aaaaaaaaaaabbbbbbbbccccccccddddddddddaaaabbbbbbbccccccccddddd... disk page: |page 1|page 2|page 3|page 4|page 5|page 6|page 7|page 8|page 9| The problem I'm trying to address is that the current planner guesses that most of the pages will need to be read; however the local clustering means that in fact only a small subset need to be accessed. My first guess is that modifying the definition of "correlation" to account for page-sizes would be a good approach. I.e. Instead of the correlation across the whole table, for each row perform an auto-correlation (http://astronomy.swin.edu.au/~pbourke/analysis/correlate/) and keep only the values with a "delay" of less than 1 page-size. If you want to share thoughts offline (ron@intervideo.com), I'll gladly post a summary of responses here to save the bandwidth of the group.
В списке pgsql-performance по дате отправления: