This is yet another issue entirely. This is about estimating how much
io will be random io if we do an index order scan. Correlation is a
passable tool for this but we might be able to do better.
But it has nothing to do with the cross-column stats problem.
greg
On 17 Oct 2008, at 01:29 AM, Ron Mayer <rm_pg@cheapcomplexdevices.com>
wrote:
> Josh Berkus wrote:
>>> Yes, or to phrase that another way: What kinds of queries are being
>>> poorly optimized now and why?
>> Well, we have two different correlation problems. One is the
>> problem of dependant correlation, such as the 1.0 correlation of
>> ZIP and CITY fields as a common problem. This could in fact be
>> fixed, I believe, via a linear math calculation based on the
>> sampled level of correlation, assuming we have enough samples. And
>> it's really only an issue if the correlation is
>>> 0.5.
>
> I'd note that this can be an issue even without 2 columns involved.
>
> I've seen a number of tables where the data is loaded in batches
> so similar-values from a batch tend to be packed into relatively few
> pages.
>
> Thinks a database for a retailer that nightly aggregates data from
> each of many stores. Each incoming batch inserts the store's data
> into tightly packed disk pages where most all rows on the page are for
> that store. But those pages are interspersed with pages from other
> stores.
>
> I think I like the ideas Greg Stark had a couple years ago:
> http://archives.postgresql.org/pgsql-hackers/2006-09/msg01040.php
> "...sort the sampled values by value
> and count up the average number of distinct blocks per value.... Or
> perhaps we need a second histogram where the quantities are of
> distinct pages rather than total records.... We might also need a
> separate "average number of n-block spans per value"
> since those seem to me to lead more directly to values like "blocks
> that need to be read".
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers