> Zeugswetter Andreas IZ5 <Andreas.Zeugswetter@telecom.at> writes:
> > Other db's usually use the value count(*) / nunique for the light
> > weight statistics. This makes the assumptoin that the distinct index
> > values are evenly distributed. That is on average a correct
> > assumption, whereas our assumption on average overestimates the number
> > of rows returned. I am not sure we have a nunique info though.
>
> We don't, and AFAICS it would be an expensive statistic to compute.
>
> I have thought about this a little more overnight, and I have come up
> with what I think is a better idea. Suppose that VACUUM ANALYZE stores
> in pg_statistic not only the disbursion, but also the most frequently
> occurring value of each column. It already computes (or I should say
> estimates) the most frequently occurring value (MFOV) in order to arrive
> at the disbursion, so storing the value costs nothing except a little
> more space in pg_statistic. Now, the logic that eqsel() should use is
>
> if constant-being-compared-against == MFOV then
> return disbursion;
> else
> return MIN(disbursion, 1.0 - disbursion);
Yes, I like this.
> BTW, this argument proves rigorously that the selectivity of a search
> for any value other than the MFOV is not more than 0.5, so there is some
> basis for my intuition that eqsel should not return a value above 0.5.
> So, in the cases where eqsel does not know the exact value being
> searched for, I'd still be inclined to cap its result at 0.5.
I don't follow this. If the most frequent value occurs 95% of the time,
wouldn't the selectivity be 0.95? And why use 1-disbursion. You may
find the existing code does a better job than the MIN() computation.
-- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610)
853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill,
Pennsylvania19026