Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)
Дата
Msg-id 199907281606.MAA11774@candle.pha.pa.us
обсуждение исходный текст
Ответ на Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
> Zeugswetter Andreas IZ5 <Andreas.Zeugswetter@telecom.at> writes:
> > Other db's usually use the value count(*) / nunique for the light
> > weight statistics.  This makes the assumptoin that the distinct index
> > values are evenly distributed.  That is on average a correct
> > assumption, whereas our assumption on average overestimates the number
> > of rows returned.  I am not sure we have a nunique info though.
> 
> We don't, and AFAICS it would be an expensive statistic to compute.
> 
> I have thought about this a little more overnight, and I have come up
> with what I think is a better idea.  Suppose that VACUUM ANALYZE stores
> in pg_statistic not only the disbursion, but also the most frequently
> occurring value of each column.  It already computes (or I should say
> estimates) the most frequently occurring value (MFOV) in order to arrive
> at the disbursion, so storing the value costs nothing except a little
> more space in pg_statistic.  Now, the logic that eqsel() should use is
> 
>     if constant-being-compared-against == MFOV then
>         return disbursion;
>     else
>         return MIN(disbursion, 1.0 - disbursion);

Yes, I like this.

> BTW, this argument proves rigorously that the selectivity of a search
> for any value other than the MFOV is not more than 0.5, so there is some
> basis for my intuition that eqsel should not return a value above 0.5.
> So, in the cases where eqsel does not know the exact value being
> searched for, I'd still be inclined to cap its result at 0.5.

I don't follow this. If the most frequent value occurs 95% of the time,
wouldn't the selectivity be 0.95?  And why use 1-disbursion.  You may
find the existing code does  a better job than the MIN() computation.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "D'Arcy" "J.M." Cain
Дата:
Сообщение: Re: [HACKERS] Checking if a system is ELF
Следующее
От: Oleg Bartunov
Дата:
Сообщение: SYSLOGD facility