Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)
Дата
Msg-id 199907290021.UAA24604@candle.pha.pa.us
обсуждение исходный текст
Ответ на Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)  (Philip Warner <pjw@rhyme.com.au>)
Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> >> BTW, this argument proves rigorously that the selectivity of a search
> >> for any value other than the MFOV is not more than 0.5, so there is some
> >> basis for my intuition that eqsel should not return a value above 0.5.
> >> So, in the cases where eqsel does not know the exact value being
> >> searched for, I'd still be inclined to cap its result at 0.5.
> 
> > I don't follow this. If the most frequent value occurs 95% of the time,
> > wouldn't the selectivity be 0.95?
> 
> If you are searching for the most frequent value, then the selectivity
> estimate should indeed be 0.95.  If you are searching for anything else,
> the selectivity estimate ought to be 0.05 or less.  If you don't know
> what value you will be searching for, which number should you use?

You are going to love this:
0.95 * 0.95 + 0.05 * 0.05

This is because with a 95% of one value, you would think the ask for
that value 95% of the time, and another value 5% of the time.  The last
0.05 is not really accurate.  It assumes there are only two unique
values in the table, which may be wrong, but it is close enough.

> 
> The unsupported assumption here is that if the table contains 95%
> occurrence of a particular value, then the odds are also 95% (or at
> least high) that that's the value you are searching for in any given
> query that has an "= something" WHERE qual.

Yes.

> That assumption is pretty reasonable in some cases (such as your
> example earlier of "WHERE state = 'PA'" in a Pennsylvania-local
> database), but it falls down badly in others, such as where the
> most common value is NULL or an empty string or some other indication
> that there's no useful data.  In that sort of situation it's actually
> pretty unlikely that the user will be searching for field =
> most-common-value ... but the system probably has no way to know that.

Well, if null is most common, it is very probable they would be looking
for col IS NULL.

> I wonder whether it would help to add even more data to pg_statistic.
> For example, suppose we store the fraction of the columns that are NULL,
> plus the most frequently occurring *non null* value, plus the fraction
> of the columns that are that value.  This would allow us to be very
> smart about columns in which "no data" is represented by NULL (as a good
> DB designer would do):

That would be nice.

> 
> selectivity of "IS NULL": NULLfraction
> 
> selectivity of "IS NOT NULL": 1 - NULLfraction
> 
> selectivity of "= X" for a known non-null constant X:
>     if X == MFOV: MFOVfraction
>     else: MIN(MFOVfraction, 1-MFOVfraction-NULLfraction)
> 
> selectivity of "= X" when X is not known a priori, but presumably is not
> null:
>     MIN(MFOVfraction, 1-NULLfraction)
> 
> Both of the MIN()s are upper bounds, so multiplying them by a
> fudge-factor < 1 would be reasonable.

Yes, I am with you here.

> These rules would guarantee small selectivity values when either
> MFOVfraction or 1-NULLfraction is small.  It still wouldn't cost
> much, since I believe VACUUM ANALYZE is counting nulls already...

Yes, it is.  Sounds nice.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] pg_dump not dumping all tables
Следующее
От: The Hermit Hacker
Дата:
Сообщение: Re: [HACKERS] pg_dump not dumping all tables