Re: COUNT(*) and index-only scans

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: COUNT(*) and index-only scans
Дата
Msg-id CA+TgmobFbgQgwzM64xFmmcWZUCEwXi_vJEuSqjbo7S3yFHzigQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: COUNT(*) and index-only scans  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: COUNT(*) and index-only scans
Список pgsql-hackers
On Wed, Oct 12, 2011 at 10:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> What bothers me considerably more is the issue about how specific
>>> queries might see an all-visible fraction that's very substantially
>>> different from the table's overall ratio,
>
>> - Suppose VACUUM processes the table and makes it all-visible.  Then,
>> somebody comes along and updates one tuple on every page, making them
>> all not-all-visible, but not trigger VACUUM because we're nowhere
>> close the 20% threshold.  Now COUNT(*) will think it should use an
>> index-scan, but really... not so much.  In fact, even if it's only
>> that a tuple has been updated on 25% of the pages, we're probably in
>> trouble.
>
> Yeah, but that would be a pretty unlucky pattern, and in any case the
> fix for it is going to be to make autovacuum more aggressive.

Hmm, maybe.

>> - Suppose the table has a million rows and we're going to read 100 of
>> them, or 0.01%.  Now it might appear that a covering index has a
>> negligible advantage over a non-covering index, but in fact I think we
>> still want to err on the side of trying to use the covering index.
>
> Given that fact pattern we still will, I think.  We'll still prefer an
> indexscan over a seqscan, for sure.  In any case, if you believe the
> assumption that those 100 rows are more likely to be recently-dirtied
> than the average row, I'm not sure why you think we should be trying to
> force an assumption that index-only will succeed here.

I'm not concerned about an index scan vs. a sequential scan here.  I'm
concerned about it being impossible for the DBA to get an index-only
scan when s/he wants it very badly.  The current (stupid) formula
handles this case just about perfectly - it will prefer a smaller
index over a larger one, except when a covering index is available, in
which case it will prefer the smallest covering index.  That sounds
exactly right to me.  We get that behavior because the 10% of heap
fetches that we're assuming we'll get to skip is larger than the
penalty for using a bigger index.  If we take out 10% and replace it
by all_visible_percentage * fraction_of_tuples_fetched, then that 10%
is going to drop to some infinitesmally small value on single row
fetches from giant tables.  But that's exactly one of the cases for
which people want index-only scans in the first place.  It's no better
to be overly pessimistic here than it is to be overly optimistic.  If
the table is 90% all-visible, the probability of our finding an
all-visible row is probably not 90%.  But it's probably not 0.01% or
0.0001% either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [BUGS] *.sql contrib files contain unresolvable MODULE_PATHNAME
Следующее
От: Tom Lane
Дата:
Сообщение: Re: COUNT(*) and index-only scans