Re: COUNT(*) and index-only scans

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: COUNT(*) and index-only scans
Дата
Msg-id CA+TgmoaWaPg8jp443yV6eG0H9EY-Zj1-KDyJWvFJ=430DxzOdQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: COUNT(*) and index-only scans  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: COUNT(*) and index-only scans
Re: COUNT(*) and index-only scans
Список pgsql-hackers
On Tue, Oct 11, 2011 at 12:43 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Greg Stark wrote:
>> On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > My intention was to allow it to consider any covering index. ?You're
>> > thinking about the cost estimate, which is really entirely different.
>> >
>>
>> Is there any reason to consider more than one? I would have expected
>> the narrowest one to be the best choice. There's something to be said
>> for using the same index consistently but we already have that problem
>> and make no attempt to do that. And partial indexes might be better
>> but then we would already be considering them if their constraints are
>> satisfied.
>
> Actually, I think the smallest non-partial one on disk might be the best
> --- that is very easy to find out.

I doubt there is any need to write special-purpose code to decide
which index ought to be used for a full table scan.  We can just throw
all of the otherwise-useless indexes into the costing machinery with
empty pathkeys, and let them duke it out.  All but the best one will
be instantly discarded, and the best one will either beat or lose to a
sequential scan.  All of this will happen before we start trying to
build join paths, so there's no combinatorial explosion in planning
time - it'll just be a straightforward cost comparison between plans
with identical pathkeys, and should be quite fast.

The real issue is that the costing estimates need to be accurate, and
that's where the rubber hits the road.  Otherwise, even if we pick the
right way to scan the table, we may do silly things up the line when
we go to start constructing the join order.  I think we need to beef
up ANALYZE to gather statistics on the fraction of the pages that are
marked all-visible, or maybe VACUUM should gather that information.
The trouble is that if we VACUUM and then ANALYZE, we'll often get
back a value very close to 100%, but then the real value may diminish
quite a bit before the next auto-analyze fires.  I think if we can
figure out what to do about that problem we'll be well on our way...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Overhead cost of Serializable Snapshot Isolation
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Overhead cost of Serializable Snapshot Isolation