Re: Slow query - index not used

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Slow query - index not used
Дата
Msg-id 1425.1074703227@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Slow query - index not used  (Michael Brusser <michael@synchronicity.com>)
Список pgsql-hackers
Michael Brusser <michael@synchronicity.com> writes:
> So 'rows' values are incorrect.

You sound like you are expecting them to be exact.  They're just
estimates.  They're all plenty close enough for planning purposes,
except maybe the one for 'KnowledgeBase' is a little further off
than I would have expected.  That could just be a chance sampling
thing --- if you rerun ANALYZE and then check again, how much does that
estimate move around in a few tries?

> I mean 'rows' has value of about 5000 records from the total of 75000
> records on the table. This ratio does not seem high enough to assume
> that index scan won't be benefitial.

You're mistaken.  You didn't give any indication of how wide the rows
are in this table, but supposing for purposes of argument that they're
roughly 100 bytes apiece, there would be room for 80 rows on each 8K
page of the table.  A scan that needs to visit 1/15th of the table is
statistically certain to read nearly every page of the table, many of
them multiple times.  (This assumes the needed rows are randomly
scattered, which wouldn't be true if the table is "clustered" on the
index, but you didn't say anything about having done that.)  So an
indexscan doesn't save any I/O, and may cost I/O if some of the pages
fall out of buffer cache between visits.  Add to that the penalty for
random instead of sequential reads, and the cost to read the index
itself, and you come out way behind a seqscan.

This all assumes that I/O is far more expensive than CPU cycles.
If you have a database that's small enough to fit in RAM then the cost
of checking the WHERE clauses at each tuple could become the dominant
factor.  If that's your situation you should look at altering the
planner's cost parameters --- in particular lower random_page_cost and
increase the various CPU cost settings.

With the default cost settings, the planner will not pick an indexscan
if it thinks that more than a couple percent of the table needs to be
visited.  (I think the breakeven point is dependent on table size, but
don't have the details in my head right now.)  You can experiment with
EXPLAIN ANALYZE and the various enable_xxx settings to see if it's right
or not in your environment.

See the pgsql-performance archives for more discussion.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: Allow backend to output result sets in XML
Следующее
От: "Bort, Paul"
Дата:
Сообщение: Re: Allow backend to output result sets in XML