Re: Performance hit on large row counts

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Performance hit on large row counts
Дата
Msg-id 12356.1135632990@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Performance hit on large row counts  (David Scott <davids@apptechsys.com>)
Ответы Re: Performance hit on large row counts
Список pgsql-performance
David Scott <davids@apptechsys.com> writes:
> We are trying to ascertain if we are up against the limits of what
> postgres can accomplish without having the tables clustered.  ...

> We are aware that there is a minimum time that is required to resolve
> the index values against the table to ascertain that they are live rows,
> and we believe we are circumventing that time to some extent by taking
> advantage of the rows being in physical order with the cluster.  So does
> this lead us to the conclusion that the differences in the query times
> is how long is takes us to check on disk whether or not these rows are live?

Both of your initial examples are bitmap scans, which should be pretty
insensitive to index correlation effects --- certainly the planner
assumes so.  What I'd want to know about is why the planner is picking
different indexes for the queries.  The CLUSTER may be affecting things
in some other way, like by squeezing out dead tuples causing a
reduction in the total table and index sizes.

The join examples use plain indexscans, which *would* be affected by
correlation ... but again, why are you getting a different scan plan
for "stuff" than in the non-join case?

It's not helping you that the rowcount estimates are so far off.
I think the different plans might be explained by the noise in the
rowcount estimates.

You should try increasing the statistics targets on the columns you use
in the WHERE conditions.

I'm not at all sure I believe your premise that querying for a different
key value excludes cache effects, btw.  On modern hardware it's likely
that CLUSTER would leave the *whole* of these tables sitting in kernel
disk cache.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: David Scott
Дата:
Сообщение: Performance hit on large row counts
Следующее
От: Ivan Voras
Дата:
Сообщение: Bitmap indexes etc.