Re: Index not being used in sorting of simple table

Поиск
Список
Период
Сортировка
От Paul Smith
Тема Re: Index not being used in sorting of simple table
Дата
Msg-id VPOP32.5.0.20070504171050.732.408d.1.f0e0d0d3@lmail.pscs.co.uk
обсуждение исходный текст
Ответ на Re: Index not being used in sorting of simple table  (Heikki Linnakangas <heikki@enterprisedb.com>)
Список pgsql-performance
At 16:26 04/05/2007, you wrote:
>Paul Smith wrote:
>>Why doesn't it use the other index? If use 'set enable_seqscan=0'
>>then it does.
>
>Just a guess, but is the table clustered on column a? Maybe not
>explicitly, but was it loaded from data that was sorted by a?

I wouldn't have thought so - a is pretty 'random' as far as order of
insertion goes. On the other hand 'b' (the one whose index doesn't
get used) is probably pretty correlated - 'b' is the date when the
entry was added to the table, so they would be added in order of 'b'
(they also get deleted after a while, and I'm not sure how PGSQL
re-uses deleted rows that have been vacuumed)

>Analyzer calculates the correlation between physical order and each
>column. The planner will favor index scans instead of sorting when
>the correlation is strong, and it thinks the data doesn't fit in
>memory. Otherwise an explicitly sort will result in less I/O and be
>therefore more favorable.

Ah, I see.

>You can check the correlation stats with:
>SELECT tablename, attname, correlation FROM pg_stats where tablename='x';

There I get
  x   | a  |     0.977819
  x   | b  |     0.78292

This is a bit odd, because I'd have thought they'd be more correlated
on 'b' than 'a'..

>>I tried using EXPLAIN ANALYZE to see how long it actually took:
>>- seq scan - 75 secs
>>- index scan - 13 secs
>>- seq scan - 77 secs
>
>>(I tried the seq scan version after the index scan as well to see
>>if disk caching was a factor, but it doesn't look like it)
>
>That won't flush the heap pages from cache...

No, I know, but it would mean that if the pages were being loaded
into disk cache by the first scan which would make the second scan
quicker, it would probably make the third one quicker as well.

>How much memory do you have and how large is the table?

The table is about 300MB. I have 2GB RAM on my PC (but most of it is
in use - the disk cache size is currently 600MB).

>I suspect that the planner thinks it doesn't fit in memory, and
>therefore favors the seqscan+sort plan which would require less random I/O,
>but in reality it's in cache and the index scan is faster because it
>doesn't need to sort. Have you set your effective_cache_size properly?

I haven't set that at all - it's the default..

If I set this to 51200 (I think that means 400MB) then it does use
the index scan method, so thanks for this bit of info.


Paul                            VPOP3 - Internet Email Server/Gateway
support@pscs.co.uk                      http://www.pscs.co.uk/



В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Index not being used in sorting of simple table
Следующее
От: Sebastian Hennebrueder
Дата:
Сообщение: Re: Feature Request --- was: PostgreSQL Performance Tuning