Обсуждение: Sequential scan being used despite indexes

Поиск
Список
Период
Сортировка

Sequential scan being used despite indexes

От
James Russell
Дата:
Hi there,

I'm running a simple query with 2 inner joins (say A, B and C). Each of the join columns has indexes. If I run queries that join just A and B, or just B and C, postgres uses indexes. But if I run "A join B join C" then the "B join C" part starts using a sequential scan and I can't figure out why.

Here's the query, which basically retrieves all meta-data for all messages in a given forum. The relationship is pretty simple. Forums contain threads, which contain messages, which each have associated meta-data:

SELECT message.message_id, message_meta_data.value
FROM thread
    JOIN message USING (thread_id)
    JOIN message_meta_data ON (message.message_id=message_meta_data.message_id)
WHERE thread.forum_id=123;

Explaining:
Hash Join  (cost=337.93..1267.54 rows=180 width=35)
Hash Cond: ("outer".message_id = "inner".message_id)
->  Seq Scan on message_meta_data  (cost=0.00..739.19 rows=37719 width=30)
->  Hash  (cost=337.79..337.79 rows=57 width=13)
    ->  Nested Loop  (cost=0.00..337.79 rows=57 width=13)
          ->  Index Scan using thread_forum_id_idx on thread (cost=0.00..41.61 rows=13 width=4)
                 Index Cond: (forum_id = 6)
          ->  Index Scan using message_thread_id_idx on message (cost=0.00..22.72 rows=5 width=17)
                 Index Cond: ("outer".thread_id = message.thread_id)

As you can see, the message and message_meta_data tables use a Seq Scan. The only way I can think of forcing it to use the Index Scan in all cases would be to use two separate nested queries: The outer query would retrieve the list of messages in the forum, and the inner query would retrieve the list of metadata for an individual message. Obviously I want to avoid having to do that if possible.

Any ideas?

Many thanks if you can help.

James

Re: Sequential scan being used despite indexes

От
"Joshua D. Drake"
Дата:
>
> Explaining:
> Hash Join  (cost=337.93..1267.54 rows=180 width=35)
> Hash Cond: ("outer".message_id = "inner".message_id)
> ->  Seq Scan on message_meta_data  (cost=0.00..739.19 rows=37719 width=30)
> ->  Hash  (cost=337.79..337.79 rows=57 width=13)
>     ->  Nested Loop  (cost=0.00..337.79 rows=57 width=13)
>           ->  Index Scan using thread_forum_id_idx on thread
> (cost=0.00..41.61 rows=13 width=4)
>                  Index Cond: (forum_id = 6)
>           ->  Index Scan using message_thread_id_idx on message
> (cost=0.00..22.72 rows=5 width=17)
>                  Index Cond: ("outer".thread_id = message.thread_id)
>
> As you can see, the message and message_meta_data tables use a Seq
> Scan. The only way I can think of forcing it to use the Index Scan in
> all cases would be to use two separate nested queries: The outer query
> would retrieve the list of messages in the forum, and the inner query
> would retrieve the list of metadata for an individual message.
> Obviously I want to avoid having to do that if possible.
>
> Any ideas?
What does explain analyze say?

Joshua D. Drake


>
> Many thanks if you can help.
>
> James


--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


Re: Sequential scan being used despite indexes

От
Michael Fuhr
Дата:
On Tue, Jan 31, 2006 at 07:29:51PM -0800, Joshua D. Drake wrote:
> > Any ideas?
>
> What does explain analyze say?

Also, have the tables been vacuumed and analyzed?

--
Michael Fuhr

Re: Sequential scan being used despite indexes

От
James Russell
Дата:
[Sorry, my last reply didn't go to the list]

Reading about this issue further in the FAQ, it seems that I should ensure that Postgres has adequate and accurate information about the tables in question by regularly running VACUUM ANALYZE, something I don't do currently.

I disabled SeqScan as per the FAQ, and it indeed was a lot slower so Postgres was making the right choice in this case.

Many thanks,

James

Re: Sequential scan being used despite indexes

От
Christopher Kings-Lynne
Дата:
> Reading about this issue further in the FAQ, it seems that I should
> ensure that Postgres has adequate and accurate information about the
> tables in question by regularly running VACUUM ANALYZE, something I
> don't do currently.

Well then you'll get rubbish performance always in PostgreSQL...

I strongly suggest you run autovacuum if you don't really understand
PostgreSQL vacuuming/analyzing.

Chris


Re: Sequential scan being used despite indexes

От
Michael Fuhr
Дата:
On Wed, Feb 01, 2006 at 01:33:08PM +0900, James Russell wrote:
> Reading about this issue further in the FAQ, it seems that I should ensure
> that Postgres has adequate and accurate information about the tables in
> question by regularly running VACUUM ANALYZE, something I don't do
> currently.

Many people use a cron job (or the equivalent) to run VACUUM ANALYZE
at regular intervals; some also use the pg_autovacuum daemon, which
is a contrib module in 8.0 and earlier and part of the backend as of
8.1.

How often to vacuum/analyze depends on usage.  Once per day is
commonly cited, but busy tables might need it more often than that.
Just recently somebody had a table that could have used vacuuming
every five minutes or less (all records were updated every 30
seconds); pg_autovacuum can be useful in such cases.

> I disabled SeqScan as per the FAQ, and it indeed was a lot slower so
> Postgres was making the right choice in this case.

The planner might be making the right choice given the statistics
it has, but it's possible that better statistics would lead to a
different plan, perhaps one where an index scan would be faster.

What happens if you run VACUUM ANALYZE on all the tables, then run
the query again with EXPLAIN ANALYZE?

--
Michael Fuhr