Planner doing seqscan before indexed join

От: Dan Harris
Тема: Planner doing seqscan before indexed join
Дата: ,
(см: обсуждение, исходный текст)
Ответы: Re: Planner doing seqscan before indexed join  ("Dave Dutcher")
Re: Planner doing seqscan before indexed join  ("Marc Mamin")
Список: pgsql-performance

8.0.3 - Linux 2.6.18..
Freshly vacuumed and analyzed

This database has been humming along fine for a while now, but I've got one of those sticky queries that is taking
much too long to finish.

After some digging, I've found that the planner is choosing to apply a necessary seq scan to the table.  Unfortunately,
it's scanning the whole table, when it seems that it could have joined it to a smaller table first and reduce the
amount of rows it would have to scan dramatically ( 70 million to about 5,000 ).

The table "eventactivity" has about 70million rows in it, index on "incidentid"
The table "keyword_incidents" is a temporary table and has incidentid as its primary key.  It contains
5125 rows. eventmain and eventgeo both have 2.1 million. My hope is that I can convince the planner to do the
  join to keyword_incidents *first* and then do the seq scan for the LIKE condition.  Instead, it seems that it's
whole 70 million rows first and then doing the join, which takes a lot longer than I'd like to wait for it.  Or, maybe
misreading the explain output?

Thanks again

Here's the query:

explain analyze







   eventmain.incidentid = keyword_incidents.incidentid and

   eventgeo.incidentid = keyword_incidents.incidentid and

   eventactivity.incidentid = keyword_incidents.incidentid

   and (  recordtext like '%JOSE CHAVEZ%'   )
order by eventmain.entrydate limit 10000;

  Limit  (cost=2388918.07..2388918.08 rows=1 width=455) (actual time=81771.186..81771.292 rows=26 loops=1)
    ->  Sort  (cost=2388918.07..2388918.08 rows=1 width=455) (actual time=81771.180..81771.215 rows=26 loops=1)
          Sort Key: eventmain.entrydate
          ->  Nested Loop  (cost=0.00..2388918.06 rows=1 width=455) (actual time=357.389..81770.982 rows=26 loops=1)
                ->  Nested Loop  (cost=0.00..2388913.27 rows=1 width=230) (actual time=357.292..81767.385 rows=26
                      ->  Nested Loop  (cost=0.00..2388909.33 rows=1 width=122) (actual time=357.226..81764.501 rows=26
                            ->  Seq Scan on eventactivity  (cost=0.00..2388874.46 rows=7 width=84) (actual
rows=27 loops=1)
                                  Filter: ((recordtext)::text ~~ '%JOSE CHAVEZ%'::text)
                            ->  Index Scan using keyword_incidentid_pkey on keyword_incidents  (cost=0.00..4.97 rows=1
(actual time=0.034..0.036 rows=1 loops=27)
                                  Index Cond: (("outer".incidentid)::text = (keyword_incidents.incidentid)::text)
                      ->  Index Scan using eventgeo_incidentid_idx on eventgeo  (cost=0.00..3.93 rows=1 width=108)
time=0.076..0.081 rows=1 loops=26)
                            Index Cond: (("outer".incidentid)::text = (eventgeo.incidentid)::text)
                ->  Index Scan using eventmain_incidentid_idx on eventmain  (cost=0.00..4.78 rows=1 width=225) (actual
time=0.069..0.075 rows=1 loops=26)
                      Index Cond: (("outer".incidentid)::text = (eventmain.incidentid)::text)
  Total runtime: 81771.529 ms
(15 rows)

В списке pgsql-performance по дате сообщения:

От: "Matt Smiley"
Сообщение: Re: Sunfire X4500 recommendations
От: Tom Lane
Сообщение: Re: Shared buffers, db transactions commited, and write IO on Solaris