How to get the optimizer to use an index with multiple fields

Поиск
Список
Период
Сортировка
От Jim C. Nasby
Тема How to get the optimizer to use an index with multiple fields
Дата
Msg-id 20030428132143.J66185@flake.decibel.org
обсуждение исходный текст
Ответы Re: How to get the optimizer to use an index with multiple fields  ("Jim C. Nasby" <jim@nasby.net>)
Re: How to get the optimizer to use an index with multiple  ("scott.marlowe" <scott.marlowe@ihs.com>)
Список pgsql-performance
select id into temp NewRetires from stats_participant where retire_to>=1
AND retire_date = (SELECT last_date FROM Project_statsrun WHERE
project_id = :ProjectID);

results in a table with 5 values...

explain analyze delete from email_rank where project_id=25 and id in
(select id from NewRetires);

 Index Scan using email_rank__day_rank on email_rank
(cost=0.00..9003741627715.16 rows=45019 width=6) (actual time=408.12..9688.37 rows=3 loops=1)
   Index Cond: (project_id = 25)
   Filter: (subplan)
   SubPlan
     ->  Seq Scan on newretires  (cost=100000000.00..100000020.00 rows=1000 width=4) (actual time=0.01..0.05 rows=5
loops=91834)
 Total runtime: 9689.86 msec

But, there's already an index that would fit the bill here perfectly:

              Table "public.email_rank"
        Column         |  Type   |     Modifiers
-----------------------+---------+--------------------
 project_id            | integer | not null
 id                    | integer | not null
 first_date            | date    | not null
 last_date             | date    | not null
 day_rank              | integer | not null default 0
 day_rank_previous     | integer | not null default 0
 overall_rank          | integer | not null default 0
 overall_rank_previous | integer | not null default 0
 work_today            | bigint  | not null default 0
 work_total            | bigint  | not null default 0
Indexes: email_rank_pkey primary key btree (project_id, id),
         email_rank__day_rank btree (project_id, day_rank),
         email_rank__overall_rank btree (project_id, overall_rank)

Why isn't it using email_rank_pkey instead of using day_rank then a
filter? The original query on sybase (see below) is essentially instant,
because it's using the index of (project_id, id), so it doesn't have to
read the whole table.

stats=> select project_id,count(*) from email_rank group by project_id;
 project_id | count
------------+--------
          5 | 327856
          8 |  28304
         24 |  34622
         25 |  91834
        205 | 331464

Also, changing the WHERE IN to a WHERE EXISTS in the delete is
substantially faster in this case (3.5 seconds as opposed to 9); it
would be nice if the optimizer could rewrite the query on-the-fly. I
started looking into this in the first place because the original query
was taking 6-10 seconds, which seemed too long...

Original query:
DELETE FROM Email_Rank
    WHERE project_id = :ProjectID
        AND id IN  (SELECT id
                        FROM STATS_Participant sp
                        WHERE retire_to >= 1
                            AND retire_date = (SELECT last_date FROM Project_statsrun WHERE project_id = :ProjectID)
                    )
;

I tried changing this to an EXISTS and it takes over a minute. So in
this case, the range of runtimes is ~4 seconds (building the temp table
takes ~0.25 seconds) to over a minute.
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


В списке pgsql-performance по дате отправления:

Предыдущее
От: Andrew Sullivan
Дата:
Сообщение: Re: Diferent execution plan for similar query
Следующее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: How to get the optimizer to use an index with multiple fields