DISTINCT vs EXISTS performance

Поиск
Список
Период
Сортировка
От Joseph Shraibman
Тема DISTINCT vs EXISTS performance
Дата
Msg-id 3F0B7A87.3000000@selectacast.net
обсуждение исходный текст
Список pgsql-general
I have a query where I want to select the usertable records that have a matching entry in
an event table.  There are two ways to do this.

1) SELECT COUNT(DISTINCT u.uid) FROM usertable u, eventlog e WHERE u.uid = e.uid AND
e.type = XX;
2) SELECT COUNT(u.uid) FROM usertable u WHERE EXISTS(SELECT 1 FROM eventlog e WHERE u.uid
= e.uid AND e.type = XX);

In a real life query 1 took 46284.58 msec and 45856.66 msec for first and second runs and
query 2 took 38736.77 msec and 32833.08 msec.

Here are the explain analyse outputs:

                                                                               QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=180116.76..180116.76 rows=1 width=20) (actual time=46267.93..46267.93
rows=1 loops=1)
    ->  Nested Loop  (cost=195.13..180116.75 rows=1 width=20) (actual
time=46094.32..46265.81 rows=8 loops=1)
          ->  Nested Loop  (cost=195.13..180112.96 rows=1 width=16) (actual
time=46064.86..46189.25 rows=8 loops=1)
                ->  Hash Join  (cost=195.13..179856.40 rows=43 width=12) (actual
time=46054.33..46153.15 rows=8 loops=1)
                      Hash Cond: ("outer".typeid = "inner".id)
                      ->  Seq Scan on eventlog  (cost=0.00..174675.16 rows=664742 width=8)
(actual time=357.04..45349.36 rows=580655 loops=1)
                            Filter: ("type" = 4)
                      ->  Hash  (cost=194.97..194.97 rows=65 width=4) (actual
time=21.83..21.83 rows=0 loops=1)
                            ->  Index Scan using clickthru_jid_and_id_key on clickthru
(cost=0.00..194.97 rows=65 width=4) (actual time=21.47..21.71 rows=63 loops=1)
                                  Index Cond: (jobid = 7899)
                ->  Index Scan using usertable_pkey on usertable u  (cost=0.00..5.91
rows=1 width=4) (actual time=4.50..4.50 rows=1 loops=8)
                      Index Cond: (("outer".uid = u.userkey) AND (u.podkey = 259))
          ->  Index Scan using d_pkey on d  (cost=0.00..3.78 rows=1 width=4) (actual
time=9.56..9.56 rows=1 loops=8)
                Index Cond: ("outer".uid = d.ukey)
  Total runtime: 46284.58 msec
45856.66 msec
(15 rows)



                                                                                    QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=4325054.14..4325054.14 rows=1 width=8) (actual time=38736.62..38736.62
rows=1 loops=1)
    ->  Nested Loop  (cost=0.00..4325052.01 rows=852 width=8) (actual
time=12451.09..38736.58 rows=6 loops=1)
          ->  Index Scan using usertable_podkey_key on usertable u  (cost=0.00..4321822.44
rows=852 width=4) (actual time=12450.95..38735.85 rows=6 loops=1)
                Index Cond: (pkey = 259)
                Filter: (subplan)
                SubPlan
                  ->  Nested Loop  (cost=0.00..2532.25 rows=1 width=8) (actual
time=49.59..49.59 rows=0 loops=752)
                        ->  Index Scan using eventlog_uid_and_jid_and_type_key on eventlog
  (cost=0.00..2343.37 rows=62 width=4) (actual time=29.64..48.91 rows=4 loops=752)
                              Index Cond: (uid = $0)
                              Filter: ("type" = 4)
                        ->  Index Scan using clickthru_pkey on clickthru  (cost=0.00..3.02
rows=1 width=4) (actual time=0.18..0.18 rows=0 loops=2725)
                              Index Cond: ("outer".typeid = clickthru.id)
                              Filter: (jobid = 7899)
          ->  Index Scan using directory_pkey on d  (cost=0.00..3.78 rows=1 width=4)
(actual time=0.11..0.11 rows=1 loops=6)
                Index Cond: (d.ukey = "outer".userkey)
  Total runtime: 38736.77 msec
32833.08 msec
(16 rows)



... so the questions are:

Why are the plans so different?
Why does the planner think query 2 will take so long when it ends up being faster than
query 1?
Is there anything I can do to speed up the query?

version is: 7.3.3

BTW does this belong on the performance list or is that list about tuning the database
config paramaters only?


В списке pgsql-general по дате отправления:

Предыдущее
От: Timothy Brier
Дата:
Сообщение: Restoring a postgres database
Следующее
От: Stephan Szabo
Дата:
Сообщение: Re: Statistics on a table