Slow inner join, but left join is fast

Поиск
Список
Период
Сортировка
От Jeremy Haile
Тема Slow inner join, but left join is fast
Дата
Msg-id 1168445842.5566.1168570133@webmail.messagingengine.com
обсуждение исходный текст
Ответы Re: Slow inner join, but left join is fast
Список pgsql-performance
I have a query made by joining two subqueries where the outer query
performing the join takes significantly longer to run than the two
subqueries.

The first subquery runs in 600ms.  The seconds subquery runs in 700ms.
But the outer query takes 240 seconds to run!  Both of the two
subqueries only return 8728 rows.

Changing the inner join to a left join makes the outer query run in
about 1000ms (which is great), but I don't understand why the inner join
is so slow!

I'm using PostgreSQL 8.2.1.  Any ideas?

QUERY PLAN (Inner Join) - takes 240 seconds
-------------------
Nested Loop  (cost=17.46..17.56 rows=1 width=120)
  Join Filter: ((a.merchant_dim_id = b.merchant_dim_id) AND
  (a.dcms_dim_id = b.dcms_dim_id))
  ->  HashAggregate  (cost=8.71..8.74 rows=1 width=16)
        ->  Index Scan using transaction_facts_transaction_date_idx on
        transaction_facts  (cost=0.00..8.69 rows=1 width=16)
              Index Cond: ((transaction_date >= '2007-01-09
              00:00:00'::timestamp without time zone) AND
              (transaction_date < '2007-01-09 09:30:00'::timestamp
              without time zone))
  ->  HashAggregate  (cost=8.75..8.78 rows=1 width=16)
        ->  HashAggregate  (cost=8.71..8.72 rows=1 width=55)
              ->  Index Scan using
              transaction_facts_transaction_date_idx on
              transaction_facts  (cost=0.00..8.69 rows=1 width=55)
                    Index Cond: ((transaction_date >= '2007-01-09
                    00:00:00'::timestamp without time zone) AND
                    (transaction_date < '2007-01-09 09:30:00'::timestamp
                    without time zone))


QUERY PLAN (Left Join) - takes one second
-------------------
Merge Left Join  (cost=304037.63..304064.11 rows=2509 width=120)
  Merge Cond: ((a.dcms_dim_id = b.dcms_dim_id) AND (a.merchant_dim_id =
  b.merchant_dim_id))
  ->  Sort  (cost=152019.45..152025.72 rows=2509 width=64)
        Sort Key: a.dcms_dim_id, a.merchant_dim_id
        ->  HashAggregate  (cost=151771.15..151852.69 rows=2509
        width=16)
              ->  Bitmap Heap Scan on transaction_facts
              (cost=5015.12..150419.90 rows=77214 width=16)
                    Recheck Cond: ((transaction_date >= '2007-01-09
                    00:00:00'::timestamp without time zone) AND
                    (transaction_date < '2007-01-09 09:30:00'::timestamp
                    without time zone))
                    ->  Bitmap Index Scan on
                    transaction_facts_transaction_date_idx
                    (cost=0.00..4995.81 rows=77214 width=0)
                          Index Cond: ((transaction_date >= '2007-01-09
                          00:00:00'::timestamp without time zone) AND
                          (transaction_date < '2007-01-09
                          09:30:00'::timestamp without time zone))
  ->  Sort  (cost=152018.18..152020.54 rows=943 width=64)
        Sort Key: b.dcms_dim_id, b.merchant_dim_id
        ->  Subquery Scan b  (cost=151931.51..151971.59 rows=943
        width=64)
              ->  HashAggregate  (cost=151931.51..151962.16 rows=943
              width=16)
                    ->  HashAggregate  (cost=151578.11..151672.35
                    rows=9424 width=55)
                          ->  Bitmap Heap Scan on transaction_facts
                          (cost=5015.12..150419.90 rows=77214 width=55)
                                Recheck Cond: ((transaction_date >=
                                '2007-01-09 00:00:00'::timestamp without
                                time zone) AND (transaction_date <
                                '2007-01-09 09:30:00'::timestamp without
                                time zone))
                                ->  Bitmap Index Scan on
                                transaction_facts_transaction_date_idx
                                (cost=0.00..4995.81 rows=77214 width=0)
                                      Index Cond: ((transaction_date >=
                                      '2007-01-09 00:00:00'::timestamp
                                      without time zone) AND
                                      (transaction_date < '2007-01-09
                                      09:30:00'::timestamp without time
                                      zone))


QUERY
-------------------
select a.merchant_dim_id, a.dcms_dim_id,
       a.num_success, a.num_failed, a.total_transactions,
       a.success_rate,
       b.distinct_num_success, b.distinct_num_failed,
       b.distinct_total_transactions, b.distinct_success_rate
from (

-- SUBQUERY 1
select merchant_dim_id,
       dcms_dim_id,
       sum(success) as num_success,
       sum(failed) as num_failed,
       count(*) as total_transactions,
       (sum(success) * 1.0 / count(*)) as success_rate
from transaction_facts
where transaction_date >= '2007-1-9'
and transaction_date < '2007-1-9 9:30'
group by merchant_dim_id, dcms_dim_id

) as a inner join (

-- SUBQUERY 2
select merchant_dim_id,
       dcms_dim_id,
       sum(success) as distinct_num_success,
       sum(failed) as distinct_num_failed,
       count(*) as distinct_total_transactions,
       (sum(success) * 1.0 / count(*)) as distinct_success_rate
from (

  select merchant_dim_id,
         dcms_dim_id,
         serial_number,
         success,
         failed
  from transaction_facts
  where transaction_date >= '2007-1-9'
  and transaction_date < '2007-1-9 9:30'
  group by merchant_dim_id, dcms_dim_id, serial_number, success, failed

  ) as distinct_summary
group by merchant_dim_id, dcms_dim_id

) as b using(merchant_dim_id, dcms_dim_id)

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Horribly slow query/ sequential scan
Следующее
От: tsuraan
Дата:
Сообщение: Re: group by will not use an index?