same query different execution plan (hash join vs. semi-hash join)

Поиск
Список
Период
Сортировка
От Huang, Suya
Тема same query different execution plan (hash join vs. semi-hash join)
Дата
Msg-id D83E55F5F4D99B4A9B4C4E259E6227CD01498C88@AUX1EXC01.apac.experian.local
обсуждение исходный текст
Ответы Re: same query different execution plan (hash join vs. semi-hash join)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance

Hi buddies,

 

I’ve got a query as below, it runs several times with different execution plan and totally different execution time. The one using hash-join is slow and the one using semi-hash join is very fast. However, I have no control over the optimizer behavior of PostgreSQL database. Or, do I have?

 

The database version is 9.3.4

 

SELECT dem_type,

       dem_value,

       Count(*)

FROM   demo_weekly a

WHERE  date = '2013-11-30'

AND    userid IN ( select userid from test1)

       AND dem_type IN ( 'Gender', 'Age', 'Hobbies' )

GROUP  BY dem_type,

          dem_value ;

 

 

                                                                                    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HashAggregate  (cost=322386.94..322786.94 rows=40000 width=29) (actual time=3142.849..3142.927 rows=19 loops=1)

   ->  Hash Semi Join  (cost=14460.06..314403.08 rows=1064514 width=29) (actual time=803.671..2786.979 rows=1199961 loops=1)

         Hash Cond: ((a.userid)::text = (test1.userid)::text)

         ->  Append  (cost=0.00..277721.30 rows=2129027 width=78) (actual time=536.829..1691.270 rows=2102611 loops=1)

               ->  Seq Scan on demo_weekly a  (cost=0.00..0.00 rows=1 width=808) (actual time=0.002..0.002 rows=0 loops=1)

                     Filter: ((date = '2013-11-30'::date) AND ((dem_type)::text = ANY ('{Gender,Age,"Hobbies"}'::text[])))

               ->  Bitmap Heap Scan on demo_weekly_20131130 a_1  (cost=50045.63..277721.30 rows=2129026 width=78) (actual time=536.826..1552.203 rows=2102611 loops=1)

                     Recheck Cond: ((dem_type)::text = ANY ('{Gender,Age,"Hobbies"}'::text[]))

                     Filter: (date = '2013-11-30'::date)

                     ->  Bitmap Index Scan on demo_weekly_20131130_dt_idx  (cost=0.00..49513.37 rows=2129026 width=0) (actual time=467.453..467.453 rows=2102611 loops=1)

                           Index Cond: ((dem_type)::text = ANY ('{Gender,Age,"Hobbies"}'::text[]))

         ->  Hash  (cost=8938.36..8938.36 rows=441736 width=50) (actual time=266.501..266.501 rows=441736 loops=1)

               Buckets: 65536  Batches: 1  Memory Usage: 35541kB

               ->  Seq Scan on test1  (cost=0.00..8938.36 rows=441736 width=50) (actual time=0.023..87.869 rows=441736 loops=1)

Total runtime: 3149.004 ms

(15 rows)

 

                                                                                   QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HashAggregate  (cost=318351.90..318751.90 rows=40000 width=29) (actual time=23668.646..23668.723 rows=19 loops=1)

   ->  Hash Join  (cost=5316.68..310497.81 rows=1047212 width=29) (actual time=1059.182..23218.864 rows=1199961 loops=1)

         Hash Cond: ((a.userid)::text = (test1.userid)::text)

         ->  Append  (cost=0.00..276382.82 rows=2094423 width=78) (actual time=528.116..2002.462 rows=2102611 loops=1)

               ->  Seq Scan on demo_weekly a  (cost=0.00..0.00 rows=1 width=808) (actual time=0.001..0.001 rows=0 loops=1)

                     Filter: ((date = '2013-11-30'::date) AND ((dem_type)::text = ANY ('{Gender,Age,"Hobbies"}'::text[])))

               ->  Bitmap Heap Scan on demo_weekly_20131130 a_1  (cost=49269.46..276382.82 rows=2094422 width=78) (actual time=528.114..1825.265 rows=2102611 loops=1)

                     Recheck Cond: ((dem_type)::text = ANY ('{Gender,Age,"Hobbies"}'::text[]))

                     Filter: (date = '2013-11-30'::date)

                     ->  Bitmap Index Scan on demo_weekly_20131130_dt_idx  (cost=0.00..48745.85 rows=2094422 width=0) (actual time=458.694..458.694 rows=2102611 loops=1)

                           Index Cond: ((dem_type)::text = ANY ('{Gender,Age,"Hobbies"}'::text[]))

         ->  Hash  (cost=5314.18..5314.18 rows=200 width=516) (actual time=530.930..530.930 rows=441736 loops=1)

               Buckets: 1024  Batches: 1  Memory Usage: 35541kB

               ->  HashAggregate  (cost=5312.18..5314.18 rows=200 width=516) (actual time=298.301..411.734 rows=441736 loops=1)

                     ->  Seq Scan on test1  (cost=0.00..5153.94 rows=63294 width=516) (actual time=0.068..91.378 rows=441736 loops=1)

Total runtime: 23679.096 ms

(16 rows)

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Stats collector constant I/O
Следующее
От: Tom Lane
Дата:
Сообщение: Re: same query different execution plan (hash join vs. semi-hash join)