how to change the provoke table in hash join

Поиск
Список
Период
Сортировка
От Huang, Suya
Тема how to change the provoke table in hash join
Дата
Msg-id D83E55F5F4D99B4A9B4C4E259E6227CD01EBAAD0@AUX1EXC02.apac.experian.local
обсуждение исходный текст
Ответы Re: how to change the provoke table in hash join
Список pgsql-performance

Hi,

 

Below are two query plan for same SQL with and without an index. I noticed the Hash join order has changed since index has been created and this is not what I want. As it’s hashing the big table and to provoke records in  a small table. in Oracle, it’s simple to add hint to point the table you’d like to be used as the provoke table. However, in Postgres, I don’t know how to change the behavior.

 

--plan 1, 10 seconds were spent on sequential scan on term_weekly table.

 

dev=# explain analyze select distinct  cs_id from lookup_weekly  n inner join term_weekly s on s.b_id=n.b_id and s.date=n.date where term in ('cat'::text);

 

 

                                                                           QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

HashAggregate  (cost=2100211.06..2100211.11 rows=5 width=4) (actual time=27095.470..27095.487 rows=138 loops=1)

   ->  Hash Join  (cost=954343.95..2100211.04 rows=5 width=4) (actual time=24088.912..27095.206 rows=160 loops=1)

         Hash Cond: (((n.b_id)::text = (s.b_id)::text) AND (n.date = s.date))

         ->  Append  (cost=0.00..862153.59 rows=37828460 width=52) (actual time=0.016..10923.091 rows=37828459 loops=1)

               ->  Seq Scan on lookup_weekly n  (cost=0.00..0.00 rows=1 width=524) (actual time=0.001..0.001 rows=0 loops=1)

               ->  Seq Scan on lookup_weekly_20131130 n_1  (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.015..1229.217 rows=5158718 loops=1)

               ->  Seq Scan on lookup_weekly_20131207 n_2  (cost=0.00..117764.18 rows=5158718 width=52) (actual time=5.225..1177.539 rows=5158718 loops=1)

               ->  Seq Scan on lookup_weekly_20131214 n_3  (cost=0.00..117764.18 rows=5158718 width=52) (actual time=5.756..1274.135 rows=5158718 loops=1)

               ->  Seq Scan on lookup_weekly_20131221 n_4  (cost=0.00..117764.18 rows=5158718 width=52) (actual time=4.269..1131.570 rows=5158718 loops=1)

               ->  Seq Scan on lookup_weekly_20131228 n_5  (cost=0.00..117764.18 rows=5158718 width=52) (actual time=9.383..1110.435 rows=5158718 loops=1)

               ->  Seq Scan on lookup_weekly_20140426 n_6  (cost=0.00..91715.42 rows=4042442 width=52) (actual time=8.137..947.724 rows=4042442 loops=1)

               ->  Seq Scan on lookup_weekly_20140503 n_7  (cost=0.00..93516.49 rows=4118149 width=52) (actual time=7.717..791.339 rows=4118149 loops=1)

               ->  Seq Scan on lookup_weekly_20140329 n_8  (cost=0.00..88100.78 rows=3874278 width=52) (actual time=0.004..637.297 rows=3874278 loops=1)

         ->  Hash  (cost=954343.47..954343.47 rows=32 width=61) (actual time=10604.327..10604.327 rows=553 loops=1)

               Buckets: 1024  Batches: 1  Memory Usage: 43kB

               ->  Append  (cost=0.00..954343.47 rows=32 width=61) (actual time=10.009..10602.075 rows=553 loops=1)

                     ->  Seq Scan on term_weekly s  (cost=0.00..0.00 rows=1 width=520) (actual time=0.000..0.000 rows=0 loops=1)

                           Filter: (term = 'cat'::text)

                     ->  Seq Scan on term_weekly_20140503 s_1  (cost=0.00..262030.12 rows=8 width=46) (actual time=10.007..3738.945 rows=166 loops=1)

                           Filter: (term = 'cat'::text)

                           Rows Removed by Filter: 8516324

                     ->  Seq Scan on term_weekly_20140510 s_2  (cost=0.00..246131.35 rows=8 width=46) (actual time=52.059..2316.793 rows=152 loops=1)

                           Filter: (term = 'cat'::text)

                           Rows Removed by Filter: 8010196

                     ->  Seq Scan on term_weekly_20140517 s_3  (cost=0.00..233644.94 rows=8 width=46) (actual time=26.661..2504.273 rows=135 loops=1)

                           Filter: (term = 'cat'::text)

                           Rows Removed by Filter: 7632420

                     ->  Seq Scan on term_weekly_20140524 s_4  (cost=0.00..212537.06 rows=7 width=46) (actual time=49.773..2041.578 rows=100 loops=1)

                           Filter: (term = 'cat'::text)

                           Rows Removed by Filter: 6950865

Total runtime: 27095.639 ms

(31 rows)

 

--plan 2, only 1 second spent on index scan of term_weekly table, however, as it selects the big table to do the hashing, it takes 22 seconds for the hash to complete. The advantage get from index has been totally lost because of this join order.

 

                                                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HashAggregate  (cost=1429795.17..1429795.22 rows=5 width=4) (actual time=22991.289..22991.307 rows=138 loops=1)

   ->  Hash Join  (cost=1429580.49..1429795.15 rows=5 width=4) (actual time=22963.340..22991.214 rows=160 loops=1)

         Hash Cond: (((s.b_id)::text = (n.b_id)::text) AND (s.date = n.date))

         ->  Append  (cost=0.00..142.77 rows=32 width=61) (actual time=0.052..1.125 rows=553 loops=1)

               ->  Seq Scan on term_weekly s  (cost=0.00..0.00 rows=1 width=520) (actual time=0.001..0.001 rows=0 loops=1)

                     Filter: (term = 'cat'::text)

               ->  Index Scan using idx_term_weekly_20140503_3 on term_weekly_20140503 s_1  (cost=0.56..36.70 rows=8 width=46) (actual time=0.051..0.353 rows=166 loops=1)

                     Index Cond: (term = 'cat'::text)

               ->  Index Scan using idx_term_weekly_20140510_3 on term_weekly_20140510 s_2  (cost=0.56..36.70 rows=8 width=46) (actual time=0.043..0.293 rows=152 loops=1)

                     Index Cond: (term = 'cat'::text)

               ->  Index Scan using idx_term_weekly_20140517_3 on term_weekly_20140517 s_3  (cost=0.56..36.70 rows=8 width=46) (actual time=0.029..0.244 rows=135 loops=1)

                     Index Cond: (term = 'cat'::text)

               ->  Index Scan using idx_term_weekly_20140524_3 on term_weekly_20140524 s_4  (cost=0.56..32.68 rows=7 width=46) (actual time=0.024..0.192 rows=100 loops=1)

                     Index Cond: (term = 'cat'::text)

         ->  Hash  (cost=862153.59..862153.59 rows=37828460 width=52) (actual time=22939.457..22939.457 rows=37828459 loops=1)

               Buckets: 4194304  Batches: 1  Memory Usage: 3144960kB

               ->  Append  (cost=0.00..862153.59 rows=37828460 width=52) (actual time=0.010..9100.690 rows=37828459 loops=1)

                     ->  Seq Scan on lookup_weekly n  (cost=0.00..0.00 rows=1 width=524) (actual time=0.001..0.001 rows=0 loops=1)

                     ->  Seq Scan on lookup_weekly_20131130 n_1  (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.008..1099.194 rows=5158718 loops=1)

                     ->  Seq Scan on lookup_weekly_20131207 n_2  (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.004..861.678 rows=5158718 loops=1)

                     ->  Seq Scan on lookup_weekly_20131214 n_3  (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.004..860.374 rows=5158718 loops=1)

                     ->  Seq Scan on lookup_weekly_20131221 n_4  (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.003..852.169 rows=5158718 loops=1)

                     ->  Seq Scan on lookup_weekly_20131228 n_5  (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.005..835.201 rows=5158718 loops=1)

                     ->  Seq Scan on lookup_weekly_20140426 n_6  (cost=0.00..91715.42 rows=4042442 width=52) (actual time=0.005..663.261 rows=4042442 loops=1)

                     ->  Seq Scan on lookup_weekly_20140503 n_7  (cost=0.00..93516.49 rows=4118149 width=52) (actual time=0.006..678.281 rows=4118149 loops=1)

                     ->  Seq Scan on lookup_weekly_20140329 n_8  (cost=0.00..88100.78 rows=3874278 width=52) (actual time=0.003..635.296 rows=3874278 loops=1)

Total runtime: 22995.361 ms

(27 rows)

 

Thanks,

Suya

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Huang, Suya"
Дата:
Сообщение: Re: query performance with hstore vs. non-hstore
Следующее
От: Matheus de Oliveira
Дата:
Сообщение: Re: how to change the provoke table in hash join