Re: Hash Join performance

Поиск
Список
Период
Сортировка
От Vamsidhar Thummala
Тема Re: Hash Join performance
Дата
Msg-id e0e3da5e0903131511q664ec424lfbf5b6c36da84598@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hash Join performance  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Hash Join performance  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Thanks for such quick response.

On Fri, Mar 13, 2009 at 5:34 PM, Tom Lane wrote:
> 2) Why is the Hash Join (top most) so slow?

Doesn't look that bad to me.  The net time charged to the HashJoin node
is 186107.210 - 53597.555 - 112439.592 = 20070.063 msec.  In addition it
would be reasonable to count the hashtable build time, which evidently
is 112439.592 - 111855.510 = 584.082 msec.  So the hashtable build took
about 10 msec/row, in addition to the data fetching; and then the actual
join spent about 3 microsec per outer row, again exclusive of obtaining
those rows.  The table build seems a bit slow, maybe, but I don't see a
problem with the join speed.

I am wondering why are we subtracting the entire Seq Scan time of Lineitem from the total time to calculate the HashJoin time. 
Does the Hash probing start as soon as the first record of Lineitem is available, i.e., after 112439.592ms?

Here is another plan I have for the same TPC-H 18 query with different configuration parameters (shared_buffers set to 400MB, just for experimental purposes) and HashJoin seems to take longer time (at least 155.58s based on above calculation):

GroupAggregate  (cost=905532.09..912092.04 rows=119707 width=57) (actual time=392705.160..392705.853 rows=57 loops=1)
-> Sort (cost=905532.09..906082.74 rows=119707 width=57) (actual time=392705.116..392705.220 rows=399 loops=1)
Sort Key: orders.o_totalprice, orders.o_orderdate, customer.c_name, customer.c_custkey, orders.o_orderkey
-> Hash Join (cost=507550.05..877523.36 rows=119707 width=57) (actual time=72616.327..392703.675 rows=399 loops=1)
Hash Cond: (public.lineitem.l_orderkey = orders.o_orderkey)
-> Seq Scan on lineitem (cost=0.00..261655.05 rows=6000947 width=14) (actual time=0.027..178712.709 rows=6001215 loops=1)
-> Hash (cost=506580.84..506580.84 rows=29921 width=51) (actual time=58421.050..58421.050 rows=57 loops=1)
-> Hash Join (cost=416568.25..506580.84 rows=29921 width=51) (actual time=25208.925..58419.502 rows=57 loops=1)
Hash Cond: (orders.o_custkey = customer.c_custkey)
-> Merge IN Join (cost=405349.14..493081.88 rows=29921 width=29) (actual time=37.244..57646.024 rows=57 loops=1)
Merge Cond: (orders.o_orderkey = "IN_subquery".l_orderkey)
-> Index Scan using orders_pkey on orders (cost=0.00..79501.17 rows=1499952 width=25) (actual time=0.100..5379.828 rows=1496151 loops=1)
-> Materialize (cost=405349.14..406004.72 rows=29921 width=4) (actual time=34.825..51619.816 rows=57 loops=1)
-> GroupAggregate (cost=0.00..404639.71 rows=29921 width=14) (actual time=34.818..51619.488 rows=57 loops=1)
Filter: (sum(l_quantity) > 300::numeric)
-> Index Scan using fkey_lineitem_1 on lineitem (cost=0.00..348617.14 rows=6000947 width=14) (actual time=0.079..44140.117 rows=6001215 loops=1)
-> Hash (cost=6803.60..6803.60 rows=149978 width=26) (actual time=640.980..640.980 rows=150000 loops=1)
-> Seq Scan on customer (cost=0.00..6803.60 rows=149978 width=26) (actual time=0.021..510.993 rows=150000 loops=1)

I re-ran the query multiple times to verify the accuracy of results.

Regards,
~Vamsi

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Hash Join performance
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Hash Join performance