7.4 vs 7.3 ( hash join issue )

Поиск
Список
Период
Сортировка
От Gaetano Mendola
Тема 7.4 vs 7.3 ( hash join issue )
Дата
Msg-id 4150B096.1050904@bigfoot.com
обсуждение исходный текст
Ответы Re: 7.4 vs 7.3 ( hash join issue )
Список pgsql-performance
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,
I'm having performance degradation with a view upgrading from
7.3 to 7.4, the view is a not so complex, one of his field
is the result from a function.
If I remove the function ( or I use a void function ) the 7.4
out perform the 7.3:

On 7.4 I get:

xxxxx=# explain analyze select * from v_ivan_2;
~                                                                                     QUERY PLAN
-
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~ Hash Left Join  (cost=7028.36..16780.89 rows=65613 width=288) (actual time=2059.923..9340.043 rows=79815 loops=1)
~   Hash Cond: ("outer".id_baa_loc = "inner".id_baa_loc)
~   ->  Hash Left Join  (cost=6350.62..15134.25 rows=65613 width=258) (actual time=1816.013..7245.085 rows=65609
loops=1)
~         Hash Cond: ("outer".id_localita = "inner".id_localita)
~         ->  Hash Left Join  (cost=6252.93..14786.74 rows=65613 width=247) (actual time=1777.072..6533.316 rows=65609
loops=1)
~               Hash Cond: ("outer".id_frazione = "inner".id_frazione)
~               ->  Hash Left Join  (cost=6226.61..14362.74 rows=65613 width=235) (actual time=1768.273..5837.104
rows=65609loops=1) 
~                     Hash Cond: ("outer".id_baa = "inner".id_baa)
~                     ->  Hash Left Join  (cost=5092.24..12342.65 rows=65594 width=197) (actual time=1354.059..4562.398
rows=65592loops=1) 
~                           Hash Cond: ("outer".id_pratica = "inner".id_pratica)
~                           ->  Hash Left Join  (cost=3597.52..10010.84 rows=65594 width=173) (actual
time=785.775..3278.372rows=65592 loops=1) 
~                                 Hash Cond: ("outer".id_pratica = "inner".id_pratica)
~                                 ->  Hash Join  (cost=1044.77..6605.97 rows=65594 width=149) (actual
time=274.316..2070.788rows=65592 loops=1) 
~                                       Hash Cond: ("outer".id_stato_pratica = "inner".id_stato_pratica)
~                                       ->  Hash Join  (cost=1043.72..5850.59 rows=65593 width=141) (actual
time=273.478..1421.274rows=65592 loops=1) 
~                                             Hash Cond: ("outer".id_pratica = "inner".id_pratica)
~                                             ->  Seq Scan on t_pratica p  (cost=0.00..3854.27 rows=65927 width=137)
(actualtime=7.275..533.281 rows=65927 loops=1) 
~                                             ->  Hash  (cost=1010.92..1010.92 rows=65592 width=8) (actual
time=265.615..265.615rows=0 loops=1) 
~                                                   ->  Seq Scan on t_baa_pratica bp  (cost=0.00..1010.92 rows=65592
width=8)(actual time=0.209..164.761 rows=65592 loops=1) 
~                                       ->  Hash  (cost=1.05..1.05 rows=5 width=22) (actual time=0.254..0.254 rows=0
loops=1)
~                                             ->  Seq Scan on lookup_stato_pratica s  (cost=0.00..1.05 rows=5 width=22)
(actualtime=0.190..0.210 rows=5 loops=1) 
~                                 ->  Hash  (cost=2519.82..2519.82 rows=65865 width=28) (actual time=511.104..511.104
rows=0loops=1) 
~                                       ->  Seq Scan on t_persona  (cost=0.00..2519.82 rows=65865 width=28) (actual
time=0.068..381.586rows=65864 loops=1) 
~                                             Filter: (is_rich = true)
~                           ->  Hash  (cost=1462.53..1462.53 rows=64356 width=28) (actual time=567.919..567.919 rows=0
loops=1)
~                                 ->  Index Scan using idx_t_persona_is_inte on t_persona  (cost=0.00..1462.53
rows=64356width=28) (actual time=12.953..432.697 rows=64356 loops=1) 
~                                       Index Cond: (is_inte = true)
~                     ->  Hash  (cost=1113.65..1113.65 rows=41444 width=46) (actual time=413.782..413.782 rows=0
loops=1)
~                           ->  Hash Join  (cost=4.33..1113.65 rows=41444 width=46) (actual time=2.687..333.746
rows=41444loops=1) 
~                                 Hash Cond: ("outer".id_comune = "inner".id_comune)
~                                 ->  Seq Scan on t_baa_loc bl  (cost=0.00..653.44 rows=41444 width=20) (actual
time=0.422..94.803rows=41444 loops=1) 
~                                 ->  Hash  (cost=4.22..4.22 rows=222 width=34) (actual time=1.735..1.735 rows=0
loops=1)
~                                       ->  Seq Scan on t_comune co  (cost=0.00..4.22 rows=222 width=34) (actual
time=0.521..1.277rows=222 loops=1) 
~               ->  Hash  (cost=25.59..25.59 rows=1459 width=20) (actual time=8.343..8.343 rows=0 loops=1)
~                     ->  Seq Scan on t_frazione f  (cost=0.00..25.59 rows=1459 width=20) (actual time=0.554..5.603
rows=1459loops=1) 
~         ->  Hash  (cost=94.94..94.94 rows=5494 width=19) (actual time=38.504..38.504 rows=0 loops=1)
~               ->  Seq Scan on t_localita l  (cost=0.00..94.94 rows=5494 width=19) (actual time=8.499..28.216
rows=5494loops=1) 
~   ->  Hash  (cost=660.61..660.61 rows=34261 width=38) (actual time=198.663..198.663 rows=0 loops=1)
~         ->  Seq Scan on t_affaccio af  (cost=0.00..660.61 rows=34261 width=38) (actual time=5.875..133.336 rows=34261
loops=1)
~ Total runtime: 9445.263 ms
(40 rows)


On 7.3 I get:


xxxxx=# explain analyze select * from v_ivan_2;
~                                                                                       QUERY PLAN
-
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~ Hash Join  (cost=5597.02..15593.91 rows=65610 width=354) (actual time=2169.37..13102.64 rows=79815 loops=1)
~   Hash Cond: ("outer".id_baa_loc = "inner".id_baa_loc)
~   ->  Hash Join  (cost=4919.28..13953.00 rows=65610 width=316) (actual time=1966.38..10568.69 rows=65609 loops=1)
~         Hash Cond: ("outer".id_localita = "inner".id_localita)
~         ->  Hash Join  (cost=4821.59..13596.30 rows=65610 width=297) (actual time=1934.29..9151.45 rows=65609
loops=1)
~               Hash Cond: ("outer".id_frazione = "inner".id_frazione)
~               ->  Hash Join  (cost=4795.27..13157.36 rows=65610 width=277) (actual time=1925.29..7795.71 rows=65609
loops=1)
~                     Hash Cond: ("outer".id_baa = "inner".id_baa)
~                     ->  Hash Join  (cost=3640.17..11149.38 rows=65592 width=223) (actual time=1375.66..5870.74
rows=65592loops=1) 
~                           Hash Cond: ("outer".id_pratica = "inner".id_pratica)
~                           ->  Hash Join  (cost=3597.53..10237.66 rows=65592 width=195) (actual time=835.95..4332.46
rows=65592loops=1) 
~                                 Hash Cond: ("outer".id_pratica = "inner".id_pratica)
~                                 ->  Hash Join  (cost=1044.78..6800.07 rows=65592 width=167) (actual
time=307.55..2903.04rows=65592 loops=1) 
~                                       Hash Cond: ("outer".id_pratica = "inner".id_pratica)
~                                       ->  Merge Join  (cost=1.06..4770.96 rows=65927 width=159) (actual
time=1.41..1898.12rows=65927 loops=1) 
~                                             Merge Cond: ("outer".id_stato_pratica = "inner".id_stato_pratica)
~                                             ->  Index Scan using idx_t_pratica on t_pratica p  (cost=0.00..4044.70
rows=65927width=137) (actual time=0.58..894.95 rows=65927 loops=1) 
~                                             ->  Sort  (cost=1.06..1.06 rows=5 width=22) (actual time=0.78..58.49
rows=63528loops=1) 
~                                                   Sort Key: s.id_stato_pratica
~                                                   ->  Seq Scan on lookup_stato_pratica s  (cost=0.00..1.05 rows=5
width=22)(actual time=0.11..0.13 rows=5 loops=1) 
~                                       ->  Hash  (cost=1010.92..1010.92 rows=65592 width=8) (actual
time=305.40..305.40rows=0 loops=1) 
~                                             ->  Seq Scan on t_baa_pratica bp  (cost=0.00..1010.92 rows=65592 width=8)
(actualtime=0.23..192.88 rows=65592 loops=1) 
~                                 ->  Hash  (cost=2519.82..2519.82 rows=65864 width=28) (actual time=527.88..527.88
rows=0loops=1) 
~                                       ->  Seq Scan on t_persona  (cost=0.00..2519.82 rows=65864 width=28) (actual
time=0.07..394.51rows=65864 loops=1) 
~                                             Filter: (is_rich = true)
~                           ->  Hash  (cost=10.46..10.46 rows=64356 width=28) (actual time=539.27..539.27 rows=0
loops=1)
~                                 ->  Index Scan using idx_t_persona_is_inte on t_persona  (cost=0.00..10.46 rows=64356
width=28)(actual time=0.61..403.48 rows=64356 loops=1) 
~                                       Index Cond: (is_inte = true)
~                     ->  Hash  (cost=1134.38..1134.38 rows=41444 width=54) (actual time=549.25..549.25 rows=0 loops=1)
~                           ->  Hash Join  (cost=4.33..1134.38 rows=41444 width=54) (actual time=2.19..470.20
rows=41444loops=1) 
~                                 Hash Cond: ("outer".id_comune = "inner".id_comune)
~                                 ->  Seq Scan on t_baa_loc bl  (cost=0.00..653.44 rows=41444 width=20) (actual
time=0.15..179.24rows=41444 loops=1) 
~                                 ->  Hash  (cost=4.22..4.22 rows=222 width=34) (actual time=1.55..1.55 rows=0 loops=1)
~                                       ->  Seq Scan on t_comune co  (cost=0.00..4.22 rows=222 width=34) (actual
time=0.22..1.08rows=222 loops=1) 
~               ->  Hash  (cost=25.59..25.59 rows=1459 width=20) (actual time=8.37..8.37 rows=0 loops=1)
~                     ->  Seq Scan on t_frazione f  (cost=0.00..25.59 rows=1459 width=20) (actual time=0.22..5.46
rows=1459loops=1) 
~         ->  Hash  (cost=94.94..94.94 rows=5494 width=19) (actual time=31.46..31.46 rows=0 loops=1)
~               ->  Seq Scan on t_localita l  (cost=0.00..94.94 rows=5494 width=19) (actual time=0.22..20.41 rows=5494
loops=1)
~   ->  Hash  (cost=660.61..660.61 rows=34261 width=38) (actual time=199.96..199.96 rows=0 loops=1)
~         ->  Seq Scan on t_affaccio af  (cost=0.00..660.61 rows=34261 width=38) (actual time=0.21..130.67 rows=34261
loops=1)
~ Total runtime: 13190.70 msec
(41 rows)



As you can see the 7.3 do a index scan on the table t_pratica and the 7.4 perform a sequential scan,
the plans however are very close to each other.

So I identify the performance issue on the function call, indeed:


7.4:

xxxxx=# explain analyze select sp_foo(id_pratica) from t_pratica;
~                                                    QUERY PLAN
- ------------------------------------------------------------------------------------------------------------------
~ Seq Scan on t_pratica  (cost=0.00..3887.23 rows=65927 width=4) (actual time=4.013..45240.015 rows=65927 loops=1)
~ Total runtime: 45499.123 ms
(2 rows)


7.3:

xxxxx=# explain analyze select sp_foo(id_pratica) from t_pratica;
~                                                   QUERY PLAN
- ----------------------------------------------------------------------------------------------------------------
~ Seq Scan on t_pratica  (cost=0.00..3854.27 rows=65927 width=4) (actual time=0.58..18446.99 rows=65927 loops=1)
~ Total runtime: 18534.41 msec
(2 rows)



This is the sp_foo:

CREATE FUNCTION sp_foo (integer) RETURNS text
~    AS '
DECLARE
~  a_id_pratica  ALIAS FOR $1;

~  my_parere     TEXT;
BEGIN
~  a_id_pratica  := $1;

~  SELECT INTO my_parere le.nome
~  FROM t_evento           e,
~       lookup_tipo_evento le
~  WHERE e.id_tipo_evento = le.id_tipo_evento AND
~        e.id_pratica = a_id_pratica          AND
~        e.id_tipo_evento in (5,6,7,8 )
~  ORDER by e.id_evento desc
~  LIMIT 1;

~  RETURN my_parere;
END;
'  LANGUAGE plpgsql;


Preparing a statement this is the plan used by 7.4:

xxxxx=# explain analyze execute foo_body( 5 );
~                                                                                            QUERY PLAN
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~ Limit  (cost=10.30..10.30 rows=1 width=24) (actual time=0.538..0.538 rows=0 loops=1)
~   ->  Sort  (cost=10.30..10.30 rows=1 width=24) (actual time=0.534..0.534 rows=0 loops=1)
~         Sort Key: e.id_evento
~         ->  Hash Join  (cost=9.11..10.30 rows=1 width=24) (actual time=0.512..0.512 rows=0 loops=1)
~               Hash Cond: ("outer".id_tipo_evento = "inner".id_tipo_evento)
~               ->  Seq Scan on lookup_tipo_evento le  (cost=0.00..1.16 rows=16 width=32) (actual time=0.010..0.041
rows=16loops=1) 
~               ->  Hash  (cost=9.11..9.11 rows=1 width=16) (actual time=0.144..0.144 rows=0 loops=1)
~                     ->  Index Scan using t_evento_id_pratica_key on t_evento e  (cost=0.00..9.11 rows=1 width=16)
(actualtime=0.140..0.140 rows=0 loops=1) 
~                           Index Cond: (id_pratica = $1)
~                           Filter: (((id_tipo_evento)::text = '5'::text) OR ((id_tipo_evento)::text = '6'::text) OR
((id_tipo_evento)::text= '7'::text) OR ((id_tipo_evento)::text = '8'::text)) 
~ Total runtime: 0.824 ms
(11 rows)



The table t_pratica have 65927 rows so 0.824 ms * 65927 is almost the total time execution for
each t_pratica row ~ 45000 ms


Unfortunately I can not see the plan used by the 7.3 engine due the lack of explain execute,
however I did an explain analyze on the select:

xxxxx=# explain analyze SELECT le.nome
xxxxx-# FROM t_evento e,lookup_tipo_evento le
xxxxx-# WHERE e.id_tipo_evento = le.id_tipo_evento
xxxxx-# AND e.id_pratica = 5
xxxxx-#         AND e.id_tipo_evento in (5,6,7,8 )
xxxxx-# ORDER by e.id_evento desc
xxxxx-# LIMIT 1;
~                                                                                            QUERY PLAN
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~ Limit  (cost=10.27..10.27 rows=1 width=48) (actual time=0.19..0.19 rows=0 loops=1)
~   ->  Sort  (cost=10.27..10.27 rows=1 width=48) (actual time=0.18..0.18 rows=0 loops=1)
~         Sort Key: e.id_evento
~         ->  Merge Join  (cost=10.24..10.27 rows=1 width=48) (actual time=0.09..0.09 rows=0 loops=1)
~               Merge Cond: ("outer".id_tipo_evento = "inner".id_tipo_evento)
~               ->  Sort  (cost=9.02..9.02 rows=1 width=16) (actual time=0.09..0.09 rows=0 loops=1)
~                     Sort Key: e.id_tipo_evento
~                     ->  Index Scan using t_evento_id_pratica_key on t_evento e  (cost=0.00..9.02 rows=1 width=16)
(actualtime=0.06..0.06 rows=0 loops=1) 
~                           Index Cond: (id_pratica = 5)
~                           Filter: (((id_tipo_evento)::text = '5'::text) OR ((id_tipo_evento)::text = '6'::text) OR
((id_tipo_evento)::text= '7'::text) OR ((id_tipo_evento)::text = '8'::text)) 
~               ->  Sort  (cost=1.22..1.23 rows=16 width=32) (never executed)
~                     Sort Key: le.id_tipo_evento
~                     ->  Seq Scan on lookup_tipo_evento le  (cost=0.00..1.16 rows=16 width=32) (never executed)
~ Total runtime: 0.31 msec
(14 rows)


Disabling the hashjoin on the 7.4 I got best performance that 7.3:

xxxxx=# set enable_hashjoin = off;
SET
xxxxx=# explain analyze select sp_get_ultimo_parere(id_pratica) from t_pratica;
~                                                    QUERY PLAN
- -------------------------------------------------------------------------------------------------------------------
~ Seq Scan on t_pratica  (cost=0.00..3887.23 rows=65927 width=4) (actual time=12.384..12396.136 rows=65927 loops=1)
~ Total runtime: 12485.548 ms
(2 rows)


Now my question is why the 7.4 choose the hash join ?  :-(
I can provide further details if you ask

BTW with the hash_join = off  the 7.4 choose the same 7.3  plan for this function body.

Of course both engines are running on the same machine with the same settings.


Regards
Gaetano Mendola
























-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBULCU7UpzwH2SGd4RAt2ZAKC9FjAKiljRqgaZSZa+p/7N65Cl7ACePWBV
TaR2VH1kDSBS7b+kNK4deFo=
=X+th
-----END PGP SIGNATURE-----


В списке pgsql-performance по дате отправления:

Предыдущее
От: Patrick Hatcher
Дата:
Сообщение: Re: vacuum full & max_fsm_pages question
Следующее
От: Dennis Bjorklund
Дата:
Сообщение: Re: 7.4 vs 7.3 ( hash join issue )