Обсуждение: bad performances using hashjoin

Поиск
Список
Период
Сортировка

bad performances using hashjoin

От
Gaetano Mendola
Дата:
Hi all,
I'm stuck in a select that use the hash join where should not:
6 seconds vs 0.3 ms !!

If you need other info in order to improve the planner,
let me know.

Regards
Gaetano Mendola




empdb=# explain analyze SELECT id_sat_request
empdb-#    FROM sat_request sr,
empdb-#         v_sc_packages vs
empdb-#    WHERE    ----- JOIN ----
empdb-#          sr.id_package = vs.id_package AND
empdb-#             ---------------
empdb-#          id_user = 29416 AND
empdb-#          id_url  =  329268  AND
empdb-#          vs.estimated_start > now() AND
empdb-#          id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled');
                                                                            QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=272.95..276.61 rows=1 width=4) (actual time=6323.107..6323.107 rows=0 loops=1)
   Hash Cond: ("outer".id_package = "inner".id_package)
   ->  Subquery Scan vs  (cost=269.91..272.10 rows=292 width=4) (actual time=6316.534..6317.398 rows=407 loops=1)
         ->  Sort  (cost=269.91..270.64 rows=292 width=263) (actual time=6316.526..6316.789 rows=407 loops=1)
               Sort Key: vs.estimated_start
               ->  Hash Join  (cost=227.58..257.95 rows=292 width=263) (actual time=6302.480..6313.982 rows=407
loops=1)
                     Hash Cond: ("outer".id_package = "inner".id_package)
                     ->  Subquery Scan vpk  (cost=141.82..150.04 rows=1097 width=218) (actual time=6106.020..6113.038
rows=1104loops=1) 
                           ->  Sort  (cost=141.82..144.56 rows=1097 width=162) (actual time=6106.006..6106.745
rows=1104loops=1) 
                                 Sort Key: p.id_publisher, p.name
                                 ->  Hash Left Join  (cost=15.54..86.42 rows=1097 width=162) (actual
time=2.978..6087.608rows=1104 loops=1) 
                                       Hash Cond: ("outer".id_package = "inner".id_package)
                                       ->  Seq Scan on packages p  (cost=0.00..53.48 rows=1097 width=146) (actual
time=0.011..7.978rows=1104 loops=1) 
                                       ->  Hash  (cost=13.69..13.69 rows=738 width=20) (actual time=2.061..2.061 rows=0
loops=1)
                                             ->  Seq Scan on package_security ps  (cost=0.00..13.69 rows=738 width=20)
(actualtime=0.027..1.289 rows=747 loops=1) 
                     ->  Hash  (cost=85.63..85.63 rows=54 width=49) (actual time=196.022..196.022 rows=0 loops=1)
                           ->  Merge Join  (cost=82.83..85.63 rows=54 width=49) (actual time=192.898..195.565 rows=407
loops=1)
                                 Merge Cond: ("outer".id_program = "inner".id_program)
                                 ->  Subquery Scan vs  (cost=72.27..73.97 rows=226 width=16) (actual time=6.867..7.872
rows=407loops=1) 
                                       ->  Sort  (cost=72.27..72.84 rows=226 width=20) (actual time=6.851..7.114
rows=407loops=1) 
                                             Sort Key: sequences.id_program, sequences.internal_position
                                             ->  Seq Scan on sequences  (cost=0.00..63.44 rows=226 width=20) (actual
time=0.295..3.370rows=407 loops=1) 
                                                   Filter: ((estimated_start IS NOT NULL) AND
(date_trunc('seconds'::text,estimated_start) > now())) 
                                 ->  Sort  (cost=10.56..10.68 rows=47 width=37) (actual time=186.013..186.296 rows=439
loops=1)
                                       Sort Key: vpr.id_program
                                       ->  Subquery Scan vpr  (cost=8.90..9.25 rows=47 width=37) (actual
time=185.812..185.928rows=48 loops=1) 
                                             ->  Sort  (cost=8.90..9.02 rows=47 width=61) (actual time=185.806..185.839
rows=48loops=1) 
                                                   Sort Key: programs.id_publisher, programs.id_program
                                                   ->  Seq Scan on programs  (cost=0.00..7.60 rows=47 width=61) (actual
time=9.592..185.634rows=48 loops=1) 
                                                         Filter: (id_program <> 0)
   ->  Hash  (cost=3.04..3.04 rows=1 width=8) (actual time=4.862..4.862 rows=0 loops=1)
         ->  Index Scan using idx_id_url_sat_request on sat_request sr  (cost=0.00..3.04 rows=1 width=8) (actual
time=4.851..4.851rows=0 loops=1) 
               Index Cond: (id_url = 329268)
               Filter: ((id_user = 29416) AND (id_sat_request_status = 1))
 Total runtime: 6324.435 ms
(35 rows)

empdb=# set enable_hashjoin = false;
SET
empdb=# explain analyze SELECT id_sat_request
empdb-#    FROM sat_request sr,
empdb-#         v_sc_packages vs
empdb-#    WHERE    ----- JOIN ----
empdb-#          sr.id_package = vs.id_package AND
empdb-#             ---------------
empdb-#          id_user = 29416 AND
empdb-#          id_url  =  329268  AND
empdb-#          vs.estimated_start > now() AND
empdb-#          id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled');
                                                                                   QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=393.41..400.83 rows=1 width=4) (actual time=0.080..0.080 rows=0 loops=1)
   Join Filter: ("outer".id_package = "inner".id_package)
   ->  Index Scan using idx_id_url_sat_request on sat_request sr  (cost=0.00..3.04 rows=1 width=8) (actual
time=0.078..0.078rows=0 loops=1) 
         Index Cond: (id_url = 329268)
         Filter: ((id_user = 29416) AND (id_sat_request_status = 1))
   ->  Subquery Scan vs  (cost=393.41..395.60 rows=292 width=4) (never executed)
         ->  Sort  (cost=393.41..394.14 rows=292 width=263) (never executed)
               Sort Key: vs.estimated_start
               ->  Merge Join  (cost=372.76..381.46 rows=292 width=263) (never executed)
                     Merge Cond: ("outer".id_package = "inner".id_package)
                     ->  Sort  (cost=87.19..87.32 rows=54 width=49) (never executed)
                           Sort Key: vs.id_package
                           ->  Merge Join  (cost=82.83..85.63 rows=54 width=49) (never executed)
                                 Merge Cond: ("outer".id_program = "inner".id_program)
                                 ->  Subquery Scan vs  (cost=72.27..73.97 rows=226 width=16) (never executed)
                                       ->  Sort  (cost=72.27..72.84 rows=226 width=20) (never executed)
                                             Sort Key: sequences.id_program, sequences.internal_position
                                             ->  Seq Scan on sequences  (cost=0.00..63.44 rows=226 width=20) (never
executed)
                                                   Filter: ((estimated_start IS NOT NULL) AND
(date_trunc('seconds'::text,estimated_start) > now())) 
                                 ->  Sort  (cost=10.56..10.68 rows=47 width=37) (never executed)
                                       Sort Key: vpr.id_program
                                       ->  Subquery Scan vpr  (cost=8.90..9.25 rows=47 width=37) (never executed)
                                             ->  Sort  (cost=8.90..9.02 rows=47 width=61) (never executed)
                                                   Sort Key: programs.id_publisher, programs.id_program
                                                   ->  Seq Scan on programs  (cost=0.00..7.60 rows=47 width=61) (never
executed)
                                                         Filter: (id_program <> 0)
                     ->  Sort  (cost=285.57..288.31 rows=1097 width=218) (never executed)
                           Sort Key: vpk.id_package
                           ->  Subquery Scan vpk  (cost=221.95..230.17 rows=1097 width=218) (never executed)
                                 ->  Sort  (cost=221.95..224.69 rows=1097 width=162) (never executed)
                                       Sort Key: p.id_publisher, p.name
                                       ->  Merge Right Join  (cost=108.88..166.55 rows=1097 width=162) (never executed)
                                             Merge Cond: ("outer".id_package = "inner".id_package)
                                             ->  Index Scan using package_security_id_package_key on package_security
ps (cost=0.00..38.50 rows=738 width=20) (never executed) 
                                             ->  Sort  (cost=108.88..111.62 rows=1097 width=146) (never executed)
                                                   Sort Key: p.id_package
                                                   ->  Seq Scan on packages p  (cost=0.00..53.48 rows=1097 width=146)
(neverexecuted) 
 Total runtime: 0.302 ms
(38 rows)

Re: bad performances using hashjoin

От
Tom Lane
Дата:
Gaetano Mendola <mendola@bigfoot.com> writes:
> If you need other info in order to improve the planner,

... like, say, the PG version you are using, or the definitions of the
views involved?  It's difficult to say much of anything about this.

However: the reason the second plan wins is because there are zero rows
fetched from sat_request, and so the bulk of the plan is never executed
at all.  I doubt the second plan would win if there were any matching
sat_request rows.  If this is the case you actually need to optimize,
probably the thing to do is to get rid of the ORDER BY clauses you
evidently have in your views, so that there's some chance of building
a fast-start plan.

BTW, I believe in 8.0 the first plan would be about as fast as the
second, because we added some code to hash join to fall out without
scanning the left input if the right input is empty.

            regards, tom lane

Re: bad performances using hashjoin

От
Gaetano Mendola
Дата:
Tom Lane wrote:
> Gaetano Mendola <mendola@bigfoot.com> writes:
>
>>If you need other info in order to improve the planner,
>
>
> ... like, say, the PG version you are using, or the definitions of the
> views involved?  It's difficult to say much of anything about this.

That is true, sorry I forgot it :-(
The engine is a 7.4.5 and these are the views definitions:

sat_request is just a table

CREATE OR REPLACE VIEW v_sc_packages AS
  SELECT *
  FROM
      v_programs           vpr,
      v_packages           vpk,
      v_sequences          vs
  WHERE
     ------------ JOIN -------------
      vpr.id_program = vs.id_program AND
      vpk.id_package = vs.id_package AND
     -------------------------------
      vs.estimated_start IS NOT NULL
  ORDER BY vs.estimated_start;


CREATE OR REPLACE VIEW v_programs AS
  SELECT *
  FROM programs
  WHERE id_program<>0
  ORDER BY id_publisher, id_program
;



CREATE OR REPLACE VIEW v_packages AS
  SELECT *
  FROM packages p LEFT OUTER JOIN package_security ps USING (id_package)
  ORDER BY p.id_publisher, p.name
;

CREATE OR REPLACE VIEW v_sequences AS
  SELECT id_package        AS id_package,
         id_program        AS id_program,
         internal_position AS internal_position,
         estimated_start   AS estimated_start
  FROM sequences
  ORDER BY id_program, internal_position
;


> However: the reason the second plan wins is because there are zero rows
> fetched from sat_request, and so the bulk of the plan is never executed
> at all.  I doubt the second plan would win if there were any matching
> sat_request rows.  If this is the case you actually need to optimize,
> probably the thing to do is to get rid of the ORDER BY clauses you
> evidently have in your views, so that there's some chance of building
> a fast-start plan.

Removed all order by from that views, this is the comparison between the two
(orderdered and not ordered):

empdb=# explain analyze SELECT id_sat_request
empdb-#    FROM sat_request sr,
empdb-#         v_sc_packages vs
empdb-#    WHERE    ----- JOIN ----
empdb-#          sr.id_package = vs.id_package AND
empdb-#             ---------------
empdb-#          id_user = 29416 AND
empdb-#          id_url  =  424364 AND
empdb-#          vs.estimated_start > now() AND
empdb-#          id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled');
                                                                            QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=280.98..284.74 rows=1 width=4) (actual time=895.344..895.344 rows=0 loops=1)
   Hash Cond: ("outer".id_package = "inner".id_package)
   ->  Subquery Scan vs  (cost=277.94..280.19 rows=301 width=4) (actual time=893.191..894.396 rows=569 loops=1)
         ->  Sort  (cost=277.94..278.69 rows=301 width=263) (actual time=893.184..893.546 rows=569 loops=1)
               Sort Key: vs.estimated_start
               ->  Hash Join  (cost=232.61..265.54 rows=301 width=263) (actual time=868.980..889.643 rows=569 loops=1)
                     Hash Cond: ("outer".id_package = "inner".id_package)
                     ->  Subquery Scan vpk  (cost=150.29..159.26 rows=1196 width=218) (actual time=822.281..834.063
rows=1203loops=1) 
                           ->  Sort  (cost=150.29..153.28 rows=1196 width=159) (actual time=822.266..823.190 rows=1203
loops=1)
                                 Sort Key: p.id_publisher, p.name
                                 ->  Hash Left Join  (cost=16.14..89.16 rows=1196 width=159) (actual
time=3.504..809.262rows=1203 loops=1) 
                                       Hash Cond: ("outer".id_package = "inner".id_package)
                                       ->  Seq Scan on packages p  (cost=0.00..53.98 rows=1196 width=143) (actual
time=0.018..13.869rows=1203 loops=1) 
                                       ->  Hash  (cost=14.09..14.09 rows=818 width=20) (actual time=2.395..2.395 rows=0
loops=1)
                                             ->  Seq Scan on package_security ps  (cost=0.00..14.09 rows=818 width=20)
(actualtime=0.020..1.520 rows=845 loops=1) 
                     ->  Hash  (cost=82.19..82.19 rows=51 width=49) (actual time=46.402..46.402 rows=0 loops=1)
                           ->  Merge Join  (cost=79.54..82.19 rows=51 width=49) (actual time=39.435..45.376 rows=569
loops=1)
                                 Merge Cond: ("outer".id_program = "inner".id_program)
                                 ->  Subquery Scan vs  (cost=70.98..72.59 rows=214 width=16) (actual
time=16.834..19.102rows=569 loops=1) 
                                       ->  Sort  (cost=70.98..71.52 rows=214 width=20) (actual time=16.824..17.338
rows=569loops=1) 
                                             Sort Key: sequences.id_program, sequences.internal_position
                                             ->  Seq Scan on sequences  (cost=0.00..62.70 rows=214 width=20) (actual
time=0.638..11.969rows=569 loops=1) 
                                                   Filter: ((estimated_start IS NOT NULL) AND
(date_trunc('seconds'::text,estimated_start) > now())) 
                                 ->  Sort  (cost=8.56..8.68 rows=47 width=37) (actual time=22.580..23.123 rows=605
loops=1)
                                       Sort Key: vpr.id_program
                                       ->  Subquery Scan vpr  (cost=6.90..7.25 rows=47 width=37) (actual
time=22.294..22.464rows=48 loops=1) 
                                             ->  Sort  (cost=6.90..7.02 rows=47 width=61) (actual time=22.287..22.332
rows=48loops=1) 
                                                   Sort Key: programs.id_publisher, programs.id_program
                                                   ->  Seq Scan on programs  (cost=0.00..5.60 rows=47 width=61) (actual
time=4.356..22.068rows=48 loops=1) 
                                                         Filter: (id_program <> 0)
   ->  Hash  (cost=3.04..3.04 rows=1 width=8) (actual time=0.033..0.033 rows=0 loops=1)
         ->  Index Scan using idx_id_url_sat_request on sat_request sr  (cost=0.00..3.04 rows=1 width=8) (actual
time=0.031..0.031rows=0 loops=1) 
               Index Cond: (id_url = 424364)
               Filter: ((id_user = 29416) AND (id_sat_request_status = 1))
 Total runtime: 897.044 ms
(35 rows)



empdb=# explain analyze SELECT id_sat_request
empdb-#    FROM sat_request sr,
empdb-#         v_sc_packages_new vs
empdb-#    WHERE    ----- JOIN ----
empdb-#          sr.id_package = vs.id_package AND
empdb-#             ---------------
empdb-#          id_user = 29416 AND
empdb-#          id_url  =  424364 AND
empdb-#          vs.estimated_start > now() AND
empdb-#          id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled');
                                                                          QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=19.18..96.87 rows=1 width=4) (actual time=15.576..15.576 rows=0 loops=1)
   ->  Nested Loop  (cost=19.18..93.04 rows=1 width=8) (actual time=15.569..15.569 rows=0 loops=1)
         ->  Hash Join  (cost=19.18..89.21 rows=1 width=12) (actual time=15.566..15.566 rows=0 loops=1)
               Hash Cond: ("outer".id_package = "inner".id_package)
               ->  Hash Left Join  (cost=16.14..80.19 rows=1196 width=4) (actual time=7.291..13.620 rows=1203 loops=1)
                     Hash Cond: ("outer".id_package = "inner".id_package)
                     ->  Seq Scan on packages p  (cost=0.00..53.98 rows=1196 width=4) (actual time=0.028..2.694
rows=1203loops=1) 
                     ->  Hash  (cost=14.09..14.09 rows=818 width=4) (actual time=6.707..6.707 rows=0 loops=1)
                           ->  Seq Scan on package_security ps  (cost=0.00..14.09 rows=818 width=4) (actual
time=0.026..4.620rows=845 loops=1) 
               ->  Hash  (cost=3.04..3.04 rows=1 width=8) (actual time=0.061..0.061 rows=0 loops=1)
                     ->  Index Scan using idx_id_url_sat_request on sat_request sr  (cost=0.00..3.04 rows=1 width=8)
(actualtime=0.056..0.056 rows=0 loops=1) 
                           Index Cond: (id_url = 424364)
                           Filter: ((id_user = 29416) AND (id_sat_request_status = 1))
         ->  Index Scan using idx_sequences_id_package on sequences  (cost=0.00..3.82 rows=1 width=8) (never executed)
               Index Cond: ("outer".id_package = sequences.id_package)
               Filter: ((estimated_start IS NOT NULL) AND (date_trunc('seconds'::text, estimated_start) > now()))
   ->  Index Scan using programs_pkey on programs  (cost=0.00..3.83 rows=1 width=4) (never executed)
         Index Cond: (programs.id_program = "outer".id_program)
         Filter: (id_program <> 0)
 Total runtime: 16.279 ms
(20 rows)




The second one of course is faster, this is the second select with hashjoin disabled:

empdb=# set enable_hashjoin = false;
SET
empdb=# explain analyze SELECT id_sat_request
empdb-#    FROM sat_request sr,
empdb-#         v_sc_packages_new vs
empdb-#    WHERE    ----- JOIN ----
empdb-#          sr.id_package = vs.id_package AND
empdb-#             ---------------
empdb-#          id_user = 29416 AND
empdb-#          id_url  =  424364 AND
empdb-#          vs.estimated_start > now() AND
empdb-#          id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled');
                                                                             QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=10.62..175.83 rows=1 width=4) (actual time=0.280..0.280 rows=0 loops=1)
   Merge Cond: ("outer".id_package = "inner".id_package)
   ->  Merge Left Join  (cost=0.00..162.21 rows=1196 width=4) (actual time=0.188..0.188 rows=1 loops=1)
         Merge Cond: ("outer".id_package = "inner".id_package)
         ->  Index Scan using packages_pkey on packages p  (cost=0.00..115.51 rows=1196 width=4) (actual
time=0.085..0.085rows=1 loops=1) 
         ->  Index Scan using package_security_id_package_key on package_security ps  (cost=0.00..39.06 rows=818
width=4)(actual time=0.080..0.080 rows=1 loops=1) 
   ->  Sort  (cost=10.62..10.62 rows=1 width=12) (actual time=0.087..0.087 rows=0 loops=1)
         Sort Key: sr.id_package
         ->  Nested Loop  (cost=0.00..10.61 rows=1 width=12) (actual time=0.069..0.069 rows=0 loops=1)
               ->  Nested Loop  (cost=0.00..6.77 rows=1 width=16) (actual time=0.067..0.067 rows=0 loops=1)
                     ->  Index Scan using idx_id_url_sat_request on sat_request sr  (cost=0.00..3.04 rows=1 width=8)
(actualtime=0.065..0.065 rows=0 loops=1) 
                           Index Cond: (id_url = 424364)
                           Filter: ((id_user = 29416) AND (id_sat_request_status = 1))
                     ->  Index Scan using idx_sequences_id_package on sequences  (cost=0.00..3.72 rows=1 width=8)
(neverexecuted) 
                           Index Cond: ("outer".id_package = sequences.id_package)
                           Filter: ((estimated_start IS NOT NULL) AND (date_trunc('seconds'::text, estimated_start) >
now()))
               ->  Index Scan using programs_pkey on programs  (cost=0.00..3.83 rows=1 width=4) (never executed)
                     Index Cond: (programs.id_program = "outer".id_program)
                     Filter: (id_program <> 0)
 Total runtime: 0.604 ms
(20 rows)

I see the problem is still here:
Hash Left Join  (cost=16.14..80.19 rows=1196 width=4) (actual time=7.291..13.620 rows=1203 loops=1)

BTW, at this time the executions time seen are lower because right now is not a peak hour

> BTW, I believe in 8.0 the first plan would be about as fast as the
> second, because we added some code to hash join to fall out without
> scanning the left input if the right input is empty.

I'll take it a try if you are really interested in the results.

Regards
Gaetano Mendola











Re: bad performances using hashjoin

От
Klint Gore
Дата:
On Sun, 20 Feb 2005 13:46:10 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> sat_request rows.  If this is the case you actually need to optimize,
> probably the thing to do is to get rid of the ORDER BY clauses you
> evidently have in your views, so that there's some chance of building
> a fast-start plan.

Is having an order by in a view legal?  In sybase ASA, mssql it throws a
syntax error if there's an order by.

If so, does it do 2 sorts when you sort by something else?

i.e. if view is
   create view v1 as select * from table order by 1;
and the statment
   select * from v1 order by 2;
is run does it sort by 1 then resort by 2?

klint.

+---------------------------------------+-----------------+
: Klint Gore                            : "Non rhyming    :
: EMail   : kg@kgb.une.edu.au           :  slang - the    :
: Snail   : A.B.R.I.                    :  possibilities  :
: Mail      University of New England   :  are useless"   :
:           Armidale NSW 2351 Australia :     L.J.J.      :
: Fax     : +61 2 6772 5376             :                 :
+---------------------------------------+-----------------+

Re: bad performances using hashjoin

От
Tom Lane
Дата:
Klint Gore <kg@kgb.une.edu.au> writes:
> Is having an order by in a view legal?

Not according to the SQL spec, but PG has allowed it for several releases.
(The same goes for ORDER BY in a sub-select, which is actually pretty
much the same thing ...)

> If so, does it do 2 sorts when you sort by something else?

Yup.  It's something you'd only want for the topmost view in a stack,
IMHO.  A sort at a lower level is likely to be wasted effort.

            regards, tom lane

Re: bad performances using hashjoin

От
David Brown
Дата:
Tom Lane wrote:

>However: the reason the second plan wins is because there are zero rows
>fetched from sat_request, and so the bulk of the plan is never executed
>at all.  I doubt the second plan would win if there were any matching
>sat_request rows.
>
That's what I thought at first, but if you look more closely, that's
having very little impact on either the cost or actual time:

->  Index Scan using idx_id_url_sat_request on sat_request sr  (cost=0.00..3.04 rows=1 width=8) (actual
time=0.031..0.031rows=0 loops=1) 


The real problem appears to be here:

->  Hash Left Join  (cost=16.14..89.16 rows=1196 width=159) (actual time=3.504..809.262 rows=1203 loops=1)


As Gaetano points out in his follow-up post, the problem still exists
after he removed the sorts:

->  Hash Left Join  (cost=16.14..80.19 rows=1196 width=4) (actual time=7.291..13.620 rows=1203 loops=1)


The planner is not breaking up the outer join in his v_packages view:
  SELECT *
  FROM packages p LEFT OUTER JOIN package_security ps USING (id_package)

It's not being selective at all with packages, despite id_package being
the link to sat_request.

If this is too complex for the planner, could he re-arrange his outer
join so that's it's processed later? If he puts it in his actual query,
for instance, will the planner flatten it out anyway?

Re: bad performances using hashjoin

От
Tom Lane
Дата:
David Brown <time@bigpond.net.au> writes:
> The planner is not breaking up the outer join in his v_packages view:

The planner doesn't make any attempt to rearrange join order of outer
joins.  There are some cases where such a rearrangement is OK, but there
are other cases where it isn't, and we don't currently have the logic
needed to tell which is which.

In the particular case at hand here, 8.0's hack to suppress evaluating
the outer side of a hash join after finding the inner side is empty
would eliminate the complaint.

In the original message, it did seem that the packages-to-
package_security join is taking a lot longer than one would expect:

    ->  Hash Left Join  (cost=15.54..86.42 rows=1097 width=162) (actual time=2.978..6087.608 rows=1104 loops=1)
          Hash Cond: ("outer".id_package = "inner".id_package)
          ->  Seq Scan on packages p  (cost=0.00..53.48 rows=1097 width=146) (actual time=0.011..7.978 rows=1104
loops=1)
          ->  Hash  (cost=13.69..13.69 rows=738 width=20) (actual time=2.061..2.061 rows=0 loops=1)
                ->  Seq Scan on package_security ps  (cost=0.00..13.69 rows=738 width=20) (actual time=0.027..1.289
rows=747loops=1) 

but this behavior isn't reproduced in the later message, so I wonder if
it wasn't an artifact of something else taking a chunk of time.

            regards, tom lane

Re: bad performances using hashjoin

От
Gaetano Mendola
Дата:
Tom Lane wrote:

> but this behavior isn't reproduced in the later message, so I wonder if
> it wasn't an artifact of something else taking a chunk of time.

I think is due the fact that first queries were performed in peakhours.


Regards
Gaetano Mendola



Re: bad performances using hashjoin

От
David Brown
Дата:
Gaetano Mendola wrote:

>I think is due the fact that first queries were performed in peakhours.
>
>
A memory intensive operation takes 7.5 times longer during heavy loads.
Doesn't this suggest that swapping of working memory is occurring?


Re: bad performances using hashjoin

От
Mischa
Дата:
Quoting Tom Lane <tgl@sss.pgh.pa.us>:

> Klint Gore <kg@kgb.une.edu.au> writes:
> > Is having an order by in a view legal?
>
> Not according to the SQL spec, but PG has allowed it for several releases.
> (The same goes for ORDER BY in a sub-select, which is actually pretty
> much the same thing ...)

Umm... if you implement LIMIT in a subselect, it becomes highly meaningful (nad
useful.

Is this a case of one nonstandard feature being the thin edge of the wedge
for another?
--
"Dreams come true, not free."