Обсуждение: Cost Issue - How do I force a Hash Join
Hi,
    I have query where I do two inline queries (which involves grouping) and then join them with an outer join.
The individual queries run in 50-300 ms. However the optimizer is choosing a nested loop to join them rather than a Hash join
causing the complete query to take 500+ seconds. It expects that it will get 1 row out from each of the sources, but here is gets
several thousand rows.
Is there any way I can get a hash join used on the outer join, while preserving the nested loops.
explain analyze
select o1.objaddr, o1.fieldname, o1.objsig,
o1.totmem, o1.cnt,
o2.totmem, o2.cnt
from
( select min(ao.objaddr) as objaddr, count(*) as cnt,
sum(ao.totmem) as totmem, ao.objsig, ar.fieldname, ao.objtype
from jam_heapobj ao, jam_heaprel ar
where ar.heap_id = 1 and ar.parentaddr = 0 and ar.fieldname = 'K'
and ao.heap_id = ar.heap_id and ao.objaddr = ar.childaddr
group by ao.objsig, ar.fieldname, ao.objtype) o1
left outer join
(select min(bo.objaddr) as objaddr, count(*) as cnt,
sum(bo.totmem) as totmem, bo.objsig, br.fieldname, bo.objtype
from jam_heapobj bo, jam_heaprel br
where br.heap_id = 0 and br.parentaddr = 0 and br.fieldname = 'K'
and bo.heap_id = br.heap_id and bo.objaddr = br.childaddr
group by bo.objsig, br.fieldname, bo.objtype) o2
on ( o2.objsig = o1.objsig and o2.objtype = o1.objtype
and o2.fieldname = o1.fieldname)
order by o1.totmem - coalesce(o2.totmem,0) desc;
select o1.objaddr, o1.fieldname, o1.objsig,
o1.totmem, o1.cnt,
o2.totmem, o2.cnt
from
( select min(ao.objaddr) as objaddr, count(*) as cnt,
sum(ao.totmem) as totmem, ao.objsig, ar.fieldname, ao.objtype
from jam_heapobj ao, jam_heaprel ar
where ar.heap_id = 1 and ar.parentaddr = 0 and ar.fieldname = 'K'
and ao.heap_id = ar.heap_id and ao.objaddr = ar.childaddr
group by ao.objsig, ar.fieldname, ao.objtype) o1
left outer join
(select min(bo.objaddr) as objaddr, count(*) as cnt,
sum(bo.totmem) as totmem, bo.objsig, br.fieldname, bo.objtype
from jam_heapobj bo, jam_heaprel br
where br.heap_id = 0 and br.parentaddr = 0 and br.fieldname = 'K'
and bo.heap_id = br.heap_id and bo.objaddr = br.childaddr
group by bo.objsig, br.fieldname, bo.objtype) o2
on ( o2.objsig = o1.objsig and o2.objtype = o1.objtype
and o2.fieldname = o1.fieldname)
order by o1.totmem - coalesce(o2.totmem,0) desc;
 Sort (cost=16305.41..16305.42 rows=1 width=562) (actual time=565997.769..566016.255 rows=6115 loops=1)
Sort Key: (o1.totmem - COALESCE(o2.totmem, 0::bigint))
->Nested Loop Left Join (cost=16305.22..16305.40 rows=1 width=562) (actual time=612.631..565896.047 rows=6115 loops=1)
Join Filter: ((("inner".objsig)::text = ("outer".objsig)::text) AND (("inner".objtype)::text = ("outer".objtype)::text) AND (("inner".fieldname)::text = ("outer".fieldname)::text))
->Subquery Scan o1 (cost=12318.12..12318.15 rows=1 width=514) (actual time=309.659..413.311 rows=6115 loops=1)
->HashAggregate (cost=12318.12..12318.14 rows=1 width=54) (actual time=309.649..367.206 rows=6115 loops=1)
->Nested Loop (cost=0.00..12317.90 rows=10 width=54) (actual time=0.243..264.116 rows=6338 loops=1)
->Index Scan using jam_heaprel_n1 on jam_heaprel ar (cost=0.00..12275.00 rows=7 width=19) (actual time=0.176..35.780 rows=6338 loops=1)
Index Cond: ((heap_id = 1) AND (parentaddr = 0))
Filter: ((fieldname)::text = 'K'::text)
->Index Scan using jam_heapobj_u1 on jam_heapobj ao (cost=0.00..6.10 rows=2 width=51) (actual time=0.019..0.022 rows=1 loops=6338)
Index Cond: ((ao.heap_id = 1) AND (ao.objaddr = "outer".childaddr))
->Subquery Scan o2 (cost=3987.10..3987.13 rows=1 width=514) (actual time=0.062..75.171 rows=6038 loops=6115)
->HashAggregate (cost=3987.10..3987.12 rows=1 width=54) (actual time=0.056..36.469 rows=6038 loops=6115)
->Nested Loop (cost=0.00..3987.05 rows=2 width=54) (actual time=0.145..257.876 rows=6259 loops=1)
->Index Scan using jam_heaprel_n1 on jam_heaprel br (cost=0.00..3974.01 rows=3 width=19) (actual time=0.074..35.124 rows=6259 loops=1)
Index Cond: ((heap_id = 0) AND (parentaddr = 0))
Filter: ((fieldname)::text = 'K'::text)
->Index Scan using jam_heapobj_u1 on jam_heapobj bo (cost=0.00..4.33 rows=1 width=51) (actual time=0.018..0.022 rows=1 loops=6259)
Index Cond: ((bo.heap_id = 0) AND (bo.objaddr = "outer".childaddr))
Total runtime: 566044.187 ms
(21 rows)
Sort Key: (o1.totmem - COALESCE(o2.totmem, 0::bigint))
->Nested Loop Left Join (cost=16305.22..16305.40 rows=1 width=562) (actual time=612.631..565896.047 rows=6115 loops=1)
Join Filter: ((("inner".objsig)::text = ("outer".objsig)::text) AND (("inner".objtype)::text = ("outer".objtype)::text) AND (("inner".fieldname)::text = ("outer".fieldname)::text))
->Subquery Scan o1 (cost=12318.12..12318.15 rows=1 width=514) (actual time=309.659..413.311 rows=6115 loops=1)
->HashAggregate (cost=12318.12..12318.14 rows=1 width=54) (actual time=309.649..367.206 rows=6115 loops=1)
->Nested Loop (cost=0.00..12317.90 rows=10 width=54) (actual time=0.243..264.116 rows=6338 loops=1)
->Index Scan using jam_heaprel_n1 on jam_heaprel ar (cost=0.00..12275.00 rows=7 width=19) (actual time=0.176..35.780 rows=6338 loops=1)
Index Cond: ((heap_id = 1) AND (parentaddr = 0))
Filter: ((fieldname)::text = 'K'::text)
->Index Scan using jam_heapobj_u1 on jam_heapobj ao (cost=0.00..6.10 rows=2 width=51) (actual time=0.019..0.022 rows=1 loops=6338)
Index Cond: ((ao.heap_id = 1) AND (ao.objaddr = "outer".childaddr))
->Subquery Scan o2 (cost=3987.10..3987.13 rows=1 width=514) (actual time=0.062..75.171 rows=6038 loops=6115)
->HashAggregate (cost=3987.10..3987.12 rows=1 width=54) (actual time=0.056..36.469 rows=6038 loops=6115)
->Nested Loop (cost=0.00..3987.05 rows=2 width=54) (actual time=0.145..257.876 rows=6259 loops=1)
->Index Scan using jam_heaprel_n1 on jam_heaprel br (cost=0.00..3974.01 rows=3 width=19) (actual time=0.074..35.124 rows=6259 loops=1)
Index Cond: ((heap_id = 0) AND (parentaddr = 0))
Filter: ((fieldname)::text = 'K'::text)
->Index Scan using jam_heapobj_u1 on jam_heapobj bo (cost=0.00..4.33 rows=1 width=51) (actual time=0.018..0.022 rows=1 loops=6259)
Index Cond: ((bo.heap_id = 0) AND (bo.objaddr = "outer".childaddr))
Total runtime: 566044.187 ms
(21 rows)
Regards,
Virag
			
		"Virag Saksena" <virag@auptyma.com> writes:
> The individual queries run in 50-300 ms. However the optimizer is =
> choosing a nested loop to join them rather than a Hash join
> causing the complete query to take 500+ seconds. It expects that it will =
> get 1 row out from each of the sources, but here is gets
> several thousand rows.
The best approach is to see if you can't fix that estimation error.
Are the stats up to date on these tables?  If so, maybe raising the
statistics targets would help.
            regards, tom lane
			
		"Virag Saksena" <virag@auptyma.com> writes:
>> The individual queries run in 50-300 ms. However the optimizer is
>> choosing a nested loop to join them rather than a Hash join...
I have what appears to be the identical problem.
This is a straightforward query that should be fairly quick, but takes about 30 minutes.  It's a query across three
tables,call them A, B, and C.  The tables are joined on indexed columns. 
Here's a quick summary:
   Table A -----> Table B -----> Table C
     A_ID           B_ID           C_ID
                    A_ID           NAME
                    C_ID
Tables A and B have 6 million rows each.  Table C is small: 67 names, no repeats.  All columns involved in the join are
indexed.
Summary:
   1. Query B only:    2.7 seconds, 302175 rows returned
   2. Join B and C:    4.3 seconds, exact same answer
   3. Join A and B:    7.2 minutes, exact same answer
   4. Join A, B, C:    32.7 minutes, exact same answer
Looking at these:
   Query #1 is doing the real work: finding the rows of interest.
   Queries #1 and #2 ought to be virtually identical, since Table C has
   just one row with C_ID = 9, but the time almost doubles.
   Query #3 should take a bit longer than Query #1 because it has to join
   300K rows, but the indexes should make this take just a few seconds,
   certainly well under a minute.
   Query #4 should be identical to Query #3, again because there's only
   one row in Table C.  32 minutes is pretty horrible for such a
   straightforward query.
It looks to me like the problem is the use of nested loops when a hash join should be used, but I'm no expert at query
planning.
This is psql 8.0.3.  Table definitions are at the end.  (Table and column names are altered to protect the guilty,
otherwisethese are straight from Postgres.)  I ran "vacuum full analyze" after the last data were added.  Hardware is a
Dell,2-CPU Xeon, 4 GB memory, database is on a single SATA 7200RPM disk. 
Thanks,
Craig
----------------------------
QUERY #1:
---------
explain analyze select B.A_ID from B where B.B_ID = 9;
 Index Scan using i_B_B_ID on B  (cost=0.00..154401.36 rows=131236 width=4) (actual time=0.158..1387.251 rows=302175
loops=1)
   Index Cond: (B_ID = 9)
 Total runtime: 2344.053 ms
QUERY #2:
---------
explain analyze select B.A_ID from B join C on (B.C_ID = C.C_ID) where C.name = 'Joe';
 Nested Loop  (cost=0.00..258501.92 rows=177741 width=4) (actual time=0.349..3392.532 rows=302175 loops=1)
   ->  Seq Scan on C  (cost=0.00..12.90 rows=1 width=4) (actual time=0.232..0.336 rows=1 loops=1)
         Filter: ((name)::text = 'Joe'::text)
   ->  Index Scan using i_B_C_ID on B  (cost=0.00..254387.31 rows=328137 width=8) (actual time=0.102..1290.002
rows=302175loops=1) 
         Index Cond: (B.C_ID = "outer".C_ID)
 Total runtime: 4373.916 ms
QUERY #3:
---------
explain analyze
   select A.A_ID from A
     join B on (A.A_ID = B.A_ID)
    where B.B_ID = 9;
 Nested Loop  (cost=0.00..711336.41 rows=131236 width=4) (actual time=37.118..429419.347 rows=302175 loops=1)
   ->  Index Scan using i_B_B_ID on B  (cost=0.00..154401.36 rows=131236 width=4) (actual time=27.344..8858.489
rows=302175loops=1) 
         Index Cond: (B_ID = 9)
   ->  Index Scan using pk_A_test on A  (cost=0.00..4.23 rows=1 width=4) (actual time=1.372..1.376 rows=1 loops=302175)
         Index Cond: (A.A_ID = "outer".A_ID)
 Total runtime: 430467.686 ms
QUERY #4:
---------
explain analyze
   select A.A_ID from A
     join B on (A.A_ID = B.A_ID)
     join C on (B.B_ID = C.B_ID)
     where C.name = 'Joe';
 Nested Loop  (cost=0.00..1012793.38 rows=177741 width=4) (actual time=70.184..1960112.247 rows=302175 loops=1)
   ->  Nested Loop  (cost=0.00..258501.92 rows=177741 width=4) (actual time=52.114..17753.638 rows=302175 loops=1)
         ->  Seq Scan on C  (cost=0.00..12.90 rows=1 width=4) (actual time=0.109..0.176 rows=1 loops=1)
               Filter: ((name)::text = 'Joe'::text)
         ->  Index Scan using i_B_B_ID on B  (cost=0.00..254387.31 rows=328137 width=8) (actual time=51.985..15566.896
rows=302175loops=1) 
               Index Cond: (B.B_ID = "outer".B_ID)
   ->  Index Scan using pk_A_test on A  (cost=0.00..4.23 rows=1 width=4) (actual time=6.407..6.412 rows=1 loops=302175)
         Index Cond: (A.A_ID = "outer".A_ID)
 Total runtime: 1961200.079 ms
TABLE DEFINITIONS:
------------------
xxx => \d a
                Table "xxx.a"
      Column       |          Type          | Modifiers
-------------------+------------------------+-----------
 a_id              | integer                | not null
 ... more columns
Indexes:
    "pk_a_id" PRIMARY KEY, btree (a_id)
    ... more indexes on other columns
xxx => \d b
                    Table "xxx.b"
          Column          |          Type          | Modifiers
--------------------------+------------------------+-----------
 b_id                     | integer                | not null
 a_id                     | integer                | not null
 c_id                     | integer                | not null
 ... more columns
Indexes:
    "b_pkey" PRIMARY KEY, btree (b_id)
    "i_b_a_id" btree (a_id)
    "i_b_c_id" btree (c_id)
xxx=> \d c
          Table "xxx.c"
    Column     |          Type          | Modifiers
---------------+------------------------+-----------
 c_id          | integer                | not null
 name          | character varying(200) |
 ... more columns
Indexes:
    "c_pkey" PRIMARY KEY, btree (c_id)
			
		Tables are analyzed, though I would love to find a way to increase it's
accuracy of statistics
Tried raising the statistics target upto 100, but it did not help. Should I
bump it even more
However I found that if I add depth to the group by clauses, it somehow
tells the optimizer that it would get more than 1 row
and it goes to a Hash Join ....
For this query, only rows with one value of depth are accessed, so we are
okay ... but I would like to see if there is some other
way I can get a better approximation for the costs
 Sort (cost=25214.36..25214.39 rows=10 width=958) (actual
time=9798.860..9815.670 rows=6115 loops=1)
  Sort Key: (o1.totmem - COALESCE(o2.totmem, 0::bigint))
  ->Hash Left Join (cost=25213.83..25214.19 rows=10 width=958) (actual
time=8526.248..9755.721 rows=6115 loops=1)
    Hash Cond: ((("outer".objsig)::text = ("inner".objsig)::text) AND
(("outer".objtype)::text = ("inner".objtype)::text) AND
(("outer".fieldname)::text = ("inner".fieldname)::text))
    ->Subquery Scan o1 (cost=18993.48..18993.66 rows=10 width=990) (actual
time=6059.880..6145.223 rows=6115 loops=1)
      ->HashAggregate (cost=18993.48..18993.56 rows=10 width=46) (actual
time=6059.871..6094.897 rows=6115 loops=1)
        ->Nested Loop (cost=0.00..18993.22 rows=15 width=46) (actual
time=45.510..5980.807 rows=6338 loops=1)
          ->Index Scan using jam_heaprel_n1 on jam_heaprel ar
(cost=0.00..18932.01 rows=10 width=19) (actual time=45.374..205.520
rows=6338 loops=1)
            Index Cond: ((heap_id = 1) AND (parentaddr = 0))
            Filter: ((fieldname)::text = 'K'::text)
          ->Index Scan using jam_heapobj_u1 on jam_heapobj ao
(cost=0.00..6.10 rows=2 width=43) (actual time=0.885..0.890 rows=1
loops=6338)
            Index Cond: ((ao.heap_id = 1) AND (ao.objaddr =
"outer".childaddr))
    ->Hash (cost=6220.34..6220.34 rows=2 width=982) (actual
time=2466.178..2466.178 rows=0 loops=1)
      ->Subquery Scan o2 (cost=6220.30..6220.34 rows=2 width=982) (actual
time=2225.242..2433.744 rows=6038 loops=1)
        ->HashAggregate (cost=6220.30..6220.32 rows=2 width=46) (actual
time=2225.233..2366.890 rows=6038 loops=1)
          ->Nested Loop (cost=0.00..6220.27 rows=2 width=46) (actual
time=0.449..2149.257 rows=6259 loops=1)
            ->Index Scan using jam_heaprel_n1 on jam_heaprel br
(cost=0.00..6202.89 rows=4 width=19) (actual time=0.296..51.310 rows=6259
loops=1)
              Index Cond: ((heap_id = 0) AND (parentaddr = 0))
              Filter: ((fieldname)::text = 'K'::text)
            ->Index Scan using jam_heapobj_u1 on jam_heapobj bo
(cost=0.00..4.33 rows=1 width=43) (actual time=0.294..0.300 rows=1
loops=6259)
              Index Cond: ((bo.heap_id = 0) AND (bo.objaddr =
"outer".childaddr))
 Total runtime: 9950.192 ms
Regards,
Virag
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Virag Saksena" <virag@auptyma.com>
Cc: <pgsql-performance@postgresql.org>
Sent: Monday, February 20, 2006 9:35 PM
Subject: Re: [PERFORM] Cost Issue - How do I force a Hash Join
> "Virag Saksena" <virag@auptyma.com> writes:
> > The individual queries run in 50-300 ms. However the optimizer is =
> > choosing a nested loop to join them rather than a Hash join
> > causing the complete query to take 500+ seconds. It expects that it will
=
> > get 1 row out from each of the sources, but here is gets
> > several thousand rows.
>
> The best approach is to see if you can't fix that estimation error.
> Are the stats up to date on these tables?  If so, maybe raising the
> statistics targets would help.
>
> regards, tom lane
>