Обсуждение: Weird explain output
I have a long SQL with many join, the explain analyse output like -> Hash Left Join (cost=197749705.15..5320438982425.18 rows=5245599776015 width=7542) (actual time=674.634..2645.293 rows=4568 loops=1) Hash Cond: (p.id = ps.partner_id) -> Merge Join (cost=197747693.55..78881918575.29 rows=5245599776015 width=7486) (actual time=636.814..653.137 rows=4568 loops=1) Merge Cond: (a3.team_id = t.id) -> Sort (cost=3509574.57..3554578.91 rows=18001735 width=112) (actual time=518.318..519.221 rows=4569 loops=1) Sort Key: a3.team_id Sort Method: quicksort Memory: 549kB ..................... sniped -> Materialize (cost=194238118.98..194529512.87 rows=58278779 width=7390) (actual time=118.488..122.408 rows=4569 loops=1) -> Sort (cost=194238118.98..194383815.92 rows=58278779 width=7390) (actual time=118.486..119.723 rows=3687 loops=1) Sort Key: t.id Sort Method: quicksort Memory: 6951kB ............................. sniped -> Hash (cost=1917.17..1917.17 rows=7555 width=68) (actual time=37.578..37.578 rows=7555 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 958kB ............................ sniped We can see the 'Hash Left Join node' spend 2645.293 ms, so I change the sql and remove the left join node, and expect the SQL will end with 653.137 ms. But the explain analyse output like -> Merge Join (cost=197548498.05..314933709300.46 rows=5245599776015 width=7478) (actual time=775.495..2744.759 rows=4568 loops=1) Merge Cond: (a3.team_id = t.id) -> Sort (cost=3509574.57..3554578.91 rows=18001735 width=112) (actual time=540.636..541.392 rows=4569 loops=1) Sort Key: a3.team_id Sort Method: quicksort Memory: 549kB .......................... sniped -> Materialize (cost=194038923.48..194330317.37 rows=58278779 width=7382) (actual time=234.632..238.065 rows=4569 loops=1) -> Sort (cost=194038923.48..194184620.42 rows=58278779 width=7382) (actual time=234.628..235.744 rows=3687 loops=1) Sort Key: t.id Sort Method: quicksort Memory: 6899kB ......................... sniped We can see the 'Merge Join' spend 2744.759 ms. The sniped output under the 'Merge Join' node in the two explain output are similar, Why does this happen?
wd <wd@wdicc.com> writes: > I have a long SQL with many join, the explain analyse output like > -> Hash Left Join (cost=197749705.15..5320438982425.18 > rows=5245599776015 width=7542) (actual time=674.634..2645.293 rows=4568 > loops=1) > Hash Cond: (p.id = ps.partner_id) > -> Merge Join (cost=197747693.55..78881918575.29 > rows=5245599776015 width=7486) (actual time=636.814..653.137 rows=4568 > loops=1) > Merge Cond: (a3.team_id = t.id) > -> Sort (cost=3509574.57..3554578.91 rows=18001735 > width=112) (actual time=518.318..519.221 rows=4569 loops=1) > Sort Key: a3.team_id > Sort Method: quicksort Memory: 549kB > ..................... sniped > -> Materialize (cost=194238118.98..194529512.87 > rows=58278779 width=7390) (actual time=118.488..122.408 rows=4569 loops=1) > -> Sort (cost=194238118.98..194383815.92 > rows=58278779 width=7390) (actual time=118.486..119.723 rows=3687 loops=1) > Sort Key: t.id > Sort Method: quicksort Memory: 6951kB > ............................. sniped > -> Hash (cost=1917.17..1917.17 rows=7555 width=68) (actual > time=37.578..37.578 rows=7555 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 958kB > ............................ sniped > We can see the 'Hash Left Join node' spend 2645.293 ms, so I change the sql > and remove the left join node, and expect the SQL will end with 653.137 ms. > But the explain analyse output like > -> Merge Join (cost=197548498.05..314933709300.46 rows=5245599776015 > width=7478) (actual time=775.495..2744.759 rows=4568 loops=1) > Merge Cond: (a3.team_id = t.id) > -> Sort (cost=3509574.57..3554578.91 rows=18001735 width=112) > (actual time=540.636..541.392 rows=4569 loops=1) > Sort Key: a3.team_id > Sort Method: quicksort Memory: 549kB > .......................... sniped > -> Materialize (cost=194038923.48..194330317.37 rows=58278779 > width=7382) (actual time=234.632..238.065 rows=4569 loops=1) > -> Sort (cost=194038923.48..194184620.42 rows=58278779 > width=7382) (actual time=234.628..235.744 rows=3687 loops=1) > Sort Key: t.id > Sort Method: quicksort Memory: 6899kB > ......................... sniped > We can see the 'Merge Join' spend 2744.759 ms. The sniped output under the > 'Merge Join' node in the two explain output are similar, Why does this > happen? You're worrying about the wrong thing entirely. The right thing to be worrying about is why are some of those row estimates off by four orders of magnitude, and what you can do to fix that. The planner will never deliver sane plans when its estimates are so far off from reality. From my perspective, you snipped all the interesting parts of this output, because the misestimation is evidently happening somewhere down in there. regards, tom lane
Thanks for your reply. On Mon, Feb 4, 2013 at 3:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > You're worrying about the wrong thing entirely. The right thing to be > worrying about is why are some of those row estimates off by four orders > of magnitude, and what you can do to fix that. The planner will never > deliver sane plans when its estimates are so far off from reality. > > You mean the rows from explain is bigger then the rows real returned? It caused by a sub query with a filter like ( a and b ) or ( c and d ), I've tried to create a temp table for it, the explained rows and the real rows will similarly. Here is the full output, http://explain.depesz.com/s/M7oo > From my perspective, you snipped all the interesting parts of this > output, because the misestimation is evidently happening somewhere down > in there. > > regards, tom lane >