Обсуждение: Problems with hash join over nested loop

Поиск
Список
Период
Сортировка

Problems with hash join over nested loop

От
Jim Nasby
Дата:
I've been working on trying to normalize a table that's got a bunch of text fields. Normalizing the first 4 has been a
non-issue.But when I try and normalize 2 additional fields a bunch of query plans go belly-up. 

I've seen this on our old 8.4 databases as well as 9.1. I haven't been able to test on anything newer yet.

A representative query is below. Note that this is a stripped down version of something larger, so it seems a bit silly
theway it's written, but I've got others that are exhibiting the same bad behavior: 

EXPLAIN ANALYZE
select * from (
select distinct e.login AS Login, d.date1, d.date2,
  (SELECT COUNT(DISTINCT n.customer_id) FROM notes n
      inner join loans l on n.customer_id = l.customer_id
      inner join loan_tasks_committed as ltc on l.id = ltc.loan_id
       AND n.note_time BETWEEN date1 AND date2 + interval '1 day'
       AND n.activity_cd in ('help_draw')
       AND ltc.created_on between n.note_time - interval '10 minutes' and n.note_time + interval '1 day'
       AND getcust(ltc.created_by)=e.login
       where n.employee_id = e.id
     ) AS "Draw"
  FROM
    (select current_date-1 as date1, current_date-1 as date2) as d, employees e
    inner join employees_roles e_r on e.id=e_r.employee_id
  WHERE
    e_r.role_id IN (3,23) --Inbound Customer Support Operator or Customer Service Issue Rep
  GROUP BY
    Login,
    e.id,
    date1,
    date2
  ORDER BY Login) a
  WHERE "Draw" > 0
;

loan_tasks_committed is the table that's been normalized. For testing I'm swapping between tables with two
normalizationviews that boil down to: 

_by:
  SELECT lt_by.id, lt_by.lock_version, c.id AS created_by_id, c.by AS created_by, u.id AS updated_by_id, u.by AS
updated_by,lt_by.created_on, lt_by.updated_on, lt_by.loan_id, lt_by.entered_on, lt_by.acct_date, lt_by.task_amount,
lt_by.loan_task_cd,lt_by.parent_id, lt_by.type_cd, e.id AS entered_by_id, e.by AS entered_by, cl.id AS collected_by_id,
cl.byAS collected_by, lt_by.currency_cd, lt_by.committed, lt_by.loan_task_code_id 
    FROM lt_by
    LEFT JOIN by_text c ON lt_by.created_by_id = c.id
    LEFT JOIN by_text u ON lt_by.updated_by_id = u.id
    LEFT JOIN by_text e ON lt_by.entered_by_id = e.id
    LEFT JOIN by_text cl ON lt_by.collected_by_id = cl.id;

_cd:
  SELECT lt_cd.id, lt_cd.lock_version, c.id AS created_by_id, c.by AS created_by, u.id AS updated_by_id, u.by AS
updated_by,lt_cd.created_on, lt_cd.updated_on, lt_cd.loan_id, lt_cd.entered_on, lt_cd.acct_date, lt_cd.task_amount,
ltc.idAS loan_task_code_id, ltc.loan_task_code, ltc.loan_task_code AS loan_task_cd, lt_cd.parent_id, tc.id AS
loan_task_type_id,tc.loan_task_type, tc.loan_task_type AS type_cd, e.id AS entered_by_id, e.by AS entered_by, cl.id AS
collected_by_id,cl.by AS collected_by, lt_cd.currency_cd, lt_cd.committed 
    FROM lt_cd
    LEFT JOIN by_text c ON lt_cd.created_by_id = c.id
    LEFT JOIN by_text u ON lt_cd.updated_by_id = u.id
    LEFT JOIN by_text e ON lt_cd.entered_by_id = e.id
    LEFT JOIN by_text cl ON lt_cd.collected_by_id = cl.id
    LEFT JOIN lt_code ltc ON lt_cd.loan_task_code_id = ltc.id
    LEFT JOIN lt_type tc ON lt_cd.loan_task_type_id = tc.id;


As you can see, they're identical except for normalizing 2 additional fields.

The additional normalization results in a moderate amount of heap space savings:

SELECT table_name, rows, heap_size, index_size, toast_size, total_size FROM tools.space WHERE table_schema='jnasby';
     table_name    |    rows     | heap_size  | index_size | toast_size | total_size
------------------+-------------+------------+------------+------------+------------
  by_text          |      721492 | 41 MB      | 49 MB      | 40 kB      | 90 MB
  lt_code          |          42 | 8192 bytes | 32 kB      | 32 kB      | 72 kB
  lt_type          |           3 | 8192 bytes | 32 kB      | 32 kB      | 72 kB
  lt_cd            | 9.82601e+07 | 10 GB      | 15 GB      | 2832 kB    | 25 GB
  lt_by            | 9.82615e+07 | 12 GB      | 21 GB      | 3360 kB    | 33 GB

I've got full explain analyze below, but the relevant bits are:

_by:
  SubPlan 2
    ->  Aggregate  (cost=489.06..489.07 rows=1 width=4) (actual time=0.083..0.083 rows=1 loops=692)
          ->  Nested Loop  (cost=0.00..489.05 rows=1 width=4) (actual time=0.080..0.080 rows=0 loops=692)
                ->  Nested Loop  (cost=0.00..485.80 rows=10 width=8) (actual time=0.079..0.079 rows=0 loops=692)
                      Join Filter: ((jnasby.lt_by.created_on >= (n.note_time - '00:10:00'::interval)) AND
(jnasby.lt_by.created_on<= (n.note_time + '1 day'::interval))) 
                      ->  Nested Loop  (cost=0.00..383.44 rows=14 width=16) (actual time=0.054..0.059 rows=0 loops=692)
                            ->  Index Scan using notes_u1 on notes n  (cost=0.00..344.23 rows=1 width=12) (actual
time=0.053..0.057rows=0 loops=692) 
                                  Index Cond: ((employee_id = e.id) AND (note_time >= ((('now'::text)::date - 1))) AND
(note_time<= (((('now'::text)::date - 1)) + '1 day'::interval))) 
                                  Filter: ((activity_cd)::text = 'help_draw'::text)
                            ->  Index Scan using loans_m12 on loans l  (cost=0.00..38.94 rows=22 width=8) (actual
time=0.034..0.042rows=2 loops=18) 
                                  Index Cond: (customer_id = n.customer_id)
                      ->  Index Scan using lt_by__loan_id on lt_by  (cost=0.00..6.50 rows=36 width=28) (actual
time=0.043..0.477rows=60 loops=28) 
                            Index Cond: (loan_id = l.id)
                ->  Index Scan using by_text_pkey on by_text c  (cost=0.00..0.31 rows=1 width=4) (never executed)
                      Index Cond: (id = jnasby.lt_by.created_by_id)
                      Filter: ("substring"((by)::text, '^[CAE]:(.*?) I:'::text) = (e.login)::text)

_cd:
  SubPlan 2
    ->  Aggregate  (cost=3089331.15..3089331.16 rows=1 width=4) (actual time=372.589..372.589 rows=1 loops=692)
          ->  Hash Join  (cost=16560.08..3089331.14 rows=1 width=4) (actual time=372.586..372.586 rows=0 loops=692)
                Hash Cond: (jnasby.lt_cd.loan_id = l.id)
                Join Filter: ((jnasby.lt_cd.created_on >= (n.note_time - '00:10:00'::interval)) AND
(jnasby.lt_cd.created_on<= (n.note_time + '1 day'::interval))) 
                ->  Hash Join  (cost=16176.47..3087105.37 rows=491238 width=12) (actual time=14610.882..32223.008
rows=2642loops=8) 
                      Hash Cond: (jnasby.lt_cd.created_by_id = c.id)
                      ->  Seq Scan on lt_cd  (cost=0.00..2329065.44 rows=98260144 width=32) (actual
time=0.018..15123.405rows=98261588 loops=8) 
                      ->  Hash  (cost=16131.38..16131.38 rows=3607 width=4) (actual time=4519.878..4519.878 rows=20
loops=8)
                            Buckets: 1024  Batches: 1  Memory Usage: 1kB
                            ->  Seq Scan on by_text c  (cost=0.00..16131.38 rows=3607 width=4) (actual
time=4405.172..4519.861rows=20 loops=8) 
                                  Filter: ("substring"((by)::text, '^[CAE]:(.*?) I:'::text) = (e.login)::text)
                ->  Hash  (cost=383.44..383.44 rows=14 width=16) (actual time=0.055..0.055 rows=0 loops=692)
                      Buckets: 1024  Batches: 1  Memory Usage: 0kB
                      ->  Nested Loop  (cost=0.00..383.44 rows=14 width=16) (actual time=0.051..0.055 rows=0 loops=692)
                            ->  Index Scan using notes_u1 on notes n  (cost=0.00..344.23 rows=1 width=12) (actual
time=0.049..0.052rows=0 loops=692) 
                                  Index Cond: ((employee_id = e.id) AND (note_time >= ((('now'::text)::date - 1))) AND
(note_time<= (((('now'::text)::date - 1)) + '1 day'::interval))) 
                                  Filter: ((activity_cd)::text = 'help_draw'::text)
                            ->  Index Scan using loans_m12 on loans l  (cost=0.00..38.94 rows=22 width=8) (actual
time=0.033..0.040rows=2 loops=18) 
                                  Index Cond: (customer_id = n.customer_id)


I've tried disabling execution paths to get it to loop join, but no matter what it insists on seqscanning lt_cd. I've
verifiedthat simpler queries will use an index scan of loan_id (ie: WHERE loan_id < 999999). 

Again, I've got multiple queries exhibiting this problem... this is NOT directly related to this query.


_by: http://explain.depesz.com/s/LYOY

                                                                                                  QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Subquery Scan on a  (cost=843581.41..843600.81 rows=862 width=24) (actual time=64.735..64.735 rows=0 loops=1)
    ->  Unique  (cost=843581.41..843592.19 rows=862 width=20) (actual time=64.735..64.735 rows=0 loops=1)
          ->  Sort  (cost=843581.41..843583.57 rows=862 width=20) (actual time=64.733..64.733 rows=0 loops=1)
                Sort Key: e.login, ((('now'::text)::date - 1)), ((('now'::text)::date - 1)), ((SubPlan 1))
                Sort Method: quicksort  Memory: 25kB
                ->  HashAggregate  (cost=380.35..843539.38 rows=862 width=20) (actual time=64.726..64.726 rows=0
loops=1)
                      Filter: ((SubPlan 2) > 0)
                      ->  Hash Join  (cost=305.29..371.73 rows=862 width=20) (actual time=4.765..5.866 rows=895
loops=1)
                            Hash Cond: (e_r.employee_id = e.id)
                            ->  Nested Loop  (cost=15.18..67.61 rows=862 width=12) (actual time=0.228..0.976 rows=895
loops=1)
                                  ->  Result  (cost=0.00..0.03 rows=1 width=0) (actual time=0.009..0.009 rows=1
loops=1)
                                  ->  Bitmap Heap Scan on employees_roles e_r  (cost=15.18..58.96 rows=862 width=4)
(actualtime=0.214..0.782 rows=895 loops=1) 
                                        Recheck Cond: (role_id = ANY ('{3,23}'::integer[]))
                                        ->  Bitmap Index Scan on employees_roles_m1  (cost=0.00..14.97 rows=862
width=0)(actual time=0.195..0.195 rows=895 loops=1) 
                                              Index Cond: (role_id = ANY ('{3,23}'::integer[]))
                            ->  Hash  (cost=247.27..247.27 rows=3427 width=12) (actual time=4.521..4.521 rows=3427
loops=1)
                                  Buckets: 1024  Batches: 1  Memory Usage: 154kB
                                  ->  Seq Scan on employees e  (cost=0.00..247.27 rows=3427 width=12) (actual
time=0.021..3.595rows=3427 loops=1) 
                      SubPlan 1
                        ->  Aggregate  (cost=489.06..489.07 rows=1 width=4) (never executed)
                              ->  Nested Loop  (cost=0.00..489.05 rows=1 width=4) (never executed)
                                    ->  Nested Loop  (cost=0.00..485.80 rows=10 width=8) (never executed)
                                          Join Filter: ((jnasby.lt_by.created_on >= (n.note_time -
'00:10:00'::interval))AND (jnasby.lt_by.created_on <= (n.note_time + '1 day'::interval))) 
                                          ->  Nested Loop  (cost=0.00..383.44 rows=14 width=16) (never executed)
                                                ->  Index Scan using notes_u1 on notes n  (cost=0.00..344.23 rows=1
width=12)(never executed) 
                                                      Index Cond: ((employee_id = e.id) AND (note_time >=
((('now'::text)::date- 1))) AND (note_time <= (((('now'::text)::date - 1)) + '1 day'::interval))) 
                                                      Filter: ((activity_cd)::text = 'help_draw'::text)
                                                ->  Index Scan using loans_m12 on loans l  (cost=0.00..38.94 rows=22
width=8)(never executed) 
                                                      Index Cond: (customer_id = n.customer_id)
                                          ->  Index Scan using lt_by__loan_id on lt_by  (cost=0.00..6.50 rows=36
width=28)(never executed) 
                                                Index Cond: (loan_id = l.id)
                                    ->  Index Scan using by_text_pkey on by_text c  (cost=0.00..0.31 rows=1 width=4)
(neverexecuted) 
                                          Index Cond: (id = jnasby.lt_by.created_by_id)
                                          Filter: ("substring"((by)::text, '^[CAE]:(.*?) I:'::text) = (e.login)::text)
                      SubPlan 2
                        ->  Aggregate  (cost=489.06..489.07 rows=1 width=4) (actual time=0.083..0.083 rows=1 loops=692)
                              ->  Nested Loop  (cost=0.00..489.05 rows=1 width=4) (actual time=0.080..0.080 rows=0
loops=692)
                                    ->  Nested Loop  (cost=0.00..485.80 rows=10 width=8) (actual time=0.079..0.079
rows=0loops=692) 
                                          Join Filter: ((jnasby.lt_by.created_on >= (n.note_time -
'00:10:00'::interval))AND (jnasby.lt_by.created_on <= (n.note_time + '1 day'::interval))) 
                                          ->  Nested Loop  (cost=0.00..383.44 rows=14 width=16) (actual
time=0.054..0.059rows=0 loops=692) 
                                                ->  Index Scan using notes_u1 on notes n  (cost=0.00..344.23 rows=1
width=12)(actual time=0.053..0.057 rows=0 loops=692) 
                                                      Index Cond: ((employee_id = e.id) AND (note_time >=
((('now'::text)::date- 1))) AND (note_time <= (((('now'::text)::date - 1)) + '1 day'::interval))) 
                                                      Filter: ((activity_cd)::text = 'help_draw'::text)
                                                ->  Index Scan using loans_m12 on loans l  (cost=0.00..38.94 rows=22
width=8)(actual time=0.034..0.042 rows=2 loops=18) 
                                                      Index Cond: (customer_id = n.customer_id)
                                          ->  Index Scan using lt_by__loan_id on lt_by  (cost=0.00..6.50 rows=36
width=28)(actual time=0.043..0.477 rows=60 loops=28) 
                                                Index Cond: (loan_id = l.id)
                                    ->  Index Scan using by_text_pkey on by_text c  (cost=0.00..0.31 rows=1 width=4)
(neverexecuted) 
                                          Index Cond: (id = jnasby.lt_by.created_by_id)
                                          Filter: ("substring"((by)::text, '^[CAE]:(.*?) I:'::text) = (e.login)::text)
  Total runtime: 65.070 ms
(51 rows)

_cd: http://explain.depesz.com/s/fkc
                                                                                                  QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Subquery Scan on a  (cost=5326007345.03..5326007364.42 rows=862 width=24) (actual time=257838.254..257838.254 rows=0
loops=1)
    ->  Unique  (cost=5326007345.03..5326007355.80 rows=862 width=20) (actual time=257838.254..257838.254 rows=0
loops=1)
          ->  Sort  (cost=5326007345.03..5326007347.18 rows=862 width=20) (actual time=257838.254..257838.254 rows=0
loops=1)
                Sort Key: e.login, ((('now'::text)::date - 1)), ((('now'::text)::date - 1)), ((SubPlan 1))
                Sort Method: quicksort  Memory: 25kB
                ->  HashAggregate  (cost=380.35..5326007303.00 rows=862 width=20) (actual time=257838.246..257838.246
rows=0loops=1) 
                      Filter: ((SubPlan 2) > 0)
                      ->  Hash Join  (cost=305.29..371.73 rows=862 width=20) (actual time=3.916..4.789 rows=895
loops=1)
                            Hash Cond: (e_r.employee_id = e.id)
                            ->  Nested Loop  (cost=15.18..67.61 rows=862 width=12) (actual time=0.208..0.788 rows=895
loops=1)
                                  ->  Result  (cost=0.00..0.03 rows=1 width=0) (actual time=0.006..0.006 rows=1
loops=1)
                                  ->  Bitmap Heap Scan on employees_roles e_r  (cost=15.18..58.96 rows=862 width=4)
(actualtime=0.191..0.626 rows=895 loops=1) 
                                        Recheck Cond: (role_id = ANY ('{3,23}'::integer[]))
                                        ->  Bitmap Index Scan on employees_roles_m1  (cost=0.00..14.97 rows=862
width=0)(actual time=0.173..0.173 rows=895 loops=1) 
                                              Index Cond: (role_id = ANY ('{3,23}'::integer[]))
                            ->  Hash  (cost=247.27..247.27 rows=3427 width=12) (actual time=3.696..3.696 rows=3427
loops=1)
                                  Buckets: 1024  Batches: 1  Memory Usage: 154kB
                                  ->  Seq Scan on employees e  (cost=0.00..247.27 rows=3427 width=12) (actual
time=0.012..2.967rows=3427 loops=1) 
                      SubPlan 1
                        ->  Aggregate  (cost=3089331.15..3089331.16 rows=1 width=4) (never executed)
                              ->  Hash Join  (cost=16560.08..3089331.14 rows=1 width=4) (never executed)
                                    Hash Cond: (jnasby.lt_cd.loan_id = l.id)
                                    Join Filter: ((jnasby.lt_cd.created_on >= (n.note_time - '00:10:00'::interval)) AND
(jnasby.lt_cd.created_on<= (n.note_time + '1 day'::interval))) 
                                    ->  Hash Join  (cost=16176.47..3087105.37 rows=491238 width=12) (never executed)
                                          Hash Cond: (jnasby.lt_cd.created_by_id = c.id)
                                          ->  Seq Scan on lt_cd  (cost=0.00..2329065.44 rows=98260144 width=32) (never
executed)
                                          ->  Hash  (cost=16131.38..16131.38 rows=3607 width=4) (never executed)
                                                ->  Seq Scan on by_text c  (cost=0.00..16131.38 rows=3607 width=4)
(neverexecuted) 
                                                      Filter: ("substring"((by)::text, '^[CAE]:(.*?) I:'::text) =
(e.login)::text)
                                    ->  Hash  (cost=383.44..383.44 rows=14 width=16) (never executed)
                                          ->  Nested Loop  (cost=0.00..383.44 rows=14 width=16) (never executed)
                                                ->  Index Scan using notes_u1 on notes n  (cost=0.00..344.23 rows=1
width=12)(never executed) 
                                                      Index Cond: ((employee_id = e.id) AND (note_time >=
((('now'::text)::date- 1))) AND (note_time <= (((('now'::text)::date - 1)) + '1 day'::interval))) 
                                                      Filter: ((activity_cd)::text = 'help_draw'::text)
                                                ->  Index Scan using loans_m12 on loans l  (cost=0.00..38.94 rows=22
width=8)(never executed) 
                                                      Index Cond: (customer_id = n.customer_id)
                      SubPlan 2
                        ->  Aggregate  (cost=3089331.15..3089331.16 rows=1 width=4) (actual time=372.589..372.589
rows=1loops=692) 
                              ->  Hash Join  (cost=16560.08..3089331.14 rows=1 width=4) (actual time=372.586..372.586
rows=0loops=692) 
                                    Hash Cond: (jnasby.lt_cd.loan_id = l.id)
                                    Join Filter: ((jnasby.lt_cd.created_on >= (n.note_time - '00:10:00'::interval)) AND
(jnasby.lt_cd.created_on<= (n.note_time + '1 day'::interval))) 
                                    ->  Hash Join  (cost=16176.47..3087105.37 rows=491238 width=12) (actual
time=14610.882..32223.008rows=2642 loops=8) 
                                          Hash Cond: (jnasby.lt_cd.created_by_id = c.id)
                                          ->  Seq Scan on lt_cd  (cost=0.00..2329065.44 rows=98260144 width=32) (actual
time=0.018..15123.405rows=98261588 loops=8) 
                                          ->  Hash  (cost=16131.38..16131.38 rows=3607 width=4) (actual
time=4519.878..4519.878rows=20 loops=8) 
                                                Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                                ->  Seq Scan on by_text c  (cost=0.00..16131.38 rows=3607 width=4)
(actualtime=4405.172..4519.861 rows=20 loops=8) 
                                                      Filter: ("substring"((by)::text, '^[CAE]:(.*?) I:'::text) =
(e.login)::text)
                                    ->  Hash  (cost=383.44..383.44 rows=14 width=16) (actual time=0.055..0.055 rows=0
loops=692)
                                          Buckets: 1024  Batches: 1  Memory Usage: 0kB
                                          ->  Nested Loop  (cost=0.00..383.44 rows=14 width=16) (actual
time=0.051..0.055rows=0 loops=692) 
                                                ->  Index Scan using notes_u1 on notes n  (cost=0.00..344.23 rows=1
width=12)(actual time=0.049..0.052 rows=0 loops=692) 
                                                      Index Cond: ((employee_id = e.id) AND (note_time >=
((('now'::text)::date- 1))) AND (note_time <= (((('now'::text)::date - 1)) + '1 day'::interval))) 
                                                      Filter: ((activity_cd)::text = 'help_draw'::text)
                                                ->  Index Scan using loans_m12 on loans l  (cost=0.00..38.94 rows=22
width=8)(actual time=0.033..0.040 rows=2 loops=18) 
                                                      Index Cond: (customer_id = n.customer_id)
  Total runtime: 257838.571 ms
(57 rows)
--
Jim Nasby, Lead Data Architect   (512) 569-9461


Re: Problems with hash join over nested loop

От
Tom Lane
Дата:
Jim Nasby <jnasby@enova.com> writes:
> I've been working on trying to normalize a table that's got a bunch of text fields. Normalizing the first 4 has been
anon-issue. But when I try and normalize 2 additional fields a bunch of query plans go belly-up. 

Try increasing join_collapse_limit/from_collapse_limit.  I'm a bit
confused by your description but I think maybe you've got more than 8
relations in the subqueries.

            regards, tom lane


Re: Problems with hash join over nested loop

От
Jim Nasby
Дата:
On 10/28/13 6:13 PM, Tom Lane wrote:
> Jim Nasby <jnasby@enova.com> writes:
>> I've been working on trying to normalize a table that's got a bunch of text fields. Normalizing the first 4 has been
anon-issue. But when I try and normalize 2 additional fields a bunch of query plans go belly-up. 
>
> Try increasing join_collapse_limit/from_collapse_limit.  I'm a bit
> confused by your description but I think maybe you've got more than 8
> relations in the subqueries.

Hell, never thought about that. Bumping it up did the trick. Thanks!
--
Jim Nasby, Lead Data Architect   (512) 569-9461


Re: Problems with hash join over nested loop

От
Merlin Moncure
Дата:
On Mon, Oct 28, 2013 at 6:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jim Nasby <jnasby@enova.com> writes:
>> I've been working on trying to normalize a table that's got a bunch of text fields. Normalizing the first 4 has been
anon-issue. But when I try and normalize 2 additional fields a bunch of query plans go belly-up. 
>
> Try increasing join_collapse_limit/from_collapse_limit.  I'm a bit
> confused by your description but I think maybe you've got more than 8
> relations in the subqueries.

Hm -- wondering out loud if there would be any value in terms of
decorating explain output when that limit was hit and if it's
practical to do so...

merlni


Re: Problems with hash join over nested loop

От
Jim Nasby
Дата:
On 10/29/13 9:10 AM, Merlin Moncure wrote:
> On Mon, Oct 28, 2013 at 6:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Jim Nasby <jnasby@enova.com> writes:
>>> I've been working on trying to normalize a table that's got a bunch of text fields. Normalizing the first 4 has
beena non-issue. But when I try and normalize 2 additional fields a bunch of query plans go belly-up. 
>>
>> Try increasing join_collapse_limit/from_collapse_limit.  I'm a bit
>> confused by your description but I think maybe you've got more than 8
>> relations in the subqueries.
>
> Hm -- wondering out loud if there would be any value in terms of
> decorating explain output when that limit was hit and if it's
> practical to do so...

I think the community would *love* any method of noting potential performance problems. Hitting the GEQO limit fits in
thereas well. We could eventually warn about other things as well, like going just over work_mem or seqscanning a big
tablefor a small number of rows. 

I'm also wondering if it's time to raise those limits. I constructed a somewhat contrived test query in our schema to
testthis. This is a legitimate join path for our schema... I can't see why someone would use the *full* path, but
smallersections are definitely in use. It's basically all joins, with one simple filter on top of that. 

I'd rather not share the actual query or plan, but:

grep -i scan temp.txt |wc -l
28

All tests done via EXPLAIN ... in psql with \timing turned on. I ignored obvious outliers... margin of error is ~5%
fromwhat I saw: 

Default config:            21ms
geqo = off:            19ms
geqo off, from_collapse = 99:    19ms
from_collapse_limit = 99:    21ms
join_collapse_limit = 99:    171ms
both = 99:            176ms
geqo off, join_collapse = 99    1.2s
both + geqo = off:        1.2s

Obviously there's cases where 1.2 seconds of planning time will kill you... but if you're that time sensitive and using
28tables I think it's reasonable to expect people to do some hand tuning! :) 

Conversely, where you are likely to get to that sheer number of tables is when you're doing something that's going to
takea non-trivial amount of time to execute. In this particular case, if I limit the query to a single row (via blah_id
=2, not via limit), it takes ~2ms to execute when cached with full optimization (interestingly, planning time was at
about926ms at that point). 

Now that looks horrible... 926ms to plan a query that takes 2ms to return. But I'm not even going to bother with the
20msplan, because it's going to take minutes if not HOURS to run (it's just full scanning everything it can find). 
--
Jim Nasby, Lead Data Architect   (512) 569-9461


Re: Problems with hash join over nested loop

От
Tom Lane
Дата:
Jim Nasby <jnasby@enova.com> writes:
> I'm also wondering if it's time to raise those limits.

Yeah, possibly.  The current default values were set on machines much
smaller/slower than most current hardware.

I think also that the collapse limits were invented mainly to keep people
out of GEQO's clutches, but we've made some significant fixes in GEQO
since then.  Maybe the real answer is to make the default collapse limits
much higher, and lower geqo_threshold to whatever we think the threshold
of pain is for applying the regular planner.

            regards, tom lane


Re: Problems with hash join over nested loop

От
Jim Nasby
Дата:
On 10/29/13 11:45 AM, Tom Lane wrote:
> Jim Nasby <jnasby@enova.com> writes:
>> I'm also wondering if it's time to raise those limits.
>
> Yeah, possibly.  The current default values were set on machines much
> smaller/slower than most current hardware.
>
> I think also that the collapse limits were invented mainly to keep people
> out of GEQO's clutches, but we've made some significant fixes in GEQO
> since then.  Maybe the real answer is to make the default collapse limits
> much higher, and lower geqo_threshold to whatever we think the threshold
> of pain is for applying the regular planner.

In my test case geqo does seem to do a good job. I'll see if I can get some data on how number of relations affects
planningtime... I don't get much of a warm fuzzy about lowering geqo... 
--
Jim Nasby, Lead Data Architect   (512) 569-9461


Re: Problems with hash join over nested loop

От
Tom Lane
Дата:
Jim Nasby <jnasby@enova.com> writes:
> On 10/29/13 11:45 AM, Tom Lane wrote:
>> Jim Nasby <jnasby@enova.com> writes:
>>> I'm also wondering if it's time to raise those limits.

>> Yeah, possibly.  The current default values were set on machines much
>> smaller/slower than most current hardware.
>>
>> I think also that the collapse limits were invented mainly to keep people
>> out of GEQO's clutches, but we've made some significant fixes in GEQO
>> since then.  Maybe the real answer is to make the default collapse limits
>> much higher, and lower geqo_threshold to whatever we think the threshold
>> of pain is for applying the regular planner.

> In my test case geqo does seem to do a good job. I'll see if I can get some data on how number of relations affects
planningtime... I don't get much of a warm fuzzy about lowering geqo... 

Yeah, it's probably not that simple.  A trawl through the archives
reminded me that we've discussed this quite a bit in the past already.
The collapse limits are important for the regular planner not only to
limit runtime but also to limit planner memory consumption; moreover,
GEQO doesn't behave all that well either with very large join problems.
These facts killed a proposal back in 2009 to remove the collapse limits
altogether.  There was also some discussion in 2011, see thread here:
http://www.postgresql.org/message-id/BANLkTin4ncKMg+bEixi1WB1RJPrZ5dVRgQ@mail.gmail.com
but the general feeling seemed to be that we needed more planner
infrastructure work first.  In particular it seems like the best way
forward might require limiting subproblem size using something more
sophisticated than just "number of relations".

            regards, tom lane


Re: Problems with hash join over nested loop

От
Jim Nasby
Дата:
On 10/29/13 1:20 PM, Tom Lane wrote:
> Jim Nasby <jnasby@enova.com> writes:
>> On 10/29/13 11:45 AM, Tom Lane wrote:
>>> Jim Nasby <jnasby@enova.com> writes:
>>>> I'm also wondering if it's time to raise those limits.
>
>>> Yeah, possibly.  The current default values were set on machines much
>>> smaller/slower than most current hardware.
>>>
>>> I think also that the collapse limits were invented mainly to keep people
>>> out of GEQO's clutches, but we've made some significant fixes in GEQO
>>> since then.  Maybe the real answer is to make the default collapse limits
>>> much higher, and lower geqo_threshold to whatever we think the threshold
>>> of pain is for applying the regular planner.
>
>> In my test case geqo does seem to do a good job. I'll see if I can get some data on how number of relations affects
planningtime... I don't get much of a warm fuzzy about lowering geqo... 
>
> Yeah, it's probably not that simple.  A trawl through the archives
> reminded me that we've discussed this quite a bit in the past already.
> The collapse limits are important for the regular planner not only to
> limit runtime but also to limit planner memory consumption; moreover,
> GEQO doesn't behave all that well either with very large join problems.
> These facts killed a proposal back in 2009 to remove the collapse limits
> altogether.  There was also some discussion in 2011, see thread here:
> http://www.postgresql.org/message-id/BANLkTin4ncKMg+bEixi1WB1RJPrZ5dVRgQ@mail.gmail.com
> but the general feeling seemed to be that we needed more planner
> infrastructure work first.  In particular it seems like the best way
> forward might require limiting subproblem size using something more
> sophisticated than just "number of relations".

Yeah, I saw one mention of 1GB... that's a bit disconcerting.

Is there a way to measure memory consumption during planning, short of something like strace? (I've got no dev tools
availableon our servers.) 
--
Jim Nasby, Lead Data Architect   (512) 569-9461


Re: Problems with hash join over nested loop

От
Tom Lane
Дата:
Jim Nasby <jnasby@enova.com> writes:
> Is there a way to measure memory consumption during planning, short of something like strace? (I've got no dev tools
availableon our servers.) 

Nothing built-in, I'm pretty sure.  You could probably add some
instrumentation, but that would require running modified executables ...

            regards, tom lane


Re: Problems with hash join over nested loop

От
Jim Nasby
Дата:
On 10/29/13 3:36 PM, Tom Lane wrote:
> Jim Nasby <jnasby@enova.com> writes:
>> Is there a way to measure memory consumption during planning, short of something like strace? (I've got no dev tools
availableon our servers.) 
>
> Nothing built-in, I'm pretty sure.  You could probably add some
> instrumentation, but that would require running modified executables ...

FYI, client_min_messages = debug5 and log_planner_stats = on is useful, though I wish it included ru_maxrss (see
http://www.gnu.org/software/libc/manual/html_node/Resource-Usage.html).
--
Jim Nasby, Lead Data Architect   (512) 569-9461


Re: Problems with hash join over nested loop

От
Jim Nasby
Дата:
On 10/30/13 5:27 PM, Jim Nasby wrote:
> On 10/29/13 3:36 PM, Tom Lane wrote:
>> Jim Nasby <jnasby@enova.com> writes:
>>> Is there a way to measure memory consumption during planning, short of something like strace? (I've got no dev
toolsavailable on our servers.) 
>>
>> Nothing built-in, I'm pretty sure.  You could probably add some
>> instrumentation, but that would require running modified executables ...
>
> FYI, client_min_messages = debug5 and log_planner_stats = on is useful, though I wish it included ru_maxrss (see
http://www.gnu.org/software/libc/manual/html_node/Resource-Usage.html).

Oh, and in my 28 table case ru_minflt was 428 4k memory pages (1.7MB). Not a great measurement, but better than
nothing.I didn't detect anything noticeable on vmstat either, so I don't think the consumption is huge (an email in the
olderthread mentioned 1GB... I'm not seeing that). 
--
Jim Nasby, Lead Data Architect   (512) 569-9461


Re: Problems with hash join over nested loop

От
Tom Lane
Дата:
Jim Nasby <jnasby@enova.com> writes:
> Oh, and in my 28 table case ru_minflt was 428 4k memory pages (1.7MB). Not a great measurement, but better than
nothing.I didn't detect anything noticeable on vmstat either, so I don't think the consumption is huge (an email in the
olderthread mentioned 1GB... I'm not seeing that). 

Note that what matters here is not so much the number of base relations as
the number of possible join paths, which varies quite a lot depending on
the given and deduced join conditions.  Worst-case is where join
conditions are available between any two base relations, which is not
a real common case.

            regards, tom lane