Обсуждение: Nested Loop vs Hash Join based on predicate?

Поиск
Список
Период
Сортировка

Nested Loop vs Hash Join based on predicate?

От
"Doiron, Daniel"
Дата:
I have the following queries:

EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)
select    […]
                from      f_calc_service   a11,
                                d_patient_type                a12
                where   a11.d_patient_pop_id in (336)
                         and a11.d_patient_type_id = a12.id
                         and a12.short_name = 'I'
                group by              a11.d_rate_schedule_id,
                                a11.d_payer_id,
                                a11.d_patient_pop_id,
                                a11.d_patient_type_id

And 

EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)
select     […]
                from      f_calc_service   a11,
                                d_patient_type                a12
                where   a11.d_patient_pop_id in (336)
                         and a11.d_patient_type_id = a12.id
                         and a12.short_name = 'O'
                group by              a11.d_rate_schedule_id,
                                a11.d_payer_id,
                                a11.d_patient_pop_id,
                                a11.d_patient_type_id
;  

Making this one change from short_name = ‘I’ to short_name = ‘O’ changes the query execution from 200k ms to 280ms. The first one chooses a Nested Loop, the second chooses a hash join. How do I get them both to choose the same? There are no values for d_patient_pop_id in (336) and short_name = ‘I’. 

Thanks!

Dan


Re: Nested Loop vs Hash Join based on predicate?

От
Pavel Stehule
Дата:


2016-03-16 21:23 GMT+01:00 Doiron, Daniel <DoironD@advisory.com>:
I have the following queries:

EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)
select    […]
                from      f_calc_service   a11,
                                d_patient_type                a12
                where   a11.d_patient_pop_id in (336)
                         and a11.d_patient_type_id = a12.id
                         and a12.short_name = 'I'
                group by              a11.d_rate_schedule_id,
                                a11.d_payer_id,
                                a11.d_patient_pop_id,
                                a11.d_patient_type_id

And 

EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)
select     […]
                from      f_calc_service   a11,
                                d_patient_type                a12
                where   a11.d_patient_pop_id in (336)
                         and a11.d_patient_type_id = a12.id
                         and a12.short_name = 'O'
                group by              a11.d_rate_schedule_id,
                                a11.d_payer_id,
                                a11.d_patient_pop_id,
                                a11.d_patient_type_id
;  

Making this one change from short_name = ‘I’ to short_name = ‘O’ changes the query execution from 200k ms to 280ms. The first one chooses a Nested Loop, the second chooses a hash join. How do I get them both to choose the same? There are no values for d_patient_pop_id in (336) and short_name = ‘I’. 

we don't see plans, so it is blind shot,

Probably the estimation for 'I' value is pretty underestimated - so planner choose nested loop. The reasons can be different - possible correlation inside data for example.

You can try:

0) ensure so your statistic are current - run statement ANALYZE

a) increase statistic by statement ALTER TABLE xx ALTER COLUMN yyy SET STATISTICS some number

b) penalize nested loop - statement SET enable_nestloop TO off;

Regards

Pavel
 

Thanks!

Dan