Обсуждение: Very bad plan when using VIEW and IN (SELECT...*)

Поиск
Список
Период
Сортировка

Very bad plan when using VIEW and IN (SELECT...*)

От
"Carlo Stonebanks"
Дата:
Ref these two queries against a view:

-- QUERY 1, executes < 0.5 secs
SELECT *
FROM mdx_core.vw_provider AS p
WHERE provider_id IN (13083101)

-- QUERY 2, executes > 13.5 secs
SELECT *
FROM mdx_core.vw_provider AS p
WHERE provider_id IN (SELECT 13083101)

I am using the simple IN (SELECT n) to simplify the problem. I noticed the
oddity of the behaviour when I used a proper SELECT myId FROM myTable.

Plans are below. The view has an internal UNION.
Any explanation as to why this happens?

The actualt view is listed at the very bottom, if relevant.

Carlo



QUERY 1 PLAN
"Unique  (cost=25.48..25.69 rows=2 width=417) (actual time=0.180..0.190
rows=2 loops=1)"
"  ->  Sort  (cost=25.48..25.48 rows=2 width=417) (actual time=0.179..0.180
rows=2 loops=1)"
"        Sort Key: "*SELECT* 1".provider_id, (NULL::integer), "*SELECT*
1".master_id, "*SELECT* 1".client_ids, "*SELECT* 1".upin, "*SELECT*
1".medical_education_number, "*SELECT* 1".abmsuid, "*SELECT* 1".npi,
"*SELECT* 1".npi_status_code, "*SELECT* 1".cc_id, "*SELECT* 1".aoa_id,
"*SELECT* 1".last_name, "*SELECT* 1".first_name, "*SELECT* 1".middle_name,
"*SELECT* 1".suffix, "*SELECT* 1".display_name, "*SELECT* 1".display_title,
"*SELECT* 1".nickname, "*SELECT* 1".familiar_name, "*SELECT* 1".pubmed_name,
"*SELECT* 1".master_name, "*SELECT* 1".display_name_orig, (NULL::text),
"*SELECT* 1".gender, "*SELECT* 1".birth_year, "*SELECT* 1".birth_month,
"*SELECT* 1".birth_day, "*SELECT* 1".clinical_interest, "*SELECT*
1".research_interest, "*SELECT* 1".summary, "*SELECT* 1".comments, "*SELECT*
1".degree_types, "*SELECT* 1".provider_type_ids, "*SELECT*
1".provider_status_code, "*SELECT* 1".provider_status_year, "*SELECT*
1".created, "*SELECT* 1".unique_flag, "*SELECT* 1".is_locked, "*SELECT*
1".provider_standing_code, "*SELECT* 1".impt_source_date, "*SELECT*
1".input_resource_id, "*SELECT* 1".input_source_ids"
"        Sort Method:  quicksort  Memory: 27kB"
"        ->  Append  (cost=0.00..25.47 rows=2 width=417) (actual
time=0.078..0.143 rows=2 loops=1)"
"              ->  Subquery Scan "*SELECT* 1"  (cost=0.00..8.59 rows=1
width=408) (actual time=0.078..0.079 rows=1 loops=1)"
"                    ->  Index Scan using provider_provider_id_idx on
provider p  (cost=0.00..8.58 rows=1 width=408) (actual time=0.076..0.077
rows=1 loops=1)"
"                          Index Cond: (provider_id = 13083101)"
"              ->  Subquery Scan "*SELECT* 2"  (cost=0.00..16.87 rows=1
width=417) (actual time=0.061..0.062 rows=1 loops=1)"
"                    ->  Nested Loop  (cost=0.00..16.86 rows=1 width=417)
(actual time=0.055..0.056 rows=1 loops=1)"
"                          ->  Index Scan using
provider_name_pid_rec_stat_idx on provider_alias pa  (cost=0.00..8.27 rows=1
width=32) (actual time=0.047..0.047 rows=1 loops=1)"
"                                Index Cond: (provider_id = 13083101)"
"                          ->  Index Scan using provider_provider_id_idx on
provider p  (cost=0.00..8.58 rows=1 width=389) (actual time=0.005..0.006
rows=1 loops=1)"
"                                Index Cond: (p.provider_id = 13083101)"
"Total runtime: 0.371 ms"

QUERY 2 PLAN
"Merge IN Join  (cost=2421241.80..3142039.99 rows=30011 width=2032) (actual
time=13778.400..13778.411 rows=2 loops=1)"
"  Merge Cond: ("*SELECT* 1".provider_id = (13083101))"
"  ->  Unique  (cost=2421241.77..3066486.33 rows=6002275 width=417) (actual
time=13778.119..13778.372 rows=110 loops=1)"
"        ->  Sort  (cost=2421241.77..2436247.46 rows=6002275 width=417)
(actual time=13778.118..13778.163 rows=110 loops=1)"
"              Sort Key: "*SELECT* 1".provider_id, (NULL::integer),
"*SELECT* 1".master_id, "*SELECT* 1".client_ids, "*SELECT* 1".upin,
"*SELECT* 1".medical_education_number, "*SELECT* 1".abmsuid, "*SELECT*
1".npi, "*SELECT* 1".npi_status_code, "*SELECT* 1".cc_id, "*SELECT*
1".aoa_id, "*SELECT* 1".last_name, "*SELECT* 1".first_name, "*SELECT*
1".middle_name, "*SELECT* 1".suffix, "*SELECT* 1".display_name, "*SELECT*
1".display_title, "*SELECT* 1".nickname, "*SELECT* 1".familiar_name,
"*SELECT* 1".pubmed_name, "*SELECT* 1".master_name, "*SELECT*
1".display_name_orig, (NULL::text), "*SELECT* 1".gender, "*SELECT*
1".birth_year, "*SELECT* 1".birth_month, "*SELECT* 1".birth_day, "*SELECT*
1".clinical_interest, "*SELECT* 1".research_interest, "*SELECT* 1".summary,
"*SELECT* 1".comments, "*SELECT* 1".degree_types, "*SELECT*
1".provider_type_ids, "*SELECT* 1".provider_status_code, "*SELECT*
1".provider_status_year, "*SELECT* 1".created, "*SELECT* 1".unique_flag,
"*SELECT* 1".is_locked, "*SELECT* 1".provider_standing_code, "*SELECT*
1".impt_source_date, "*SELECT* 1".input_resource_id, "*SELECT*
1".input_source_ids"
"              Sort Method:  external merge  Disk: 423352kB"
"              ->  Append  (cost=0.00..596598.30 rows=6002275 width=417)
(actual time=0.039..7879.715 rows=1312637 loops=1)"
"                    ->  Subquery Scan "*SELECT* 1"  (cost=0.00..543238.96
rows=5994998 width=408) (actual time=0.039..7473.664 rows=1305360 loops=1)"
"                          ->  Seq Scan on provider p  (cost=0.00..483288.98
rows=5994998 width=408) (actual time=0.037..6215.112 rows=1305360 loops=1)"
"                    ->  Subquery Scan "*SELECT* 2"  (cost=0.00..53359.34
rows=7277 width=417) (actual time=0.049..186.643 rows=7277 loops=1)"
"                          ->  Nested Loop  (cost=0.00..53286.57 rows=7277
width=417) (actual time=0.043..176.134 rows=7277 loops=1)"
"                                ->  Seq Scan on provider_alias pa
(cost=0.00..157.77 rows=7277 width=32) (actual time=0.018..3.134 rows=7277
loops=1)"
"                                ->  Index Scan using
provider_provider_id_idx on provider p  (cost=0.00..7.29 rows=1 width=389)
(actual time=0.021..0.021 rows=1 loops=7277)"
"                                      Index Cond: (p.provider_id =
pa.provider_id)"
"  ->  Sort  (cost=0.03..0.04 rows=1 width=4) (actual time=0.014..0.014
rows=1 loops=1)"
"        Sort Key: (13083101)"
"        Sort Method:  quicksort  Memory: 25kB"
"        ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.001..0.001 rows=1 loops=1)"
"Total runtime: 13959.905 ms"


REATE OR REPLACE VIEW mdx_core.vw_provider AS
SELECT
   p.provider_id,
   NULL AS provider_alias_id,
   p.master_id,
   p.client_ids,
   p.upin,
   p.medical_education_number,
   p.abmsuid,
   p.npi,
   p.npi_status_code,
   p.cc_id,
   p.aoa_id,
   p.last_name,
   p.first_name,
   p.middle_name,
   p.suffix,
   p.display_name,
   p.display_title,
   p.nickname,
   p.familiar_name,
   p.pubmed_name,
   p.master_name,
   p.display_name_orig,
   NULL::text AS is_primary,
   p.gender,
   p.birth_year,
   p.birth_month,
   p.birth_day,
   p.clinical_interest,
   p.research_interest,
   p.summary,
   p.comments,
   p.degree_types,
   p.provider_type_ids,
   p.provider_status_code,
   p.provider_status_year,
   p.created,
   p.unique_flag,
   p.is_locked,
   p.provider_standing_code,
   p.impt_source_date,
   p.input_resource_id,
   p.input_source_ids
FROM mdx_core.provider AS p

UNION SELECT
   p.provider_id,
   pa.provider_alias_id,
   p.master_id,
   p.client_ids,
   p.upin,
   p.medical_education_number,
   p.abmsuid,
   p.npi,
   p.npi_status_code,
   p.cc_id,
   p.aoa_id,
   pa.last_name,
   pa.first_name,
   pa.middle_name,
   pa.suffix,
   p.display_name,
   p.display_title,
   p.nickname,
   p.familiar_name,
   p.pubmed_name,
   p.master_name,
   p.display_name_orig,
   pa.is_primary,
   p.gender,
   p.birth_year,
   p.birth_month,
   p.birth_day,
   p.clinical_interest,
   p.research_interest,
   p.summary,
   p.comments,
   p.degree_types,
   p.provider_type_ids,
   p.provider_status_code,
   p.provider_status_year,
   p.created,
   p.unique_flag,
   p.is_locked,
   p.provider_standing_code,
   p.impt_source_date,
   p.input_resource_id,
   p.input_source_ids
FROM mdx_core.provider_alias AS pa
JOIN mdx_core.provider AS p USING (provider_id);


Re: Very bad plan when using VIEW and IN (SELECT...*)

От
"Carlo Stonebanks"
Дата:
Oops, my bad - this should be in PG PERFORM! Posting it there...

Re: Very bad plan when using VIEW and IN (SELECT...*)

От
Tom Lane
Дата:
"Carlo Stonebanks" <stonec.register@sympatico.ca> writes:
> Ref these two queries against a view:
> -- QUERY 1, executes < 0.5 secs
> SELECT *
> FROM mdx_core.vw_provider AS p
> WHERE provider_id IN (13083101)

> -- QUERY 2, executes > 13.5 secs
> SELECT *
> FROM mdx_core.vw_provider AS p
> WHERE provider_id IN (SELECT 13083101)

> I am using the simple IN (SELECT n) to simplify the problem. I noticed the
> oddity of the behaviour when I used a proper SELECT myId FROM myTable.

foo IN (SELECT ...), in general, is a join.  There's no particular
reason to expect it to give the same result as foo IN (constant).

Having said that, modern versions of the planner seem to deal reasonably
well with this situation as long as they're being asked to push the
condition through a UNION ALL.  Do you really need a UNION in that view?
That's going to cost you plenty in a lot of cases not only this one.

            regards, tom lane

Re: Very bad plan when using VIEW and IN (SELECT...*)

От
"Carlo Stonebanks"
Дата:
Taken from your advice, I gave this a try:

SELECT *
FROM mdx_core.vw_provider AS p
WHERE provider_id = ANY array(
   SELECT provider_id
   FROM mdx_core.provider_alias
   LIMIT 10
)

Well, it RIPS through the query in 15ms compared to 13.0 secs for the In
(SELECT...)


>> Having said that, modern versions of the planner seem to deal reasonably
well with this situation as long as they're being asked to push the
condition through a UNION ALL.  Do you really need a UNION in that view?
<<

This is PG v 8.3 - do you mean "modern" compared to that?

And if you mean UNION vs. UNION ALL, I think in this case UNION ALL would
do.

Carlo

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: August 12, 2010 6:48 PM
To: Carlo Stonebanks
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Very bad plan when using VIEW and IN (SELECT...*)

"Carlo Stonebanks" <stonec.register@sympatico.ca> writes:
> Ref these two queries against a view:
> -- QUERY 1, executes < 0.5 secs
> SELECT *
> FROM mdx_core.vw_provider AS p
> WHERE provider_id IN (13083101)

> -- QUERY 2, executes > 13.5 secs
> SELECT *
> FROM mdx_core.vw_provider AS p
> WHERE provider_id IN (SELECT 13083101)

> I am using the simple IN (SELECT n) to simplify the problem. I noticed the

> oddity of the behaviour when I used a proper SELECT myId FROM myTable.

foo IN (SELECT ...), in general, is a join.  There's no particular
reason to expect it to give the same result as foo IN (constant).

Having said that, modern versions of the planner seem to deal reasonably
well with this situation as long as they're being asked to push the
condition through a UNION ALL.  Do you really need a UNION in that view?
That's going to cost you plenty in a lot of cases not only this one.

            regards, tom lane


Re: Very bad plan when using VIEW and IN (SELECT...*)

От
Tom Lane
Дата:
"Carlo Stonebanks" <stonec.register@sympatico.ca> writes:
>>> Having said that, modern versions of the planner seem to deal reasonably
>>> well with this situation as long as they're being asked to push the
>>> condition through a UNION ALL.  Do you really need a UNION in that view?

> This is PG v 8.3 - do you mean "modern" compared to that?

When I was testing it yesterday it seemed that versions back to 8.2
generated acceptable plans, but the results might depend on details you
didn't show us.  Try it and see.

            regards, tom lane