Query works when kludged, but would prefer "best practice" solution

Поиск
Список
Период
Сортировка
От Carlo Stonebanks
Тема Query works when kludged, but would prefer "best practice" solution
Дата
Msg-id fcmnm1$24mj$1@news.hub.org
обсуждение исходный текст
Ответы Re: Query works when kludged, but would prefer "best practice" solution  ("Merlin Moncure" <mmoncure@gmail.com>)
Список pgsql-performance
Hi all,

Please see the section marked as "PROBLEM" in "ORIGINAL QUERY" plan below.
You can see it's pretty slow. Oddly enough, an index for facility_address_id
is available but not being used, but I suspect it's questionable whether it
would be an improvement.

I knew that the filter was best applied to the results of the join - my
attempts to restructure the query with subqueries, etc didn't fool the
planner - it always figured out a plan that had this problem SEQ SCAN +
FILTER in it.

Finally, I "hid" the condition from the planner with a coalesce function -
see "SOLUTION" in the "KLUDGED QUERY" plan below.

Sure enough, a new plan appeared with a remarkable performance improvement!

The purpose of this query is to find facilities within a geographical area
when the complete address data is missing (hence the facility_address_id is
NULL).

PG is 8.4.2 on RH linux server with 1GB ram, HDD is RAID 1.

I don't like kludging like this - so any and all help or advice is
appreciated!

Carlo

ORIGINAL QUERY
select
   pp.provider_id,
   pp.provider_practice_id,
   nearby.distance
from mdx_core.provider_practice as pp
join mdx_core.facility as f
on f.facility_id = pp.facility_id
join (select * from mdx_core.zips_in_mile_range('08820', 10)) as nearby
on f.default_country_code = 'US'
   and f.default_postal_code = nearby.zip
where facility_address_id is null

Hash Join  (cost=30258.99..107702.53 rows=9438 width=16) (actual
time=169.516..3064.188 rows=872 loops=1)
  Hash Cond: (pp.facility_id = f.facility_id)
PROBLEM:
------------
  ->  Seq Scan on provider_practice pp  (cost=0.00..74632.55 rows=724429
width=12) (actual time=0.039..1999.457 rows=728396 loops=1)
        Filter: (facility_address_id IS NULL)
------------
  ->  Hash  (cost=29954.15..29954.15 rows=24387 width=12) (actual
time=156.668..156.668 rows=907 loops=1)
        ->  Nested Loop  (cost=0.00..29954.15 rows=24387 width=12) (actual
time=149.891..155.343 rows=907 loops=1)
              ->  Function Scan on zips_in_mile_range  (cost=0.00..12.50
rows=1000 width=40) (actual time=149.850..149.920 rows=66 loops=1)
              ->  Index Scan using facility_country_postal_code_idx on
facility f  (cost=0.00..29.64 rows=24 width=15) (actual time=0.015..0.048
rows=14 loops=66)
                    Index Cond: ((f.default_country_code = 'US'::bpchar) AND
((f.default_postal_code)::text = zips_in_mile_range.zip))
Total runtime: 3065.338 ms


KLUDGED QUERY

select
   pp.provider_id,
   pp.provider_practice_id,
   nearby.distance
from mdx_core.provider_practice as pp
join mdx_core.facility as f
on f.facility_id = pp.facility_id
join (select * from mdx_core.zips_in_mile_range('08820', 10)) as nearby
on f.default_country_code = 'US'
   and f.default_postal_code = nearby.zip
   and coalesce(pp.facility_address_id, -1) = -1

Nested Loop  (cost=0.00..112618.87 rows=180 width=16) (actual
time=149.680..167.261 rows=872 loops=1)
  ->  Nested Loop  (cost=0.00..29954.15 rows=24387 width=12) (actual
time=149.659..155.018 rows=907 loops=1)
        ->  Function Scan on zips_in_mile_range  (cost=0.00..12.50 rows=1000
width=40) (actual time=149.620..149.698 rows=66 loops=1)
        ->  Index Scan using facility_country_postal_code_idx on facility f
(cost=0.00..29.64 rows=24 width=15) (actual time=0.015..0.045 rows=14
loops=66)
              Index Cond: ((f.default_country_code = 'US'::bpchar) AND
((f.default_postal_code)::text = zips_in_mile_range.zip))
SOLUTION
-------------
  ->  Index Scan using provider_practice_facility_idx on provider_practice
pp  (cost=0.00..3.38 rows=1 width=12) (actual time=0.007..0.009 rows=1
loops=907)
        Index Cond: (f.facility_id = pp.facility_id)
        Filter: (COALESCE(facility_address_id, -1) = -1)
-------------
Total runtime: 168.275 ms


В списке pgsql-performance по дате отправления:

Предыдущее
От: Markus Schiltknecht
Дата:
Сообщение: Re: DRBD and Postgres: how to improve the perfomance?
Следующее
От: "Merlin Moncure"
Дата:
Сообщение: Re: Query works when kludged, but would prefer "best practice" solution