JOIN performance
| От | Dean Gibson (DB Administrator) |
|---|---|
| Тема | JOIN performance |
| Дата | |
| Msg-id | 5.1.0.14.2.20040920121606.00a89c68@imaps.mailpen.net обсуждение исходный текст |
| Ответ на | Re: COUNT(*) to find records which have a certain number of (Greg Stark <gsstark@mit.edu>) |
| Ответы |
Re: JOIN performance
|
| Список | pgsql-sql |
I have a view that when used, is slow:
CREATE VIEW "Data".genlic_a4avail AS SELECT genlic_a4.*, last_action_date,
end_date, canceled FROM genlic_a4 LEFT JOIN lic_hd USING(
sys_id) WHERE status != 'A';
Here is the EXPLAIN output:
Merge Join (cost=155360.47..159965.70 rows=13063 width=75) Merge Cond: ("outer".sys_id = "inner".sys_id) -> Sort
(cost=3912.51..3916.48 rows=1589 width=62) Sort Key: "_GenLicGroupA4".sys_id -> Nested Loop
(cost=0.00..3828.04rows=1589 width=62) Join Filter: ("outer".callsign ~ ("inner".pattern)::text)
-> Seq Scan on "_GenLicGroupA4" (cost=0.00..1667.40
rows=1589 width=21) Filter: ((status <> 'R'::bpchar) AND (status <>
'A'::bpchar) AND (geo_region = 12)) -> Seq Scan on "_GeoRestrict" (cost=0.00..1.16 rows=16
width=41) -> Sort (cost=262032.96..264249.96 rows=886799 width=72) Sort Key: lic_hd.sys_id ->
SubqueryScan lic_hd (cost=0.00..24529.99 rows=886799 width=72) -> Seq Scan on "_LicHD"
(cost=0.00..24529.99rows=886799
width=72)
If I change the view to this:
CREATE VIEW "Data".genlic_a4avail AS SELECT genlic_a4.*, (SELECT
last_action_dateFROM lic_hd WHERE sys_id =
genlic_a4.sys_id LIMIT 1) AS last_action_date, (SELECT end_date FROM lic_hd WHERE
sys_id=
genlic_a4.sys_id LIMIT 1) AS end_date, (SELECT canceled FROM lic_hd WHERE sys_id
=
genlic_a4.sys_id LIMIT 1) AS canceled FROM genlic_a4 WHERE status !=
'A';
Then the performance is MUCH better:
Subquery Scan genlic_a4avail (cost=0.00..3828.04 rows=1589 width=62) -> Nested Loop (cost=0.00..3828.04 rows=1589
width=62) Join Filter: ("outer".callsign ~ ("inner".pattern)::text) -> Seq Scan on "_GenLicGroupA4"
(cost=0.00..1667.40rows=1589
width=21) Filter: ((status <> 'R'::bpchar) AND (status <>
'A'::bpchar) AND (geo_region = 12)) -> Seq Scan on "_GeoRestrict" (cost=0.00..1.16 rows=16 width=41)
SubPlan -> Limit (cost=0.00..3.01 rows=1 width=4) -> Index Scan using "_LicHD_pkey" on
"_LicHD" (cost=0.00..3.01 rows=1 width=4) Index Cond: (unique_system_identifier = $0)
-> Limit (cost=0.00..3.01 rows=1 width=8) -> Index Scan using "_LicHD_pkey" on
"_LicHD" (cost=0.00..3.01 rows=1 width=8) Index Cond: (unique_system_identifier = $0)
-> Limit (cost=0.00..3.01 rows=1 width=8) -> Index Scan using "_LicHD_pkey" on
"_LicHD" (cost=0.00..3.01 rows=1 width=8) Index Cond: (unique_system_identifier = $0)
Note that genlic_a4 is small (4519), and lic_hd is large (886799), and
lic_hd has sys_id as its PRIMARY KEY.
Is there a better way to write the LEFT JOIN so as to achieve the
performance of the second VIEW without the clumsiness of the three (SELECT
... LIMIT 1) ???
В списке pgsql-sql по дате отправления: