Re: Efficiency of EXISTS?

Поиск

Список

Период

Сортировка

От	Kenneth Tilton
Тема	Re: Efficiency of EXISTS?
Дата	23 июля 2012 г. 22:13:14
Msg-id	CAECCA8bduPiZq0tLgFEA8Q-CctTeOJ8QvWSLUv3p7GG8quWfag@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Efficiency of EXISTS? (Merlin Moncure <mmoncure@gmail.com>)
Список	pgsql-performance

Дерево обсуждения

On Mon, Jul 23, 2012 at 2:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Mon, Jul 23, 2012 at 4:12 PM, Kenneth Tilton <ktilton@mcna.net> wrote:
> My mental model of the EXISTS clause must be off. This snippet appears at
> the end of a series of WITH clauses I suspect are irrelevant:
>
>> with etc etc ... , cids as
>> (select distinct c.id from ddr2 c
>> join claim_entries ce on ce.claim_id = c.id
>> where (c.assigned_ddr = 879
>> or exists (select 1 from ddr_cdt dc
>> where
>> dc.sys_user_id = 879
>> and dc.document_type = c.document_type
>> -- makes it faster: and (dc.cdt_code is null or dc.cdt_code = ce.cpt_code)
>> )))
>>
>> select count(*) from cids
>
>
> If I uncomment the bit where it says "make it faster" I get decent response
> and the graphical analyze display shows the expected user+doctype+cdtcode
> index is being used (and nice thin lines suggesting efficient lookup).
>
> As it is, the analyze display shows the expected user+doctype index* being
> used but the lines are fat, and performance is an exponential disaster.
>
> * I created the (to me ) redundant user+doctype index trying to get Postgres
> to Do the Right Thing(tm), but I can see that was not the issue.
>
> I presume the reason performance drops off a cliff is because there can be
> 9000 cdt_codes for one user+doctype, but I was hoping EXISTS would just look
> to see if there was at least one row matching user+doctype and return its
> decision. I have tried select *, select 1, and limit 1 on the nested select
> to no avail.
>
> Am I just doing something wrong? I am a relative noob. Is there some other
> hint I can give the planner?

hard to say without having the explain analyze output. also it's not
clear why you need to use WITH, at least for the terminating query.
I'd just do:

select count(*) from
(
inner_query
)

OK. Here is the full query:

with ddr as (

select c.id

,case

when c.portal_user_id is null then u.provider_facility_id

else pu.provider_facility_id

end provider_facility_id

from claims c

left join sys_users u on u.id = c.created_by

left join portal_users pu on pu.id = c.portal_user_id

WHERE c.deleted = 0

AND c.status >= 0

AND (c.created_by is not null or c.portal_user_id is not null)

AND true not in ( select ineligible_code_id in (46,65)

from claim_carcs cc

where c.id = cc.claim_id

and cc.deleted = 0 )

AND (false OR c.document_type = 0)

AND (false OR c.group_plan_id = 44)

limit 1500

)

,ddr2 as (

select c.id

, c.document_type

, c.radiographs

, c.nea_number

, c.assigned_ddr

, d.provider_facility_id as submitting_facility_id

, count(ca.id) as claim_attachments_count

, cast(exists (select 1 from triples where s = c.id and sda='claims' and p = 'ddr-review-passed-on-by') as boolean) as passedon

from ddr d

inner join

claims c on d.id = c.id

join claim_attachments ca on c.id = ca.claim_id

group by

c.id

, submitting_facility_id

having ((nullif(trim(c.nea_number, ' '),'') is not null)

or case transmission_method

when 'P' then count(distinct ca.id) > 1

else count(distinct ca.id) > 0

end

or c.radiographs > 0))

, cids as

(select distinct c.id from ddr2 c

join claim_entries ce on ce.claim_id = c.id

where (c.assigned_ddr = 879

or exists (select 1 from ddr_cdt dc

where

dc.sys_user_id = 879

and dc.document_type = c.document_type

--and (dc.cdt_code is null or dc.cdt_code = ce.cpt_code)

)))

select count(*) from cids

And the explain output:

"Aggregate (cost=56060.60..56060.61 rows=1 width=0)"

" CTE ddr"

" -> Limit (cost=306.29..16203.83 rows=1500 width=16)"

" -> Nested Loop Left Join (cost=306.29..7442626.75 rows=702214 width=16)"

" -> Hash Left Join (cost=306.29..7244556.97 rows=702214 width=12)"

" Hash Cond: (c.created_by = u.id)"

" -> Index Scan using claims_lca1 on claims c (cost=0.00..7230212.96 rows=702214 width=12)"

" Index Cond: ((deleted = 0) AND (status >= 0) AND (group_plan_id = 44) AND (document_type = 0))"

" Filter: (((created_by IS NOT NULL) OR (portal_user_id IS NOT NULL)) AND (NOT (SubPlan 1)))"

" SubPlan 1"

" -> Index Scan using claim_carcs_claim_id on claim_carcs cc (cost=0.00..9.23 rows=1 width=4)"

" Index Cond: (c.id = claim_id)"

" Filter: (deleted = 0)"

" -> Hash (cost=224.46..224.46 rows=6546 width=8)"

" -> Seq Scan on sys_users u (cost=0.00..224.46 rows=6546 width=8)"

" -> Index Scan using portal_users_pkey on portal_users pu (cost=0.00..0.27 rows=1 width=8)"

" Index Cond: (id = c.portal_user_id)"

" CTE ddr2"

" -> GroupAggregate (cost=25714.40..28093.98 rows=286 width=27)"

" Filter: ((NULLIF(btrim((c.nea_number)::text, ' '::text), ''::text) IS NOT NULL) OR CASE c.transmission_method WHEN 'P'::bpchar THEN (count(DISTINCT ca.id) > 1) ELSE (count(DISTINCT ca.id) > 0) END OR (c.radiographs > 0))"

" -> Sort (cost=25714.40..25715.11 rows=286 width=27)"

" Sort Key: c.id, d.provider_facility_id"

" -> Nested Loop (cost=0.00..25702.73 rows=286 width=27)"

" -> Nested Loop (cost=0.00..12752.74 rows=1500 width=27)"

" -> CTE Scan on ddr d (cost=0.00..30.00 rows=1500 width=8)"

" -> Index Scan using claims_pkey on claims c (cost=0.00..8.47 rows=1 width=19)"

" Index Cond: (id = d.id)"

" -> Index Scan using claim_attachments_claim on claim_attachments ca (cost=0.00..8.61 rows=2 width=8)"

" Index Cond: (claim_id = c.id)"

" SubPlan 3"

" -> Index Scan using triples_s_idx on triples (cost=0.00..8.28 rows=1 width=0)"

" Index Cond: (s = c.id)"

" Filter: ((sda = 'claims'::text) AND (p = 'ddr-review-passed-on-by'::text))"

" SubPlan 4"

" -> Bitmap Heap Scan on triples (cost=102.70..1010.15 rows=823 width=8)"

" Recheck Cond: (p = 'ddr-review-passed-on-by'::text)"

" Filter: (sda = 'claims'::text)"

" -> Bitmap Index Scan on triples_p_idx (cost=0.00..102.49 rows=3497 width=0)"

" Index Cond: (p = 'ddr-review-passed-on-by'::text)"

" CTE cids"

" -> HashAggregate (cost=11759.51..11760.52 rows=101 width=4)"

" -> Nested Loop (cost=0.00..11722.94 rows=14627 width=4)"

" -> CTE Scan on ddr2 c (cost=0.00..112.75 rows=144 width=4)"

" Filter: ((assigned_ddr = 879) OR (alternatives: SubPlan 6 or hashed SubPlan 7))"

" SubPlan 6"

" -> Seq Scan on ddr_cdt dc (cost=0.00..134293.58 rows=361282 width=0)"

" Filter: ((sys_user_id = 879) AND (document_type = c.document_type))"

" SubPlan 7"

" -> Bitmap Heap Scan on ddr_cdt dc (cost=20292.74..73868.80 rows=1083845 width=4)"

" Recheck Cond: (sys_user_id = 879)"

" -> Bitmap Index Scan on "ddr-cdt-idx-user-doc" (cost=0.00..20021.78 rows=1083845 width=0)"

" Index Cond: (sys_user_id = 879)"

" -> Index Scan using claim_entries_claim_id on claim_entries ce (cost=0.00..79.35 rows=102 width=4)"

" Index Cond: (claim_id = c.id)"

" -> CTE Scan on cids (cost=0.00..2.02 rows=101 width=0)"

More interesting: I tried reducing the complex query to a simpler query and what I saw was that my mental model of EXISTS is fine. :) It was efficient in the way I expected, and faster than the version that did the last test (the cdt_code test). Now I just have to find out why it is slower in vivo.

Thx, ken

--
Kenneth Tilton

Director of Software Development

MCNA Dental Plans
200 West Cypress Creek Road
Suite 500

Fort Lauderdale, FL 33309

954-730-7131 X181 (Office)
954-628-3347 (Fax)
1-800-494-6262 X181 (Toll Free)

ktilton@mcna.net (Email)

www.mcna.net (Website)CONFIDENTIALITY NOTICE: This electronic mail may contain information that is privileged, confidential, and/or otherwise protected from disclosure to anyone other than its intended recipient(s). Any dissemination or use of this electronic mail or its contents by persons other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail so that we may correct our internal records. Please then delete the original message. Thank you.

В списке pgsql-performance по дате отправления:

Предыдущее

От: Merlin Moncure
Дата: 23 июля 2012 г., 21:52:50
Сообщение: Re: Efficiency of EXISTS?

Следующее

От: Craig Ringer
Дата: 24 июля 2012 г., 00:30:57
Сообщение: Re: Odd blocking (or massively latent) issue - even with EXPLAIN

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Efficiency of EXISTS?

Предыдущее

Следующее