Re: Efficiency of EXISTS?

Поиск
Список
Период
Сортировка
От Kenneth Tilton
Тема Re: Efficiency of EXISTS?
Дата
Msg-id CAECCA8bduPiZq0tLgFEA8Q-CctTeOJ8QvWSLUv3p7GG8quWfag@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Efficiency of EXISTS?  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-performance


On Mon, Jul 23, 2012 at 2:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Jul 23, 2012 at 4:12 PM, Kenneth Tilton <ktilton@mcna.net> wrote:
> My mental model of the EXISTS clause must be off. This snippet appears at
> the end of a series of WITH clauses I suspect are irrelevant:
>
>> with etc etc ... , cids as
>>   (select distinct c.id from ddr2 c
>> join claim_entries ce on ce.claim_id = c.id
>> where (c.assigned_ddr = 879
>> or exists (select 1 from ddr_cdt dc
>> where
>> dc.sys_user_id = 879
>> and dc.document_type = c.document_type
>> -- makes it faster: and (dc.cdt_code is null or dc.cdt_code = ce.cpt_code)
>> )))
>>
>> select count(*) from cids
>
>
> If I uncomment the bit where it says "make it faster" I get decent response
> and the graphical analyze display shows the expected user+doctype+cdtcode
> index is being used (and nice thin lines suggesting efficient lookup).
>
> As it is, the analyze display shows the expected user+doctype index* being
> used but the lines are fat, and performance is an exponential disaster.
>
> * I created the (to me ) redundant user+doctype index trying to get Postgres
> to Do the Right Thing(tm), but I can see that was not the issue.
>
> I presume the reason performance drops off a cliff is because there can be
> 9000 cdt_codes for one user+doctype, but I was hoping EXISTS would just look
> to see if there was at least one row matching user+doctype and return its
> decision. I have tried select *, select 1, and limit 1 on the nested select
> to no avail.
>
> Am I just doing something wrong? I am a relative noob. Is there some other
> hint I can give the planner?

hard to say without having the explain analyze output.  also it's not
clear why you need to use WITH, at least for the terminating query.
I'd just do:

select count(*) from
(
  inner_query
)

OK. Here is the full query:

with ddr as (
select c.id
 ,case
   when c.portal_user_id is null then u.provider_facility_id
   else pu.provider_facility_id
  end provider_facility_id
 from claims c
 left join sys_users u on u.id = c.created_by
 left join portal_users pu on pu.id = c.portal_user_id
 WHERE c.deleted = 0
 AND c.status >= 0
 AND (c.created_by is not null or c.portal_user_id is not null)
 AND true not in ( select ineligible_code_id in (46,65)
from claim_carcs cc
where c.id = cc.claim_id
and cc.deleted = 0 )
 AND (false OR c.document_type = 0)
 AND (false OR c.group_plan_id = 44)
 
 limit 1500
)

,ddr2 as (
select c.id
 , c.document_type
 , c.radiographs
 , c.nea_number
 , c.assigned_ddr
 , d.provider_facility_id as submitting_facility_id
 , count(ca.id) as claim_attachments_count
 , cast(exists (select 1 from triples where s = c.id and sda='claims' and p = 'ddr-review-passed-on-by') as boolean) as passedon
 from ddr d
inner join
claims c on d.id = c.id
join claim_attachments ca on c.id = ca.claim_id
group by
 , submitting_facility_id
having ((nullif(trim(c.nea_number, ' '),'') is not null)
or case transmission_method
when 'P' then count(distinct ca.id) > 1
else count(distinct ca.id) > 0
  end
or c.radiographs > 0))

, cids as
  (select distinct c.id from ddr2 c
join claim_entries ce on ce.claim_id = c.id
where (c.assigned_ddr = 879
or exists (select 1 from ddr_cdt dc 
where
dc.sys_user_id = 879
and dc.document_type = c.document_type
--and (dc.cdt_code is null or dc.cdt_code = ce.cpt_code)
)))
select count(*) from cids

And the explain output:

"Aggregate  (cost=56060.60..56060.61 rows=1 width=0)"
"  CTE ddr"
"    ->  Limit  (cost=306.29..16203.83 rows=1500 width=16)"
"          ->  Nested Loop Left Join  (cost=306.29..7442626.75 rows=702214 width=16)"
"                ->  Hash Left Join  (cost=306.29..7244556.97 rows=702214 width=12)"
"                      Hash Cond: (c.created_by = u.id)"
"                      ->  Index Scan using claims_lca1 on claims c  (cost=0.00..7230212.96 rows=702214 width=12)"
"                            Index Cond: ((deleted = 0) AND (status >= 0) AND (group_plan_id = 44) AND (document_type = 0))"
"                            Filter: (((created_by IS NOT NULL) OR (portal_user_id IS NOT NULL)) AND (NOT (SubPlan 1)))"
"                            SubPlan 1"
"                              ->  Index Scan using claim_carcs_claim_id on claim_carcs cc  (cost=0.00..9.23 rows=1 width=4)"
"                                    Index Cond: (c.id = claim_id)"
"                                    Filter: (deleted = 0)"
"                      ->  Hash  (cost=224.46..224.46 rows=6546 width=8)"
"                            ->  Seq Scan on sys_users u  (cost=0.00..224.46 rows=6546 width=8)"
"                ->  Index Scan using portal_users_pkey on portal_users pu  (cost=0.00..0.27 rows=1 width=8)"
"                      Index Cond: (id = c.portal_user_id)"
"  CTE ddr2"
"    ->  GroupAggregate  (cost=25714.40..28093.98 rows=286 width=27)"
"          Filter: ((NULLIF(btrim((c.nea_number)::text, ' '::text), ''::text) IS NOT NULL) OR CASE c.transmission_method WHEN 'P'::bpchar THEN (count(DISTINCT ca.id) > 1) ELSE (count(DISTINCT ca.id) > 0) END OR (c.radiographs > 0))"
"          ->  Sort  (cost=25714.40..25715.11 rows=286 width=27)"
"                Sort Key: c.id, d.provider_facility_id"
"                ->  Nested Loop  (cost=0.00..25702.73 rows=286 width=27)"
"                      ->  Nested Loop  (cost=0.00..12752.74 rows=1500 width=27)"
"                            ->  CTE Scan on ddr d  (cost=0.00..30.00 rows=1500 width=8)"
"                            ->  Index Scan using claims_pkey on claims c  (cost=0.00..8.47 rows=1 width=19)"
"                                  Index Cond: (id = d.id)"
"                      ->  Index Scan using claim_attachments_claim on claim_attachments ca  (cost=0.00..8.61 rows=2 width=8)"
"                            Index Cond: (claim_id = c.id)"
"          SubPlan 3"
"            ->  Index Scan using triples_s_idx on triples  (cost=0.00..8.28 rows=1 width=0)"
"                  Index Cond: (s = c.id)"
"                  Filter: ((sda = 'claims'::text) AND (p = 'ddr-review-passed-on-by'::text))"
"          SubPlan 4"
"            ->  Bitmap Heap Scan on triples  (cost=102.70..1010.15 rows=823 width=8)"
"                  Recheck Cond: (p = 'ddr-review-passed-on-by'::text)"
"                  Filter: (sda = 'claims'::text)"
"                  ->  Bitmap Index Scan on triples_p_idx  (cost=0.00..102.49 rows=3497 width=0)"
"                        Index Cond: (p = 'ddr-review-passed-on-by'::text)"
"  CTE cids"
"    ->  HashAggregate  (cost=11759.51..11760.52 rows=101 width=4)"
"          ->  Nested Loop  (cost=0.00..11722.94 rows=14627 width=4)"
"                ->  CTE Scan on ddr2 c  (cost=0.00..112.75 rows=144 width=4)"
"                      Filter: ((assigned_ddr = 879) OR (alternatives: SubPlan 6 or hashed SubPlan 7))"
"                      SubPlan 6"
"                        ->  Seq Scan on ddr_cdt dc  (cost=0.00..134293.58 rows=361282 width=0)"
"                              Filter: ((sys_user_id = 879) AND (document_type = c.document_type))"
"                      SubPlan 7"
"                        ->  Bitmap Heap Scan on ddr_cdt dc  (cost=20292.74..73868.80 rows=1083845 width=4)"
"                              Recheck Cond: (sys_user_id = 879)"
"                              ->  Bitmap Index Scan on "ddr-cdt-idx-user-doc"  (cost=0.00..20021.78 rows=1083845 width=0)"
"                                    Index Cond: (sys_user_id = 879)"
"                ->  Index Scan using claim_entries_claim_id on claim_entries ce  (cost=0.00..79.35 rows=102 width=4)"
"                      Index Cond: (claim_id = c.id)"
"  ->  CTE Scan on cids  (cost=0.00..2.02 rows=101 width=0)"

More interesting: I tried reducing the complex query to a simpler query and what I saw was that my mental model of EXISTS is fine. :) It was efficient in the way I expected, and faster than the version that did the last test (the cdt_code test). Now I just have to find out why it is slower in vivo.

Thx, ken



--
Kenneth Tilton

Director of Software Development

MCNA Dental Plans
200 West Cypress Creek Road
Suite 500
Fort Lauderdale, FL 33309
ktilton@mcna.net (Email)

www.mcna.net (Website)CONFIDENTIALITY NOTICE: This electronic mail may contain information that is privileged, confidential, and/or otherwise protected from disclosure to anyone other than its intended recipient(s). Any dissemination or use of this electronic mail or its contents by persons other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail so that we may correct our internal records. Please then delete the original message. Thank you.


В списке pgsql-performance по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: Efficiency of EXISTS?
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Odd blocking (or massively latent) issue - even with EXPLAIN