Re: Inlining of functions (doing LIKE on an array)

Поиск
Список
Период
Сортировка
От ldh@laurent-hasson.com
Тема Re: Inlining of functions (doing LIKE on an array)
Дата
Msg-id 35BE9BD1DD4DD444B0EDE3721E2AEA860115D550@P3PWEX4MB001.ex4.secureserver.net
обсуждение исходный текст
Ответ на Re: Inlining of functions (doing LIKE on an array)  (Marc Mamin <M.Mamin@intershop.de>)
Ответы Re: Inlining of functions (doing LIKE on an array)
Список pgsql-performance
I tried "exists", but won't work in the Function, i.e.,

CREATE OR REPLACE FUNCTION ArrayLike(text[], text) RETURNS bigint
  AS  'exists (select * from unnest($1) a where a like $2)'
LANGUAGE SQL STRICT IMMUTABLE LEAKPROOF

It's as expected though. As for the GIN indices, I tried and it didn't make a difference, which I guess is expected as
wellbecause of the Like operator. I don't expect regular indices to work on regular columns for Like operations,
especially'%xxx' ones, so I didn't expect GIN indices to work either for Array columns with Like. Am I wrong? 

Finally, I think the issue is actually not what I originally thought (i.e., index usage, as per above). But the
inliningstill is the culprit. Here is the plan for  

select count(*) from claims
where (select count(*)  from unnest("SECONDARY_ICD9_DGNS_CD") x_  where x_ like '427%' ) > 0

"Aggregate  (cost=2633016.66..2633016.67 rows=1 width=0) (actual time=3761.888..3761.889 rows=1 loops=1)"
"  ->  Seq Scan on claims  (cost=0.00..2631359.33 rows=662931 width=0) (actual time=0.097..3757.314 rows=85632
loops=1)"
"        Filter: ((SubPlan 1) > 0)"
"        Rows Removed by Filter: 1851321"
"        SubPlan 1"
"          ->  Aggregate  (cost=1.25..1.26 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1936953)"
"                ->  Function Scan on unnest a  (cost=0.00..1.25 rows=1 width=0) (actual time=0.001..0.001 rows=0
loops=1936953)"
"                      Filter: (a ~~ '427%'::text)"
"                      Rows Removed by Filter: 2"
"Planning time: 0.461 ms"
"Execution time: 3762.272 ms"

And when using the function:

"Aggregate  (cost=614390.75..614390.76 rows=1 width=0) (actual time=8169.416..8169.417 rows=1 loops=1)"
"  ->  Seq Scan on claims  (cost=0.00..612733.43 rows=662931 width=0) (actual time=0.163..8162.679 rows=85632 loops=1)"
"        Filter: (tilda."like"("SECONDARY_ICD9_DGNS_CD", '427%'::text) > 0)"
"        Rows Removed by Filter: 1851321"
"Planning time: 0.166 ms"
"Execution time: 8169.676 ms"

There is something fundamental here it seems, but I am not so good at reading plans to understand the differences here.




Thank you,
Laurent Hasson

-----Original Message-----
From: Marc Mamin [mailto:M.Mamin@intershop.de]
Sent: Friday, November 11, 2016 07:44
To: ldh@laurent-hasson.com; pgsql-performance@postgresql.org
Subject: RE: Inlining of functions (doing LIKE on an array)




> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of
> ldh@laurent-hasson.com
> Sent: Freitag, 11. November 2016 07:54
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Inlining of functions (doing LIKE on an array)
>
> Hello,
>
> I am trying to implement an efficient "like" over a text[]. I see a lot of people have tried before me and I learnt a
lotthrough the forums. The results of my search is that a query like the following is optimal: 
>
> select count(*)
>   from claims
> where (select count(*)
>           from unnest("ICD9_DGNS_CD") x_
>          where x_ like '427%'
>        ) > 0
>

Hi,
are you using GIN indexes?

http://stackoverflow.com/questions/4058731/can-postgresql-index-array-columns

moreover your query can still be optimized:
=>
select count(*)
  from claims
where exists (select *
          from unnest("ICD9_DGNS_CD") x_
         where x_ like '427%'
       )

regards,

Marc Mamin

> So I figured I'd create a Function to encapsulate the concept:
>
> CREATE OR REPLACE FUNCTION ArrayLike(text[], text) RETURNS bigint AS
> 'select count(*) from unnest($1) a where a like $2'
> LANGUAGE SQL STRICT IMMUTABLE LEAKPROOF
>
> This works functionally, but performs like crap: full table scan, and cannot make use of any index it seems.
Basically,it feels like PG can't inline that function. 
>
> I have been trying all evening to find a way to rewrite it to trick the compiler/planner into inlining. I tried the
operatorapproach for example, but performance is again not good. 
>
> create function rlike(text,text)
> returns bool as 'select $2 like $1' language sql strict immutable;
> create operator  ``` (procedure = rlike, leftarg = text,
>                       rightarg = text, commutator = ```); CREATE OR
> REPLACE FUNCTION MyLike(text[], text) RETURNS boolean AS 'select $2
> ``` ANY($1)'
> LANGUAGE SQL STRICT IMMUTABLE LEAKPROOF
>
> And by not good, I mean that on my table of 2M+ rows, the "native" query takes 3s, while the function version takes
9sand the operator version takes (via the function, or through the operator directly), takes 15s. 
>
> Any ideas or pointers?
>
>
> Thank you,
> Laurent Hasson


В списке pgsql-performance по дате отправления:

Предыдущее
От: Devrim Gündüz
Дата:
Сообщение: Re: Any advice tuning this query ?
Следующее
От: Andreas Karlsson
Дата:
Сообщение: Re: Any advice tuning this query ?