Re: Inlining of functions (doing LIKE on an array)

Поиск
Список
Период
Сортировка
От ldh@laurent-hasson.com
Тема Re: Inlining of functions (doing LIKE on an array)
Дата
Msg-id 35BE9BD1DD4DD444B0EDE3721E2AEA860115DECB@P3PWEX4MB001.ex4.secureserver.net
обсуждение исходный текст
Ответ на Re: Inlining of functions (doing LIKE on an array)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Inlining of functions (doing LIKE on an array)
Список pgsql-performance
Thanks for the pointer on the "select exists" syntax Tom. Much appreciated. I couldn't figure it out! And as for
normalizing,yes, thought about it, but the one-to-many relationship would make other scenarios we have more complex and
slower.So I am juggling with trade-offs. 

So, here are my findings. I did 10 runs for each of the 4 options I have arrived at. The runs were pretty consistent,
withina few 10th's of a second off each other, so little variability. Not 100% scientific, but good enough for my test.
Ipicked here the last run I had with the plans for illustration. 

Take-aways:
-----------------------
   - The "select exists" (#3) approach is roughly 40% faster than "select count(*) > 0" (#1).
   - The SQL Function version (#3) Vs the plpgSQL function version (#2) of the same query performs better (~30%)
   - The inlined version (#4) is twice as fast (roughly) as the SQL version (#3).

I wish there were a way to force inlining, or some other mechanism as the performance difference is large here. I'll be
usingthe inlining approach when possible, but the SQL Function approach is simpler and will likely be more suitable for
somedevelopers. 

Details:
-----------------
1- select count(*) > 0 as SQL
===================================
CREATE OR REPLACE FUNCTION MyLike2(text[], text) RETURNS boolean
  AS  'select count(*) > 0 from unnest($1) a where a like $2'
LANGUAGE SQL STRICT IMMUTABLE

EXPLAIN ANALYZE
select count(*)
from cms.claims
where MyLike2("code", '427%')
--"Aggregate  (cost=609418.77..609418.78 rows=1 width=0) (actual time=8464.372..8464.372 rows=1 loops=1)"
--"  ->  Seq Scan on claims  (cost=0.00..607761.44 rows=662931 width=0) (actual time=0.077..8457.963 rows=85632
loops=1)"
--"        Filter: MyLike2("code", '427%'::text)"
--"        Rows Removed by Filter: 1851321"
--"Planning time: 0.131 ms"
--"Execution time: 8464.407 ms"

2- select exists as plpgSQL
===================================
CREATE OR REPLACE FUNCTION MyLike3(text[], text) RETURNS boolean
  AS  'begin return exists (select * from unnest($1) a where a like $2); end'
LANGUAGE plpgSQL STRICT IMMUTABLE

EXPLAIN ANALYZE
select count(*)
from cms.claims
where MyLike3("code", '427%')
--"Aggregate  (cost=609418.77..609418.78 rows=1 width=0) (actual time=7708.945..7708.945 rows=1 loops=1)"
--"  ->  Seq Scan on claims  (cost=0.00..607761.44 rows=662931 width=0) (actual time=0.040..7700.528 rows=85632
loops=1)"
--"        Filter: MyLike3("code", '427%'::text)"
--"        Rows Removed by Filter: 1851321"
--"Planning time: 0.076 ms"
--"Execution time: 7708.975 ms"

3- select exists as SQL
===================================
CREATE OR REPLACE FUNCTION MyLike(text[], text) RETURNS boolean
  AS  'select exists (select * from unnest($1) a where a like $2)'
LANGUAGE SQL STRICT IMMUTABLE

EXPLAIN ANALYZE
select count(*)
from cms.claims
where MyLike("code", '427%')
--"Aggregate  (cost=609418.77..609418.78 rows=1 width=0) (actual time=5524.690..5524.690 rows=1 loops=1)"
--"  ->  Seq Scan on claims  (cost=0.00..607761.44 rows=662931 width=0) (actual time=0.064..5515.886 rows=85632
loops=1)"
--"        Filter: tilda."like"("code", '427%'::text)"
--"        Rows Removed by Filter: 1851321"
--"Planning time: 0.097 ms"
--"Execution time: 5524.718 ms"

4- select exists inlined
===================================
EXPLAIN ANALYZE
select count(*)
from cms.claims
where exists (select * from unnest("SECONDARY_ICD9_DGNS_CD") a where a like '427%')
--"Aggregate  (cost=2604013.42..2604013.43 rows=1 width=0) (actual time=2842.259..2842.259 rows=1 loops=1)"
--"  ->  Seq Scan on claims  (cost=0.00..2601527.42 rows=994397 width=0) (actual time=0.017..2837.122 rows=85632
loops=1)"
--"        Filter: (SubPlan 1)"
--"        Rows Removed by Filter: 1851321"
--"        SubPlan 1"
--"          ->  Function Scan on unnest a  (cost=0.00..1.25 rows=1 width=0) (actual time=0.001..0.001 rows=0
loops=1936953)"
--"                Filter: (a ~~ '427%'::text)"
--"                Rows Removed by Filter: 2"
--"Planning time: 0.155 ms"
--"Execution time: 2842.311 ms"


Thank you,
Laurent Hasson

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, November 11, 2016 11:46
To: ldh@laurent-hasson.com
Cc: Marc Mamin <M.Mamin@intershop.de>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Inlining of functions (doing LIKE on an array)

"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
> I tried "exists", but won't work in the Function, i.e., CREATE OR
> REPLACE FUNCTION ArrayLike(text[], text) RETURNS bigint
>   AS  'exists (select * from unnest($1) a where a like $2)'
> LANGUAGE SQL STRICT IMMUTABLE LEAKPROOF

Syntax and semantics problems.  This would work:

regression=# CREATE OR REPLACE FUNCTION ArrayLike(text[], text) RETURNS bool regression-# as 'select exists (select *
fromunnest($1) a where a like $2)' 
regression-# LANGUAGE SQL STRICT IMMUTABLE; CREATE FUNCTION regression=# create table tt (f1 text[]); CREATE TABLE
regression=#explain select * from tt where ArrayLike(f1, 'foo'); 
                      QUERY PLAN
-------------------------------------------------------
 Seq Scan on tt  (cost=0.00..363.60 rows=453 width=32)
   Filter: arraylike(f1, 'foo'::text)
(2 rows)

But we don't inline SQL functions containing sub-selects, so you're still stuck with the rather high overhead of a SQL
function. A plpgsql function might be a bit faster: 

CREATE OR REPLACE FUNCTION ArrayLike(text[], text) RETURNS bool as 'begin return exists (select * from unnest($1) a
wherea like $2); end' 
LANGUAGE plpgSQL STRICT IMMUTABLE;

BTW, I'd be pretty suspicious of marking this function leakproof, because the underlying LIKE operator isn't leakproof
accordingto pg_proc. 


> It's as expected though. As for the GIN indices, I tried and it didn't make a difference, which I guess is expected
aswell because of the Like operator. I don't expect regular indices to work on regular columns for Like operations,
especially'%xxx' ones, so I didn't expect GIN indices to work either for Array columns with Like. Am I wrong? 

Plain GIN index, probably not.  A pg_trgm index could help with LIKE searches, but I don't think we have a variant of
thatfor array columns. 

Have you considered renormalizing the data so that you don't have arrays?

            regards, tom lane


В списке pgsql-performance по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: Inlining of functions (doing LIKE on an array)
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: Any advice tuning this query ?