Обсуждение: Performance regarding LIKE searches
Hi,
I am querying a Postgresql 8.3 database table that has approximately 22 million records. The (explain analyze) query
islisted below:
gdr_gbrowse_live=> explain analyze SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand FROM feature as f, name
asn WHERE (n.id=f.id AND lower(n.name) LIKE 'Scaffold:scaffold_163:1000..1199%' AND n.display_name>0);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.01..5899.93 rows=734 width=884) (actual time=0.033..0.033 rows=0 loops=1)
-> Index Scan using name_name_lower_pattern_ops_idx on name n (cost=0.01..9.53 rows=734 width=4) (actual
time=0.032..0.032rows=0 loops=1)
Index Cond: ((lower((name)::text) ~>=~ 'Scaffold:scaffold'::text) AND (lower((name)::text) ~<~
'Scaffold:scaffole'::text))
Filter: ((display_name > 0) AND (lower((name)::text) ~~ 'Scaffold:scaffold_163:1000..1199%'::text))
-> Index Scan using feature_pkey on feature f (cost=0.00..8.01 rows=1 width=884) (never executed)
Index Cond: (f.id = n.id)
Total runtime: 0.119 ms
(7 rows)
I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting. This is very fast
andperforms like I would expect. However, when my application, GBrowse, access the database, I see in my slow query
logthis:
2010-03-29 09:34:38.083
PDT,"gdr_gbrowse_live","gdr_gbrowse_live",11649,"10.0.0.235:59043",4bb0399d.2d81,8,"SELECT",2010-03-2822:24:45
PDT,4/118607,0,LOG,00000,"duration:21467.467 ms execute dbdpg_p25965_9: SELECT
f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand
FROM feature as f, name as n
WHERE (n.id=f.id AND lower(n.name) LIKE $1)
","parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%'",,,,,,,
GBrowse is a perl based application. Looking at the duration for this query is around 21 seconds. That is a bit long.
Does anyone have any ideas why the query duration is so different?
Randall Svancara
Systems Administrator/DBA/Developer
Main Bioinformatics Laboratory
randalls@bioinfo.wsu.edu writes:
> I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting. This is very
fastand performs like I would expect. However, when my application, GBrowse, access the database, I see in my slow
querylog this:
> 2010-03-29 09:34:38.083
PDT,"gdr_gbrowse_live","gdr_gbrowse_live",11649,"10.0.0.235:59043",4bb0399d.2d81,8,"SELECT",2010-03-2822:24:45
PDT,4/118607,0,LOG,00000,"duration:21467.467 ms execute dbdpg_p25965_9: SELECT
f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand
> FROM feature as f, name as n
> WHERE (n.id=f.id AND lower(n.name) LIKE $1)
> ","parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%'",,,,,,,
> GBrowse is a perl based application. Looking at the duration for this query is around 21 seconds. That is a bit
long. Does anyone have any ideas why the query duration is so different?
You're not going to get an index optimization when the LIKE pattern
isn't a constant (and left-anchored, but this is).
It is possible to get the planner to treat a query parameter as a
constant (implying a re-plan on each execution instead of having a
cached plan). I believe what you have to do at the moment is use
unnamed rather than named prepared statements. The practicality of
this would depend a lot on your client-side software stack, which
you didn't mention.
regards, tom lane
Tom,
We are using perl 5.10 with postgresql DBD. Can you point me in the right direction in terms of unamed and named
preparedstatements?
Thanks,
Randall Svancara
Systems Administrator/DBA/Developer
Main Bioinformatics Laboratory
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: randalls@bioinfo.wsu.edu
Cc: pgsql-performance@postgresql.org
Sent: Monday, March 29, 2010 10:00:03 AM
Subject: Re: [PERFORM] Performance regarding LIKE searches
randalls@bioinfo.wsu.edu writes:
> I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting. This is very
fastand performs like I would expect. However, when my application, GBrowse, access the database, I see in my slow
querylog this:
> 2010-03-29 09:34:38.083
PDT,"gdr_gbrowse_live","gdr_gbrowse_live",11649,"10.0.0.235:59043",4bb0399d.2d81,8,"SELECT",2010-03-2822:24:45
PDT,4/118607,0,LOG,00000,"duration:21467.467 ms execute dbdpg_p25965_9: SELECT
f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand
> FROM feature as f, name as n
> WHERE (n.id=f.id AND lower(n.name) LIKE $1)
> ","parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%'",,,,,,,
> GBrowse is a perl based application. Looking at the duration for this query is around 21 seconds. That is a bit
long. Does anyone have any ideas why the query duration is so different?
You're not going to get an index optimization when the LIKE pattern
isn't a constant (and left-anchored, but this is).
It is possible to get the planner to treat a query parameter as a
constant (implying a re-plan on each execution instead of having a
cached plan). I believe what you have to do at the moment is use
unnamed rather than named prepared statements. The practicality of
this would depend a lot on your client-side software stack, which
you didn't mention.
regards, tom lane
On 3/29/2010 12:23 PM, randalls@bioinfo.wsu.edu wrote:
> Tom,
>
> We are using perl 5.10 with postgresql DBD. Can you point me in the right direction in terms of unamed and named
preparedstatements?
>
> Thanks,
>
> Randall Svancara
> Systems Administrator/DBA/Developer
> Main Bioinformatics Laboratory
>
>
>
> ----- Original Message -----
> From: "Tom Lane"<tgl@sss.pgh.pa.us>
> To: randalls@bioinfo.wsu.edu
> Cc: pgsql-performance@postgresql.org
> Sent: Monday, March 29, 2010 10:00:03 AM
> Subject: Re: [PERFORM] Performance regarding LIKE searches
>
> randalls@bioinfo.wsu.edu writes:
>> I can see I am hitting an index using an index that I created using the varchar_pattern_ops setting. This is very
fastand performs like I would expect. However, when my application, GBrowse, access the database, I see in my slow
querylog this:
>
>> 2010-03-29 09:34:38.083
PDT,"gdr_gbrowse_live","gdr_gbrowse_live",11649,"10.0.0.235:59043",4bb0399d.2d81,8,"SELECT",2010-03-2822:24:45
PDT,4/118607,0,LOG,00000,"duration:21467.467 ms execute dbdpg_p25965_9: SELECT
f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand
>> FROM feature as f, name as n
>> WHERE (n.id=f.id AND lower(n.name) LIKE $1)
>
>> ","parameters: $1 = 'Scaffold:scaffold\_163:1000..1199%'",,,,,,,
>
>> GBrowse is a perl based application. Looking at the duration for this query is around 21 seconds. That is a bit
long. Does anyone have any ideas why the query duration is so different?
>
> You're not going to get an index optimization when the LIKE pattern
> isn't a constant (and left-anchored, but this is).
>
> It is possible to get the planner to treat a query parameter as a
> constant (implying a re-plan on each execution instead of having a
> cached plan). I believe what you have to do at the moment is use
> unnamed rather than named prepared statements. The practicality of
> this would depend a lot on your client-side software stack, which
> you didn't mention.
>
> regards, tom lane
>
I'm just going to guess, but DBD::Pg can do "real prepare" or "fake
prepare".
It does "real" by default. Try setting:
$dbh->{pg_server_prepare} = 0;
before you prepare/run that statement and see if it makes a difference.
http://search.cpan.org/dist/DBD-Pg/Pg.pm#prepare
-Andy
On Mon, 29 Mar 2010, randalls@bioinfo.wsu.edu wrote: > WHERE ... lower(n.name) LIKE 'Scaffold:scaffold_163:1000..1199%' ... I'm sure you noticed that this is never going to return any rows? Matthew -- Me... a skeptic? I trust you have proof?