Обсуждение: Can LIKE use indexes or not?
Reading the archives and the FAQ, it seems to be implied that LIKE can use index (and ILIKE can't; so to do case-insensitive search you need to create a functional index on LOWER(field) and say: LOWER(field) LIKE 'foo%'). However, EXPLAIN always says seq scan for the test data I'm using. I've done 'set enable_seqscan to off' and it still says seq scan. I was curious as to how the index will help this query: db1=> set enable_seqscan to off; SET Time: 5.732 ms db1=> explain select * from t where f like 'xx%'; QUERY PLAN ------------------------------------------------------------------- Seq Scan on t (cost=100000000.00..100002698.90 rows=89 width=14) Filter: (f ~~ 'xx%'::text) (2 rows) db1=> explain select * from t where lower(f) like 'xx%'; QUERY PLAN -------------------------------------------------------------------- Seq Scan on t (cost=100000000.00..100002893.68 rows=390 width=14) Filter: (lower(f) ~~ 'xx%'::text) (2 rows) The table is: db1=> \d t Table "public.t" Column | Type | Modifiers --------+------+----------- f | text | Indexes: "i1" unique, btree (lower(f)) "i2" unique, btree (f) It contains +- 250k rows of totally random 10-char-long strings (containing upper- & lowercase letters and numbers). Here's how the LIKE performs: db1=> select * from t where f like 'xx%'; f ------------ xxEqfLZMkH xxBRRnLetJ ... xxFPYJEiYf (98 rows) Time: 452.613 ms Would using an index potentially help the performance of this query, and if yes, how do I force Postgres to use the index? db1=> select * from t where lower(f) like 'mmm%'; f ------------ MmmyEVmfSY MMmzolhHtq ... mMMWEQzlKm (16 rows) Time: 634.470 ms -- dave
David Garamond said: > Would using an index potentially help the performance of this query, and > if yes, how do I force Postgres to use the index? > > db1=> select * from t where lower(f) like 'mmm%'; I suspect the fact that you're specifying the lower function on the column data, ie lower(f), implies that the function has to be applied to every row in the table in order to calculate the value prior to testing the like condition. I don't know enough about what you can and cannot do index-wise in PG, in terms of creating an index based on a computed (upper/lower) value of a column. But you could consider adding an extra column to the table and a trigger so that the trigger places an UPPER or LOWER version of the column "f" into the new column. Like searches would then be select * from t where new_upper_f like upper('MMM%'); Provided that there is an index on the new column, new_upper_f, you should avoid the full table scan. (I think, I haven't tested this out)... John Sidney-Woollett
try this: CREATE [ UNIQUE ] INDEX my_index ON t ( lower(f)); John Sidney-Woollett wrote: >David Garamond said: > > >>Would using an index potentially help the performance of this query, and >>if yes, how do I force Postgres to use the index? >> >>db1=> select * from t where lower(f) like 'mmm%'; >> >> > >I suspect the fact that you're specifying the lower function on the column >data, ie lower(f), implies that the function has to be applied to every >row in the table in order to calculate the value prior to testing the like >condition. > >I don't know enough about what you can and cannot do index-wise in PG, in >terms of creating an index based on a computed (upper/lower) value of a >column. > >But you could consider adding an extra column to the table and a trigger >so that the trigger places an UPPER or LOWER version of the column "f" >into the new column. > >Like searches would then be > >select * from t where new_upper_f like upper('MMM%'); > >Provided that there is an index on the new column, new_upper_f, you should >avoid the full table scan. (I think, I haven't tested this out)... > >John Sidney-Woollett > > >---------------------------(end of broadcast)--------------------------- >TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > > >
John Sidney-Woollett said: > select * from t where new_upper_f like upper('MMM%'); I think I meant select * from t where new_upper_f like 'MMM%'; or select * from t where new_upper_f like upper('mmm%'); John
If you use an exact = does it use the index? e.g. explain select ... where lower(f)='xxxxxxxx' If so it could be your locale setting. On some versions of Postgresql like is disabled on non-C locales. On some versions of Postgresql on some platforms the default is a non-C locale. With version 7.4 you can workaround that: http://www.postgresql.org/docs/current/static/indexes-opclass.html Hope that helps, At 03:30 PM 2/5/2004 +0700, David Garamond wrote: >Reading the archives and the FAQ, it seems to be implied that LIKE can use >index (and ILIKE can't; so to do case-insensitive search you need to >create a functional index on LOWER(field) and say: LOWER(field) LIKE 'foo%'). > >However, EXPLAIN always says seq scan for the test data I'm using. I've >done 'set enable_seqscan to off' and it still says seq scan. I was curious >as to how the index will help this query: > >db1=> set enable_seqscan to off; >SET >Time: 5.732 ms >db1=> explain select * from t where f like 'xx%'; > QUERY PLAN >------------------------------------------------------------------- > Seq Scan on t (cost=100000000.00..100002698.90 rows=89 width=14) > Filter: (f ~~ 'xx%'::text) >(2 rows) > >db1=> explain select * from t where lower(f) like 'xx%'; > QUERY PLAN >-------------------------------------------------------------------- > Seq Scan on t (cost=100000000.00..100002893.68 rows=390 width=14) > Filter: (lower(f) ~~ 'xx%'::text) >(2 rows)
Lincoln Yeoh wrote: > If you use an exact = does it use the index? > e.g. explain select ... where lower(f)='xxxxxxxx' Yes it does. > If so it could be your locale setting. On some versions of Postgresql > like is disabled on non-C locales. I'm using 7.4.1. These are the lines in postgresql.conf (it's basically pristine from the one created by initdb). # These settings are initialized by initdb -- they may be changed lc_messages = 'en_US.iso885915' #locale for system error message strings lc_monetary = 'en_US.iso885915' #locale for monetary formatting lc_numeric = 'en_US.iso885915' #locale for number formatting lc_time = 'en_US.iso885915' #locale for time formatting > On some versions of Postgresql on > some platforms the default is a non-C locale. With version 7.4 you can > workaround that: > http://www.postgresql.org/docs/current/static/indexes-opclass.html Yes, that was the cause of the problem. I've now recreated the index using the varchar_pattern_ops: db1=> create unique index i1 on t(i varchar_pattern_ops); db1=> create unique index i2 on t(lower(i) varchar_pattern_ops); and now EXPLAIN tells me the query uses Index scan: db1=> explain select * from t where f like 'xx%'; QUERY PLAN -------------------------------------------------------------------------------------- Index Scan using i1 on t (cost=0.00..6.01 rows=322 width=14) Index Cond: ((f ~>=~ 'xx'::character varying) AND (f ~<~ 'xy'::character varying)) Filter: (f ~~ 'xx%'::text) (3 rows) db1=> explain select * from t where lower(f) like 'xx%'; QUERY PLAN ---------------------------------------------------------------------------------------------------- Index Scan using i2 on t (cost=0.00..4049.64 rows=1421 width=14) Index Cond: ((lower(f) ~>=~ 'xx'::character varying) AND (lower(f) ~<~ 'xy'::character varying)) Filter: (lower(f) ~~ 'xx%'::text) (3 rows) > Hope that helps, Yes it does, thanks. Apparently using the index does improve the speed: db1=> select * from t where f like 'xx%'; f ------------ xxAGRrXrXr xxAwScNpWh ... xxyuFyyDtn (98 rows) Time: 9.679 ms db1=> select * from t where lower(f) like 'xx%'; f ------------ xxaAvoarIZ XXadJWnXcK ... xXzynzWllI (413 rows) Time: 8.626 ms -- dave
On Thursday 05 February 2004 10:25, David Garamond wrote: Glad to see your problem is solved. Your locale/charset settings look a bit odd though: > # These settings are initialized by initdb -- they may be changed > lc_messages = 'en_US.iso885915' #locale for system error message strings > lc_monetary = 'en_US.iso885915' #locale for monetary formatting > lc_numeric = 'en_US.iso885915' #locale for number formatting > lc_time = 'en_US.iso885915' #locale for time formatting US English with 8859-15 (Latin 9?) charset? Can I as what OS this is? It's just that it seems like an odd combination to me - or am I displaying my ignorance here? -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > On Thursday 05 February 2004 10:25, David Garamond wrote: > > Glad to see your problem is solved. Your locale/charset settings look a bit > odd though: > >># These settings are initialized by initdb -- they may be changed >>lc_messages = 'en_US.iso885915' #locale for system error message strings >>lc_monetary = 'en_US.iso885915' #locale for monetary formatting >>lc_numeric = 'en_US.iso885915' #locale for number formatting >>lc_time = 'en_US.iso885915' #locale for time formatting > > US English with 8859-15 (Latin 9?) charset? Can I as what OS this is? It's > just that it seems like an odd combination to me - or am I displaying my > ignorance here? It's Redhat 7.3 running under VMWare Workstation 4.0.* The host OS is Windows 2000 + SP4. All software are pretty much left to their defaults. * Of course, we use "real" Linux for production; this is just my home machine. -- dave
Richard Huxton wrote: > US English with 8859-15 (Latin 9?) charset? Can I as what OS this is? > It's just that it seems like an odd combination to me - or am I > displaying my ignorance here? FYI, 8859-15 is just 8859-1 with the Euro sign in place of the generic currency sign and some other minor changes. As you might imagine, in Europe it is (becoming?) the default, and since most people in the rest of the world probably won't ever notice the difference, you can expect it to become the preferred choice of operating systems there as well.
On Thursday 05 February 2004 20:00, Peter Eisentraut wrote: > Richard Huxton wrote: > > US English with 8859-15 (Latin 9?) charset? Can I as what OS this is? > > It's just that it seems like an odd combination to me - or am I > > displaying my ignorance here? > > FYI, 8859-15 is just 8859-1 with the Euro sign in place of the generic > currency sign and some other minor changes. As you might imagine, in > Europe it is (becoming?) the default, and since most people in the rest > of the world probably won't ever notice the difference, you can expect > it to become the preferred choice of operating systems there as well. Yeah - Harald Fuchs said the same thing (but it seems to have drifted off list). I was getting mixed up with 8859-14 (Latin 8) which is the Celtic one - blame it on being a Welshman. -- Richard Huxton Archonet Ltd
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Why don't you use select * from t where new_upper_f ilike 'mmm%'; The ilike operator ignores case and you would get rid of the function call. On Thursday 05 February 2004 01:20 am, John Sidney-Woollett wrote: > John Sidney-Woollett said: > > select * from t where new_upper_f like upper('MMM%'); > > I think I meant > > select * from t where new_upper_f like 'MMM%'; > > or > > select * from t where new_upper_f like upper('mmm%'); > > John > > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend - -- UC - -- Open Source Solutions 4U, LLC 2570 Fleetwood Drive Phone: +1 650 872 2425 San Bruno, CA 94066 Cell: +1 650 302 2405 United States Fax: +1 650 872 2417 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFAI+6wjqGXBvRToM4RAgY8AKCmTr8JQpbDd+FMEZCG1ITVIYH+CQCeJ45C IHZxeLxj60UL+TlhXfH6pdM= =JOOY -----END PGP SIGNATURE-----