Обсуждение: Can LIKE under utf8 use INDEXes?
Hi. I'm confused about the behavior of LIKE under utf8 locale.
Accoding to the docs ( http://www.postgresql.org/docs/8.2/interactive/locale.html - excerpted below), it seems that LIKE ignores locale and hence can't use indexes. Yet, EXPLAIN clearly shows it using indexes.
The docs suggest a workaround, to allow LIKE to use indexes - but I couldn't figure it out. Although I'm stuck with locale utf8, all my data is 7-bit ascii. I'm doing a tremendous amount of WHERE x LIKE 'abc%' - what's the best way to set up a good index?
(I can change the settings for this database - but the cluster must remain utf8).
Thanks!
(Here is the doc excerpt, from http://www.postgresql.org/docs/8.2/interactive/locale.html :
"The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them. As a workaround to allow PostgreSQL to use indexes with LIKE clauses under a non-C locale, several custom operator classes exist. These allow the creation of an index that performs a strict character-by-character comparison, ignoring locale comparison rules. Refer to Section 11.8 for more information.")
On Wed, Jul 22, 2009 at 5:57 PM, Robert James<srobertjames@gmail.com> wrote: > Hi. I'm confused about the behavior of LIKE under utf8 locale. > Accoding to the docs ( > http://www.postgresql.org/docs/8.2/interactive/locale.html - excerpted > below), it seems that LIKE ignores locale and hence can't use indexes. Yet, > EXPLAIN clearly shows it using indexes. Are you sure you're using 8.2? -- greg http://mit.edu/~gsstark/resume.pdf
On Wed, 2009-07-22 at 12:57 -0400, Robert James wrote: > The docs suggest a workaround, to allow LIKE to use indexes - but I > couldn't figure it out. Although I'm stuck with locale utf8, all my > data is 7-bit ascii. I'm doing a tremendous amount of WHERE x LIKE > 'abc%' - what's the best way to set up a good index? Create the index using text_pattern_ops, and I think it will do what you want. CREATE INDEX foo_t_idx ON foo (t text_pattern_ops); Regards, Jeff Davis
Robert James <srobertjames@gmail.com> writes: > Hi. I'm confused about the behavior of LIKE under utf8 locale. UTF8 is not a locale, it's an encoding. If you're using C locale then LIKE can use indexes, regardless of the encoding. If you're using some other locale then you need a pattern_ops index. regards, tom lane
Thank you, Tom. I guess I'm a bit confused about things here. How can I find the locale of my database? (I wasn't able to find this in the docs).
If I do have the locale set to 'C', do I loose anything by using utf8 for all text fields?
On Wed, Jul 22, 2009 at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert James <srobertjames@gmail.com> writes:UTF8 is not a locale, it's an encoding. If you're using C locale then
> Hi. I'm confused about the behavior of LIKE under utf8 locale.
LIKE can use indexes, regardless of the encoding. If you're using
some other locale then you need a pattern_ops index.
regards, tom lane
Robert James wrote: > Thank you, Tom. I guess I'm a bit confused about things here. How can > I find the locale of my database? (I wasn't able to find this in the docs). > If I do have the locale set to 'C', do I loose anything by using utf8 > for all text fields? use psql: postgres=# \l+ List of databases Name | Owner | Encoding | Collation | Ctype | ----------------+-----------+----------+-------------+-------------+ postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | output shortend ;-) Cheers Andy
Thanks - I don't show any locale:
rbt_development=> \l
List of databases
Name | Owner | Encoding
----------------------------+---------------------+----------
rbt_development | rbt | UTF8
...
List of databases
Name | Owner | Encoding
----------------------------+---------------------+----------
rbt_development | rbt | UTF8
...
On Wed, Jul 22, 2009 at 6:45 PM, Andreas Wenk <a.wenk@netzmeister-st-pauli.de> wrote:
Robert James wrote:use psql:Thank you, Tom. I guess I'm a bit confused about things here. How can I find the locale of my database? (I wasn't able to find this in the docs).
If I do have the locale set to 'C', do I loose anything by using utf8 for all text fields?
postgres=# \l+
List of databases
Name | Owner | Encoding | Collation | Ctype |
----------------+-----------+----------+-------------+-------------+
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
output shortend ;-)
Cheers
Andy
Robert James wrote: > Thanks - I don't show any locale: > rbt_development=> \l > List of databases > Name | Owner | Encoding > ----------------------------+---------------------+---------- > rbt_development | rbt | UTF8 > ... ahm - you are running pg 8.2. There I think the + option is not available (\l+). So if you use a debian based system and installed it via the package manager apt or aptitude you could give this a try: /usr/lib/postgresql/8.2/bin/pg_controldata \ /var/lib/postgresql/8.2/main/ |grep LC This should output something like: LC_COLLATE: de_DE.UTF-8 LC_CTYPE: de_DE.UTF-8 I hope this helps a little ... Cheers Andy P.S.: top posting is ugly ;-) > > On Wed, Jul 22, 2009 at 6:45 PM, Andreas Wenk > <a.wenk@netzmeister-st-pauli.de <mailto:a.wenk@netzmeister-st-pauli.de>> > wrote: > > Robert James wrote: > > Thank you, Tom. I guess I'm a bit confused about things here. > How can I find the locale of my database? (I wasn't able to > find this in the docs). > If I do have the locale set to 'C', do I loose anything by using > utf8 for all text fields? > > > use psql: > > postgres=# \l+ > List of databases > Name | Owner | Encoding | Collation | Ctype | > ----------------+-----------+----------+-------------+-------------+ > postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | > > output shortend ;-) > > Cheers > > Andy > > > > >
Andreas Wenk wrote: > Robert James wrote: >> Thanks - I don't show any locale: >> rbt_development=> \l >> List of databases >> Name | Owner | Encoding >> ----------------------------+---------------------+---------- >> rbt_development | rbt | UTF8 >> ... > > ahm - you are running pg 8.2. There I think the + option is not > available (\l+). So if you use a debian based system and installed it > via the package manager apt or aptitude you could give this a try: > > /usr/lib/postgresql/8.2/bin/pg_controldata \ > /var/lib/postgresql/8.2/main/ |grep LC > > This should output something like: > > LC_COLLATE: de_DE.UTF-8 > LC_CTYPE: de_DE.UTF-8 > > I hope this helps a little ... > > Cheers > > Andy > > P.S.: top posting is ugly ;-) more correct: \l+ is also available in 8.2 but the output in 8.4 is extended ... Cheers Andy
Andreas Wenk <a.wenk@netzmeister-st-pauli.de> writes: > Robert James wrote: >> Thanks - I don't show any locale: > ahm - you are running pg 8.2. There I think the + option is not > available (\l+). So if you use a debian based system and installed it > via the package manager apt or aptitude you could give this a try: > /usr/lib/postgresql/8.2/bin/pg_controldata \ > /var/lib/postgresql/8.2/main/ |grep LC Easier way is "SHOW LC_COLLATE" and "SHOW LC_CTYPE" ... regards, tom lane