Обсуждение: Can LIKE under utf8 use INDEXes?

Поиск
Список
Период
Сортировка

Can LIKE under utf8 use INDEXes?

От
Robert James
Дата:
Hi.  I'm confused about the behavior of LIKE under utf8 locale.
Accoding to the docs ( http://www.postgresql.org/docs/8.2/interactive/locale.html - excerpted below), it seems that LIKE ignores locale and hence can't use indexes.  Yet, EXPLAIN clearly shows it using indexes.
The docs suggest a workaround, to allow LIKE to use indexes - but I couldn't figure it out.  Although I'm stuck with locale utf8, all my data is 7-bit ascii.  I'm doing a tremendous amount of WHERE x LIKE 'abc%' - what's the best way to set up a good index?
(I can change the settings for this database - but the cluster must remain utf8).
Thanks!
"The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them. As a workaround to allow PostgreSQL to use indexes with LIKE clauses under a non-C locale, several custom operator classes exist. These allow the creation of an index that performs a strict character-by-character comparison, ignoring locale comparison rules. Refer to Section 11.8 for more information.")

Re: Can LIKE under utf8 use INDEXes?

От
Greg Stark
Дата:
On Wed, Jul 22, 2009 at 5:57 PM, Robert James<srobertjames@gmail.com> wrote:
> Hi.  I'm confused about the behavior of LIKE under utf8 locale.
> Accoding to the docs (
> http://www.postgresql.org/docs/8.2/interactive/locale.html - excerpted
> below), it seems that LIKE ignores locale and hence can't use indexes.  Yet,
> EXPLAIN clearly shows it using indexes.

Are you sure you're using 8.2?

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: Can LIKE under utf8 use INDEXes?

От
Jeff Davis
Дата:
On Wed, 2009-07-22 at 12:57 -0400, Robert James wrote:
> The docs suggest a workaround, to allow LIKE to use indexes - but I
> couldn't figure it out.  Although I'm stuck with locale utf8, all my
> data is 7-bit ascii.  I'm doing a tremendous amount of WHERE x LIKE
> 'abc%' - what's the best way to set up a good index?

Create the index using text_pattern_ops, and I think it will do what you
want.

CREATE INDEX foo_t_idx ON foo (t text_pattern_ops);

Regards,
    Jeff Davis


Re: Can LIKE under utf8 use INDEXes?

От
Tom Lane
Дата:
Robert James <srobertjames@gmail.com> writes:
> Hi.  I'm confused about the behavior of LIKE under utf8 locale.

UTF8 is not a locale, it's an encoding.  If you're using C locale then
LIKE can use indexes, regardless of the encoding.  If you're using
some other locale then you need a pattern_ops index.

            regards, tom lane

Re: Can LIKE under utf8 use INDEXes?

От
Robert James
Дата:
Thank you, Tom.  I guess I'm a bit confused about things here.  How can I find the locale of my database? (I wasn't able to find this in the docs).
If I do have the locale set to 'C', do I loose anything by using utf8 for all text fields?

On Wed, Jul 22, 2009 at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert James <srobertjames@gmail.com> writes:
> Hi.  I'm confused about the behavior of LIKE under utf8 locale.

UTF8 is not a locale, it's an encoding.  If you're using C locale then
LIKE can use indexes, regardless of the encoding.  If you're using
some other locale then you need a pattern_ops index.

                       regards, tom lane

Re: Can LIKE under utf8 use INDEXes?

От
Andreas Wenk
Дата:
Robert James wrote:
> Thank you, Tom.  I guess I'm a bit confused about things here.  How can
> I find the locale of my database? (I wasn't able to find this in the docs).
> If I do have the locale set to 'C', do I loose anything by using utf8
> for all text fields?

use psql:

postgres=# \l+
                         List of databases
       Name      |   Owner   | Encoding |  Collation  |    Ctype    |
----------------+-----------+----------+-------------+-------------+
  postgres       | postgres  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |


output shortend ;-)

Cheers

Andy





Re: Can LIKE under utf8 use INDEXes?

От
Robert James
Дата:
Thanks - I don't show any locale:
rbt_development=> \l
  List of databases
  Name | Owner | Encoding
----------------------------+---------------------+----------
 rbt_development | rbt | UTF8
 ...


On Wed, Jul 22, 2009 at 6:45 PM, Andreas Wenk <a.wenk@netzmeister-st-pauli.de> wrote:
Robert James wrote:
Thank you, Tom.  I guess I'm a bit confused about things here.  How can I find the locale of my database? (I wasn't able to find this in the docs).
If I do have the locale set to 'C', do I loose anything by using utf8 for all text fields?

use psql:

postgres=# \l+
                       List of databases
     Name      |   Owner   | Encoding |  Collation  |    Ctype    |
----------------+-----------+----------+-------------+-------------+
 postgres       | postgres  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |

output shortend ;-)

Cheers

Andy





Re: Can LIKE under utf8 use INDEXes?

От
Andreas Wenk
Дата:
Robert James wrote:
> Thanks - I don't show any locale:
> rbt_development=> \l
>   List of databases
>   Name | Owner | Encoding
> ----------------------------+---------------------+----------
>  rbt_development | rbt | UTF8
>  ...

ahm - you are running pg 8.2. There I think the + option is not
available (\l+). So if you use a debian based system and installed it
via the package manager apt or aptitude you could give this a try:

/usr/lib/postgresql/8.2/bin/pg_controldata \
/var/lib/postgresql/8.2/main/ |grep LC

This should output something like:

LC_COLLATE:    de_DE.UTF-8
LC_CTYPE:      de_DE.UTF-8

I hope this helps a little ...

Cheers

Andy

P.S.: top posting is ugly ;-)

>
> On Wed, Jul 22, 2009 at 6:45 PM, Andreas Wenk
> <a.wenk@netzmeister-st-pauli.de <mailto:a.wenk@netzmeister-st-pauli.de>>
> wrote:
>
>     Robert James wrote:
>
>         Thank you, Tom.  I guess I'm a bit confused about things here.
>          How can I find the locale of my database? (I wasn't able to
>         find this in the docs).
>         If I do have the locale set to 'C', do I loose anything by using
>         utf8 for all text fields?
>
>
>     use psql:
>
>     postgres=# \l+
>                            List of databases
>          Name      |   Owner   | Encoding |  Collation  |    Ctype    |
>     ----------------+-----------+----------+-------------+-------------+
>      postgres       | postgres  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
>
>     output shortend ;-)
>
>     Cheers
>
>     Andy
>
>
>
>
>


Re: Can LIKE under utf8 use INDEXes?

От
Andreas Wenk
Дата:
Andreas Wenk wrote:
> Robert James wrote:
>> Thanks - I don't show any locale:
>> rbt_development=> \l
>>   List of databases
>>   Name | Owner | Encoding
>> ----------------------------+---------------------+----------
>>  rbt_development | rbt | UTF8
>>  ...
>
> ahm - you are running pg 8.2. There I think the + option is not
> available (\l+). So if you use a debian based system and installed it
> via the package manager apt or aptitude you could give this a try:
>
> /usr/lib/postgresql/8.2/bin/pg_controldata \
> /var/lib/postgresql/8.2/main/ |grep LC
>
> This should output something like:
>
> LC_COLLATE:    de_DE.UTF-8
> LC_CTYPE:      de_DE.UTF-8
>
> I hope this helps a little ...
>
> Cheers
>
> Andy
>
> P.S.: top posting is ugly ;-)

more correct: \l+ is also available in 8.2 but the output in 8.4 is
extended ...

Cheers

Andy


Re: Can LIKE under utf8 use INDEXes?

От
Tom Lane
Дата:
Andreas Wenk <a.wenk@netzmeister-st-pauli.de> writes:
> Robert James wrote:
>> Thanks - I don't show any locale:

> ahm - you are running pg 8.2. There I think the + option is not
> available (\l+). So if you use a debian based system and installed it
> via the package manager apt or aptitude you could give this a try:

> /usr/lib/postgresql/8.2/bin/pg_controldata \
> /var/lib/postgresql/8.2/main/ |grep LC

Easier way is "SHOW LC_COLLATE" and "SHOW LC_CTYPE" ...

            regards, tom lane