Re: DB encoding, locale and indexes

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: DB encoding, locale and indexes
Дата
Msg-id 5316.1423148171@sss.pgh.pa.us
обсуждение исходный текст
Ответ на DB encoding, locale and indexes  (Sterfield <sterfield@gmail.com>)
Ответы Re: DB encoding, locale and indexes
Список pgsql-general
Sterfield <sterfield@gmail.com> writes:
> I'm a sysadmin working for an application that stores all its data in a PG
> database.
> Currently, the cluster has its encoding set to UTF-8, and the locale (both
> LC_COLLATE and LC_CTYPE) is set to 'en_US.UTF-8'.

> I discovered recently that the indexes created on varchar fields are not
> working for LIKE operator, as they are created without the correct class
> operator (as stated in
> http://www.postgresql.org/docs/9.2/static/indexes-types.html).

Right, because en_US.UTF-8 uses dictionary sort order rather than plain
byte-by-byte sort.

> The most straight-forward solution would be to create a second index on the
> same field but with the class operator, in order to have indexes both for
> =, >, < operators AND LIKE / regexp operators. Few additional indexes, some
> diskspace eaten, problem solved.

Yup.

> However, some people are saying that nothing has to change on the index,
> and that the only thing we have to do is to change the LC_COLLATE of each
> databases to 'C', in order for the indexes to work without the class
> operator.

Yes, that is another possible solution, and it's documented.  Keep in mind
though that you can *not* just reach into pg_database and tweak those
fields; if you did, all your indexes would be corrupt, because they'd no
longer match the sort order the system is expecting.  The only safe way to
get there would be to dump and reload into a new database set up this way.
(If you wanted to live dangerously, I guess you could manually tweak the
pg_database fields and then REINDEX every affected index ... but this
still involves substantial downtime, and I would not recommend doing it
without practicing on a test installation.)

You also have to ask whether any of your applications are expecting ORDER
BY some-text-field to produce dictionary order rather than ASCII order.

>    - If I have unicode character stored in my database (for example 'é'),
>    and the LC_COLLATE set to 'C', how the index will behave if I do a query
>    with LIKE '<something>é%' ?

It's still the same character, but it will sort in a possibly unexpected
way.

            regards, tom lane


В списке pgsql-general по дате отправления:

Предыдущее
От: Igor Neyman
Дата:
Сообщение: Re: Error: could not read symbolic link "pg_tblspc/940585". No such file or directory
Следующее
От: Sterfield
Дата:
Сообщение: Re: DB encoding, locale and indexes