Re: multibyte-character aware support for function "downcase_truncate_identifier()"

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: multibyte-character aware support for function "downcase_truncate_identifier()"
Дата
Msg-id AANLkTik-6ThpaSrnbXvkjXfJfmtTCm3RadYqTuf-q_sp@mail.gmail.com
обсуждение исходный текст
Ответ на Re: multibyte-character aware support for function "downcase_truncate_identifier()"  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
On Sun, Nov 21, 2010 at 6:22 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> On 11/21/2010 06:09 PM, Robert Haas wrote:
>
> I think that's fair.  It actually doesn't seem like it should be that
> hard if we knew that the server encoding were UTF8 - it's just a big
> translation table somewhere, no?
>
> No, it's far more complex. See for example
> <http://unicode.org/reports/tr21/tr21-3.html>, which says:
>
> There are a number of complications to case mappings that occur once the
> repertoire of characters is expanded beyond ASCII.
>
> Because of the inclusion of certain composite characters for compatibility,
> such as 01F1 "DZ" capital dz, there is a third case, called titlecase, which
> is used where the first letter of a word is to be capitalized (e.g.
> Titlecase, vs. UPPERCASE, or lowercase).
>
> For example, the title case of the example character is 01F2 "Dz" capital d
> with small z.
>
> Case mappings may produce strings of different length than the original.
>
> For example, the German character 00DF "ß" small letter sharp s expands when
> uppercased to the sequence of two characters "SS". This also occurs where
> there is no precomposed character corresponding to a case mapping, such as
> with 0149 "ʼn" latin small letter n preceded by apostrophe.
>
> Characters may also have different case mappings, depending on the context.
>
> For example, 03A3 "Σ" capital sigma lowercases to 03C3 "σ" small sigma if it
> is followed by another letter, but lowercases to 03C2 "ς" small final sigma
> if it is not.
>
> Characters may have case mappings that depend on the locale.
>
> For example, in Turkish the letter 0049 "I" capital letter i lowercases to
> 0131 "ı" small dotless i.
>
> Case mappings are not, in general, reversible.
>
> For example, once the string "McGowan" has been uppercased, lowercased or
> titlecased, the original cannot be recovered by applying another uppercase,
> lowercase, or titlecase operation.

Yikes.  So what do people do about this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: multibyte-character aware support for function "downcase_truncate_identifier()"
Следующее
От: Tom Lane
Дата:
Сообщение: Re: knngist - 0.8