Re: PostgreSQL implicitly double-quoting identifier name with umlaut

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: PostgreSQL implicitly double-quoting identifier name with umlaut
Дата
Msg-id 156004.1726269616@sss.pgh.pa.us
обсуждение исходный текст
Ответ на PostgreSQL implicitly double-quoting identifier name with umlaut  (Michael Downey <mdowney@esri.com>)
Ответы RE: PostgreSQL implicitly double-quoting identifier name with umlaut
Список pgsql-sql
Michael Downey <mdowney@esri.com> writes:
> One of our internal users, using our tools, added a column called Örtschaft. We anticipated it would be folded to
lowercase. 
> So we inserted our metadata for the column in our metadata with the name örtschaft. With the system query for
metadata,we 
> ended up seeing query mismatches involving this column as we found the actual column name is Örtschaft
> in the database.

When working in UTF8 (or any multibyte encoding), PG's identifier
case-folding changes only ASCII letters.  I can't find anything in
our SGML docs about this, at least not where I'd expect it to be
documented.  The code is pretty clear about what it's doing though:

    /*
     * SQL99 specifies Unicode-aware case normalization, which we don't yet
     * have the infrastructure for.  Instead we use tolower() to provide a
     * locale-aware translation.  However, there are some locales where this
     * is not right either (eg, Turkish may do strange things with 'i' and
     * 'I').  Our current compromise is to use tolower() for characters with
     * the high bit set, as long as they aren't part of a multi-byte
     * character, and use an ASCII-only downcasing for 7-bit characters.
     */

These days the claim that no infrastructure is available is obsolete.
But I'm mighty hesitant to touch this behavior, because it'd almost
surely break peoples' apps.  We could do better on the documentation
front though.

            regards, tom lane



В списке pgsql-sql по дате отправления: