Re: Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding
Дата	9 июня 2013 г. 18:40:08
Msg-id	19394.1370792358@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding (Andrew Dunstan <andrew@dunslane.net>)
Ответы	Re: Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding
Список	pgsql-hackers

Дерево обсуждения

Andrew Dunstan <andrew@dunslane.net> writes:
> On 06/09/2013 12:38 AM, Noah Misch wrote:
>> PostgreSQL has lived with this wrong behavior since ... the beginning?  It's a
>> problem, certainly, but a bandage fix brings its own trouble.

I don't see this as particularly bandage-y.  It's a subset of the
spec-required folding behavior, sure, but at least now it's a proper
subset of that behavior.  It preserves all cases in which the previous
coding did the right thing, while removing some cases in which it
didn't.

> If you have a better fix I am all ears. I can recall at least one 
> discussion of this area (concerning Turkish I quite a few years ago) 
> where we failed to come up with anything.

Yeah, Turkish handling of i/I messes up any attempt to use the locale's
case-folding rules straightforwardly.  However, I think we've already
fixed that with the rule that ASCII characters are folded manually.
The resistance to moving this code to use towlower() for non-ASCII
mainly comes from worries about speed, I think; although there was also
something about downcasing conversions that change the string's byte
length being problematic for some callers.

> I have a fairly hard time believing in your "relies on this and somehow 
> works" scenario.

The key point for me is that if tolower() actually does anything in the
previous state of the code, it's more than likely going to produce
invalidly encoded data.  The consequences of that can't be good.
You can argue that there might be people out there for whom the
transformation accidentally produced a validly-encoded string, but how
likely is that really?  It seems much more likely that the only reason
we've not had more complaints is that on most popular platforms, the
code accidentally fails to fire on any UTF8 characters (or any common
ones, anyway).  On those platforms, there will be no change of behavior.
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Kevin Grittner
Дата: 09 июня 2013 г., 17:49:04
Сообщение: Re: ALTER TABLE ... ALTER CONSTRAINT

Следующее

От: Tom Lane
Дата: 09 июня 2013 г., 19:28:24
Сообщение: Re: small patch to crypt.c

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding

Предыдущее

Следующее