Re: Windows buildfarm animals are still not happy with abbreviated keys patch

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: Windows buildfarm animals are still not happy with abbreviated keys patch
Дата	23 января 2015 г. 17:34:21
Msg-id	CA+TgmoZmQk3XDfPEiTOPQwiJLvECmMxOL9tPeSJ2WJZ7O-ujxw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Windows buildfarm animals are still not happy with abbreviated keys patch (Peter Geoghegan <pg@heroku.com>)
Ответы	Re: Windows buildfarm animals are still not happy with abbreviated keys patch
Список	pgsql-hackers

Дерево обсуждения

On Thu, Jan 22, 2015 at 5:51 PM, Peter Geoghegan <pg@heroku.com> wrote:
> That having been said, it's clearer to continue to handle each case (C
> locale vs other locales) separately within the new
> bttext_abbrev_convert() function, just to be consistent, but also to
> avoid NUL-terminating the text strings to pass to strxfrm(), which as
> you point out is an avoidable cost. So, I'm not asking you to restore
> the terser uniform use of strxfrm() we had before, but, for what it's
> worth, that was not the real issue. The real issue was that strxfrm()
> spuriously used the wrong locale, as you said. Provided strxfrm() had
> the correct locale (the C locale), then AFAICT it ought to have been
> fine, regardless of whether or not it then behave exactly equivalently
> to memcpy().

I don't agree.  On a system where HAVE_LOCALE_T is not defined, there
is *no way* to get strxfrm() to behave like memcpy(), because we're
not passing a locale to it.  Clearly, strxfrm() is going to do a
locale-aware transformation in that case whether the user wrote
collate "C" or not.  The comments in regc_pg_locale.c explain:
* 1. In C/POSIX collations, we use hard-wired code.  We can't depend on* the <ctype.h> functions since those will obey
LC_CTYPE. Note that these* collations don't give a fig about multibyte characters.** 2. In the "default" collation
(whichis supposed to obey LC_CTYPE):** 2a. When working in UTF8 encoding, we use the <wctype.h> functions if*
available. This assumes that every platform uses Unicode codepoints* directly as the wchar_t representation of Unicode.
On some platforms* wchar_t is only 16 bits wide, so we have to punt for codepoints > 0xFFFF.** 2b. In all other
encodings,or on machines that lack <wctype.h>, we use* the <ctype.h> functions for pg_wchar values up to 255, and punt
forvalues* above that.  This is only 100% correct in single-byte encodings such as* LATINn.  However, non-Unicode
multibyteencodings are mostly Far Eastern* character sets for which the properties being tested here aren't very*
relevantfor higher code values anyway.  The difficulty with using the* <wctype.h> functions with non-Unicode multibyte
encodingsis that we can* have no certainty that the platform's wchar_t representation matches* what we do in pg_wchar
conversions.**3. Other collations are only supported on platforms that HAVE_LOCALE_T.* Here, we use the
locale_t-extendedforms of the <wctype.h> and <ctype.h>* functions, under exactly the same cases as #2.

In other words, even on systems that don't HAVE_LOCALE_T, we still
have to support the default collation and the C collation, and they
have to behave differently.  There's no way to make that work using
only strxfrm(), because nothing gets passed to that function to tell
it which of those two things it is supposed to do.

Now even if the above were not an issue, for example because we
dropped support for systems where !HAVE_LOCALE_T, I think it would be
poor form to depend on strxfrm_l() to behave like memcpy() where we
could just as easily call memcpy() and be *sure* that it was going to
do what we wanted.  Much of writing good code is figuring out what
could go wrong and then figuring out how to prevent it, and relying on
strxfrm_l() would be an unnecessary risk regardless.  Given the
existence of !HAVE_LOCALE_T systems, it's just plain broken.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Windows buildfarm animals are still not happy with abbreviated keys patch