Re: Order changes in PG16 since ICU introduction
От | Jeff Davis |
---|---|
Тема | Re: Order changes in PG16 since ICU introduction |
Дата | |
Msg-id | 37520ec1ae9591f83132f82dbd625f3fc2d69c16.camel@j-davis.com обсуждение исходный текст |
Ответ на | Re: Order changes in PG16 since ICU introduction (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: Order changes in PG16 since ICU introduction
Re: Order changes in PG16 since ICU introduction |
Список | pgsql-hackers |
On Fri, 2023-04-28 at 14:35 -0700, Jeff Davis wrote: > On Thu, 2023-04-27 at 14:23 +0200, Daniel Verite wrote: > > This should be pg_strcasecmp(...) == 0 > > Good catch, thank you! Fixed in updated patches. Rebased patches. === 0001: do not convert C to en-US-u-va-posix I plan to commit this soon. If someone specifies "C", they are probably expecting memcmp()-like behavior, or some kind of error/warning that it can't be provided. Removing this transformation means that if you specify iculocale=C, you'll get an error or warning (depending on icu_validation_level), because C is not a recognized icu locale. Depending on how some of the other issues in this thread are sorted out, we may want to relax the validation. === 0002: fix @euro, etc. in ICU >= 64 I'd like to commit this soon too, but I'll wait for someone to take a look. It makes it more reliable to map libc names to icu locale names regardless of the ICU version. It doesn't solve the problem for locales like "de__PHONEBOOK", but those don't seem to be a libc format (I think just an old ICU format), so I don't see a big reason to carry it forward. It might be another reason to turn down the validation level to WARNING, though. === 0003: support C memcmp() behavior with ICU provider The current patch 0003 has a problem, because in previous postgres versions (going all the way back), we allowed "C" as a valid ICU locale, that would actually be passed to ICU as a locale name. But ICU didn't recognize it, and it would end up opening the root locale. So we can't simply redefine "C" to mean "memcmp", because that would potentially break indexes. I see the following potential solutions: 1. Represent the memcmp behavior with iculocale=NULL, or some other catalog hack, so that we can distinguish between a locale "C" upgraded from a previous version (which should pass "C" to ICU and get the root locale), and a new collation defined with locale "C" (which should have memcmp behavior). The catalog representation for locale information is already complex, so I'm not excited about this option, but it will work. 2. When provider=icu and locale=C, magically transform that into provider=libc to get memcmp-like behavior for new collations but preserve the existing behavior for upgraded collations. Not especially clean, but if we issue a NOTICE perhaps that would avoid confusion. 3. Like #2, except create a new provider type "none" which may be slightly less confusing. === 0004: make LOCALE apply to ICU for CREATE DATABASE To understand this patch it helps to understand the confusing situation with CREATE DATABASE in version 15: The keywords LC_CTYPE and LC_COLLATE set the server environment LC_CTYPE/LC_COLLATE for that database and can be specified regardless of the provider. LOCALE can be specified along with (or instead of) LC_CTYPE and LC_COLLATE, in which case whichever of LC_CTYPE or LC_COLLATE is unspecified defaults to the setting of LOCALE. Iff the provider is libc, LC_CTYPE and LC_COLLATE also act as the database default collation's locale. If the provider is icu, then none of LOCALE, LC_CTYPE, or LC_COLLATE affect the database default collation's locale at all; that's controlled by ICU_LOCALE (which may be omitted if the template's daticulocale is non-NULL). The idea of patch 0004 is to address the last part, which is probably the most confusing aspect. But for that to work smoothly, we need something like 0003 so that LOCALE=C gives the same semantics regardless of the provider. Regards, Jeff Davis
Вложения
В списке pgsql-hackers по дате отправления: