Re: Built-in CTYPE provider

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: Built-in CTYPE provider
Дата
Msg-id 19b34a70-f5cf-4faf-88dc-917db44ed48d@eisentraut.org
обсуждение исходный текст
Ответ на Re: Built-in CTYPE provider  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Built-in CTYPE provider  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On 08.03.24 02:00, Jeff Davis wrote:
> And here's v22 (I didn't post v21).
> 
> I committed Unicode property tables and functions, and the simple case
> mapping. I separated out the full case mapping changes (based on
> SpecialCasing.txt) into patch 0006.

> 0002: Basic builtin collation provider that only supports "C".

Overall, this patch looks sound.

In the documentation, let's make the list of locale providers an actual 
list instead of a sequence of <sect3>s.

We had some discussion on initdb option --builtin-locale and whether it 
should be something more general.  I'm ok with leaving it like this for 
now and maybe consider as an "open item" for PG17.

In

     errmsg("parameter \"locale\" must be specified")

make "locale" a placeholder.  (See commit 36a14afc076).

It seems the builtin provider accepts both "C" and "POSIX" as locale
names, but the documentation says it must be "C".  Maybe we don't need
to accept "POSIX"?  (Seeing that there are no plans for "POSIX.UTF-8",
maybe we just ignore the "POSIX" spelling altogether?)

Speaking of which, the code in postinit.c is inconsistent in that 
respect with builtin_validate_locale().  Shouldn't postinit.c use
builtin_validate_locale(), to keep it consistent?

Or, there could be a general function that accepts a locale provider and 
a locale string and validates everything together?

In initdb.c, this message

printf(_("The database cluster will be initialized with no locale.\n"));

sounds a bit confusing.  I think it's ok to show "C" as a locale.  I'm
not sure we need to change the logic here.

Also in initdb.c, this message

pg_fatal("locale must be specified unless provider is libc");

should be flipped around, like

locale must be specified if provider is %s

In pg_dump.c, dumpDatabase(), there are some new warning messages that
are not specifically about the builtin provider.  Are those existing
deficiencies?  It's not clear to me.

What are the changes in the pg_upgrade test about?  Maybe explain the
scenario it is trying to test briefly?


> 0004: Inline some UTF-8 functions to improve performance

Makes sense that inlining can be effective here.  But why aren't you 
just inlining the existing function pg_utf_mblen()?  Now we have two 
functions that do the same thing.  And the comment at pg_utf_mblen() is 
removed completely, so it's not clear anymore why it exists.


> 0005: Add a unicode_strtitle() function and move the implementation for
> the builtin provider out of formatting.c.

In the recent discussion you had expression some uncertainty about the 
detailed semantics of this.  INITCAP() was copied from Oracle, so we 
could check there for reference, too.  Or we go with full Unicode 
semantics.  I'm not clear on all the differences and tradeoffs, if there 
are any.  In any case, it would be good if there were documentation or a 
comment that somehow wrote down the resolution of this.




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Lakhin
Дата:
Сообщение: Re: DROP DATABASE is interruptible
Следующее
От: "Andrey M. Borodin"
Дата:
Сообщение: Re: Transaction timeout