Re: Unicode grapheme clusters

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Unicode grapheme clusters
Дата
Msg-id Y8nktYFVf21NmmU+@momjian.us
обсуждение исходный текст
Ответ на Re: Unicode grapheme clusters  (Greg Stark <stark@mit.edu>)
Ответы Re: Unicode grapheme clusters  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, Jan 19, 2023 at 07:37:48PM -0500, Greg Stark wrote:
> This is how we've always documented it. Postgres treats code points as
> "characters" not graphemes.
> 
> You don't need to go to anything as esoteric as emojis to see this either.
> Accented characters like é have no canonical forms that are multiple code
> points and in some character sets some accented characters can only be
> represented that way.
> 
> But I don't think there's any reason to consider changing e existing functions.
> They have to be consistent with substr and the other string manipulation
> functions.
> 
> We could add new functions to work with graphemes but it might bring more pain
> keeping it up to date....

I am not sure what you are referring to above?  character_length?  I was
talking about display length, and psql uses that --- at some point, our
lack of support for graphemes will cause psql to not align columns.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: refactoring relation extension and BufferAlloc(), faster COPY
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Unicode grapheme clusters