Re: Unicode grapheme clusters

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Unicode grapheme clusters
Дата
Msg-id Y8wrKdVl/HpKDYrP@momjian.us
обсуждение исходный текст
Ответ на Re: Unicode grapheme clusters  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: Unicode grapheme clusters  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Sat, Jan 21, 2023 at 12:37:30PM -0500, Bruce Momjian wrote:
> Well, as one of the URLs I quoted said:
> 
>     This is by design. wcwidth() is utterly broken. Any terminal or
>     terminal application that uses it is also utterly broken. Forget
>     about emoji wcwidth() doesn't even work with combining characters,
>     zero width joiners, flags, and a whole bunch of other things.
> 
> So, either we have to find a function in the library that will do the
> looping over the string for us, or we need to identify the special
> Unicode characters that create grapheme clusters and handle them in our
> code.

I just checked if wcswidth() would honor graphene clusters, though
wcwidth() does not, but it seems wcswidth() treats characters just like
wcwidth():

    $ LANG=en_US.UTF-8 grapheme_test
    wcswidth len=7
    
    bytes_consumed=4, wcwidth len=2
    bytes_consumed=4, wcwidth len=2
    bytes_consumed=3, wcwidth len=0
    bytes_consumed=3, wcwidth len=1
    bytes_consumed=3, wcwidth len=0
    bytes_consumed=4, wcwidth len=2

C test program attached.  This is on Debian 11.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Unicode grapheme clusters
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Unicode grapheme clusters