Re: Unicode grapheme clusters

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Unicode grapheme clusters
Дата
Msg-id Y9AvgA1+93WXp9gN@momjian.us
обсуждение исходный текст
Ответ на Re: Unicode grapheme clusters  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On Tue, Jan 24, 2023 at 11:40:01AM -0500, Greg Stark wrote:
> On Sat, 21 Jan 2023 at 13:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Probably our long-term answer is to avoid depending on wcwidth
> > and use wcswidth instead.  But it's hard to get excited about
> > doing the legwork for that until popular libc implementations
> > get it right.
> 
> Here's an interesting blog post about trying to do this in Rust:
> 
> https://tomdebruijn.com/posts/rust-string-length-width-calculations/
> 
> TL;DR... Even counting the number of graphemes isn't enough because
> terminals typically (but not always) display emoji graphemes using two
> columns.
> 
> At the end of the day Unicode kind of assumes a variable-width display
> where the rendering is handled by something that has access to the
> actual font metrics. So anything trying to line things up in columns
> in a way that works with any rendering system down the line using any
> font is going to be making a best guess.

Yes, good article, though I am still surprised this is not discussed
more often.  Anyway, for psql, we assume a fixed width output device, so
we can just assume that for computation.  You are right that Unicode
just doesn't seem to consider fixed width output cases and doesn't
provide much guidance.

Beyond psql, should we update our docs to say that character_length()
for Unicode returns the number of Unicode code points, and not
necessarily the number of displayed characters if grapheme clusters are
present?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jacob Champion
Дата:
Сообщение: Re: Non-superuser subscription owners
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation