Re: client side syntax error localisation for psql (v1)

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: client side syntax error localisation for psql (v1)
Дата
Msg-id 20040312.224542.78705812.t-ishii@sra.co.jp
обсуждение исходный текст
Ответ на Re: client side syntax error localisation for psql (v1)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Ответы Re: client side syntax error localisation for psql (v1)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Список pgsql-hackers
> > PQmblen returns the storage size, which is not necessarily same as the
> > character width reprensented in a terminal. For example for a kanji
> > character in UTF-8 PQmblen returns 3, but it ocuppies 2 x ASCII
> > character space, not x 3. Isn't that a problem for you?
> 
> If I read you correctly, you mean that 1 character may take 3 bytes
> of storage in the string, but it is not guaranteed to be 1 character
> from the terminal perspective... Argh, that's definitely an issue:-(
> I assumed that one character whatever the encoding would be 1 character
> on the display.

That's not correct...

One thing I have to note is that some Asian characters such as
Japanese, Chinese require twice the space on a terminal for each
character comparing with plain ASCII characters. This is hard to
explain to those who are not familiar with kanji... Could you take a
look at included screen shot?  As you can see there are four ASCII
characters in the first line. On the second line there are *two* kanji
characters and they occupy same space as above four ASCII
characters. Moreover the strage size for the first line is 4, but the
strage size for the second line may vary depending on the encoding. If
the encoding is EUC_JP or SJIS, it takes 4 bytes, however it takes 6
bytes if the encoding is UTF-8. Got it?

> If it is not the case, I think I can put/compute this information in the
> translation structures that is use by PQmblen, and implement a
> PQmbtermlen function...
> 
> Maybe you could point me some source of information about display lengths
> of characters depending on the encoding?

I could write "PQmbtermlen" function for every encoding supported by
PostgreSQL except UTF-8. Such kind of info for UTF-8 might be quite
complex. I believe there are some mapping tables or functions to get
such kind of info somewhere on the Internet, but I don't remember.

> > I think you can do it safely using PQmblen.
> 
> Ok, what you describe is basically what I've done with the qidx
> computation as suggested by Tom Lane and then later I check that the
> encoded length is one to find my special characters.

Oh, ok.

> Thanks for you reply,

You are welcome!
--
Tatsuo Ishii


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: client side syntax error localisation for psql (v1)
Следующее
От: Andreas Pflug
Дата:
Сообщение: Re: The Name Game: postgresql.net vs. pgfoundry.org