Re: client side syntax error localisation for psql (v1)

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: client side syntax error localisation for psql (v1)
Дата
Msg-id 20040312.195530.74755421.t-ishii@sra.co.jp
обсуждение исходный текст
Ответ на Re: client side syntax error localisation for psql (v1)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Ответы Re: client side syntax error localisation for psql (v1)  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Re: client side syntax error localisation for psql (v1)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Список pgsql-hackers
> Dear Tatsuo,
> 
> Thanks for your reply, as I noticed from the source code that your name
> often appears when dealing with multi-byte issues;-)
> 
> On Fri, 12 Mar 2004, Tatsuo Ishii wrote:
> > As far as I understand your code, it will be broken on many multi byte
> > encodings.
> >
> > 1) a character is not always represented on a terminal propotional to
> >    the storage size. For example a kanji character in UTF-8 encoding
> >    has a storage size of 3 bytes while it occupies spaces only twice
> >    of ASCII characters on a terminal. Same thing can be said to LATIN
> >    2,3 etc. in UTF-8 perhaps.
> 
> I thought I dealt with that in the code by calling PQmblen for every char.
> Am I wrong ?

PQmblen returns the storage size, which is not necessarily same as the
character width reprensented in a terminal. For example for a kanji
character in UTF-8 PQmblen returns 3, but it ocuppies 2 x ASCII
character space, not x 3. Isn't that a problem for you?

> > 2) It assume all encodings are "ASCII compatible". Apparently some
> >    client-side-only encodings do not satisfy this request. Examples
> >    include SJIS, Big5.
> 
> What I mean by "ASCII compatible" is that spaces, new lines, carriage
> returns, tabs and NULL (C string terminaison) are one byte characters.
> This assumption seemed pretty safe to me.
> 
> If this is not the case, I cannot understand how any error message could
> work in psql. If one printf(" "), that would not be a space character?
> Or is the terminal doing some "on the fly" translation?? What if a
> file is read with such encoding??? Or is there a special compilation
> option to generate special strings, but in this case the executable
> would not be compatible with any other terminal????
> 
> Well, I just underline my lack of knowledge here:-(
> 
> If not, how can I detect these special characters that I need to change ?
> Maybe I could translate the string to a pg_wchar[] if the function is
> available to psql?

I think you can do it safely using PQmblen.

1) start from the begining of the target string

2) apply PQmblen

3) if it returns 1, you can do the spcecial character detection

4) otherwise it must not be an ASCII character and you can skip as  many characters as PQmnlen returns

5) goto 1) if any characters remain

> Also as I quick and dirty temporary fix, I can skip statement extraction
> for those encodings that do not meet my expectations. So I would need to
> know what encodings are at risk with the current scheme?
> 
> -- 
> Fabien Coelho - coelho@cri.ensmp.fr
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Simon Riggs"
Дата:
Сообщение: Re: Default Stats Revisited
Следующее
От: "Alex J. Avriette"
Дата:
Сообщение: Re: Timing of 'SELECT 1'