Re: UTF8 with BOM support in psql

Поиск
Список
Период
Сортировка
От Chuck McDevitt
Тема Re: UTF8 with BOM support in psql
Дата
Msg-id 2106D8DC89010842BABA5CD03FEA4061012E8BE3B9@EXVMBX018-10.exch018.msoutlookonline.net
обсуждение исходный текст
Ответ на Re: UTF8 with BOM support in psql  (Peter Eisentraut <peter_e@gmx.net>)
Ответы Re: UTF8 with BOM support in psql
Список pgsql-hackers
>
> I don't know what the best solution is here.  The BOM encoded as UTF-8
> is valid data in other encodings.  Of course, there is your point that
> such data cannot be at the start of an SQL command.
>

Is the UTF-8 BOM ( EF BB BF ) actually valid data in any other multi-byte encoding (other than it's intended use in
UTF-8)?

I realize that for single-byte encoding, such as latin-1, it would be legal as data, since any bytes other that 00 are
legal,although never legal outside a quoted string in a SQL command or psql command. 

Certainly, no psql command input file can start with those bytes, or you would get an error (unless it is changed so
theBOM is ignored). 

As to zero-width non-breaking space:  the BOM is supposed to be treated as such if in the middle of a string, but if it
isthe start, it is just the BOM, and isn't considered part of the data, if I'm reading the spec right.  Perhaps the
lexersshould allow for it as white space (along with other Unicode space characters, such as U+2060). 
It's not really important, since allowing the BOM sequence in the middle of a file is "deprecated" according to the
Unicodestandard. 

And what if you see a real BOM, FF FE or FE FF or FF FE 00 00 or 00 00 FE FF?  Give an error saying UTF-16 and UTF-32
arenot supported? 

Or is there a plan to read and convert the UTF-16 or UTF-32 to UTF-8, so psql and PostgreSQL understand it?
(BTW, that would actually be nice on Windows, where UTF-16 is common).

If we accept UTF-8 BOM, we should at least detect the other BOM sequences and give an error or warning.

Overall, from my user point of view, having psql deal with the BOM (at least the utf-8 one) would be more friendly than
currentbehavior, as some editors (notepad for example) automatically add the BOM to the beginning of Unicode files, and
it'snot obvious without dumping the file in hex. 





В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: next CommitFest
Следующее
От: "Albe Laurenz"
Дата:
Сообщение: Re: Rejecting weak passwords