Re: Support UTF-8 files with BOM in COPY FROM

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: Support UTF-8 files with BOM in COPY FROM
Дата
Msg-id 20110926.233350.224883171232526681.t-ishii@sraoss.co.jp
обсуждение исходный текст
Ответ на Support UTF-8 files with BOM in COPY FROM  (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
Список pgsql-hackers
> I'd like to support UTF-8 text or csv files that has BOM (byte order mark)
> in COPY FROM command. BOM will be automatically detected and ignored
> if the file encoding is UTF-8. WIP patch attached.

From RFC3629(http://tools.ietf.org/html/rfc3629#section-6):
o A protocol SHOULD forbid use of U+FEFF as a signature for those  textual protocol elements that the protocol mandates
tobe always  UTF-8, the signature function being totally useless in those cases.
 

COPY explicitly specifies the encoding (to be UTF-8 in this case).  So
I think we should not regard U+FEFF as "BOM" in COPY, rather we should
regard U+FEFF as "ZERO WIDTH NO-BREAK SPACE".
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Support UTF-8 files with BOM in COPY FROM
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Is there any plan to add unsigned integer types?