Re: UTF8 with BOM support in psql

Поиск
Список
Период
Сортировка
От Itagaki Takahiro
Тема Re: UTF8 with BOM support in psql
Дата
Msg-id 20091117141958.150B.52131E4D@oss.ntt.co.jp
обсуждение исходный текст
Ответ на Re: UTF8 with BOM support in psql  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: UTF8 with BOM support in psql
Список pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> > If encoding setting is reverted,
> >> "Eat BOM at beginning of file and <<set client encoding to UTF-8>>"
> > will be much safer.
>
> This isn't going to happen, so please stop wasting our time arguing
> about it.

Ok, sorry. But I still cannot accept this restriction.
>> - Only when client encoding is UTF-8 --> please fix that

The attachd patch is a new proposal of the feature.
When we found BOM at beginning of file, set "expected_encoding" to UTF8.
Before every execusion of query, if pset.encoding is not UTF8, we check the
query string not to contain any non-ASCII characters and throw an error if
found. Encoding declarations are typically written only in ascii characters,
so we can postpone encoding checking until non-ascii characters appear.

Since the default value of expected_encoding is SQL_ASCII, that pass
through all characters, so the patch does nothing to scripts without BOM.
(There are no codes to set expected_encoding except BOM.)
If client encoding is UTF8, it skips BOM and no effect to the script body.
BOMs are skipped even if client encoding is not set to UTF8, but can throw
an error if there are no explicit encoding declaration.

AFAIC, the patch can solve the almost problems in the discussions
developmentally. Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: sgml and "empty" closing tags
Следующее
От: George Gensure
Дата:
Сообщение: Re: patch - Report the schema along table name in a referential failure error message