Re: Bug in UTF8-Validation Code?

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: Bug in UTF8-Validation Code?
Дата
Msg-id 45FC0D39.8090802@dunslane.net
обсуждение исходный текст
Ответ на Re: Bug in UTF8-Validation Code?  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Bug in UTF8-Validation Code?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Bug in UTF8-Validation Code?  (Martijn van Oosterhout <kleptog@svana.org>)
Список pgsql-hackers

Jeff Davis wrote:
> On Wed, 2007-03-14 at 01:29 -0600, Michael Fuhr wrote:
>   
>> On Tue, Mar 13, 2007 at 04:42:35PM +0100, Mario Weilguni wrote:
>>     
>>> Am Dienstag, 13. März 2007 16:38 schrieb Joshua D. Drake:
>>>       
>>>> Is this any different than the issues of moving 8.0.x to 8.1 UTF8? Where
>>>> we had to use iconv?
>>>>         
>>> What issues? I've upgraded several 8.0 database to 8.1. without having to use 
>>> iconv. Did I miss something?
>>>       
>> http://www.postgresql.org/docs/8.1/interactive/release-8-1.html
>>
>> "Some users are having problems loading UTF-8 data into 8.1.X.  This
>> is because previous versions allowed invalid UTF-8 byte sequences
>> to be entered into the database, and this release properly accepts
>> only valid UTF-8 sequences. One way to correct a dumpfile is to run
>> the command iconv -c -f UTF-8 -t UTF-8 -o cleanfile.sql dumpfile.sql."
>>
>>     
>
> If the above quote were actually true, then Mario wouldn't be having a
> problem. Instead, it's half-true: Invalid byte sequences are rejected in
> some situations and accepted in others. If postgresql consistently
> rejected or consistently accepted invalid byte sequences, that would not
> cause problems with COPY (meaning problems with pg_dump, slony, etc.).
>
>
>   

How can we fix this? Frankly, the statement in the docs warning about 
making sure that escaped sequences are valid in the server encoding is a 
cop-out. We don't accept invalid data elsewhere, and this should be no 
different IMNSHO. I don't see why this should be any different from, 
say, date or numeric data. For years people have sneered at MySQL 
because it accepted dates like Feb 31st, and rightly so. But this seems 
to me to be like our own version of the same problem.

Last year Jeff suggested adding something like:
   pg_verifymbstr(string,strlen(string),0);

to each relevant input routine. Would that be an acceptable solution? If 
not, what would be?

cheers

andrew


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Bison 2.1 on win32