Multibyte still broken

Поиск
Список
Период
Сортировка
От Michael Robinson
Тема Multibyte still broken
Дата
Msg-id 200005101408.WAA07324@netrinsics.com
обсуждение исходный текст
Ответы Re: Multibyte still broken
Re: Multibyte still broken
Список pgsql-hackers
These are exerpts from a message from Tatsuo Ishii dated January 26, on
the subject of fragile code in the multibyte routines:

---- begin ----
Defensive programming saves the system but does not user. Once
corrupted data is stored in the system, it's totally useless for the
user anyway.  What about validating data *before* inserting it into a
table?
---- end ----

---- begin ----
> >Here it is. With this patch, copy out should be happy even with the
> >wrong data. I'm not sure if it could be displayed correctly, though.
> 
> Thank you very much.  However, I think even this is too optimistic:
> 
> >!     if (*s & 0x80)
> 
> Shouldn't it be something like:
> 
>     if ((*s & 0x80) && (*(s+1) & 0x80))
> 
> Even though "\242\242\242\0" is an invalid EUC sequence, it still shouldn't be
> allowed to break the software.

Thanks for the suggestion. More robust code is always good.
---- end ----

More robust code may always be good, but "good" apparently doesn't always go
into the tree.  Imagine my surprise, while upgrading a production server
from 6.5.3 to 7.0, when the data dumped from the old database failed to load
into the new database (well, crashed the backend, to be specific).

Apparently the "validate your own damn data" sentiment of the first excerpt
above has prevailed, because, on inspection, the MB code is just as fragile
as it was five months ago.

I was forced to perform emergency repairs to my database dump file to fool a 
non-multibyte 7.0 into accepting it.  Since EUC_CN is compatible with 
Latin-1, and since the benefits of multibyte are small compared to the 
risks, I intend to stick with unibyte Postgres henceforth.

I would, though, recommend a warning in the "INSTALL" file along the lines of:
 "WARNING: Use of improperly-encoded text with multi-byte support enabled  WILL lead to data corruption and/or loss.
Donot enable multi-byte support  unless you intend to fully validate your own damn data."
 
-Michael Robinson



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Lockhart
Дата:
Сообщение: FTP site
Следующее
От: "Ross J. Reedstrom"
Дата:
Сообщение: Re: pgsql/php3/apache authentication