SQL_ASCII vs. 7-bit ASCII encodings

Поиск
Список
Период
Сортировка
От Oliver Jowett
Тема SQL_ASCII vs. 7-bit ASCII encodings
Дата
Msg-id 4282C29C.4020000@opencloud.com
обсуждение исходный текст
Ответы Re: SQL_ASCII vs. 7-bit ASCII encodings  (Christopher Kings-Lynne <chriskl@familyhealth.com.au>)
Re: SQL_ASCII vs. 7-bit ASCII encodings  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
The SQL_ASCII-breaks-JDBC issue just came up yet again on the JDBC list,
and I'm wondering if we can do something better on the server side to
help solve it.

The problem is that people have SQL_ASCII databases with non-7-bit data
in them under some encoding known only to a (non-JDBC) application.
Changing client_encoding has no effect on a SQL_ASCII database, it's
always passthrough. So when a JDBC client is later written, and the JDBC
driver sets client_encoding=UNICODE, we get data corruption and/or
complaints from the driver that the server is sending it invalid unicode
(because it's really LATIN1 or whatever the original inserter happened
to use).

At this point the user has real problems as there is existing data in
their database in one or more encodings, but the encoding info
associated with that data has been lost. Converting such a database to a
single database-wide encoding is painful at best.

I suppose that we can't change the semantics of SQL_ASCII without
backwards compatibility problems. I wonder if introducing a new encoding
that only allows 7-bit ascii, and making that the default, is the way to
go.

This new encoding would be treated like any other normal encoding, i.e.
setting client_encoding does transcoding (I expect that'd be a 1:1
mapping in most or all cases) and rejects unmappable characters as soon
as they're encountered.

Then the problem is visible as soon as problematic strings are given to
the server, rather than when a client that depends on having proper
encoding information (such as JDBC) happens to be used. If the DB is
only using simple 7-bit ASCII, then there's no change in behaviour. If
the DB does need to store additional characters, the user is forced to
choose an appropriate encoding before any encoding info is lost.

Any thoughts on this?

-O



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Christopher Kings-Lynne
Дата:
Сообщение: Re: patches for items from TODO list
Следующее
От: Christopher Kings-Lynne
Дата:
Сообщение: Re: SQL_ASCII vs. 7-bit ASCII encodings