We need to decide on how to handle encoding information embedded in xml
data that is passed through the client/server encoding conversion.
Here is an example:
Client encoding is A, server encoding is B. Client sends an xml datum
that looks like this:
INSERT INTO table VALUES (xmlparse(document '<?xml version="1.0"
encoding="C"?><content>...</content>'));
Assuming that A, B, and C are all distinct, this could fail at a number
of places.
I suggest that we make the system ignore all encoding declarations in
xml data. That is, in the above example, the string would actually
have to be encoded in client encoding B on the client, would be
converted to A on the server and stored as such. As far as I can tell,
this is easily implemented and allowed by the XML standard.
The same would be done on the way back. The datum would arrive in
encoding B on the client. It might be implementation-dependent whether
the datum actually contains an XML declaration specifying an encoding
and whether that encoding might read A, B, or C -- I haven't figured
that out yet -- but the client will always be required to consider it
to be B.
What should be done above the binary send/receive functionality?
Looking at the send/receive functions for the text type, they
communicate all data in the server encoding, so it seems reasonable to
do this here as well.
Comments?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/