Encoding problems in PostgreSQL with XML data

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Encoding problems in PostgreSQL with XML data
Дата
Msg-id 200401091946.01930.peter_e@gmx.net
обсуждение исходный текст
Ответы Re: Encoding problems in PostgreSQL with XML data  (Rod Taylor <pg@rbt.ca>)
Список pgsql-hackers
This is not directly related to current development, but it is something 
that might need a low-level solution.  I've been thinking for some time 
about how to enchance the current "XML support" (e.g., contrib/xml).

The central problem I have is this:  How do we deal with the fact that 
an XML datum carries its own encoding information?

Here's a scenario:  It is desirable to have validity checking on XML 
input, be it a special XML data type or some functions that take XML 
data.  Say we define a data type that stores XML documents and rejects 
documents that are not well-formed.  I want to insert something in 
psql:

CREATE TABLE test (   description text,   content xml
);

INSERT INTO test VALUES ('test document', '<?xml 
version="1.0"?><doc><para>blah</para>...</doc>');

Now an XML parser will assume this document to be in UTF-8, and say at 
the client it is.  What if client_encoding=UNICODE but 
server_encoding=LATIN1?  Do we expect some layer to rewrite the <?xml?> 
declaration to contain the correct encoding information?  Or can the 
xml type bypass encoding conversion?  What about reading it back out of 
the database with yet another client encoding?

Rewriting the <?xml?> declaration seems like a workable solution, but it 
would break the transparency of the client/server encoding conversion.  
Also, some people might dislike that their documents are being changed 
as they are stored.

Any ideas?



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: "with grant option" for user groups.
Следующее
От: Andreas Pflug
Дата:
Сообщение: Re: OLE DB driver