Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"
Дата
Msg-id 1247507930.17862.111.camel@ayaki
обсуждение исходный текст
Ответ на BUG #4890: Allow insert character has no equivalent in "LATIN2"  ("saint" <saint@akpa.pl>)
Ответы Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"  (Robert Świętochowski<robert.swietochowski@akpa.pl>)
Список pgsql-bugs
(Please reply to the list, not just to me)

I'm not sure about this so far. Re the specific issue you mention of
conversion between cp1250 and latin-2 (ISO-8859-2) the Unicode tables
at:

  http://unicode.org/Public/MAPPINGS/ISO8859/8859-2.TXT

appear to agree - there's no PER MILLE in ISO-8859-2.

With a UTF-8 database, Pg correctly doesn't accept PER MILLE as a valid
ISO-8859-2 char:

-- Connecting with unicode (utf-8) client
CREATE TABLE test (x);
INSERT INTO test(x) VALUES ('‰');

SET client_encoding='iso-8859-2';
SELECT * from test;
ERROR:  character 0xe280b0 of encoding "UTF8" has no equivalent in
"LATIN2"

If the encoding is set to WIN1250 Pg outputs the appropriate byte. So
it's doing the right thing in each individual case where a UTF-8 DB is
concerned.

Your problem, though, is that if you connect to a LATIN2 database with a
WIN1250 client and INSERT a string containing the per-mille glyph, Pg
accepts it and it should not. If it does, indeed, accept it, then I
agree that's a bug.

I haven't tested with a LATIN2 database as I'd have to re-initdb and the
machine I'm working on has semi-useful databases on it. What you're
saying makes sense, though, presuming your client really is sending
win1250 per-mille (byte 0x89).


I'd still like to know how you're setting your client encoding. You
can't just run "SET client_encoding='win1250'" - you must tell the
client program, or the terminal it runs in, to use the appropriate
encoding as well. Otherwise when you paste the per-mille character
you'll see the right glyph, but the CLIENT will interpret that as the
character in the encoding you specified.

So, if you're using a utf-8 terminal, that means that the terminal will
send 0xe2 0x80 0xb0 for per-mille, which when interpreted as win1250
becomes ‰ , so that's what the server thinks you sent it.

In that case, though, you'd find that the euro symbol, which isn't
defined in latin-2, will cause an error:

ERROR:  character 0xe282ac of encoding "UTF8" has no equivalent in
"LATIN2"




--
Craig Ringer

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: BUG #4914: uuid_generate_v4 not present in eithersource or yum/rpm
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"