R: Chars problem restoring to ps 8.4 (utf8) a dumped db from ps 8.1 (latin9)

Поиск
Список
Период
Сортировка
От Bianchi Quota Leonardo
Тема R: Chars problem restoring to ps 8.4 (utf8) a dumped db from ps 8.1 (latin9)
Дата
Msg-id D2711A0677B74649872C6AB7603979E980DA9840@FVG-EMB-01V-TS.UCFVG.hosted
обсуждение исходный текст
Список pgsql-general
Hi, surely I will upgrade to 9.4.4! I already downloaded the rpms for the update to postgres 9.4.4 but I thought not to
updatebefore getting through this matter if update is not a prerequisite for the solution.
 

Answering to Tom's last post, I checked that Bugzilla 3.2 (an old installation of Bugzilla) was set to " Use UTF-8
(Unicode)encoding for all text in Bugzilla".
 

Today I did a test, trying to give more details, and I hope this can help to answer this question (which, if I
understoodwell, is the point):
 
Does bugzilla regardless the database charset definition write data using UTF8?
(In the test I do stuff on Bugzilla 5.0 (the last stable release) instead of Bugzilla 3.2 (which is my running
application)because for now I don't want to do tests in the production environment)
 
Then I think it would be very helpful to know if this behavior in general confirms Tom's thoughts.

---------------------TEST--------------------------------
On the new db, created in this way via psql: CREATE DATABASE bugsl9test with owner bugs ENCODING 'LATIN9' TEMPLATE
template0LC_COLLATE 'C' LC_CTYPE 'C';
 
I added two bugs. One setting bugzilla with "utf8":"0" and the other setting "utf8":"1" (1 means use utf8).
In both cases I wrote the char "è" in the field "Summary" of the web form. The result is that the value in the field of
theshort_desc column of "bugs"  table of the specific bug row, viewed via pgadminIII on a windows 7 is "Ú" ,
 
but in the first case (Utf8:"0") bugzilla shows (I use chrome) for both of the two bugs an "è" and in the second case
(utf8:"1")shows "Ú" CORRECTLY as "è".
 
-----------------------------------------------------------

Actually the whole note about setting utf8 to "1" or to "0" is: "Use UTF-8 (Unicode) encoding for all text in Bugzilla.
Newinstallations should set this to true to avoid character encoding problems.
 
Existing databases should set this to true only after the data has been converted from existing legacy character
encodingsto UTF-8, using the contrib/recode.pl script."
 

Recode.pl (https://github.com/bugzilla/bugzilla/blob/master/contrib/recode.pl) is an utility which converts a database
fromone encoding (or multiple encodings) to UTF-8 and I, in a previous test, run recode.pl to convert the data dumped
aslatin9 (of course editing the "client_encoding" from latin9 to utf8) and then no "strange chars" were shown after
restoringin the new utf8 database.
 

Thank you very much for your attention and patience!

Bye,
Leonardo


-----Messaggio originale-----
Da: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Inviato: giovedì 13 agosto 2015 16:39
A: Martín Marqués
Cc: Adrian Klaver; Bianchi Quota Leonardo; 'pgsql-general@postgresql.org'
Oggetto: Re: [GENERAL] Chars problem restoring to ps 8.4 (utf8) a dumped db from ps 8.1 (latin9)

"=?UTF-8?Q?Mart=c3=adn_Marqu=c3=a9s?=" <martin.marques@gmail.com> writes:
> El 12/08/15 a las 11:12, Tom Lane escribió:
>> It does not seem likely to me that this would work at all.  You're
>> taking a dump file that is full of LATIN9 data and simply asserting
>> that it's
>> UTF8 data.  That doesn't make it so.  If it seemed to work, maybe
>> that's because your editor changed the encoding?  Not to be relied on, for sure.

> Well, IIRC a LATIN9 encoding char which is interpreted as UTF8 will
> get inserted with no error on a UTF8 server (although the final data
> will be bogus).

I'd believe the other way around: if you tell the database that you're using LATIN9, but what you send is really UTF8,
itwill not reject it because the individual bytes are perfectly valid LATIN9 characters and there are no cross-byte
checksto make in LATIN9.  But it seems highly unlikely that LATIN9-encoded data would get past the UTF8 validity
checkerwith any consistency.
 

It's possible that the problem is one of mislabeling, ie the database was claimed to use LATIN9 but what was actually
sentwas always UTF8.
 
If that was *always* the case then the OP's fix of changing the label in the dump file was actually the right thing to
do. But we haven't been given enough information to be sure of that --- and if that's what was happening, then some
clientsoftware fixes would be in order anyway, because the client code was using the wrong client_encoding.
 

                        regards, tom lane
AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute nel messaggio o nei suoi allegati. Se non siete
idestinatari indicati nel messaggio, o responsabili per la sua consegna alla persona, o se avete ricevuto il messaggio
pererrore, siete pregati di non trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi invitiamo a cancellare il
messaggioed i suoi allegati. Grazie. CONFIDENTIALITY NOTICE Confidential information may be contained in this message
orin its attachments. If you are not the addressee indicated in this message, or responsible for message delivering to
thatperson, or if you have received this message in error, you may not transcribe, copy or deliver this message to
anyone.In that case, you should delete this message and its attachments. Thank you.
 

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Smith
Дата:
Сообщение: Re: retrieve subset of a jsonb object with a list of keys
Следующее
От: Vincent Veyron
Дата:
Сообщение: Re: PostgreSQL customer list