Обсуждение: BUG #5661: The character encoding in logfile is confusing.
The following bug has been logged online: Bug reference: 5661 Logged by: Mikio Email address: tkbysh2000@yahoo.co.jp PostgreSQL version: 9.0 RC1 Operating system: Windows XP SP3 Japanese Description: The character encoding in logfile is confusing. Details: I'm using postgresql 9.0 rc1 on Japanese Windows XP. I found character encoding is confusing in log files in pg_log directory. Default character encoding of all of databases are UTF-8, and almost message strings in log files are described by UTF-8 correctly. But few lines are described by EUC_JP. So 2 character encoding strings are existing in 1 log file and I can't read the messages parts of logs. Incidentally, client_encoding in postgresql.conf is commented out. Thank you.
On 09/16/2010 07:12 PM, Mikio wrote: > > The following bug has been logged online: > > Bug reference: 5661 > Logged by: Mikio > Email address: tkbysh2000@yahoo.co.jp > PostgreSQL version: 9.0 RC1 > Operating system: Windows XP SP3 Japanese > Description: The character encoding in logfile is confusing. > Details: > > I'm using postgresql 9.0 rc1 on Japanese Windows XP. > I found character encoding is confusing in log files in pg_log directory. > Default character encoding of all of databases are UTF-8, and almost message > strings in log files are described by UTF-8 correctly. > But few lines are described by EUC_JP. > So 2 character encoding strings are existing in 1 log file and I can't read > the messages parts of logs. > Incidentally, client_encoding in postgresql.conf is commented out. Thankyou for your report. This certainly sounds like a potential bug - but to do anything about it, we will need to see the contents of the actual log file in question and the contents of postgresql.conf . Only partial log file contents should be necessary, showing the EUC_JP encoded parts of the logs and say ten lines either side. If the EUC_JP contents were generated by client code (say, RAISE NOTICE statements in PL/PgSQL) then you will also need to supply the client code. Please bundle all the files up in a zip file to protect them from possible text encoding conversion during transfer, and post them to a file hosting site. If you don't want them to be public, just collect the logs up and wait for people to ask you to send them to them by private email. Please send a copy to me, as I've dealt with encoding issues in software (though not PostgreSQL) quite a bit. -- Craig Ringer
Hi Craig, Thank you very much for your quick response. I'm happy to participate to improve rc1. This is my first report to postgre team so I'm not sure where is the file hosting site. I'm attaching the log file and postgresql.conf on this email. Please let me know if this is not convenience for the team, can you tell me the url of the appropriate upload site? I'll upload the file onto it. I don't mind for it will be public. BTW, I found third character encoding in the file, Shift_JIS. Attached file is including all of 3 character encoded lines. For your reference: Shift_JIS: Default encoding of Japanese Windows. I found this problem on posgre server which is working as Windows service. EUC_JP: Very major encoding of Japanese Unix. I guess that the developper which worked for this, on some Unix or Linux. UTF-8: Major encoding especially ralating java in Japan. And I specified as default encoding for my all of databases. I didn't edit the log file to avoid change some data by text editor when save it. So attached log file is including from start to end a service. But the log file is very small. Total size is 7kb. And client code is not attached. Cause the messages of bad character encoding are relevant to starting up and shutting down messages. So you can find easily this problem. They are in top and end of log file. Please let me know if you need additional information. Regards. -- <tkbysh2000@yahoo.co.jp> On Fri, 17 Sep 2010 10:53:45 +0800 Craig Ringer <craig@postnewspapers.com.au> wrote: > On 09/16/2010 07:12 PM, Mikio wrote: > > > > The following bug has been logged online: > > > > Bug reference: 5661 > > Logged by: Mikio > > Email address: tkbysh2000@yahoo.co.jp > > PostgreSQL version: 9.0 RC1 > > Operating system: Windows XP SP3 Japanese > > Description: The character encoding in logfile is confusing. > > Details: > > > > I'm using postgresql 9.0 rc1 on Japanese Windows XP. > > I found character encoding is confusing in log files in pg_log directory. > > Default character encoding of all of databases are UTF-8, and almost message > > strings in log files are described by UTF-8 correctly. > > But few lines are described by EUC_JP. > > So 2 character encoding strings are existing in 1 log file and I can't read > > the messages parts of logs. > > Incidentally, client_encoding in postgresql.conf is commented out. > > Thankyou for your report. This certainly sounds like a potential bug - > but to do anything about it, we will need to see the contents of the > actual log file in question and the contents of postgresql.conf . > > Only partial log file contents should be necessary, showing the EUC_JP > encoded parts of the logs and say ten lines either side. If the EUC_JP > contents were generated by client code (say, RAISE NOTICE statements in > PL/PgSQL) then you will also need to supply the client code. > > Please bundle all the files up in a zip file to protect them from > possible text encoding conversion during transfer, and post them to a > file hosting site. If you don't want them to be public, just collect the > logs up and wait for people to ask you to send them to them by private > email. Please send a copy to me, as I've dealt with encoding issues in > software (though not PostgreSQL) quite a bit. > > -- > Craig Ringer >
Вложения
On 09/17/2010 01:10 PM, tkbysh2000@yahoo.co.jp wrote: > BTW, I found third character encoding in the file, Shift_JIS. Attached > file is including all of 3 character encoded lines. > For your reference: > Shift_JIS: Default encoding of Japanese Windows. I found this problem > on posgre server which is working as Windows service. > EUC_JP: Very major encoding of Japanese Unix. I guess that the > developper which worked for this, on some Unix or Linux. > UTF-8: Major encoding especially ralating java in Japan. And I > specified as default encoding for my all of databases. Thanks for that. > I didn't edit the log file to avoid change some data by text editor when > save it. So attached log file is including from start to end a service. > But the log file is very small. Total size is 7kb. Good plan. Thanks. > And client code is not attached. Cause the messages of bad character > encoding are relevant to starting up and shutting down messages. > So you can find easily this problem. They are in top and end of log > file. Yes, the mismatched encodings in the data are clear and obvious. Given that the messages are coming purely from postgresql, not client code, I'm now wondering if what we're dealing with is mismatched encodings in the translation files, where some messages were translated with a different encoding to other messages. One of the correctly encoded messages is "Unexpected EOF received on client connection" One of the incorrectly encoded (shift-JIS) messages is: "Fast Shutdown request received". Another is "Aborting any active transactions". I can find the correctly encoded messages in share/locale/ja/LC_MESSAGES/postgres-9.0.mo The incorrectly encoded messages appear in the same file, but are encoded in utf-8 in that file despite being output to the logs in shift-JIS. For example, with the badly encoded data from the logs extracted into the file 'x': $ python >>> x = open("x").read() >>> x '\x8d\x82\x91\xac\x83V\x83\x83\x83b\x83g\x83_\x83E\x83\x93\x97v\x8b\x81\x82\xf0\x8e\xf3\x82\xaf\x8e\xe6\x82\xe8\x82\xdc\x82\xb5\x82\xbd\r\n' >>> print x.decode("shift-jis") é«éã·ã£ãããã¦ã³è¦æ±ãåãåãã¾ãã $ grep 'é«éã·ã£ãããã¦ã³è¦æ±ãåãåãã¾ãã' * Binary file postgres-9.0.mo matches $ So - either something in the pipeline is "helpfully" converting your error messages, or your locale files aren't the same as mine. I doubt the latter; it seems almost impossible that just a few messages would be converted to shift-JIS by accident in the Windows release only. So the question now is where the messages are converted from UTF-8 to shift-JIS and why that conversion is being applied inconsistently. I'll try to have a look and see what I can find. -- Craig Ringer
Craig Ringer <craig@postnewspapers.com.au> writes: > Yes, the mismatched encodings in the data are clear and obvious. > Given that the messages are coming purely from postgresql, not client > code, I'm now wondering if what we're dealing with is mismatched > encodings in the translation files, where some messages were translated > with a different encoding to other messages. The examples you give don't seem to support that idea. I don't read Japanese, but at least these cases look like they are all UTF8 as expected in the .po files. > One of the correctly encoded messages is "Unexpected EOF received on > client connection" > One of the incorrectly encoded (shift-JIS) messages is: "Fast Shutdown > request received". Another is "Aborting any active transactions". > ... question now is where the messages are converted from UTF-8 to shift-JIS > and why that conversion is being applied inconsistently. Given those three examples, I wonder whether all the mis-encoded messages are emitted by the postmaster, rather than backends. Anyway it seems that you ought to look for some pattern in which messages are correctly vs incorrectly encoded. regards, tom lane