Обсуждение: BUG #5661: The character encoding in logfile is confusing.

Поиск
Список
Период
Сортировка

BUG #5661: The character encoding in logfile is confusing.

От
"Mikio"
Дата:
The following bug has been logged online:

Bug reference:      5661
Logged by:          Mikio
Email address:      tkbysh2000@yahoo.co.jp
PostgreSQL version: 9.0 RC1
Operating system:   Windows XP SP3 Japanese
Description:        The character encoding in logfile is confusing.
Details:

I'm using postgresql 9.0 rc1 on Japanese Windows XP.
I found character encoding is confusing in log files in pg_log directory.
Default character encoding of all of databases are UTF-8, and almost message
strings in log files are described by UTF-8 correctly.
But few lines are described by EUC_JP.
So 2 character encoding strings are existing in 1 log file and I can't read
the messages parts of logs.
Incidentally, client_encoding in postgresql.conf is commented out.

Thank you.

Re: BUG #5661: The character encoding in logfile is confusing.

От
Craig Ringer
Дата:
On 09/16/2010 07:12 PM, Mikio wrote:
>
> The following bug has been logged online:
>
> Bug reference:      5661
> Logged by:          Mikio
> Email address:      tkbysh2000@yahoo.co.jp
> PostgreSQL version: 9.0 RC1
> Operating system:   Windows XP SP3 Japanese
> Description:        The character encoding in logfile is confusing.
> Details:
>
> I'm using postgresql 9.0 rc1 on Japanese Windows XP.
> I found character encoding is confusing in log files in pg_log directory.
> Default character encoding of all of databases are UTF-8, and almost message
> strings in log files are described by UTF-8 correctly.
> But few lines are described by EUC_JP.
> So 2 character encoding strings are existing in 1 log file and I can't read
> the messages parts of logs.
> Incidentally, client_encoding in postgresql.conf is commented out.

Thankyou for your report. This certainly sounds like a potential bug -
but to do anything about it, we will need to see the contents of the
actual log file in question and the contents of postgresql.conf .

Only partial log file contents should be necessary, showing the EUC_JP
encoded parts of the logs and say ten lines either side. If the EUC_JP
contents were generated by client code (say, RAISE NOTICE statements in
PL/PgSQL) then you will also need to supply the client code.

Please bundle all the files up in a zip file to protect them from
possible text encoding  conversion during transfer, and post them to a
file hosting site. If you don't want them to be public, just collect the
logs up and wait for people to ask you to send them to them by private
email. Please send a copy to me, as I've dealt with encoding issues in
software (though not PostgreSQL) quite a bit.

--
Craig Ringer

Re: BUG #5661: The character encoding in logfile is confusing.

От
tkbysh2000@yahoo.co.jp
Дата:
Hi Craig,

Thank you very much for your quick response.
I'm happy to participate to improve rc1.

This is my first report to postgre team so I'm not sure where is the
file hosting site.
I'm attaching the log file and postgresql.conf on this email.
Please let me know if this is not convenience for the team, can you tell
me the url of the appropriate upload site? I'll upload the file onto it.
I don't mind for it will be public.

BTW, I found third character encoding in the file, Shift_JIS. Attached
file is including all of 3 character encoded lines.
For your reference:
 Shift_JIS: Default encoding of Japanese Windows. I found this problem
 on posgre server which is working as Windows service.
 EUC_JP: Very major encoding of Japanese Unix. I guess that the
 developper which worked for this, on some Unix or Linux.
 UTF-8: Major encoding especially ralating java in Japan. And I
 specified as default encoding for my all of databases.

I didn't edit the log file to avoid change some data by text editor when
save it. So attached log file is including from start to end a service.
But the log file is very small. Total size is 7kb.
And client code is not attached. Cause the messages of bad character
encoding are relevant to starting up and shutting down messages.
So you can find easily this problem. They are in top and end of log
file.

Please let me know if you need additional information.

Regards.

--
 <tkbysh2000@yahoo.co.jp>


On Fri, 17 Sep 2010 10:53:45 +0800
Craig Ringer <craig@postnewspapers.com.au> wrote:

> On 09/16/2010 07:12 PM, Mikio wrote:
> >
> > The following bug has been logged online:
> >
> > Bug reference:      5661
> > Logged by:          Mikio
> > Email address:      tkbysh2000@yahoo.co.jp
> > PostgreSQL version: 9.0 RC1
> > Operating system:   Windows XP SP3 Japanese
> > Description:        The character encoding in logfile is confusing.
> > Details:
> >
> > I'm using postgresql 9.0 rc1 on Japanese Windows XP.
> > I found character encoding is confusing in log files in pg_log directory.
> > Default character encoding of all of databases are UTF-8, and almost message
> > strings in log files are described by UTF-8 correctly.
> > But few lines are described by EUC_JP.
> > So 2 character encoding strings are existing in 1 log file and I can't read
> > the messages parts of logs.
> > Incidentally, client_encoding in postgresql.conf is commented out.
>
> Thankyou for your report. This certainly sounds like a potential bug -
> but to do anything about it, we will need to see the contents of the
> actual log file in question and the contents of postgresql.conf .
>
> Only partial log file contents should be necessary, showing the EUC_JP
> encoded parts of the logs and say ten lines either side. If the EUC_JP
> contents were generated by client code (say, RAISE NOTICE statements in
> PL/PgSQL) then you will also need to supply the client code.
>
> Please bundle all the files up in a zip file to protect them from
> possible text encoding  conversion during transfer, and post them to a
> file hosting site. If you don't want them to be public, just collect the
> logs up and wait for people to ask you to send them to them by private
> email. Please send a copy to me, as I've dealt with encoding issues in
> software (though not PostgreSQL) quite a bit.
>
> --
> Craig Ringer
>


Вложения

Re: BUG #5661: The character encoding in logfile is confusing.

От
Craig Ringer
Дата:
On 09/17/2010 01:10 PM, tkbysh2000@yahoo.co.jp wrote:

> BTW, I found third character encoding in the file, Shift_JIS. Attached
> file is including all of 3 character encoded lines.
> For your reference:
>   Shift_JIS: Default encoding of Japanese Windows. I found this problem
>   on posgre server which is working as Windows service.
>   EUC_JP: Very major encoding of Japanese Unix. I guess that the
>   developper which worked for this, on some Unix or Linux.
>   UTF-8: Major encoding especially ralating java in Japan. And I
>   specified as default encoding for my all of databases.

Thanks for that.

> I didn't edit the log file to avoid change some data by text editor when
> save it. So attached log file is including from start to end a service.
> But the log file is very small. Total size is 7kb.

Good plan. Thanks.

> And client code is not attached. Cause the messages of bad character
> encoding are relevant to starting up and shutting down messages.
> So you can find easily this problem. They are in top and end of log
> file.

Yes, the mismatched encodings in the data are clear and obvious.

Given that the messages are coming purely from postgresql, not client
code, I'm now wondering if what we're dealing with is mismatched
encodings in the translation files, where some messages were translated
with a different encoding to other messages.

One of the correctly encoded messages is "Unexpected EOF received on
client connection"

One of the incorrectly encoded (shift-JIS) messages is: "Fast Shutdown
request received". Another is "Aborting any active transactions".

I can find the correctly encoded messages in
   share/locale/ja/LC_MESSAGES/postgres-9.0.mo

The incorrectly encoded messages appear in the same file, but are
encoded in utf-8 in that file despite being output to the logs in
shift-JIS. For example, with the badly encoded data from the logs
extracted into the file 'x':

$ python
 >>> x = open("x").read()
 >>> x

'\x8d\x82\x91\xac\x83V\x83\x83\x83b\x83g\x83_\x83E\x83\x93\x97v\x8b\x81\x82\xf0\x8e\xf3\x82\xaf\x8e\xe6\x82\xe8\x82\xdc\x82\xb5\x82\xbd\r\n'
 >>> print x.decode("shift-jis")
高速シャットダウン要求を受け取りました

$ grep '高速シャットダウン要求を受け取りました' *
Binary file postgres-9.0.mo matches
$


So - either something in the pipeline is "helpfully" converting your
error messages, or your locale files aren't the same as mine. I doubt
the latter; it seems almost impossible that just a few messages would be
converted to shift-JIS by accident in the Windows release only. So the
question now is where the messages are converted from UTF-8 to shift-JIS
and why that conversion is being applied inconsistently.

I'll try to have a look and see what I can find.

--
Craig Ringer

Re: BUG #5661: The character encoding in logfile is confusing.

От
Tom Lane
Дата:
Craig Ringer <craig@postnewspapers.com.au> writes:
> Yes, the mismatched encodings in the data are clear and obvious.

> Given that the messages are coming purely from postgresql, not client
> code, I'm now wondering if what we're dealing with is mismatched
> encodings in the translation files, where some messages were translated
> with a different encoding to other messages.

The examples you give don't seem to support that idea.  I don't read
Japanese, but at least these cases look like they are all UTF8 as
expected in the .po files.

> One of the correctly encoded messages is "Unexpected EOF received on
> client connection"

> One of the incorrectly encoded (shift-JIS) messages is: "Fast Shutdown
> request received". Another is "Aborting any active transactions".

> ... question now is where the messages are converted from UTF-8 to shift-JIS
> and why that conversion is being applied inconsistently.

Given those three examples, I wonder whether all the mis-encoded
messages are emitted by the postmaster, rather than backends.
Anyway it seems that you ought to look for some pattern in which
messages are correctly vs incorrectly encoded.

            regards, tom lane