Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding

Поиск
Список
Период
Сортировка
От Alexander Law
Тема Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding
Дата
Msg-id 500FDE6F.3060202@gmail.com
обсуждение исходный текст
Ответы Re: Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding
Список pgsql-hackers
Hello,
I would like to fix this bug, but it looks like it would be not one-line patch.
Looking at the pg_dump code I see that the object names come through the following chain:
1. pg_dump executes 'SELECT c.tableoid, c.oid, c.relname, ... ' and gets the object_name with the encoding chosen for db connection/dump.
2. it invokes write_msg function or alike:
    write_msg(NULL, "finding the columns and types of table \"%s\"\n", tbinfo->dobj.name);
3. vwrite_msg localizes text message, but not the argument(s):
    vfprintf(stderr, _(fmt), ap);
Here gettext (_) internally translates fmt to OS encoding (if it's different from UTF-8 - encoding of a localized strings).

And I can see only a few solutions of the problem:
1. To convert the object name at the back-end, i.e. to modify all the similar SELECT's as:
'SELECT c.tableoid, c.oid, c.relname, convert_to(c.relname, 'OS_ENCODING') AS locrelname, ...'
and then do     write_msg(NULL, "finding the columns and types of table \"%s\"\n", tbinfo->dobj.local_name);
The downside of this approach is that it requires rewriting all the SELECT's for all the object. And it doesn't help us to write out any other text from backend, such as localized backend error.

2. To setup another connection to backend with the OS encoding, and to get all the object names through it. It looks insane too. And we have the same problem with the localized backend errors coming on "main" connection.

3. To make convert_to_os_encoding(text, encoding) function for a frontend utilities. Unfortunately frontend can't use internal PostgreSQL conversion functions, and modifying them to use through libpq looks unfeasible.
So the only way to implement such function is to use another encoding conversion framework (library).
And my question is - is it possible to include libiconv (add this dependency) to the frontend utilities code?

4. To force users to use OS encoding as the Database encoding. Or to not use non-ASCII characters in an db object names and to disable nls on Windows completely. It doesn't look like a solution at all.

BTW, it's not the only one instance of the issue. For example, when I try to use vacuumdb, I get completely unreadable messages:
http://oi48.tinypic.com/1c8j9.jpg
(blue marks what is in Russian or English, all the other text is gibberish).

Best regards,
Alexander


18.07.2012 12:51, Alexander Law wrote:
Hello,

The dump file itself is correct. The issue is only with the non-ASCII object names in pg_dump messages.
The messages text (which is non-ASCII too) displayed consistently with right encoding (i.e. with OS encoding thanks to libintl/gettext), but encoding of db object names depends on the dump encoding and thus they're getting unreadable when different encoding is used.
The same can be reproduced in Linux (where console encoding is UTF-8) when doing dump with Windows-1251 or Latin1 (for western european languages).

Thanks,
Alexander


The following bug has been logged on the website:

Bug reference:      6742
Logged by:          Alexander LAW
Email address:      exclusion(at)gmail(dot)com
PostgreSQL version: 9.1.4
Operating system:   Windows
Description:

When I try to dump database with UTF-8 encoding in Windows, I get unreadable
object names.
Please look at the screenshot (http://oi50.tinypic.com/2lw6ipf.jpg). On the
left window all the pg_dump messages displayed correctly (except for the
prompt password (bug #6510)), but the non-ASCII object name is gibberish. On
the right window (where dump is done with the Windows 1251 encoding (OS
Encoding for Russian locale)) everything is right.

Did you check the dump file using an editor that can handle UTF-8?
The Windows console is not known for properly handling that encoding.

Thomas





В списке pgsql-hackers по дате отправления: