2011/1/25 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
> On Sat, Jan 15, 2011 at 02:25, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
>> The patch overrides client_encoding by the added ENCODING option, and
>> restores it as soon as copy is done.
>
> We cannot do that because error messages should be encoded in the original
> encoding even during COPY commands with encoding option. Error messages
> could contain non-ASCII characters if lc_messages is set.
Agreed.
>> I see some complaints ask to use
>> pg_do_encoding_conversion() instead of
>> pg_client_to_server/server_to_client(), but the former will surely add
>> slight overhead per reading line
>
> If we want to reduce the overhead, we should cache the conversion procedure
> in CopyState. How about adding something like "FmgrInfo file_to_server_covv"
> into it?
I looked down to the code and found that we cannot pass FmgrInfo * to
any functions defined in pg_wchar.h, since the header file is shared
in libpq, too.
For the record, I also tried pg_do_encoding_conversion() instead of
pg_client_to_server/server_to_client(), and the simple benchmark shows
it is too slow.
with 3000000 lines with 3 columns (~22MB tsv) COPY FROM
*utf8 -> utf8 (no conversion)
13428.233ms
13322.832ms
15661.093ms
*euc_jp -> utf8 (client_encoding)
17527.470ms
16457.452ms
16522.337ms
*euc_jp -> utf8 (pg_do_encoding_conversion)
20550.983ms
21425.313ms
20774.323ms
I'll check the code more if we have better alternatives.
Regards,
--
Hitoshi Harada