Re: pl/perl and utf-8 in sql_ascii databases

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: pl/perl and utf-8 in sql_ascii databases
Дата
Msg-id 1342201377-sup-3678@alvh.no-ip.org
обсуждение исходный текст
Ответ на Re: [SPAM] [MessageLimit][lowlimit] Re: pl/perl and utf-8 in sql_ascii databases  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: pl/perl and utf-8 in sql_ascii databases  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
Excerpts from Kyotaro HORIGUCHI's message of jue jul 12 00:09:19 -0400 2012:
>
> Hmm... Sorry for immature patch..

No need to apologize.

> > ... and this story hasn't ended yet, because one of the new tests is
> > failing.  See here:
> >
> > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpie&dt=2012-07-11%2010%3A00%3A04
> >
> > The interesting part of the diff is:
> ...
> >   SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
> > ! ERROR:  character with byte sequence 0xe5 0xb7 0x9d in encoding "UTF8" has no equivalent in encoding "LATIN1"
> > ! CONTEXT:  PL/Perl function "perl_utf_inout"
> >
> >
> > I am not sure what can we do here other than remove this function and
> > query from the test.
>
> I've run the regress only for the environment capable to handle
> the character U+5ddd (Japanese character which means river)...
>
> The byte sequences which can be decoded and the result byte
> sequences of encoding from a unicode character vary among the
> encodings.

Right.  I only ran the test in C and UTF8, not Latin1, so I didn't see
it fail either.

> The problem itself which is the aim of this thread could be
> covered without the additional test. That confirms if
> encoding/decoding is done as expected on calling the language
> handler.

Right.

> I suppose that testing for the two cases and additional
> one case which runs pg_do_encoding_conversion(), say latin1,
> would be enough to confirm that encoding/decoding is properly
> done, since the concrete conversion scheme is not significant
> this case.
>
> So I recommend that we should add the test for latin1 and omit
> the test from other than sql_ascii, utf8 and latin1. This might
> be archieved by create empty plperl_lc.sql and plperl_lc.out
> files for those encodings.
>
> What do you think about that?

I think that's probably too much engineering for something that doesn't
really warrant it.  A real solution to this problem could be to create
yet another new test file containing just this function definition and
the query that calls it, and have one expected file for each encoding;
but that's too much work and too many files, I'm afraid.

I can see us supporting tests that require a small number of expected
files.  No Make tricks with file copying, though.  If we can't get
some easy way to test this without that, I submit we should just remove
the test.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Type modifier parameter of input function
Следующее
От: Tom Lane
Дата:
Сообщение: Re: initdb and fsync