Re: Careful PL/Perl Release Not Required

Поиск
Список
Период
Сортировка
От David E. Wheeler
Тема Re: Careful PL/Perl Release Not Required
Дата
Msg-id 0DA44369-C0F1-4C9D-A158-48688D37A6CC@kineticode.com
обсуждение исходный текст
Ответ на Re: Careful PL/Perl Release Not Required  (Alex Hunsaker <badalex@gmail.com>)
Список pgsql-hackers
On Feb 11, 2011, at 9:44 AM, Alex Hunsaker wrote:

> It is decoded... the input string "%C3%A9" actually is the _same_
> string utf-8, latin1 and SQL_ASCII decoded or not. Those are all ascii
> characters. Calling utf8::decode("%C3%A9") is essentially a noop.

No, it's not decoded. It doesn't matter because they're ASCII bytes. But if the utf8 flag isn't set, it's not decoded.
It'sjust byte soup as far as Perl is concerned. Unless I grossly misunderstand something, which is entirely possible. 

> Ok, I think i figured out why we seem to be talking past each other, we have:
> CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar  AS $$
> use strict;
> use URI::Escape;
> utf8::decode($_[0]);
> return uri_unescape($_[0]); $$ LANGUAGE plperlu;
>
> That *looks* like it is decoding the input string, which it is, but
> actually that will double utf8 encode your string. It does not seem to
> in this case because we are dealing with all ascii input. The trick
> here is its also telling perl to decode/treat the *output* string as
> utf8.
>
> uri_unescape() returns the same string you passed in, which thanks to
> the utf8::decode() above has the utf8 flag set. Meaning we end up
> treating it as 1 character instead of two. Or basically that it has
> the same effect as calling utf8::decode() on the return value.
>
> The correct way to write that function pre 9.1 and post 9.1 would be
> (in a utf8 database):
> CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar  AS $$
> use strict;
> use URI::Escape;
> my $str = uri_unescape($_[0]);
> utf8::decode($str);
> return $str;
> $$ LANGUAGE plperlu;
>
> The last utf8::decode being optional (as we said, it might not be
> utf8), but granting the sought behavior by the op.

No. If the argument to PL/Perl has the utf8 flag set, then that's what you always get. The utf8::decode() isn't
necessarybecause it's already decoded: 

> perl -MURI::Escape -MEncode -E 'say utf8::is_utf8(uri_unescape(Encode::decode_utf8("“hi”")))'
1

Best,

David

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Range Types: << >> -|- ops vs empty range
Следующее
От: Tom Lane
Дата:
Сообщение: Re: ALTER EXTENSION UPGRADE, v3