Re: How to convert HEX to ASCII?

Поиск
Список
Период
Сортировка
От Marti Raudsepp
Тема Re: How to convert HEX to ASCII?
Дата
Msg-id CABRT9RDai1xCvvJB28ycA=jF43vO1rANLjbq-qNBc0YrKSHBAw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: How to convert HEX to ASCII?  (Merlin Moncure <mmoncure@gmail.com>)
Ответы Re: How to convert HEX to ASCII?
Re: How to convert HEX to ASCII?
Re: How to convert HEX to ASCII?
Список pgsql-general
On Fri, Dec 2, 2011 at 16:16, Torsten Zuehlsdorff
<foo@meisterderspiele.de> wrote:
> But i clearly have a missunderstanding of other chars, like umlauts or utf-8
> chars. This, for example, should return a 'ö':
>
> # SELECT chr(x'C3B6'::int);
>  chr
> -----
>  쎶
> (1 row)

That gives you the Unicode codepoint C3B6, but %C3%B6 is UTF-8-encoded
and actually decodes to the codepoint 00F6.

There is a fundamental problem that a decoded URL may actually be a
binary string -- it might not have a textual representation at all.
But if text is what you want, RFC3986 strongly suggests using UTF-8
for encoding text strings in URLs, and that works almost always in the
real world.

So the *right* way is to first convert the URL to a binary "bytea"
type by fixing all the % escapes, then convert that to UTF-8 encoding
to handle multibyte characters.

What I came up with is far from elegant because PostgreSQL lacks
convenient functions for bytea manipulation (no bytea_agg, etc).
Stealing a little from Merlin, this is what it looks like:

CREATE OR REPLACE FUNCTION url_decode(input text) RETURNS text
LANGUAGE plpgsql IMMUTABLE STRICT AS $$
DECLARE
  bin bytea = '';
  byte text;
BEGIN
  FOR byte IN (select (regexp_matches(input, '(%..|.)', 'g'))[1]) LOOP
    IF length(byte) = 3 THEN
      bin = bin || decode(substring(byte, 2, 2), 'hex');
    ELSE
      bin = bin || byte::bytea;
    END IF;
  END LOOP;
  RETURN convert_from(bin, 'utf8');
END
$$;

db=# select url_decode('Hell%C3%B6%20World%21');
  url_decode
--------------
 Hellö World!

db=# select url_decode('%EC%8E%B6');
 url_decode
------------
 쎶

This will break for binary-encoded data in URLs, though.
db=# select url_decode('%fa%fa%fa');
ERROR:  invalid byte sequence for encoding "UTF8": 0xfa
CONTEXT:  PL/pgSQL function "url_decode" line 13 at RETURN

----

On Fri, Dec 2, 2011 at 17:46, Merlin Moncure <mmoncure@gmail.com> wrote:
> set client_encoding to latin1;

> postgres=# select unencode('Hell%C3%B6%20World%21');
>   unencode
> ---------------
>  Hellö World!
> (1 row)

Sorry, but AFAICT this makes a mess of encodings and only works by
pure luck. The server thinks it's sending the client LATIN1 text, but
it's actually UTF8-encoded and the last decoding step is done by your
terminal.

Regards,
Marti

В списке pgsql-general по дате отправления:

Предыдущее
От: "Gavin Mitchell"
Дата:
Сообщение: sql statements using access frontend
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: How to convert HEX to ASCII?