Обсуждение: TODO: Expose parser support for decoding unicode escape literals to user
Hi all I just noticed a Stack Overflow question (http://stackoverflow.com/q/20124393/398670) where someone's asking how to decode '\u0000` style escapes *stored in database text fields* into properly encoded text strings. The parser supports this for escape-strings, and you can write E'\u011B' to get 'ě' because of http://postgresql.1045698.n5.nabble.com/Unicode-escapes-in-literals-td1992313.html. I don't see this exposed in a way that users can call directly, though. 'decode(bytea, text)' has the 'escape' input, but it expects octal. It's possible to use PL/PgSQL's 'EXECUTE' to use the parser to do the work, but that's downright awful. Am I missing something obvious, or is this something that'd be a good new-developer TODO? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 05/15/2014 01:31 AM, Craig Ringer wrote: > Hi all > > I just noticed a Stack Overflow question > (http://stackoverflow.com/q/20124393/398670) where someone's asking how > to decode '\u0000` style escapes *stored in database text fields* into > properly encoded text strings. > > The parser supports this for escape-strings, and you can write E'\u011B' > to get 'ě' because of > http://postgresql.1045698.n5.nabble.com/Unicode-escapes-in-literals-td1992313.html. > > I don't see this exposed in a way that users can call directly, though. > 'decode(bytea, text)' has the 'escape' input, but it expects octal. > > It's possible to use PL/PgSQL's 'EXECUTE' to use the parser to do the > work, but that's downright awful. > > Am I missing something obvious, or is this something that'd be a good > new-developer TODO? > Not sure if this is what you want?: test=> SELECT quote_literal(E'test \u011B'); quote_literal --------------- 'test ě' -- Adrian Klaver adrian.klaver@aklaver.com
Re: TODO: Expose parser support for decoding unicode escape literals to user
От
David G Johnston
Дата:
Adrian Klaver-4 wrote > On 05/15/2014 01:31 AM, Craig Ringer wrote: >> Hi all >> >> I just noticed a Stack Overflow question >> (http://stackoverflow.com/q/20124393/398670) where someone's asking how >> to decode '\u0000` style escapes *stored in database text fields* into >> properly encoded text strings. >> >> The parser supports this for escape-strings, and you can write E'\u011B' >> to get 'ě' because of >> http://postgresql.1045698.n5.nabble.com/Unicode-escapes-in-literals-td1992313.html. >> >> I don't see this exposed in a way that users can call directly, though. >> 'decode(bytea, text)' has the 'escape' input, but it expects octal. >> >> It's possible to use PL/PgSQL's 'EXECUTE' to use the parser to do the >> work, but that's downright awful. >> >> Am I missing something obvious, or is this something that'd be a good >> new-developer TODO? >> > > Not sure if this is what you want?: > > test=> SELECT quote_literal(E'test \u011B'); > quote_literal > --------------- > 'test ě' Except the data is already in the database and there is no way to put an "E" in front of a column name and cause PostgreSQL to process the escapes embedded in the column's value in the same way it processes a literal. WITH src (txt) AS ( VALUES ('A \u011B C') ) SELECT txt FROM src; Hence the need for a function to perform the same process that the parser performs when dealing with literals. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/TODO-Expose-parser-support-for-decoding-unicode-escape-literals-to-user-tp5804012p5804042.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Re: Re: TODO: Expose parser support for decoding unicode escape literals to user
От
Adrian Klaver
Дата:
On 05/15/2014 07:13 AM, David G Johnston wrote: > Adrian Klaver-4 wrote >> On 05/15/2014 01:31 AM, Craig Ringer wrote: >>> Hi all >>> >>> I just noticed a Stack Overflow question >>> (http://stackoverflow.com/q/20124393/398670) where someone's asking how >>> to decode '\u0000` style escapes *stored in database text fields* into >>> properly encoded text strings. >>> >>> The parser supports this for escape-strings, and you can write E'\u011B' >>> to get 'ě' because of >>> http://postgresql.1045698.n5.nabble.com/Unicode-escapes-in-literals-td1992313.html. >>> >>> I don't see this exposed in a way that users can call directly, though. >>> 'decode(bytea, text)' has the 'escape' input, but it expects octal. >>> >>> It's possible to use PL/PgSQL's 'EXECUTE' to use the parser to do the >>> work, but that's downright awful. >>> >>> Am I missing something obvious, or is this something that'd be a good >>> new-developer TODO? >>> >> >> Not sure if this is what you want?: >> >> test=> SELECT quote_literal(E'test \u011B'); >> quote_literal >> --------------- >> 'test ě' > > Except the data is already in the database and there is no way to put an "E" > in front of a column name and cause PostgreSQL to process the escapes > embedded in the column's value in the same way it processes a literal. Yea, that is a problem. > > WITH src (txt) AS ( VALUES ('A \u011B C') ) > SELECT txt FROM src; > > Hence the need for a function to perform the same process that the parser > performs when dealing with literals. > > David J. > -- Adrian Klaver adrian.klaver@aklaver.com
On 05/15/2014 09:56 PM, Adrian Klaver wrote:
>
> test=> SELECT quote_literal(E'test \u011B');
> quote_literal
> ---------------
> 'test ě'
That's another case where the function isn't doing what you expect.
quote_literal has nothing to do with what's happening, it's
escape-string processing in the parser doing the work. Compare:
regress=> SELECT 'test \u011B';
?column?
-------------
test \u011B
(1 row)
regress=> SELECT E'test \u011B';
?column?
----------
test ě
(1 row)
now, the problem posed is if you had this:
regress=> CREATE TABLE test AS SELECT TEXT 'test \u011B' dummy;
SELECT 1
regress=> SELECT * FROM test;
dummy
-------------
test \u011B
(1 row)
how would you get 'test ě' ?
The parser can do it, but I don't think anyone would consider this an
acceptable solution to this problem (anybody reading this, UNDER NO
CIRCUMSTANCES USE THIS FUNCTION, EVER):
regress=> CREATE OR REPLACE FUNCTION ohmygod(text) RETURNS text AS $$
DECLARE
retval text;
BEGIN
-- If you use this in real code, I hate you
EXECUTE 'SELECT E'''||$1||''';' INTO retval;
RETURN retval;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
regress=> SELECT ohmygod(dummy) FROM test;
ohmygod
---------
test ě
(1 row)
It'd be nice to expose this capability to users without requiring that
kind of horror.
Hence: exposing parser support for decoding unicode escape literals to
the user.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 05/16/2014 04:12 AM, Craig Ringer wrote:
> On 05/15/2014 09:56 PM, Adrian Klaver wrote:
>>
>> test=> SELECT quote_literal(E'test \u011B');
>> quote_literal
>> ---------------
>> 'test ě'
>
> That's another case where the function isn't doing what you expect.
> quote_literal has nothing to do with what's happening, it's
> escape-string processing in the parser doing the work. Compare:
>
> regress=> SELECT 'test \u011B';
> ?column?
> -------------
> test \u011B
> (1 row)
>
> regress=> SELECT E'test \u011B';
> ?column?
> ----------
> test ě
> (1 row)
Davids comments and some playing around with strings showed me what was
going on and the error of my ways.
>
> now, the problem posed is if you had this:
>
> regress=> CREATE TABLE test AS SELECT TEXT 'test \u011B' dummy;
> SELECT 1
> regress=> SELECT * FROM test;
> dummy
> -------------
> test \u011B
> (1 row)
>
> how would you get 'test ě' ?
>
>
> The parser can do it, but I don't think anyone would consider this an
> acceptable solution to this problem (anybody reading this, UNDER NO
> CIRCUMSTANCES USE THIS FUNCTION, EVER):
>
>
> regress=> CREATE OR REPLACE FUNCTION ohmygod(text) RETURNS text AS $$
> DECLARE
> retval text;
> BEGIN
> -- If you use this in real code, I hate you
> EXECUTE 'SELECT E'''||$1||''';' INTO retval;
> RETURN retval;
> END;
> $$ LANGUAGE plpgsql;
> CREATE FUNCTION
Actually I started down this dark path also, before I was interrupted by
something else:)
>
> regress=> SELECT ohmygod(dummy) FROM test;
> ohmygod
> ---------
> test ě
> (1 row)
>
>
>
> It'd be nice to expose this capability to users without requiring that
> kind of horror.
>
> Hence: exposing parser support for decoding unicode escape literals to
> the user.
Yes, so as an example something like:
decode_u('test \u011B')
that would decode the escaped values automatically.
>
--
Adrian Klaver
adrian.klaver@aklaver.com