Обсуждение: Regex match not back-referencing in function
Hi,
Could someone explain the following behaviour?
SELECT regexp_replace(E'Hello & goodbye ',E'([&])','' ||
ascii(E'\\1') || E';\\1');
This returns:
regexp_replace
------------------------
Hello \& goodbye
(1 row)
So it matched:
SELECT chr(92);
chr
-----
\
(1 row)
But notice that when I append the value it's supposed to have matched
to the end of the replacement value, it shows it should be '&'.
Just to confirm:
SELECT ascii('&');
ascii
-------
38
(1 row)
So I'd expect the output of the original statement to be:
regexp_replace
------------------------
Hello && goodbye
(1 row)
What am I missing?
--
Thom
Thom Brown <thom@linux.com> writes:
> What am I missing?
I might be more confused than you, but I think you're supposing that
the result of ascii(E'\\1') has something to do with the match that
the surrounding regexp_replace function will find, later on when it
gets executed. The actual arguments seen by regexp_replace are
regression=# select E'Hello & goodbye ',E'([&])','' ||
ascii(E'\\1') || E';\\1';
?column? | ?column? | ?column?
------------------+----------+----------
Hello & goodbye | ([&]) | \\1
(1 row)
and given that, the result looks perfectly fine to me.
If there's a bug here, it's that ascii() ignores additional bytes in its
input instead of throwing an error for a string with more than one
character. But I believe we've discussed that in the past and decided
not to change it.
regards, tom lane
On Feb 12, 2012, at 13:26, Thom Brown <thom@linux.com> wrote:
> Hi,
>
> Could someone explain the following behaviour?
>
> SELECT regexp_replace(E'Hello & goodbye ',E'([&])','' ||
> ascii(E'\\1') || E';\\1');
>
> This returns:
>
> regexp_replace
> ------------------------
> Hello \& goodbye
> (1 row)
>
> So it matched:
>
> SELECT chr(92);
> chr
> -----
> \
> (1 row)
>
> But notice that when I append the value it's supposed to have matched
> to the end of the replacement value, it shows it should be '&'.
>
> Just to confirm:
>
> SELECT ascii('&');
> ascii
> -------
> 38
> (1 row)
>
> So I'd expect the output of the original statement to be:
>
> regexp_replace
> ------------------------
> Hello && goodbye
> (1 row)
>
> What am I missing?
>
> --
> Thom
>
The "ASCII" function call is evaluated independently of, and before, the regexp_replace function call and so the E'\\1'
hasno special meaning. It only has special meaning inside of the regexp_replace function.
Try just evaluating ascii(E'\\1') by itself and confirm you get "92".
David J.
On 12 February 2012 18:49, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thom Brown <thom@linux.com> writes: >> What am I missing? > > I might be more confused than you, but I think you're supposing that > the result of ascii(E'\\1') has something to do with the match that > the surrounding regexp_replace function will find, later on when it > gets executed. The actual arguments seen by regexp_replace are > > regression=# select E'Hello & goodbye ',E'([&])','' || > ascii(E'\\1') || E';\\1'; > ?column? | ?column? | ?column? > ------------------+----------+---------- > Hello & goodbye | ([&]) | \\1 > (1 row) > > and given that, the result looks perfectly fine to me. > > If there's a bug here, it's that ascii() ignores additional bytes in its > input instead of throwing an error for a string with more than one > character. But I believe we've discussed that in the past and decided > not to change it. Okay, in that case I made the wrong assumptions about order of resolution. Thanks -- Thom