Обсуждение: why non-greedy modifier for one atom changes greediness of other atoms?
why non-greedy modifier for one atom changes greediness of other atoms?
От
hubert depesz lubaczewski
Дата:
Example:
# select x, substring( x from E'^((.*?)(\\.[0-9]+))') from ( values ('ab.123xxx.46hfd'),('a.b.c.d.123xx')) as q (x);
x | substring
-----------------+-----------
ab.123xxx.46hfd | ab.1
a.b.c.d.123xx | a.b.c.d.1
(2 rows)
I found in docs, that this is what happens, but I don't understand the
logic behind forcing unique greediness in whole expression.
Also - how can one write a regexp that will match "ab.123" and
"a.b.c.d.123" respectively?
in pl/perl it's of course trivial, but I can't seem to find a way to do it in substring() regexps.
Best regards,
depesz
--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
Re: why non-greedy modifier for one atom changes greediness of other atoms?
От
hubert depesz lubaczewski
Дата:
On Mon, Jan 04, 2010 at 11:30:51AM +0100, hubert depesz lubaczewski wrote:
> Example:
> # select x, substring( x from E'^((.*?)(\\.[0-9]+))') from ( values ('ab.123xxx.46hfd'),('a.b.c.d.123xx')) as q (x);
> x | substring
> -----------------+-----------
> ab.123xxx.46hfd | ab.1
> a.b.c.d.123xx | a.b.c.d.1
> (2 rows)
>
>
> I found in docs, that this is what happens, but I don't understand the
> logic behind forcing unique greediness in whole expression.
>
> Also - how can one write a regexp that will match "ab.123" and
> "a.b.c.d.123" respectively?
sorry - it could have be unclear - in case of string 'ab123bc.12xx'
return value should be 'ab123bc.12' - i.e. we have to search to first .
followed by digits and return it from beginning of string to the last of
digits.
Best regards,
depesz
--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
hubert depesz lubaczewski wrote:
>> Example:
>> # select x, substring( x from E'^((.*?)(\\.[0-9]+))') from
>( values ('ab.123xxx.46hfd'),('a.b.c.d.123xx')) as q (x);
>> x | substring
>> -----------------+-----------
>> ab.123xxx.46hfd | ab.1
>> a.b.c.d.123xx | a.b.c.d.1
>> (2 rows)
>>
>>
>> I found in docs, that this is what happens, but I don't understand the
>> logic behind forcing unique greediness in whole expression.
Yes, that's odd.
>> Also - how can one write a regexp that will match "ab.123" and
>> "a.b.c.d.123" respectively?
>
>
> sorry - it could have be unclear - in case of string 'ab123bc.12xx'
> return value should be 'ab123bc.12' - i.e. we have to search to first .
> followed by digits and return it from beginning of string to the last of
> digits.
You could add a negative lookahead to exclude digits after the last match:
... substring(x from E'^(.*?\\.\\d+(?!\\d))') ...
Yours,
Laurenz Albe