Обсуждение: why non-greedy modifier for one atom changes greediness of other atoms?
why non-greedy modifier for one atom changes greediness of other atoms?
От
hubert depesz lubaczewski
Дата:
Example: # select x, substring( x from E'^((.*?)(\\.[0-9]+))') from ( values ('ab.123xxx.46hfd'),('a.b.c.d.123xx')) as q (x); x | substring -----------------+----------- ab.123xxx.46hfd | ab.1 a.b.c.d.123xx | a.b.c.d.1 (2 rows) I found in docs, that this is what happens, but I don't understand the logic behind forcing unique greediness in whole expression. Also - how can one write a regexp that will match "ab.123" and "a.b.c.d.123" respectively? in pl/perl it's of course trivial, but I can't seem to find a way to do it in substring() regexps. Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
Re: why non-greedy modifier for one atom changes greediness of other atoms?
От
hubert depesz lubaczewski
Дата:
On Mon, Jan 04, 2010 at 11:30:51AM +0100, hubert depesz lubaczewski wrote: > Example: > # select x, substring( x from E'^((.*?)(\\.[0-9]+))') from ( values ('ab.123xxx.46hfd'),('a.b.c.d.123xx')) as q (x); > x | substring > -----------------+----------- > ab.123xxx.46hfd | ab.1 > a.b.c.d.123xx | a.b.c.d.1 > (2 rows) > > > I found in docs, that this is what happens, but I don't understand the > logic behind forcing unique greediness in whole expression. > > Also - how can one write a regexp that will match "ab.123" and > "a.b.c.d.123" respectively? sorry - it could have be unclear - in case of string 'ab123bc.12xx' return value should be 'ab123bc.12' - i.e. we have to search to first . followed by digits and return it from beginning of string to the last of digits. Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
hubert depesz lubaczewski wrote: >> Example: >> # select x, substring( x from E'^((.*?)(\\.[0-9]+))') from >( values ('ab.123xxx.46hfd'),('a.b.c.d.123xx')) as q (x); >> x | substring >> -----------------+----------- >> ab.123xxx.46hfd | ab.1 >> a.b.c.d.123xx | a.b.c.d.1 >> (2 rows) >> >> >> I found in docs, that this is what happens, but I don't understand the >> logic behind forcing unique greediness in whole expression. Yes, that's odd. >> Also - how can one write a regexp that will match "ab.123" and >> "a.b.c.d.123" respectively? > > > sorry - it could have be unclear - in case of string 'ab123bc.12xx' > return value should be 'ab123bc.12' - i.e. we have to search to first . > followed by digits and return it from beginning of string to the last of > digits. You could add a negative lookahead to exclude digits after the last match: ... substring(x from E'^(.*?\\.\\d+(?!\\d))') ... Yours, Laurenz Albe