Обсуждение: problem with non-greedy regex match

Поиск
Список
Период
Сортировка

problem with non-greedy regex match

От
"Merlin Moncure"
Дата:
I _may_ have found a problem that is affecting non-greedy regex matches.

select regexp_matches(
  $$x = foo y x = foo y $$,
  $$x\s+(.*?)y$$  ,'g');

As I read it, this should match ' = foo' twice.  Instead, it matches
"= foo y x = foo " once.  The non-greedy form (.*?) should break out
at the first 'y'.

Interestingly, this works:
select regexp_matches(
  $$x = foo y x = foo y $$,
  $$x(.*?)y$$  ,'g');

It's the same regex minus the space after 'x'.

merlin

Re: problem with non-greedy regex match

От
Tom Lane
Дата:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> I _may_ have found a problem that is affecting non-greedy regex matches.

No, you didn't read the fine print in section 9.7.3.5; particularly

    A branch -- that is, an RE that has no top-level | operator -- has the
    same greediness as the first quantified atom in it that has a greediness
    attribute.

    ...

    The above rules associate greediness attributes not only with individual
    quantified atoms, but with branches and entire REs that contain
    quantified atoms. What that means is that the matching is done in such a
    way that the branch, or whole RE, matches the longest or shortest
    possible substring as a whole. Once the length of the entire match is
    determined, the part of it that matches any particular subexpression is
    determined on the basis of the greediness attribute of that
    subexpression, with subexpressions starting earlier in the RE taking
    priority over ones starting later.

In short, the \s+ causes the whole thing to become greedy.  Maybe \s+?
will do what you want.

            regards, tom lane