Re: BUG #17761: Questionable regular expression behavior

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #17761: Questionable regular expression behavior
Дата
Msg-id 3334493.1674835493@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: BUG #17761: Questionable regular expression behavior  (hubert depesz lubaczewski <depesz@depesz.com>)
Ответы Re: BUG #17761: Questionable regular expression behavior
Список pgsql-bugs
hubert depesz lubaczewski <depesz@depesz.com> writes:
> On Fri, Jan 27, 2023 at 09:27:35AM +0000, PG Bug reporting form wrote:
>> Executing:
>> select regexp_matches('a 1x1250x2500',
>> '(a).*?([1-9]\d*)\s*x\s*([1-9]\d*)(?:\s*x\s*([1-9]\d*))?');
>> returns: {a,1,1,NULL}
>> while executing:
>> select regexp_matches('a 1x1250x2500',
>> '(a|b).*?([1-9]\d*)\s*x\s*([1-9]\d*)(?:\s*x\s*([1-9]\d*))?');
>> returns: {a,1,1250,2500}
>> 
>> Shouldn't both results be equal?

> The problem is, afair, that there is some state in pg's regexp engine
> that makes greedy/ungreedy decision once per regexp.

Yeah.  Without having traced through it, I'm fairly sure that in the
first case, we have "(a)" which has no greediness, then ".*?" which
is non-greedy, and then that determines the overall greediness as
non-greedy, so it goes for the shortest overall match not the longest.

In the second case, "(a|b)" is greedy because anything involving "|"
is greedy, so we immediately decide we'll be greedy overall.

The fine manual explains how you can force greediness or non-greediness
when the engine's default rules for that don't do what you want.

            regards, tom lane



В списке pgsql-bugs по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: BUG #17762: date field casts to null in case section with join's
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: BUG #17741: vacuum process hangs after pg_surgery manipulations