Regex pattern with shorter back reference does NOT work as expected

Поиск
Список
Период
Сортировка
От Jeevan Chalke
Тема Regex pattern with shorter back reference does NOT work as expected
Дата
Msg-id CAM2+6=U8CdfM-qL55XHt+7hVzDRBnZwrHiVZRX2shGZ4OMuMSQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Regex pattern with shorter back reference does NOT work as expected  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi Tom,

Following example does not work as expected:

-- Should return TRUE but returning FALSE
SELECT 'Programmer' ~ '(\w).*?\1' as t;

-- Should return P, a and er i.e. 3 rows but returning just one row with
-- value Programmer
SELECT REGEXP_SPLIT_TO_TABLE('Programmer','(\w).*?\1');

Initially I thought that back-reference is not supported and thus we are
getting those result. But while trying few cases related to back-reference I
see that it is giving an error "invalid back-reference number", it means we
do have support for back-reference. So I tried few more scenarios. And I
observed that if we have input string as 'rogrammer' we are getting perfect
results i.e. when very first character is back-referenced. But failing when
first character is not part of back-reference.

This is happening only for shorter pattern matching. Longer match '(\w).*\1'
works well.

Clearly, above example has two matching pattern 'rogr' and 'mm'.

So I started debugging it to get a root cause for this. It is too complex to
understand what exactly is happening here. But while debugging I got this
chunk in regexec.c:cfindloop() function from where we are returning with
REG_NOMATCH

               {
                   /* no point in trying again */
                   *coldp = cold;
                   return REG_NOMATCH;
               }

It was starting at 'P' and ending in above block. It was strange that why it
is not continuing with next character i.e. from 'r'. So I replaced above
chunk with break statement so that it will continue from next character.
This trick worked well.

Since I have very little idea at this code area, I myself unsure that it is
indeed a correct fix. And thus thought of mailing on hackers.

I have attached patch which does above changes along with few tests in
regex.sql

Your valuable insights please...

Thanks
--
Jeevan B Chalke

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: robots.txt on git.postgresql.org
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: LogSwitch