Обсуждение: REGEXP_MATCHES() strange behavior with '^' and '$' pattern
<div dir="ltr">Hi,<br /><br />While playing with regular expression I found some strange behavior of<br />regexp_matches()function.<br /><br />Consider following sql query and its output:<br /><br /><font size="1"><span style="font-family:couriernew,monospace">postgres=# select regexp_matches('1' || chr(10) || '2' || chr(10) || '3' || chr(10)|| '4', '^', 'mg');<br /> regexp_matches <br />----------------<br /> {""}<br /> {""}<br /> {""}<br /> {""}<br /> {""}<br/> {""}<br /> {""}<br />(7 rows)</span></font><br /><br />It suppose to return me 4 rows and not 7. Similar behaviorfound with<br /> pattern '$'.<br /><br />It seems that these start and end anchor characters are not matching<br/>correctly. Or rather they are matching twice.<br /><br />To get a root cause of it, I put elog(INFO,..) intothe<br />setup_regexp_matches() function where we copy matches into the struct and<br /> found following values.<br /><br/><br /><font size="1"><span style="font-family:courier new,monospace">postgres=# select regexp_matches('1' || chr(10)|| '2' || chr(10) || '3' || chr(10) || '4', '^', 'mg');<br /> INFO: start_search: 0 rm_so: 0 rm_eo: 0<br />INFO: updated start_search: 1<br />INFO: start_search: 1 rm_so: 2 rm_eo: 2<br />INFO: updated start_search: 2<br />INFO: start_search: 2 rm_so: 2 rm_eo: 2<br />INFO: updated start_search: 3<br /> INFO: start_search: 3 rm_so: 4 rm_eo:4<br />INFO: updated start_search: 4<br />INFO: start_search: 4 rm_so: 4 rm_eo: 4<br />INFO: updated start_search:5<br />INFO: start_search: 5 rm_so: 6 rm_eo: 6<br />INFO: updated start_search: 6<br /> INFO: start_search:6 rm_so: 6 rm_eo: 6<br />INFO: updated start_search: 7</span></font><br /><br />Certainly, after second pass,updated start_search should be 3 as last<br />matched pattern was at 2 and of zero length since so = eo.<br /><br />Ihave modified that logic to look similar as that of replace_text_regexp()<br />function. As regexp_replace works well.<br/><br />Attached patch with test-case. Please have a look and let me know if I<br />assumed something wrong.<br /><br/>Thanks<br /><br />-- <br />Jeevan B Chalke<br /><br /></div>
Oops forgot patch.
Attached now.
--
Jeevan B Chalke
Attached now.
On Wed, Jul 31, 2013 at 6:03 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi,
While playing with regular expression I found some strange behavior of
regexp_matches() function.
Consider following sql query and its output:
postgres=# select regexp_matches('1' || chr(10) || '2' || chr(10) || '3' || chr(10) || '4', '^', 'mg');
regexp_matches
----------------
{""}
{""}
{""}
{""}
{""}
{""}
{""}
(7 rows)
It suppose to return me 4 rows and not 7. Similar behavior found with
pattern '$'.
It seems that these start and end anchor characters are not matching
correctly. Or rather they are matching twice.
To get a root cause of it, I put elog(INFO,..) into the
setup_regexp_matches() function where we copy matches into the struct and
found following values.
postgres=# select regexp_matches('1' || chr(10) || '2' || chr(10) || '3' || chr(10) || '4', '^', 'mg');
INFO: start_search: 0 rm_so: 0 rm_eo: 0
INFO: updated start_search: 1
INFO: start_search: 1 rm_so: 2 rm_eo: 2
INFO: updated start_search: 2
INFO: start_search: 2 rm_so: 2 rm_eo: 2
INFO: updated start_search: 3
INFO: start_search: 3 rm_so: 4 rm_eo: 4
INFO: updated start_search: 4
INFO: start_search: 4 rm_so: 4 rm_eo: 4
INFO: updated start_search: 5
INFO: start_search: 5 rm_so: 6 rm_eo: 6
INFO: updated start_search: 6
INFO: start_search: 6 rm_so: 6 rm_eo: 6
INFO: updated start_search: 7
Certainly, after second pass, updated start_search should be 3 as last
matched pattern was at 2 and of zero length since so = eo.
I have modified that logic to look similar as that of replace_text_regexp()
function. As regexp_replace works well.
Attached patch with test-case. Please have a look and let me know if I
assumed something wrong.
Thanks
--
Jeevan B Chalke
--
Jeevan B Chalke
Вложения
Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes: > Oops forgot patch. > Attached now. Hmm ... I think the logic change is good, but two demerits for not fixing the adjacent comment. regards, tom lane
On Wed, Jul 31, 2013 at 7:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hmm ... I think the logic change is good, but two demerits for not fixing
the adjacent comment.
I had a look over comments and somehow I found that OK.
Anyway, updated comments in this version of patch.
Thanks
regards, tom lane
--
Jeevan B Chalke
Вложения
On Thu, Aug 1, 2013 at 12:25 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
On Wed, Jul 31, 2013 at 7:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Hmm ... I think the logic change is good, but two demerits for not fixing
the adjacent comment.I had a look over comments and somehow I found that OK.Anyway, updated comments in this version of patch.
It looks like you have committed the changes with updated comments and more test-cases.
Thanks
Thanks
regards, tom lane
--
Jeevan B Chalke
--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Phone: +91 20 30589500
Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb
This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.