The following bug has been logged on the website:
Bug reference: 16133
Logged by: Andrew Gierth
Email address: andrew@tao11.riddles.org.uk
PostgreSQL version: 12.1
Operating system: any
Description:
(This started out as an irc discussion on #tcl that spilled over to
#postgresql:)
SELECT regexp_match('aaa', '(a*)*');
regexp_match
--------------
{aaa}
(1 row)
SELECT regexp_match('aaa', '(a*)+');
regexp_match
--------------
{""}
(1 row)
What seems to be happening here is that in the + case, the engine is doing
one more match, matching (a*) against an empty string at the end of the
input, unlike the * case where the last match of (a*) is against the whole
string. This seems to violate the rules for determining where subexpression
captures line up. (And certainly there is no justification for the + vs. *
quantifier to make any difference here.)
There are a large number of similar cases, but this seems to be the common
factor to all of them so far.