Обсуждение: BUG #16814: Invalid memory access on regexp_match with .* and BRE

Поиск
Список
Период
Сортировка

BUG #16814: Invalid memory access on regexp_match with .* and BRE

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16814
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 13.1
Operating system:   Ubuntu 20.04
Description:

When executing the following regexp call:
select regexp_match('abc', '.*', 'b');
valgrind detects an error:
==00:00:00:46.767 138746== Conditional jump or move depends on uninitialised
value(s)
==00:00:00:46.767 138746==    at 0x4657A9: parseqatom (regcomp.c:990)
==00:00:00:46.767 138746==    by 0x465CBD: parsebranch (regcomp.c:753)
==00:00:00:46.767 138746==    by 0x465E84: parse (regcomp.c:683)
==00:00:00:46.767 138746==    by 0x467F24: pg_regcomp (regcomp.c:404)
==00:00:00:46.767 138746==    by 0x57D100: RE_compile_and_cache
(regexp.c:185)
==00:00:00:46.767 138746==    by 0x57D3D9: setup_regexp_matches
(regexp.c:1114)
==00:00:00:46.767 138746==    by 0x57DF86: regexp_match (regexp.c:985)
==00:00:00:46.767 138746==    by 0x36839A: ExecInterpExpr
(execExprInterp.c:699)
==00:00:00:46.767 138746==    by 0x3657C9: ExecInterpExprStillValid
(execExprInterp.c:1802)
==00:00:00:46.767 138746==    by 0x42A172: ExecEvalExprSwitchContext
(executor.h:316)
==00:00:00:46.767 138746==    by 0x42A172: evaluate_expr (clauses.c:4809)
==00:00:00:46.767 138746==    by 0x42A34B: evaluate_function
(clauses.c:4339)
==00:00:00:46.767 138746==    by 0x42C1ED: simplify_function
(clauses.c:3969)

(This was discovered on the back of the new test module test_regex with the
slightly modified 30.4:
select * from test_regex('.*b', 'aab', 'b');
)


Re: BUG #16814: Invalid memory access on regexp_match with .* and BRE

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
> When executing the following regexp call:
> select regexp_match('abc', '.*', 'b');
> valgrind detects an error:

Hah, nice one.  It gives the wrong answer too, at least it does most of
the time for me:

# select regexp_match('abc', '.*', 'b');
 regexp_match 
--------------
 {""}
(1 row)

That's because it's acting like the pattern is '.*?' (prefer shortest
match) rather than '.*'.

This bug is well over the age of consent, btw.  Tcl's got it too,
so it surely is aboriginal in Henry Spencer's code.

Thanks for the report!

            regards, tom lane