Обсуждение: perl: unsafe empty pattern behavior
Moved from discussion on -committers: https://postgr.es/m/0ef325fa06e7a1605c4e119c4ecb637c67e5fb4e.camel@j-davis.com Summary: Do not use perl empty patterns like // or qr// or s//.../, the behavior is too surprising for perl non-experts. There are a few such uses in our tests; patch attached. Unfortunately, there is no obvious way to automatically detect them so I am just relying on grep. I'm sure there are others here who know more about perl than I do, so suggestions/corrections are welcome. Long version: Some may know this already, but we just discovered the dangers of using empty patterns in perl: "If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead... If no match has previously succeeded, this will (silently) act instead as a genuine empty pattern (which will always match)." https://perldoc.perl.org/perlop#The-empty-pattern-// In other words, if you have code like: if ('xyz' =~ //) { print "'xyz' matches //\n"; } The match will succeed and print, because there's no previous pattern, so // is a "genuine" empty pattern, which is treated like /.*/ (I think?). Then, if you add some other code before it: if ('abc' =~ /abc/) { print "'abc' matches /abc/\n"; } if ('xyz' =~ //) { print "'xyz' matches //\n"; } The first match will succeed, but the second match will fail, because // is treated like /abc/. On reflection, that does seem very perl-like. But it can cause surprising action-at-a-distance if not used carefully, especially for those who aren't experts in perl. It's much safer to just not use the empty pattern. If you use qr// instead: https://perldoc.perl.org/perlop#qr/STRING/msixpodualn like: if ('abc' =~ qr/abc/) { print "'abc' matches /abc/\n"; } if ('xyz' =~ qr//) { print "'xyz' matches //\n"; } Then the second match may succeed or may fail, and it's not clear from the documentation what precise circumstances matter. It seems to fail on older versions of perl (like 5.16.3) and succeed on newer versions (5.38.2). However, it may also depend on when the qr// is [re]compiled, or regex flags, or locale, or may just be undefined. Regards, Jeff Davis
Вложения
On 2024-Mar-12, Jeff Davis wrote: > Do not use perl empty patterns like // or qr// or s//.../, the behavior > is too surprising for perl non-experts. Yeah, nasty. > There are a few such uses in > our tests; patch attached. Unfortunately, there is no obvious way to > automatically detect them so I am just relying on grep. I'm sure there > are others here who know more about perl than I do, so > suggestions/corrections are welcome. I suggest that pg_dump/t/002_pg_dump.pl could use a verification that the ->{regexp} thing is not empty. I also tried grepping (for things like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you have ... but I only looked for the "qr" literal, not other ways to get regexes. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'. After collecting 500 such letters, he mused, a university somewhere in Arizona would probably grant him a degree. (Don Knuth)
On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote: > I suggest that pg_dump/t/002_pg_dump.pl could use a verification that > the ->{regexp} thing is not empty. I'm not sure how exactly to test for an empty pattern. The problem is, we don't really want to test for an empty pattern, because /(?^:)/ is fine. The problem is //, which gets turned into an actual pattern (perhaps empty or perhaps not), and by the time it's in the %tests hash, I think it's too late to distinguish. Again, I'm not a perl expert, so suggestions welcome. > I also tried grepping (for things > like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you > have ... but I only looked for the "qr" literal, not other ways to > get > regexes. I think that's fine. qr// seems the most dangerous, because it seems to behave differently in different versions of perl. Grepping for regexes in perl code is an "interesting" exercise. Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote: >> I also tried grepping (for things >> like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you >> have ... but I only looked for the "qr" literal, not other ways to >> get regexes. > I think that's fine. qr// seems the most dangerous, because it seems to > behave differently in different versions of perl. I wonder whether perlcritic has sufficiently deep understanding of Perl code that it could find these hazards. I already checked, and found that there's no built-in filter for this (at least not in the perlcritic version I have), but maybe we could write one? The rules seem to be plug-in modules, so you can make your own in principle. regards, tom lane
On 2024-03-12 Tu 18:59, Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: >> On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote: >>> I also tried grepping (for things >>> like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you >>> have ... but I only looked for the "qr" literal, not other ways to >>> get regexes. >> I think that's fine. qr// seems the most dangerous, because it seems to >> behave differently in different versions of perl. > I wonder whether perlcritic has sufficiently deep understanding of > Perl code that it could find these hazards. I already checked, > and found that there's no built-in filter for this (at least not > in the perlcritic version I have), but maybe we could write one? > The rules seem to be plug-in modules, so you can make your own > in principle. Yeah, that was my thought too. I'd start with ProhibitComplexRegexes.pm as a template. If nobody else does it I'll have a go, but it might take a while. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > On 2024-03-12 Tu 18:59, Tom Lane wrote: >> I wonder whether perlcritic has sufficiently deep understanding of >> Perl code that it could find these hazards. > Yeah, that was my thought too. I'd start with ProhibitComplexRegexes.pm > as a template. Oooh. Taking a quick look at the source code: https://metacpan.org/dist/Perl-Critic/source/lib/Perl/Critic/Policy/RegularExpressions/ProhibitComplexRegexes.pm it seems like it'd be pretty trivial to convert that from "complain if regex contains more than N characters" to "complain if regex contains zero characters". regards, tom lane