Re: Another regexp performance improvement: skip useless paren-captures

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: Another regexp performance improvement: skip useless paren-captures
Дата
Msg-id 80944B12-6B9A-443F-B4F8-95B04F85E28A@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Another regexp performance improvement: skip useless paren-captures  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Another regexp performance improvement: skip useless paren-captures  (Mark Dilger <mark.dilger@enterprisedb.com>)
Re: Another regexp performance improvement: skip useless paren-captures  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers

> On Aug 9, 2021, at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> There is a potentially interesting definitional question:
> what exactly ought this regexp do?
>
>         ((.)){0}\2
>
> Because the capturing paren sets are zero-quantified, they will
> never be matched to any characters, so the backref can never
> have any defined referent.

Perl regular expressions are not POSIX, but if there is a principled reason POSIX should differ from perl on this, we
shouldbe clear what that is: 

    #!/usr/bin/perl

    use strict;
    use warnings;

    our $match;
    if ('foo' =~ m/((.)(??{ die; })){0}(..)/)
    {
        print "captured 1 $1\n" if defined $1;
        print "captured 2 $2\n" if defined $2;
        print "captured 3 $3\n" if defined $3;
        print "captured 4 $4\n" if defined $4;
        print "match = $match\n" if defined $match;
    }

This will print "captured 3 fo", proving that although the regular expression is parsed with the (..) bound to the
thirdcapture group, the first two capture groups never run.  If you don't believe that, change the {0} to {1} and
observethat the script dies. 

> So I think throwing an
> error is an appropriate response.  The existing code will
> throw such an error for
>
>         ((.)){0}\1
>
> so I guess Spencer did think about this to some extent -- he
> just forgot about the possibility of nested parens.


Ugg.  That means our code throws an error where perl does not, pretty well negating my point above.  If we're already
throwingan error for this type of thing, I agree we should be consistent about it.  My personal preference would have
beento do the same thing as perl, but it seems that ship has already sailed. 


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Autovacuum on partitioned table (autoanalyze)
Следующее
От: Mark Dilger
Дата:
Сообщение: Re: Another regexp performance improvement: skip useless paren-captures