Обсуждение: BUG? Regular expression matching of optional character group at beginning of RE

Поиск
Список
Период
Сортировка

BUG? Regular expression matching of optional character group at beginning of RE

От
Viktor Rosenfeld
Дата:
Hi,

I've noticed that regular expressions which are anchored at the
beginning of the text but have an optional part at the beginning
(e.g. '^(ge)?kommen$') are not evaluated correctly if there is an
index on the column.

Consider the following table:

#+BEGIN_SRC sql
    CREATE TABLE annotation (
       id SERIAL PRIMARY KEY,
       name VARCHAR(20),
       value VARCHAR(20)
    );
#+END_SRC

And the following query:

#+BEGIN_SRC sql
SELECT count(*) FROM annotation WHERE name = 'lemma' AND value ~ '^(ge)?kommen$';
#+END_SRC

In my data set, this query should return 916 results; 911 rows match
'kommen' and 5 rows match 'gekommen'.

Here is the plan with an index on column name:

#+BEGIN_EXAMPLE
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Aggregate  (cost=20101.31..20101.32 rows=1 width=0)
   ->  Bitmap Heap Scan on annotation  (cost=282.55..20101.31 rows=2 width=0)
         Recheck Cond: ((name)::text = 'lemma'::text)
         Filter: ((value)::text ~ '^(ge)?kommen$'::text)
         ->  Bitmap Index Scan on idx_test_name  (cost=0.00..282.55 rows=15196 width=0)
               Index Cond: ((name)::text = 'lemma'::text)
#+END_EXAMPLE

This plan correctly retrieves 916 rows. However, if I create an index
on the column value, the plan is as follows:

#+BEGIN_EXAMPLE
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Aggregate  (cost=910.50..910.51 rows=1 width=0)
   ->  Bitmap Heap Scan on annotation  (cost=619.38..910.49 rows=2 width=0)
         Recheck Cond: ((name)::text = 'lemma'::text)
         Filter: ((value)::text ~ '^(ge)?kommen$'::text)
         ->  BitmapAnd  (cost=619.38..619.38 rows=76 width=0)
               ->  Bitmap Index Scan on idx_test_name  (cost=0.00..282.55 rows=15196 width=0)
                     Index Cond: ((name)::text = 'lemma'::text)
               ->  Bitmap Index Scan on idx_test_value  (cost=0.00..336.58 rows=15196 width=0)
                     Index Cond: (((value)::text ~>=~ 'ge'::text) AND ((value)::text ~<~ 'gf'::text))
#+END_EXAMPLE

This plan only retrieves the 5 rows matching 'gekommen'. Note the
usage of the index on the value column even though the first character
of the regular expression is not fixed.

The index creation command was:

#+BEGIN_SRC sql
CREATE INDEX idx_test_value ON annotation (value varchar_pattern_ops);
#+END_SRC

Note that I can force the correct evaluation of the regular expression
by prepending '(?e)'; however, this should not be necessary if I
understand the documentation correctly.

#+BEGIN_SRC sql
SELECT count(*) FROM annotation WHERE name = 'lemma' AND value ~ '(?e)^(ge)?kommen$';
#+END_SRC

I'm using PostgreSQL 9.1.4.

Cheers,
Viktor

Re: BUG? Regular expression matching of optional character group at beginning of RE

От
Tom Lane
Дата:
Viktor Rosenfeld <listuser36@googlemail.com> writes:
> I've noticed that regular expressions which are anchored at the
> beginning of the text but have an optional part at the beginning
> (e.g. '^(ge)?kommen$') are not evaluated correctly if there is an
> index on the column.

Ugh.  Yeah, that's a bug, and it looks a bit messy to fix ...

            regards, tom lane

Re: BUG? Regular expression matching of optional character group at beginning of RE

От
Tom Lane
Дата:
Viktor Rosenfeld <listuser36@googlemail.com> writes:
> I've noticed that regular expressions which are anchored at the
> beginning of the text but have an optional part at the beginning
> (e.g. '^(ge)?kommen$') are not evaluated correctly if there is an
> index on the column.

I have committed fixes for this.  Thanks for the report!

            regards, tom lane