BUG? Regular expression matching of optional character group at beginning of RE

Поиск

Список

Период

Сортировка

От	Viktor Rosenfeld
Тема	BUG? Regular expression matching of optional character group at beginning of RE
Дата	6 июля 2012 г. 18:50:16
Msg-id	20120706184951.GA91000@client195-161.wlan.hu-berlin.de обсуждение исходный текст
Ответы	Re: BUG? Regular expression matching of optional character group at beginning of RE Re: BUG? Regular expression matching of optional character group at beginning of RE
Список	pgsql-general

Дерево обсуждения

Hi,

I've noticed that regular expressions which are anchored at the
beginning of the text but have an optional part at the beginning
(e.g. '^(ge)?kommen$') are not evaluated correctly if there is an
index on the column.

Consider the following table:

#+BEGIN_SRC sql
    CREATE TABLE annotation (
       id SERIAL PRIMARY KEY,
       name VARCHAR(20),
       value VARCHAR(20)
    );
#+END_SRC

And the following query:

#+BEGIN_SRC sql
SELECT count(*) FROM annotation WHERE name = 'lemma' AND value ~ '^(ge)?kommen$';
#+END_SRC

In my data set, this query should return 916 results; 911 rows match
'kommen' and 5 rows match 'gekommen'.

Here is the plan with an index on column name:

#+BEGIN_EXAMPLE
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Aggregate  (cost=20101.31..20101.32 rows=1 width=0)
   ->  Bitmap Heap Scan on annotation  (cost=282.55..20101.31 rows=2 width=0)
         Recheck Cond: ((name)::text = 'lemma'::text)
         Filter: ((value)::text ~ '^(ge)?kommen$'::text)
         ->  Bitmap Index Scan on idx_test_name  (cost=0.00..282.55 rows=15196 width=0)
               Index Cond: ((name)::text = 'lemma'::text)
#+END_EXAMPLE

This plan correctly retrieves 916 rows. However, if I create an index
on the column value, the plan is as follows:

#+BEGIN_EXAMPLE
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Aggregate  (cost=910.50..910.51 rows=1 width=0)
   ->  Bitmap Heap Scan on annotation  (cost=619.38..910.49 rows=2 width=0)
         Recheck Cond: ((name)::text = 'lemma'::text)
         Filter: ((value)::text ~ '^(ge)?kommen$'::text)
         ->  BitmapAnd  (cost=619.38..619.38 rows=76 width=0)
               ->  Bitmap Index Scan on idx_test_name  (cost=0.00..282.55 rows=15196 width=0)
                     Index Cond: ((name)::text = 'lemma'::text)
               ->  Bitmap Index Scan on idx_test_value  (cost=0.00..336.58 rows=15196 width=0)
                     Index Cond: (((value)::text ~>=~ 'ge'::text) AND ((value)::text ~<~ 'gf'::text))
#+END_EXAMPLE

This plan only retrieves the 5 rows matching 'gekommen'. Note the
usage of the index on the value column even though the first character
of the regular expression is not fixed.

The index creation command was:

#+BEGIN_SRC sql
CREATE INDEX idx_test_value ON annotation (value varchar_pattern_ops);
#+END_SRC

Note that I can force the correct evaluation of the regular expression
by prepending '(?e)'; however, this should not be necessary if I
understand the documentation correctly.

#+BEGIN_SRC sql
SELECT count(*) FROM annotation WHERE name = 'lemma' AND value ~ '(?e)^(ge)?kommen$';
#+END_SRC

I'm using PostgreSQL 9.1.4.

Cheers,
Viktor

В списке pgsql-general по дате отправления:

Предыдущее

От: John R Pierce
Дата: 06 июля 2012 г., 18:12:04
Сообщение: Re: index and data tablespaces on two separate drives or one RAID 0?

Следующее

От: Viktor Rosenfeld
Дата: 06 июля 2012 г., 18:54:13
Сообщение: Query runtime strongly dependent on generated statistics (and fewer statistics are better?)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

BUG? Regular expression matching of optional character group at beginning of RE

Предыдущее

Следующее