Patch for #2391: "Similar to" pattern matching does not operate as documented

Поиск
Список
Период
Сортировка
От Dhanaraj M
Тема Patch for #2391: "Similar to" pattern matching does not operate as documented
Дата
Msg-id 4445E5C6.50609@sun.com
обсуждение исходный текст
Ответы Re: Patch for #2391: "Similar to" pattern matching does not operate as documented
Список pgsql-patches
Hi

I attach the patch for this bug. I have run the regression test and
passed. please review this and waiting for your reply.

As explained in the mailing list, the parenthesis is appened when '|'
operator is used without parenthesis.

Thanks
Dhanaraj

===========================================================================

The following bug has been logged online:

Bug reference:      2391
Logged by:          Eric Noriega
Email address:      noriega ( at ) gwu ( dot ) edu
PostgreSQL version: 7.0.5
Operating system:   Linux Fedora core 4
Description:        "Similar to" pattern matching does not operate as
documented
Details:

As far as I can tell, this may be a bug in how the pattern matches.

db=# select 'tab' similar to '(a|b)';
 ?column?
----------
 f

db=# select 'tab' similar to 'a|b';
 ?column?
----------
 t

The doc says:  Like LIKE, the SIMILAR TO  operator succeeds only if its
pattern matches the entire string; this is unlike common regular expression
practice, wherein the pattern may match any part of the string.

If the second case is invalid as an expression (not clear in the
docs:Parentheses may be used to group items into a single logical item),
then the statement should fail, or return false, not return true.

=================================================================================

    * From: Tom Lane <tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us>
    * To: "Eric Noriega" <noriega ( at ) gwu ( dot ) edu>
    * Subject: Re: BUG #2391: "Similar to" pattern matching does not
operate as documented
    * Date: Thu, 13 Apr 2006 12:55:41 -0400

"Eric Noriega" <noriega ( at ) gwu ( dot ) edu> writes:
 > db=# select 'tab' similar to 'a|b';
 >  ?column?
 > ----------
 >  t

Yeah, this is a bug ... the cause can be seen by looking at the
underlying similar_escape() function, which converts a SIMILAR TO
pattern into a POSIX regex pattern:

regression=# select similar_escape('(a|b)', null);
 similar_escape
----------------
 ^(a|b)$
(1 row)

regression=# select similar_escape('a|b', null);
 similar_escape
----------------
 ^a|b$
(1 row)

regression=#

I believe that in the second case, ^ and $ bind more tightly than |
per POSIX rules.  So we need to put parens around the pattern to
prevent that.

Thanks for the report!

                        regards, tom lane

=========================================================================


*** ./src/backend/utils/adt/regexp.c.orig    Sun Mar  5 21:28:43 2006
--- ./src/backend/utils/adt/regexp.c    Wed Apr 19 12:23:30 2006
***************
*** 522,527 ****
--- 522,529 ----
                  elen;
      bool        afterescape = false;
      int            nquotes = 0;
+     int     index;
+     bool     paranFlag = false;

      /* This function is not strict, so must test explicitly */
      if (PG_ARGISNULL(0))
***************
*** 549,559 ****
                    errhint("Escape string must be empty or one character.")));
      }

      /* We need room for ^, $, and up to 2 output bytes per input byte */
!     result = (text *) palloc(VARHDRSZ + 2 + 2 * plen);
      r = VARDATA(result);

      *r++ = '^';

      while (plen > 0)
      {
--- 551,583 ----
                    errhint("Escape string must be empty or one character.")));
      }

+     /* Add parenthesis */
+     for(index = 0; index < plen; index++)
+     {
+         char    pchar = *(p+index);
+
+         if (pchar == '"' || pchar == '%' || pchar == '_' || pchar == '\\' ||
+             pchar == '.' || pchar == '?' || pchar == '{' || pchar == '(' || pchar == ')')
+         {
+             paranFlag = false;
+             break;
+         }
+         else if (pchar == '|')
+             paranFlag = true;
+
+     }
+
      /* We need room for ^, $, and up to 2 output bytes per input byte */
!     if (paranFlag == true)
!         result = (text *) palloc(VARHDRSZ + 2 + 2 + 2 * plen);
!     else
!         result = (text *) palloc(VARHDRSZ + 2 + 2 * plen);
!
      r = VARDATA(result);

      *r++ = '^';
+     if (paranFlag == true)
+         *r++ = '(';

      while (plen > 0)
      {
***************
*** 593,598 ****
--- 617,625 ----
          p++, plen--;
      }

+     if (paranFlag == true)
+         *r++ = ')';
+
      *r++ = '$';

      VARATT_SIZEP(result) = r - ((char *) result);

В списке pgsql-patches по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: pgstat: remove delayed destroy / pipe:
Следующее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: [BUGS] bug in windows xp