Обсуждение: BUG #5478: ILIKE operator returns wrong result

Поиск
Список
Период
Сортировка

BUG #5478: ILIKE operator returns wrong result

От
"Markus"
Дата:
The following bug has been logged online:

Bug reference:      5478
Logged by:          Markus
Email address:      markus.herven@outpost24.com
PostgreSQL version: PostgreSQL 8.4.
Operating system:   Ubuntu 10.04
Description:        ILIKE operator returns wrong result
Details:

The following query

select 'ba' ilike '%__%';

return true as expected in 8.2 but false in 8.4.

Re: BUG #5478: ILIKE operator returns wrong result

От
Bruce Momjian
Дата:
Markus wrote:
>
> The following bug has been logged online:
>
> Bug reference:      5478
> Logged by:          Markus
> Email address:      markus.herven@outpost24.com
> PostgreSQL version: PostgreSQL 8.4.
> Operating system:   Ubuntu 10.04
> Description:        ILIKE operator returns wrong result
> Details:
>
> The following query
>
> select 'ba' ilike '%__%';
>
> return true as expected in 8.2 but false in 8.4.

I can confirm the odd behavior in current CVS:

    test=> select 'ba' ilike '%__%';
     ?column?
    ----------
     f
    (1 row)

    test=> select 'ba' like '__';
     ?column?
    ----------
     t
    (1 row)

    test=> select 'ba' like '__%';
     ?column?
    ----------
     t
    (1 row)

    test=> select 'ba' like '%_%';
     ?column?
    ----------
     t
    (1 row)

It seems to be the leading '%' it does not like.  Our docs clearly state
your syntax is recommended:

    LIKE pattern matching always covers the entire string. Therefore, to
    match a sequence anywhere within a string, the pattern must start and
    end with a percent sign.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

Re: BUG #5478: ILIKE operator returns wrong result

От
Tom Lane
Дата:
"Markus" <markus.herven@outpost24.com> writes:
> select 'ba' ilike '%__%';
> return true as expected in 8.2 but false in 8.4.

I have a feeling that this represents still another bug in the
special-case path for % followed by _ (cf bug #4821).  If so,
maybe we ought to just toss out that optimization?

            regards, tom lane

Re: BUG #5478: ILIKE operator returns wrong result

От
Bruce Momjian
Дата:
Tom Lane wrote:
> "Markus" <markus.herven@outpost24.com> writes:
> > select 'ba' ilike '%__%';
> > return true as expected in 8.2 but false in 8.4.
>
> I have a feeling that this represents still another bug in the
> special-case path for % followed by _ (cf bug #4821).  If so,
> maybe we ought to just toss out that optimization?

Yea, looks like it is this code in like_match.c:

    /* %_ is the same as _% - avoid matching _ repeatedly */

    do
    {
        NextChar(t, tlen);
        NextByte(p, plen);
    } while (tlen > 0 && plen > 0 && *p == '_');

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

Re: BUG #5478: ILIKE operator returns wrong result

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> I have a feeling that this represents still another bug in the
>> special-case path for % followed by _ (cf bug #4821).  If so,
>> maybe we ought to just toss out that optimization?

> Yea, looks like it is this code in like_match.c:

No, actually it's the bit right after that:

                /* Look for a place that matches the rest of the pattern */
                while (tlen > 0)
                {
                    int            matched = MatchText(t, tlen, p, plen);

                    if (matched != LIKE_FALSE)
                        return matched; /* TRUE or ABORT */

                    NextChar(t, tlen);
                }

If tlen == 0 when we reach this loop, we'll fall through and fail.
But that is wrong since we need to consider the possibility that
the remaining pattern can match a zero-length substring.  So the
loop needs to be changed to attempt a recursive MatchText for
tlen equal to zero as well as greater than zero.

            regards, tom lane

Re: BUG #5478: ILIKE operator returns wrong result

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> I have a feeling that this represents still another bug in the
> >> special-case path for % followed by _ (cf bug #4821).  If so,
> >> maybe we ought to just toss out that optimization?
>
> > Yea, looks like it is this code in like_match.c:
>
> No, actually it's the bit right after that:
>
>                 /* Look for a place that matches the rest of the pattern */
>                 while (tlen > 0)
>                 {
>                     int            matched = MatchText(t, tlen, p, plen);
>
>                     if (matched != LIKE_FALSE)
>                         return matched; /* TRUE or ABORT */
>
>                     NextChar(t, tlen);
>                 }
>
> If tlen == 0 when we reach this loop, we'll fall through and fail.
> But that is wrong since we need to consider the possibility that
> the remaining pattern can match a zero-length substring.  So the
> loop needs to be changed to attempt a recursive MatchText for
> tlen equal to zero as well as greater than zero.

I took a different approach.  I think the problem is that we check for
end of pattern without consuming '%' patterns.  I copied that consume
loop from code above that where we also test for end of pattern.

With the attached patch (which includes a regression test addition), it
works fine:

    test=> select 'ba' like '%__%';
     ?column?
    ----------
     t
    (1 row)

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
Index: src/backend/utils/adt/like_match.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/like_match.c,v
retrieving revision 1.27
diff -c -c -r1.27 like_match.c
*** src/backend/utils/adt/like_match.c    2 Jan 2010 16:57:54 -0000    1.27
--- src/backend/utils/adt/like_match.c    28 May 2010 15:36:09 -0000
***************
*** 139,144 ****
--- 139,146 ----
                   * n _'s matches any string of at least n characters, and we
                   * have now found there are at least n characters.
                   */
+                 while (plen > 0 && *p == '%')
+                     NextByte(p, plen);
                  if (plen <= 0)
                      return LIKE_TRUE;

Index: src/test/regress/expected/strings.out
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/expected/strings.out,v
retrieving revision 1.40
diff -c -c -r1.40 strings.out
*** src/test/regress/expected/strings.out    25 Jan 2010 20:55:32 -0000    1.40
--- src/test/regress/expected/strings.out    28 May 2010 15:36:12 -0000
***************
*** 943,948 ****
--- 943,954 ----
   t
  (1 row)

+ SELECT 'jack' LIKE '%____%' AS "true";
+  true
+ ------
+  t
+ (1 row)
+
  --
  -- test ILIKE (case-insensitive LIKE)
  -- Be sure to form every test as an ILIKE/NOT ILIKE pair.
Index: src/test/regress/sql/strings.sql
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/sql/strings.sql,v
retrieving revision 1.28
diff -c -c -r1.28 strings.sql
*** src/test/regress/sql/strings.sql    25 Jan 2010 20:55:32 -0000    1.28
--- src/test/regress/sql/strings.sql    28 May 2010 15:36:12 -0000
***************
*** 282,287 ****
--- 282,288 ----
  SELECT 'be_r' LIKE '__e__r' ESCAPE '_' AS "false";
  SELECT 'be_r' NOT LIKE '__e__r' ESCAPE '_' AS "true";

+ SELECT 'jack' LIKE '%____%' AS "true";

  --
  -- test ILIKE (case-insensitive LIKE)
***************
*** 310,316 ****
  SELECT 'foo' LIKE '__%' as t, 'foo' LIKE '___%' as t, 'foo' LIKE '____%' as f;
  SELECT 'foo' LIKE '%__' as t, 'foo' LIKE '%___' as t, 'foo' LIKE '%____' as f;

-
  --
  -- test implicit type conversion
  --
--- 311,316 ----

Re: BUG #5478: ILIKE operator returns wrong result

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> If tlen == 0 when we reach this loop, we'll fall through and fail.
>> But that is wrong since we need to consider the possibility that
>> the remaining pattern can match a zero-length substring.  So the
>> loop needs to be changed to attempt a recursive MatchText for
>> tlen equal to zero as well as greater than zero.

> I took a different approach.  I think the problem is that we check for
> end of pattern without consuming '%' patterns.  I copied that consume
> loop from code above that where we also test for end of pattern.

> With the attached patch (which includes a regression test addition), it
> works fine:

No, that patch is just plain wrong.  It eats %'s that would affect
the later recursive MatchText calls.

            regards, tom lane

Re: BUG #5478: ILIKE operator returns wrong result

От
Tom Lane
Дата:
I wrote:
> No, that patch is just plain wrong.  It eats %'s that would affect
> the later recursive MatchText calls.

Hmm ... actually, it's not wrong, but it's not good either.  What we
really ought to do here is not just eat _'s following %, but eat *any
mixture of* % and _, advancing over a text character per _.  The
subsequent search loop is reached only when we find a literal pattern
character to match.  This generalizes the original intention of not
using recursion to deal with simple advancing.

            regards, tom lane

Re: BUG #5478: ILIKE operator returns wrong result

От
Tom Lane
Дата:
BTW, while I'm looking at this, I notice that there was an oversight in
the change that made us throw an error for \ at the end of the LIKE
pattern.  We throw error in the first code chunk that deals with \
but we don't do so here:

                if (plen < 2)
                    return LIKE_FALSE;
                firstpat = CHAR(p[1]);

In some cases the problem is masked because we'll eventually apply the
normal \ processing, but I think there are other cases where we'll reach
a LIKE_ABORT condition and return false without ever throwing the error.
Seems like this should be fixed.  But should we back-patch that fix into
8.4?  We didn't backpatch the original change for fear of breaking
existing apps, and the same argument could probably be made this time.
Should I change it in 8.4, or only 9.0?

            regards, tom lane

Re: BUG #5478: ILIKE operator returns wrong result

От
Bruce Momjian
Дата:
Tom Lane wrote:
> BTW, while I'm looking at this, I notice that there was an oversight in
> the change that made us throw an error for \ at the end of the LIKE
> pattern.  We throw error in the first code chunk that deals with \
> but we don't do so here:
>
>                 if (plen < 2)
>                     return LIKE_FALSE;
>                 firstpat = CHAR(p[1]);
>
> In some cases the problem is masked because we'll eventually apply the
> normal \ processing, but I think there are other cases where we'll reach
> a LIKE_ABORT condition and return false without ever throwing the error.
> Seems like this should be fixed.  But should we back-patch that fix into
> 8.4?  We didn't backpatch the original change for fear of breaking
> existing apps, and the same argument could probably be made this time.
> Should I change it in 8.4, or only 9.0?

Tom has patch this and the fix will appear in the next minor release of
Postgres 8.3.X and 8.4.X.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com