Обсуждение: BUG #5478: ILIKE operator returns wrong result
The following bug has been logged online: Bug reference: 5478 Logged by: Markus Email address: markus.herven@outpost24.com PostgreSQL version: PostgreSQL 8.4. Operating system: Ubuntu 10.04 Description: ILIKE operator returns wrong result Details: The following query select 'ba' ilike '%__%'; return true as expected in 8.2 but false in 8.4.
Markus wrote: > > The following bug has been logged online: > > Bug reference: 5478 > Logged by: Markus > Email address: markus.herven@outpost24.com > PostgreSQL version: PostgreSQL 8.4. > Operating system: Ubuntu 10.04 > Description: ILIKE operator returns wrong result > Details: > > The following query > > select 'ba' ilike '%__%'; > > return true as expected in 8.2 but false in 8.4. I can confirm the odd behavior in current CVS: test=> select 'ba' ilike '%__%'; ?column? ---------- f (1 row) test=> select 'ba' like '__'; ?column? ---------- t (1 row) test=> select 'ba' like '__%'; ?column? ---------- t (1 row) test=> select 'ba' like '%_%'; ?column? ---------- t (1 row) It seems to be the leading '%' it does not like. Our docs clearly state your syntax is recommended: LIKE pattern matching always covers the entire string. Therefore, to match a sequence anywhere within a string, the pattern must start and end with a percent sign. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
"Markus" <markus.herven@outpost24.com> writes: > select 'ba' ilike '%__%'; > return true as expected in 8.2 but false in 8.4. I have a feeling that this represents still another bug in the special-case path for % followed by _ (cf bug #4821). If so, maybe we ought to just toss out that optimization? regards, tom lane
Tom Lane wrote: > "Markus" <markus.herven@outpost24.com> writes: > > select 'ba' ilike '%__%'; > > return true as expected in 8.2 but false in 8.4. > > I have a feeling that this represents still another bug in the > special-case path for % followed by _ (cf bug #4821). If so, > maybe we ought to just toss out that optimization? Yea, looks like it is this code in like_match.c: /* %_ is the same as _% - avoid matching _ repeatedly */ do { NextChar(t, tlen); NextByte(p, plen); } while (tlen > 0 && plen > 0 && *p == '_'); -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> I have a feeling that this represents still another bug in the >> special-case path for % followed by _ (cf bug #4821). If so, >> maybe we ought to just toss out that optimization? > Yea, looks like it is this code in like_match.c: No, actually it's the bit right after that: /* Look for a place that matches the rest of the pattern */ while (tlen > 0) { int matched = MatchText(t, tlen, p, plen); if (matched != LIKE_FALSE) return matched; /* TRUE or ABORT */ NextChar(t, tlen); } If tlen == 0 when we reach this loop, we'll fall through and fail. But that is wrong since we need to consider the possibility that the remaining pattern can match a zero-length substring. So the loop needs to be changed to attempt a recursive MatchText for tlen equal to zero as well as greater than zero. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> I have a feeling that this represents still another bug in the > >> special-case path for % followed by _ (cf bug #4821). If so, > >> maybe we ought to just toss out that optimization? > > > Yea, looks like it is this code in like_match.c: > > No, actually it's the bit right after that: > > /* Look for a place that matches the rest of the pattern */ > while (tlen > 0) > { > int matched = MatchText(t, tlen, p, plen); > > if (matched != LIKE_FALSE) > return matched; /* TRUE or ABORT */ > > NextChar(t, tlen); > } > > If tlen == 0 when we reach this loop, we'll fall through and fail. > But that is wrong since we need to consider the possibility that > the remaining pattern can match a zero-length substring. So the > loop needs to be changed to attempt a recursive MatchText for > tlen equal to zero as well as greater than zero. I took a different approach. I think the problem is that we check for end of pattern without consuming '%' patterns. I copied that consume loop from code above that where we also test for end of pattern. With the attached patch (which includes a regression test addition), it works fine: test=> select 'ba' like '%__%'; ?column? ---------- t (1 row) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com Index: src/backend/utils/adt/like_match.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/like_match.c,v retrieving revision 1.27 diff -c -c -r1.27 like_match.c *** src/backend/utils/adt/like_match.c 2 Jan 2010 16:57:54 -0000 1.27 --- src/backend/utils/adt/like_match.c 28 May 2010 15:36:09 -0000 *************** *** 139,144 **** --- 139,146 ---- * n _'s matches any string of at least n characters, and we * have now found there are at least n characters. */ + while (plen > 0 && *p == '%') + NextByte(p, plen); if (plen <= 0) return LIKE_TRUE; Index: src/test/regress/expected/strings.out =================================================================== RCS file: /cvsroot/pgsql/src/test/regress/expected/strings.out,v retrieving revision 1.40 diff -c -c -r1.40 strings.out *** src/test/regress/expected/strings.out 25 Jan 2010 20:55:32 -0000 1.40 --- src/test/regress/expected/strings.out 28 May 2010 15:36:12 -0000 *************** *** 943,948 **** --- 943,954 ---- t (1 row) + SELECT 'jack' LIKE '%____%' AS "true"; + true + ------ + t + (1 row) + -- -- test ILIKE (case-insensitive LIKE) -- Be sure to form every test as an ILIKE/NOT ILIKE pair. Index: src/test/regress/sql/strings.sql =================================================================== RCS file: /cvsroot/pgsql/src/test/regress/sql/strings.sql,v retrieving revision 1.28 diff -c -c -r1.28 strings.sql *** src/test/regress/sql/strings.sql 25 Jan 2010 20:55:32 -0000 1.28 --- src/test/regress/sql/strings.sql 28 May 2010 15:36:12 -0000 *************** *** 282,287 **** --- 282,288 ---- SELECT 'be_r' LIKE '__e__r' ESCAPE '_' AS "false"; SELECT 'be_r' NOT LIKE '__e__r' ESCAPE '_' AS "true"; + SELECT 'jack' LIKE '%____%' AS "true"; -- -- test ILIKE (case-insensitive LIKE) *************** *** 310,316 **** SELECT 'foo' LIKE '__%' as t, 'foo' LIKE '___%' as t, 'foo' LIKE '____%' as f; SELECT 'foo' LIKE '%__' as t, 'foo' LIKE '%___' as t, 'foo' LIKE '%____' as f; - -- -- test implicit type conversion -- --- 311,316 ----
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> If tlen == 0 when we reach this loop, we'll fall through and fail. >> But that is wrong since we need to consider the possibility that >> the remaining pattern can match a zero-length substring. So the >> loop needs to be changed to attempt a recursive MatchText for >> tlen equal to zero as well as greater than zero. > I took a different approach. I think the problem is that we check for > end of pattern without consuming '%' patterns. I copied that consume > loop from code above that where we also test for end of pattern. > With the attached patch (which includes a regression test addition), it > works fine: No, that patch is just plain wrong. It eats %'s that would affect the later recursive MatchText calls. regards, tom lane
I wrote: > No, that patch is just plain wrong. It eats %'s that would affect > the later recursive MatchText calls. Hmm ... actually, it's not wrong, but it's not good either. What we really ought to do here is not just eat _'s following %, but eat *any mixture of* % and _, advancing over a text character per _. The subsequent search loop is reached only when we find a literal pattern character to match. This generalizes the original intention of not using recursion to deal with simple advancing. regards, tom lane
BTW, while I'm looking at this, I notice that there was an oversight in the change that made us throw an error for \ at the end of the LIKE pattern. We throw error in the first code chunk that deals with \ but we don't do so here: if (plen < 2) return LIKE_FALSE; firstpat = CHAR(p[1]); In some cases the problem is masked because we'll eventually apply the normal \ processing, but I think there are other cases where we'll reach a LIKE_ABORT condition and return false without ever throwing the error. Seems like this should be fixed. But should we back-patch that fix into 8.4? We didn't backpatch the original change for fear of breaking existing apps, and the same argument could probably be made this time. Should I change it in 8.4, or only 9.0? regards, tom lane
Tom Lane wrote: > BTW, while I'm looking at this, I notice that there was an oversight in > the change that made us throw an error for \ at the end of the LIKE > pattern. We throw error in the first code chunk that deals with \ > but we don't do so here: > > if (plen < 2) > return LIKE_FALSE; > firstpat = CHAR(p[1]); > > In some cases the problem is masked because we'll eventually apply the > normal \ processing, but I think there are other cases where we'll reach > a LIKE_ABORT condition and return false without ever throwing the error. > Seems like this should be fixed. But should we back-patch that fix into > 8.4? We didn't backpatch the original change for fear of breaking > existing apps, and the same argument could probably be made this time. > Should I change it in 8.4, or only 9.0? Tom has patch this and the fix will appear in the next minor release of Postgres 8.3.X and 8.4.X. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com