Обсуждение: BUG #5478: ILIKE operator returns wrong result
The following bug has been logged online: Bug reference: 5478 Logged by: Markus Email address: markus.herven@outpost24.com PostgreSQL version: PostgreSQL 8.4. Operating system: Ubuntu 10.04 Description: ILIKE operator returns wrong result Details: The following query select 'ba' ilike '%__%'; return true as expected in 8.2 but false in 8.4.
Markus wrote:
>
> The following bug has been logged online:
>
> Bug reference: 5478
> Logged by: Markus
> Email address: markus.herven@outpost24.com
> PostgreSQL version: PostgreSQL 8.4.
> Operating system: Ubuntu 10.04
> Description: ILIKE operator returns wrong result
> Details:
>
> The following query
>
> select 'ba' ilike '%__%';
>
> return true as expected in 8.2 but false in 8.4.
I can confirm the odd behavior in current CVS:
test=> select 'ba' ilike '%__%';
?column?
----------
f
(1 row)
test=> select 'ba' like '__';
?column?
----------
t
(1 row)
test=> select 'ba' like '__%';
?column?
----------
t
(1 row)
test=> select 'ba' like '%_%';
?column?
----------
t
(1 row)
It seems to be the leading '%' it does not like. Our docs clearly state
your syntax is recommended:
LIKE pattern matching always covers the entire string. Therefore, to
match a sequence anywhere within a string, the pattern must start and
end with a percent sign.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
"Markus" <markus.herven@outpost24.com> writes:
> select 'ba' ilike '%__%';
> return true as expected in 8.2 but false in 8.4.
I have a feeling that this represents still another bug in the
special-case path for % followed by _ (cf bug #4821). If so,
maybe we ought to just toss out that optimization?
regards, tom lane
Tom Lane wrote:
> "Markus" <markus.herven@outpost24.com> writes:
> > select 'ba' ilike '%__%';
> > return true as expected in 8.2 but false in 8.4.
>
> I have a feeling that this represents still another bug in the
> special-case path for % followed by _ (cf bug #4821). If so,
> maybe we ought to just toss out that optimization?
Yea, looks like it is this code in like_match.c:
/* %_ is the same as _% - avoid matching _ repeatedly */
do
{
NextChar(t, tlen);
NextByte(p, plen);
} while (tlen > 0 && plen > 0 && *p == '_');
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> I have a feeling that this represents still another bug in the
>> special-case path for % followed by _ (cf bug #4821). If so,
>> maybe we ought to just toss out that optimization?
> Yea, looks like it is this code in like_match.c:
No, actually it's the bit right after that:
/* Look for a place that matches the rest of the pattern */
while (tlen > 0)
{
int matched = MatchText(t, tlen, p, plen);
if (matched != LIKE_FALSE)
return matched; /* TRUE or ABORT */
NextChar(t, tlen);
}
If tlen == 0 when we reach this loop, we'll fall through and fail.
But that is wrong since we need to consider the possibility that
the remaining pattern can match a zero-length substring. So the
loop needs to be changed to attempt a recursive MatchText for
tlen equal to zero as well as greater than zero.
regards, tom lane
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> I have a feeling that this represents still another bug in the
> >> special-case path for % followed by _ (cf bug #4821). If so,
> >> maybe we ought to just toss out that optimization?
>
> > Yea, looks like it is this code in like_match.c:
>
> No, actually it's the bit right after that:
>
> /* Look for a place that matches the rest of the pattern */
> while (tlen > 0)
> {
> int matched = MatchText(t, tlen, p, plen);
>
> if (matched != LIKE_FALSE)
> return matched; /* TRUE or ABORT */
>
> NextChar(t, tlen);
> }
>
> If tlen == 0 when we reach this loop, we'll fall through and fail.
> But that is wrong since we need to consider the possibility that
> the remaining pattern can match a zero-length substring. So the
> loop needs to be changed to attempt a recursive MatchText for
> tlen equal to zero as well as greater than zero.
I took a different approach. I think the problem is that we check for
end of pattern without consuming '%' patterns. I copied that consume
loop from code above that where we also test for end of pattern.
With the attached patch (which includes a regression test addition), it
works fine:
test=> select 'ba' like '%__%';
?column?
----------
t
(1 row)
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
Index: src/backend/utils/adt/like_match.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/like_match.c,v
retrieving revision 1.27
diff -c -c -r1.27 like_match.c
*** src/backend/utils/adt/like_match.c 2 Jan 2010 16:57:54 -0000 1.27
--- src/backend/utils/adt/like_match.c 28 May 2010 15:36:09 -0000
***************
*** 139,144 ****
--- 139,146 ----
* n _'s matches any string of at least n characters, and we
* have now found there are at least n characters.
*/
+ while (plen > 0 && *p == '%')
+ NextByte(p, plen);
if (plen <= 0)
return LIKE_TRUE;
Index: src/test/regress/expected/strings.out
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/expected/strings.out,v
retrieving revision 1.40
diff -c -c -r1.40 strings.out
*** src/test/regress/expected/strings.out 25 Jan 2010 20:55:32 -0000 1.40
--- src/test/regress/expected/strings.out 28 May 2010 15:36:12 -0000
***************
*** 943,948 ****
--- 943,954 ----
t
(1 row)
+ SELECT 'jack' LIKE '%____%' AS "true";
+ true
+ ------
+ t
+ (1 row)
+
--
-- test ILIKE (case-insensitive LIKE)
-- Be sure to form every test as an ILIKE/NOT ILIKE pair.
Index: src/test/regress/sql/strings.sql
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/sql/strings.sql,v
retrieving revision 1.28
diff -c -c -r1.28 strings.sql
*** src/test/regress/sql/strings.sql 25 Jan 2010 20:55:32 -0000 1.28
--- src/test/regress/sql/strings.sql 28 May 2010 15:36:12 -0000
***************
*** 282,287 ****
--- 282,288 ----
SELECT 'be_r' LIKE '__e__r' ESCAPE '_' AS "false";
SELECT 'be_r' NOT LIKE '__e__r' ESCAPE '_' AS "true";
+ SELECT 'jack' LIKE '%____%' AS "true";
--
-- test ILIKE (case-insensitive LIKE)
***************
*** 310,316 ****
SELECT 'foo' LIKE '__%' as t, 'foo' LIKE '___%' as t, 'foo' LIKE '____%' as f;
SELECT 'foo' LIKE '%__' as t, 'foo' LIKE '%___' as t, 'foo' LIKE '%____' as f;
-
--
-- test implicit type conversion
--
--- 311,316 ----
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> If tlen == 0 when we reach this loop, we'll fall through and fail.
>> But that is wrong since we need to consider the possibility that
>> the remaining pattern can match a zero-length substring. So the
>> loop needs to be changed to attempt a recursive MatchText for
>> tlen equal to zero as well as greater than zero.
> I took a different approach. I think the problem is that we check for
> end of pattern without consuming '%' patterns. I copied that consume
> loop from code above that where we also test for end of pattern.
> With the attached patch (which includes a regression test addition), it
> works fine:
No, that patch is just plain wrong. It eats %'s that would affect
the later recursive MatchText calls.
regards, tom lane
I wrote:
> No, that patch is just plain wrong. It eats %'s that would affect
> the later recursive MatchText calls.
Hmm ... actually, it's not wrong, but it's not good either. What we
really ought to do here is not just eat _'s following %, but eat *any
mixture of* % and _, advancing over a text character per _. The
subsequent search loop is reached only when we find a literal pattern
character to match. This generalizes the original intention of not
using recursion to deal with simple advancing.
regards, tom lane
BTW, while I'm looking at this, I notice that there was an oversight in
the change that made us throw an error for \ at the end of the LIKE
pattern. We throw error in the first code chunk that deals with \
but we don't do so here:
if (plen < 2)
return LIKE_FALSE;
firstpat = CHAR(p[1]);
In some cases the problem is masked because we'll eventually apply the
normal \ processing, but I think there are other cases where we'll reach
a LIKE_ABORT condition and return false without ever throwing the error.
Seems like this should be fixed. But should we back-patch that fix into
8.4? We didn't backpatch the original change for fear of breaking
existing apps, and the same argument could probably be made this time.
Should I change it in 8.4, or only 9.0?
regards, tom lane
Tom Lane wrote: > BTW, while I'm looking at this, I notice that there was an oversight in > the change that made us throw an error for \ at the end of the LIKE > pattern. We throw error in the first code chunk that deals with \ > but we don't do so here: > > if (plen < 2) > return LIKE_FALSE; > firstpat = CHAR(p[1]); > > In some cases the problem is masked because we'll eventually apply the > normal \ processing, but I think there are other cases where we'll reach > a LIKE_ABORT condition and return false without ever throwing the error. > Seems like this should be fixed. But should we back-patch that fix into > 8.4? We didn't backpatch the original change for fear of breaking > existing apps, and the same argument could probably be made this time. > Should I change it in 8.4, or only 9.0? Tom has patch this and the fix will appear in the next minor release of Postgres 8.3.X and 8.4.X. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com