Обсуждение: [BUGS] BUG #14512: Backslashes in LIKE

Поиск
Список
Период
Сортировка

[BUGS] BUG #14512: Backslashes in LIKE

От
vojta.rylko@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      14512
Logged by:          Vojtěch Rylko
Email address:      vojta.rylko@gmail.com
PostgreSQL version: 9.5.5
Operating system:   Linux 3.19.0-32-generic x86_64
Description:

Hi, LIKE behaves differently depending on left side.

> select 1 where '\' like '\\\'; -- one and three backslashes
 ?column? 
----------
(0 rows)

> select 1 where '\\' like '\\\'; -- two and three backslashes
ERROR:  LIKE pattern must not end with escape character


The same behaviour occurs also with usage of table:

root=# create table t (a varchar);
CREATE TABLE
root=# insert into t values ('\'); -- one backslash
INSERT 0 1
root=# select * from t where t.a like '\\\'; -- three backslashes
 a 
---
(0 rows)

root=# insert into t values ('\\'); -- two backslashes
INSERT 0 1
root=# select * from t where t.a like '\\\'; -- three backslashes
ERROR:  LIKE pattern must not end with escape character


Cheers,
Vojta Rylko, vry.cz


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14512: Backslashes in LIKE

От
"David G. Johnston"
Дата:
On Tue, Jan 24, 2017 at 10:25 AM, <vojta.rylko@gmail.com> wrote:
The following bug has been logged on the website:

Bug reference:      14512
Logged by:          Vojtěch Rylko
Email address:      vojta.rylko@gmail.com
PostgreSQL version: 9.5.5
Operating system:   Linux 3.19.0-32-generic x86_64
Description:

Hi, LIKE behaves differently depending on left side.

 
​???​
 
> select 1 where '\' like '\\\'; -- one and three backslashes
 ?column?
----------
(0 rows)

> select 1 where '\\' like '\\\'; -- two and three backslashes
ERROR:  LIKE pattern must not end with escape character


​The right-hand side is the "pattern" - i.e., <\\\> - which ends with the escape...

​David J.

Re: [BUGS] BUG #14512: Backslashes in LIKE

От
"David G. Johnston"
Дата:
On Tue, Jan 24, 2017 at 10:32 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, Jan 24, 2017 at 10:25 AM, <vojta.rylko@gmail.com> wrote:
The following bug has been logged on the website:

Bug reference:      14512
Logged by:          Vojtěch Rylko
Email address:      vojta.rylko@gmail.com
PostgreSQL version: 9.5.5
Operating system:   Linux 3.19.0-32-generic x86_64
Description:

Hi, LIKE behaves differently depending on left side.

 
​???​
 
> select 1 where '\' like '\\\'; -- one and three backslashes
 ?column?
----------
(0 rows)

> select 1 where '\\' like '\\\'; -- two and three backslashes
ERROR:  LIKE pattern must not end with escape character


​The right-hand side is the "pattern" - i.e., <\\\> - which ends with the escape...

​Never mind - I was multi-tasking and mis-read what you wrote...

I'll give it more attention when I have a moment if no one else chimes in first.

David J.​

Re: [BUGS] BUG #14512: Backslashes in LIKE

От
"David G. Johnston"
Дата:
On Tue, Jan 24, 2017 at 10:40 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, Jan 24, 2017 at 10:32 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, Jan 24, 2017 at 10:25 AM, <vojta.rylko@gmail.com> wrote:
The following bug has been logged on the website:

Bug reference:      14512
Logged by:          Vojtěch Rylko
Email address:      vojta.rylko@gmail.com
PostgreSQL version: 9.5.5
Operating system:   Linux 3.19.0-32-generic x86_64
Description:

Hi, LIKE behaves differently depending on left side.

 
​???​
 
> select 1 where '\' like '\\\'; -- one and three backslashes
 ?column?
----------
(0 rows)

> select 1 where '\\' like '\\\'; -- two and three backslashes
ERROR:  LIKE pattern must not end with escape character


​The right-hand side is the "pattern" - i.e., <\\\> - which ends with the escape...

​Never mind - I was multi-tasking and mis-read what you wrote...

I'll give it more attention when I have a moment if no one else chimes in first.


​Not a hacker but I'd say that the '\' LIKE '\\\' expression is encountering an invalid optimization that determines that the LIKE cannot succeed (due to string length differences, probably) - it too should fail like the other '\\' LIKE '\\\' example.

So, it is a "failure to fail" type of bug.  Confirmed using a 9.3.12 instance.

David J.​

Re: [BUGS] BUG #14512: Backslashes in LIKE

От
Tom Lane
Дата:
vojta.rylko@gmail.com writes:
> Hi, LIKE behaves differently depending on left side.

>> select 1 where '\' like '\\\'; -- one and three backslashes
>  ?column? 
> ----------
> (0 rows)

>> select 1 where '\\' like '\\\'; -- two and three backslashes
> ERROR:  LIKE pattern must not end with escape character

I see no bug here.  The pattern is wrong, but it happens not to notice in
the first case because it never reaches the buggy part of the pattern.

            regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14512: Backslashes in LIKE

От
"David G. Johnston"
Дата:
On Tue, Jan 24, 2017 at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
vojta.rylko@gmail.com writes:
> Hi, LIKE behaves differently depending on left side.

>> select 1 where '\' like '\\\'; -- one and three backslashes
>  ?column?
> ----------
> (0 rows)

>> select 1 where '\\' like '\\\'; -- two and three backslashes
> ERROR:  LIKE pattern must not end with escape character

I see no bug here.  The pattern is wrong, but it happens not to notice in
the first case because it never reaches the buggy part of the pattern.

Then consider a feature request that a malformed pattern be detected and fail independent of the data being checked. Such non-deterministic failure is at least a POLA violation and makes what should be a basically compile-time error into a run-time one.

I will agree that It is not a back-patchable bug (unless we decide to never fail and instead have a malformed pattern always return false - we'd at least be consistent - though probably not in a desirable way)  but would say it is a defect that should be addressed in v10.

David J.

Re: [BUGS] BUG #14512: Backslashes in LIKE

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Then consider a feature request that a malformed pattern be detected and
> fail independent of the data being checked. Such non-deterministic failure
> is at least a POLA violation and makes what should be a basically
> compile-time error into a run-time one.

Meh.  We could do something like the attached, but I think it would be a
net performance drag in practically all cases, and I doubt it is worth it.

            regards, tom lane

diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 91fe109..e94e64a 100644
*** a/src/backend/utils/adt/like.c
--- b/src/backend/utils/adt/like.c
*************** SB_lower_char(unsigned char c, pg_locale
*** 146,155 ****
--- 146,181 ----

  #include "like_match.c"

+ /*
+  * Check that pattern is legal (which reduces to checking that there is no
+  * backslash at the end).  We do this separately so that we'll throw an error
+  * for any invalid pattern, even if the matching logic doesn't ever examine
+  * all of it.  In consequence, the matching logic need not defend itself
+  * against invalid patterns.
+  */
+ static inline void
+ check_like_pattern(char *p, int plen)
+ {
+     /* This can be stupid even in multibyte encodings. */
+     while (plen-- > 0)
+     {
+         if (*p++ == '\\')
+         {
+             if (unlikely(plen == 0))
+                 ereport(ERROR,
+                         (errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
+                  errmsg("LIKE pattern must not end with escape character")));
+             /* otherwise ignore next character, even if it's backslash */
+             p++, plen--;
+         }
+     }
+ }
+
  /* Generic for all cases not requiring inline case-folding */
  static inline int
  GenericMatchText(char *s, int slen, char *p, int plen)
  {
+     check_like_pattern(p, plen);
      if (pg_database_encoding_max_length() == 1)
          return SB_MatchText(s, slen, p, plen, 0, true);
      else if (GetDatabaseEncoding() == PG_UTF8)
*************** Generic_Text_IC_like(text *str, text *pa
*** 179,184 ****
--- 205,211 ----
                                                      PointerGetDatum(pat)));
          p = VARDATA(pat);
          plen = (VARSIZE(pat) - VARHDRSZ);
+         check_like_pattern(p, plen);
          str = DatumGetTextP(DirectFunctionCall1Coll(lower, collation,
                                                      PointerGetDatum(str)));
          s = VARDATA(str);
*************** Generic_Text_IC_like(text *str, text *pa
*** 217,222 ****
--- 244,250 ----

          p = VARDATA_ANY(pat);
          plen = VARSIZE_ANY_EXHDR(pat);
+         check_like_pattern(p, plen);
          s = VARDATA_ANY(str);
          slen = VARSIZE_ANY_EXHDR(str);
          return SB_IMatchText(s, slen, p, plen, locale, locale_is_c);
*************** bytealike(PG_FUNCTION_ARGS)
*** 326,331 ****
--- 354,360 ----
      slen = VARSIZE_ANY_EXHDR(str);
      p = VARDATA_ANY(pat);
      plen = VARSIZE_ANY_EXHDR(pat);
+     check_like_pattern(p, plen);

      result = (SB_MatchText(s, slen, p, plen, 0, true) == LIKE_TRUE);

*************** byteanlike(PG_FUNCTION_ARGS)
*** 347,352 ****
--- 376,382 ----
      slen = VARSIZE_ANY_EXHDR(str);
      p = VARDATA_ANY(pat);
      plen = VARSIZE_ANY_EXHDR(pat);
+     check_like_pattern(p, plen);

      result = (SB_MatchText(s, slen, p, plen, 0, true) != LIKE_TRUE);

diff --git a/src/backend/utils/adt/like_match.c b/src/backend/utils/adt/like_match.c
index 1c37229..a69b41b 100644
*** a/src/backend/utils/adt/like_match.c
--- b/src/backend/utils/adt/like_match.c
*************** MatchText(char *t, int tlen, char *p, in
*** 99,110 ****
          if (*p == '\\')
          {
              /* Next pattern byte must match literally, whatever it is */
              NextByte(p, plen);
-             /* ... and there had better be one, per SQL standard */
-             if (plen <= 0)
-                 ereport(ERROR,
-                         (errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
-                  errmsg("LIKE pattern must not end with escape character")));
              if (GETCHAR(*p) != GETCHAR(*t))
                  return LIKE_FALSE;
          }
--- 99,106 ----
          if (*p == '\\')
          {
              /* Next pattern byte must match literally, whatever it is */
+             /* (and check_like_pattern() checked that there is one) */
              NextByte(p, plen);
              if (GETCHAR(*p) != GETCHAR(*t))
                  return LIKE_FALSE;
          }
*************** MatchText(char *t, int tlen, char *p, in
*** 160,172 ****
               * end of the text.
               */
              if (*p == '\\')
!             {
!                 if (plen < 2)
!                     ereport(ERROR,
!                             (errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
!                              errmsg("LIKE pattern must not end with escape character")));
!                 firstpat = GETCHAR(p[1]);
!             }
              else
                  firstpat = GETCHAR(*p);

--- 156,162 ----
               * end of the text.
               */
              if (*p == '\\')
!                 firstpat = GETCHAR(p[1]);        /* we know this is safe */
              else
                  firstpat = GETCHAR(*p);


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14512: Backslashes in LIKE

От
Vojtěch Rylko
Дата:
2017-01-24 19:15 GMT+01:00 David G. Johnston <david.g.johnston@gmail.com>:
>
> Then consider a feature request that a malformed pattern be detected and fail independent of the data being checked.
Suchnon-deterministic failure is at least a POLA violation and makes what should be a basically compile-time error into
arun-time one.
 

This is not pure compile-time "error" as pattern in LIKE could be
dynamic expression, for example:

root=# create table t (a varchar);
CREATE TABLE

root=# insert into t values ('\\\');
INSERT 0 1

root=# select * from t t1 cross join t t2 where t1.a like t2.a;
ERROR:  LIKE pattern must not end with escape character


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14512: Backslashes in LIKE

От
Vojtěch Rylko
Дата:
2017-01-24 18:48 GMT+01:00 David G. Johnston <david.g.johnston@gmail.com>:
​Not a hacker but I'd say that the '\' LIKE '\\\' expression is encountering an invalid optimization that determines that the LIKE cannot succeed (due to string length differences, probably) - it too should fail like the other '\\' LIKE '\\\' example.

So, it is a "failure to fail" type of bug.  Confirmed using a 9.3.12 instance.

From user perspective I see this bug quite similar to behaviour of boolean expression evaluation, where it is stated in documentation:

if the result of an expression can be determined by evaluating only some parts of it, then other subexpressions might not be evaluated at all -- 4.2.14. Expression Evaluation Rules

So I expect this:

root=# select 1 where '\\' like '\\\';
ERROR:  LIKE pattern must not end with escape character
root=# select 1 where false and '\\' like '\\\';
 ?column? 
----------
(0 rows)

same as I expect 

root=# select 1 where 1/0 = 0 and false;
ERROR:  division by zero
root=# select 1 where false and 1/0 = 0;
 ?column? 
----------
(0 rows)

(Note that examples above are not deterministic because of unspecified order of subexpressions evaluation in where clause.)

But reported behaviour confuses me as it seems like leaked internals of LIKE implementation.

Re: [BUGS] BUG #14512: Backslashes in LIKE

От
"David G. Johnston"
Дата:
On Wed, Jan 25, 2017 at 2:16 AM, Vojtěch Rylko <vojta.rylko@gmail.com> wrote:
2017-01-24 19:15 GMT+01:00 David G. Johnston <david.g.johnston@gmail.com>:
>
> Then consider a feature request that a malformed pattern be detected and fail independent of the data being checked. Such non-deterministic failure is at least a POLA violation and makes what should be a basically compile-time error into a run-time one.

This is not pure compile-time "error" as pattern in LIKE could be
dynamic expression, for example:

But now we're no longer talking about an expression of the form "LIKE constant".  In SQL whenever you decide to write a query that involves tables and columns you run the risk of introducing run-time bugs if you make assumptions about the contents of those columns that fail to hold.  So if you decide to write queries of that form you should define table t like so:

create table t (a varchar check (a !~ '\$')) -- which could be improved upon depending on your needs - but it would at least catch every actual invalid expression at the cost of disallowing valid ones.

David J.

Re: [BUGS] BUG #14512: Backslashes in LIKE

От
"David G. Johnston"
Дата:
On Wed, Jan 25, 2017 at 2:28 AM, Vojtěch Rylko <vojta.rylko@gmail.com> wrote:
2017-01-24 18:48 GMT+01:00 David G. Johnston <david.g.johnston@gmail.com>:
​Not a hacker but I'd say that the '\' LIKE '\\\' expression is encountering an invalid optimization that determines that the LIKE cannot succeed (due to string length differences, probably) - it too should fail like the other '\\' LIKE '\\\' example.

So, it is a "failure to fail" type of bug.  Confirmed using a 9.3.12 instance.

From user perspective I see this bug quite similar to behaviour of boolean expression evaluation, where it is stated in documentation:

if the result of an expression can be determined by evaluating only some parts of it, then other subexpressions might not be evaluated at all -- 4.2.14. Expression Evaluation Rules

So I expect this:

root=# select 1 where '\\' like '\\\';
ERROR:  LIKE pattern must not end with escape character
root=# select 1 where false and '\\' like '\\\';
 ?column? 
----------
(0 rows)

same as I expect 

root=# select 1 where 1/0 = 0 and false;
ERROR:  division by zero
root=# select 1 where false and 1/0 = 0;
 ?column? 
----------
(0 rows)

(Note that examples above are not deterministic because of unspecified order of subexpressions evaluation in where clause.)

But reported behaviour confuses me as it seems like leaked internals of LIKE implementation.


​I think we all agree that it does.  The opinions we are looking for are whether, given that you've written a correctly formed LIKE pattern, do you want every single instance of testing against that pattern to be preceded by a test that checks whether the given pattern is valid?​  While not measured it is a run-time cost that should return true in nearly all cases expect for development bugs.

I use RegEx a lot - I'm already used to the cost being built-in and, frankly, when doing string comparison work, I suspect that the order of magnitude such a pre-check would add would be nominal.

Given that any supposedly successful match against the pattern would fail in the case of a silly typo of this form I'm leaning more to the fact that having a bad pattern escape detection would be very difficult.  Patterns that check for invalid data are more at risk...

select 1 where 'abc\' like 'abc\'; -- fails, supposed to use \\ on the end of the pattern

Given time to think about it more I'm now leaning toward keeping the present behavior.

David J.


Re: [BUGS] BUG #14512: Backslashes in LIKE

От
Alex Malek
Дата:

Given the current behavior the same query will work or raise on error based on context.
That is pretty confusing.

Consider:

 foo=> CREATE TABLE bar (a varchar);
CREATE TABLE
foo=> SELECT * FROM bar WHERE a LIKE 'e\';
 a 
---
(0 rows)

foo=> INSERT INTO bar VALUES ('e');
INSERT 0 1
foo=> SELECT * FROM bar WHERE a LIKE 'e\';
 a 
---
(0 rows)

foo=> INSERT INTO bar VALUES ('ee');
INSERT 0 1
foo=> SELECT * FROM bar WHERE a LIKE 'e\';
ERROR:  LIKE pattern must not end with escape character

Re: [BUGS] BUG #14512: Backslashes in LIKE

От
"David G. Johnston"
Дата:
On Fri, Mar 17, 2017 at 11:16 AM, Alex Malek <magicagent@gmail.com> wrote:

Given the current behavior the same query will work or raise on error based on context.
That is pretty confusing.


​Recently discussed here:

In short - preventing a "fails-to-fail" scenario here doesn't seem worthy of the effort and run-time cost doing so would entail.

David J.

Re: [BUGS] BUG #14512: Backslashes in LIKE

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Fri, Mar 17, 2017 at 11:16 AM, Alex Malek <magicagent@gmail.com> wrote:
>> Given the current behavior the same query will work or raise on error
>> based on context.

> Recently discussed here:
>
https://www.postgresql.org/message-id/flat/20170124172505.1431.56735%40wrigleys.postgresql.org#20170124172505.1431.56735@wrigleys.postgresql.org
> In short - preventing a "fails-to-fail" scenario here doesn't seem worthy
> of the effort and run-time cost doing so would entail.

BTW, looking again at the patch I suggested in
https://www.postgresql.org/message-id/10287.1485286334%40sss.pgh.pa.us
it strikes me that the check logic was unnecessarily stupid.  What
we need to check is that there's not an odd number of backslashes at
the end of the pattern.  Since we know that backend encodings are
ASCII-safe, the test logic could be changed to scan backwards,
something like

    p += plen;
    nbackslash = 0;
    while (plen-- > 0)
    {
        if (*(--p) == '\\')
            nbackslash++;
        else
        break;
    }
    if (nbackslash & 1)
        ereport(ERROR, ...);

For patterns of practical interest this would be of small and nearly
constant cost.  Maybe it is worth doing, especially since we've now
had two independent complaints about it.

            regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs