Обсуждение: Precedence of NOT LIKE, NOT BETWEEN, etc

Поиск
Список
Период
Сортировка

Precedence of NOT LIKE, NOT BETWEEN, etc

От
Tom Lane
Дата:
While fooling around with testing operator precedence warnings,
I discovered that there's some existing precedence behavior that's
not at all what I expected.  Consider these examples (all done in
9.4; this is longstanding behavior):

regression=# SELECT 1 < '(3,4)'::point LIKE 42.0;
ERROR:  operator does not exist: point ~~ numeric
LINE 1: SELECT 1 < '(3,4)'::point LIKE 42.0;                                 ^

regression=# SELECT 1 < '(3,4)'::point NOT LIKE 42.0;
ERROR:  operator does not exist: integer < point
LINE 1: SELECT 1 < '(3,4)'::point NOT LIKE 42.0;                ^

regression=# SELECT 1 LIKE '(3,4)'::point < 42.0;
ERROR:  operator does not exist: integer ~~ point
LINE 1: SELECT 1 LIKE '(3,4)'::point < 42.0;                ^

regression=# SELECT 1 NOT LIKE '(3,4)'::point < 42.0;
ERROR:  operator does not exist: integer !~~ point
LINE 1: SELECT 1 NOT LIKE '(3,4)'::point < 42.0;                ^

These queries are silly of course, the point is just to expose which
operator the parser thinks binds more tightly.  And what we have above is:
LIKE binds more tightly than <, in either combination.  NOT LIKE binds
less tightly than < if < is on the left, but more tightly than < if <
is on the right.

This seems like a bug; it's certainly not what you'd call intuitive.

The cause, if I'm understanding the Bison manual correctly, is that
the NOT token has lower precedence than '<', so at the critical point
where it has scanned 1 < '(3,4)'::point and has to either reduce that
(thus binding < more tightly) or shift, it looks at the lookahead token
which is either LIKE or NOT in the first two examples above.  LIKE has
higher priority than '<' so in the first example it shifts; but NOT
has lower priority than '<' so in the second example it reduces.

The other productions starting with "a_expr NOT ..." also have the
same problem; that includes NOT ILIKE, NOT SIMILAR TO, NOT BETWEEN,
and NOT IN.

I'm not seeing any terribly pleasing ways to fix this.  Aside from
the option of doing nothing, it seems like these are the choices:

1. We could hack base_yylex() to reduce NOT LIKE to a single token
which could be given the same precedence as LIKE.  Ditto for the other
four cases.  This would result in nice behavior for these cases, but as
Robert has correctly pointed out, hacks in base_yylex() risk creating
other poorly-understood behaviors.

2. We could change the precedence levels so that LIKE/ILIKE/SIMILAR,
BETWEEN, and IN all have precedence just above NOT, which would pretty
much mask the problem since there would be no other tokens with in-between
precedence levels.  In the context of the operator precedence changes
I proposed earlier, this would mean inserting the IS tests and comparison
operators between IN_P and POSTFIXOP rather than where the
single-character comparison ops live now.  We would likely also have to
flatten LIKE/BETWEEN/IN into a single %nonassoc precedence level to avoid
having weird interactions among them.  This isn't terribly attractive
because it would risk larger behavioral changes than the previous
proposal.

Thoughts?
        regards, tom lane



Re: Precedence of NOT LIKE, NOT BETWEEN, etc

От
Tom Lane
Дата:
I wrote:
> I'm not seeing any terribly pleasing ways to fix this.  Aside from
> the option of doing nothing, it seems like these are the choices:

> 1. We could hack base_yylex() to reduce NOT LIKE to a single token
> which could be given the same precedence as LIKE.  Ditto for the other
> four cases.  This would result in nice behavior for these cases, but as
> Robert has correctly pointed out, hacks in base_yylex() risk creating
> other poorly-understood behaviors.

> 2. We could change the precedence levels so that LIKE/ILIKE/SIMILAR,
> BETWEEN, and IN all have precedence just above NOT, which would pretty
> much mask the problem since there would be no other tokens with in-between
> precedence levels.  In the context of the operator precedence changes
> I proposed earlier, this would mean inserting the IS tests and comparison
> operators between IN_P and POSTFIXOP rather than where the
> single-character comparison ops live now.  We would likely also have to
> flatten LIKE/BETWEEN/IN into a single %nonassoc precedence level to avoid
> having weird interactions among them.  This isn't terribly attractive
> because it would risk larger behavioral changes than the previous
> proposal.

I thought of another possibility:

3. Leave everything as-is but mark the NOT-operator productions as having
the precedence of NOT rather than of LIKE etc.  This would change the
behavior only for the NOT-LIKE-followed-by-< example, and would make the
two cases for NOT LIKE consistent though they'd remain inconsistent with
LIKE.  This behavior seems at least somewhat explainable/documentable
("NOT-foo operators have the precedence of NOT"), whereas what we have
seems about impossible to justify.
        regards, tom lane



Re: Precedence of NOT LIKE, NOT BETWEEN, etc

От
Tom Lane
Дата:
I wrote:
> I'm not seeing any terribly pleasing ways to fix this.  Aside from
> the option of doing nothing, it seems like these are the choices:

> 1. We could hack base_yylex() to reduce NOT LIKE to a single token
> which could be given the same precedence as LIKE.  Ditto for the other
> four cases.  This would result in nice behavior for these cases, but as
> Robert has correctly pointed out, hacks in base_yylex() risk creating
> other poorly-understood behaviors.

Indeed, that would create some misbehaviors.  An example is that because
LIKE is a type_func_name_keyword, it's legal to create a function named
like() and then do something like this:

regression=# select not like(42);

but that would fail if base_yylex() reduced the two tokens to one.

However, I've thought of a variant idea that seems pretty promising.
Instead of having base_yylex() reduce the two tokens to one, how
about making it replace the NOT with some new token, say NOT_HP
for "high priority NOT", but keep the lookahead token as a separate
token?  Then we would just need to add productions to the grammar
for any case where LIKE etc might validly follow NOT without being
part of the special syntaxes.  There's exactly one such case, namely
the "NOT a_expr" production, so this is a pretty cheap change.

I've experimented with the attached crude prototype patch and it seems
to do what's wanted here, while still accepting the "not like(42)"
construction.  There is a problem with the patch as it stands, which is
that I used the same NOT_HP token for all of LIKE ILIKE SIMILAR BETWEEN
and IN productions.  Because the grammar currently has BETWEEN and IN as
their own precedence levels, we'd need three different NOT-lookahead
tokens if we wanted NOT BETWEEN and NOT IN to share those precedence
levels.  I think that's probably overkill, especially if we're changing
nearby precedence levels.  What I propose we really do is adopt this
solution, but fold all of LIKE ILIKE SIMILAR BETWEEN and IN into a
single %nonassoc precedence level.  This would only change the behavior
for cases where you'd written two of these constructs adjacently without
parens, eg
    a LIKE b BETWEEN c AND d
which doesn't seem like something that would occur in working applications
anyway.

Another reason this is a prototype is that it does result in one change in
the regression test results:

*** 63,69 ****
  CONTEXT:  SQL function "test1"
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
      AS 'not even SQL';
! ERROR:  syntax error at or near "not"
  LINE 2:     AS 'not even SQL';
                  ^
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
--- 63,69 ----
  CONTEXT:  SQL function "test1"
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
      AS 'not even SQL';
! ERROR:  syntax error at or near "not even"
  LINE 2:     AS 'not even SQL';
                  ^
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL

The reason that's happening is that we aren't doing quite enough in
base_yylex to back up the lexer state completely after a lookahead.
You can exhibit similar oddities around the existing substitutions
such as "nulls first".  I think this can be fixed relatively cheaply
by looking at yyleng, but haven't experimented yet.

Also, it strikes me that we could significantly reduce, maybe even fully
eliminate, the funny behaviors around the existing base_yylex()
substitutions if we made them use the same idea, ie replace the leading
token with something special but keep the second token's separate
identity.  Unless somebody sees a hole in this idea, I'll probably go
do that and then come back to the precedence issues.

            regards, tom lane

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d91bedd..f60c863 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** static Node *makeRecursiveViewSelect(cha
*** 637,643 ****
   * list and so can never be entered directly.  The filter in parser.c
   * creates these tokens when required.
   */
! %token            NULLS_FIRST NULLS_LAST WITH_ORDINALITY WITH_TIME


  /* Precedence: lowest to highest */
--- 637,643 ----
   * list and so can never be entered directly.  The filter in parser.c
   * creates these tokens when required.
   */
! %token            NOT_HP NULLS_FIRST NULLS_LAST WITH_ORDINALITY WITH_TIME


  /* Precedence: lowest to highest */
*************** static Node *makeRecursiveViewSelect(cha
*** 649,656 ****
  %right        NOT
  %nonassoc    IS ISNULL NOTNULL    /* IS sets precedence for IS NULL, etc */
  %nonassoc    '<' '>' '=' LESS_EQUALS GREATER_EQUALS NOT_EQUALS
! %nonassoc    LIKE ILIKE SIMILAR
! %nonassoc    ESCAPE
  %nonassoc    OVERLAPS
  %nonassoc    BETWEEN
  %nonassoc    IN_P
--- 649,656 ----
  %right        NOT
  %nonassoc    IS ISNULL NOTNULL    /* IS sets precedence for IS NULL, etc */
  %nonassoc    '<' '>' '=' LESS_EQUALS GREATER_EQUALS NOT_EQUALS
! %nonassoc    LIKE ILIKE SIMILAR NOT_HP
! %nonassoc    ESCAPE            /* must be just above LIKE/ILIKE/SIMILAR */
  %nonassoc    OVERLAPS
  %nonassoc    BETWEEN
  %nonassoc    IN_P
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 11223,11228 ****
--- 11223,11230 ----
                  { $$ = makeOrExpr($1, $3, @2); }
              | NOT a_expr
                  { $$ = makeNotExpr($2, @1); }
+             | NOT_HP a_expr %prec NOT
+                 { $$ = makeNotExpr($2, @1); }

              | a_expr LIKE a_expr
                  {
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 11237,11248 ****
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_LIKE, "~~",
                                                     $1, (Node *) n, @2);
                  }
!             | a_expr NOT LIKE a_expr
                  {
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_LIKE, "!~~",
                                                     $1, $4, @2);
                  }
!             | a_expr NOT LIKE a_expr ESCAPE a_expr
                  {
                      FuncCall *n = makeFuncCall(SystemFuncName("like_escape"),
                                                 list_make2($4, $6),
--- 11239,11250 ----
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_LIKE, "~~",
                                                     $1, (Node *) n, @2);
                  }
!             | a_expr NOT_HP LIKE a_expr %prec NOT_HP
                  {
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_LIKE, "!~~",
                                                     $1, $4, @2);
                  }
!             | a_expr NOT_HP LIKE a_expr ESCAPE a_expr %prec NOT_HP
                  {
                      FuncCall *n = makeFuncCall(SystemFuncName("like_escape"),
                                                 list_make2($4, $6),
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 11263,11274 ****
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_ILIKE, "~~*",
                                                     $1, (Node *) n, @2);
                  }
!             | a_expr NOT ILIKE a_expr
                  {
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_ILIKE, "!~~*",
                                                     $1, $4, @2);
                  }
!             | a_expr NOT ILIKE a_expr ESCAPE a_expr
                  {
                      FuncCall *n = makeFuncCall(SystemFuncName("like_escape"),
                                                 list_make2($4, $6),
--- 11265,11276 ----
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_ILIKE, "~~*",
                                                     $1, (Node *) n, @2);
                  }
!             | a_expr NOT_HP ILIKE a_expr %prec NOT_HP
                  {
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_ILIKE, "!~~*",
                                                     $1, $4, @2);
                  }
!             | a_expr NOT_HP ILIKE a_expr ESCAPE a_expr
                  {
                      FuncCall *n = makeFuncCall(SystemFuncName("like_escape"),
                                                 list_make2($4, $6),
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 11293,11299 ****
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_SIMILAR, "~",
                                                     $1, (Node *) n, @2);
                  }
!             | a_expr NOT SIMILAR TO a_expr            %prec SIMILAR
                  {
                      FuncCall *n = makeFuncCall(SystemFuncName("similar_escape"),
                                                 list_make2($5, makeNullAConst(-1)),
--- 11295,11301 ----
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_SIMILAR, "~",
                                                     $1, (Node *) n, @2);
                  }
!             | a_expr NOT_HP SIMILAR TO a_expr            %prec NOT_HP
                  {
                      FuncCall *n = makeFuncCall(SystemFuncName("similar_escape"),
                                                 list_make2($5, makeNullAConst(-1)),
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 11301,11307 ****
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_SIMILAR, "!~",
                                                     $1, (Node *) n, @2);
                  }
!             | a_expr NOT SIMILAR TO a_expr ESCAPE a_expr
                  {
                      FuncCall *n = makeFuncCall(SystemFuncName("similar_escape"),
                                                 list_make2($5, $7),
--- 11303,11309 ----
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_SIMILAR, "!~",
                                                     $1, (Node *) n, @2);
                  }
!             | a_expr NOT_HP SIMILAR TO a_expr ESCAPE a_expr
                  {
                      FuncCall *n = makeFuncCall(SystemFuncName("similar_escape"),
                                                 list_make2($5, $7),
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 11441,11447 ****
                                                     (Node *) list_make2($4, $6),
                                                     @2);
                  }
!             | a_expr NOT BETWEEN opt_asymmetric b_expr AND b_expr    %prec BETWEEN
                  {
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_NOT_BETWEEN,
                                                     "NOT BETWEEN",
--- 11443,11449 ----
                                                     (Node *) list_make2($4, $6),
                                                     @2);
                  }
!             | a_expr NOT_HP BETWEEN opt_asymmetric b_expr AND b_expr    %prec NOT_HP
                  {
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_NOT_BETWEEN,
                                                     "NOT BETWEEN",
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 11457,11463 ****
                                                     (Node *) list_make2($4, $6),
                                                     @2);
                  }
!             | a_expr NOT BETWEEN SYMMETRIC b_expr AND b_expr        %prec BETWEEN
                  {
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_NOT_BETWEEN_SYM,
                                                     "NOT BETWEEN SYMMETRIC",
--- 11459,11465 ----
                                                     (Node *) list_make2($4, $6),
                                                     @2);
                  }
!             | a_expr NOT_HP BETWEEN SYMMETRIC b_expr AND b_expr        %prec NOT_HP
                  {
                      $$ = (Node *) makeSimpleA_Expr(AEXPR_NOT_BETWEEN_SYM,
                                                     "NOT BETWEEN SYMMETRIC",
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 11485,11491 ****
                          $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "=", $1, $3, @2);
                      }
                  }
!             | a_expr NOT IN_P in_expr
                  {
                      /* in_expr returns a SubLink or a list of a_exprs */
                      if (IsA($4, SubLink))
--- 11487,11493 ----
                          $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "=", $1, $3, @2);
                      }
                  }
!             | a_expr NOT_HP IN_P in_expr  %prec NOT_HP
                  {
                      /* in_expr returns a SubLink or a list of a_exprs */
                      if (IsA($4, SubLink))
*************** subquery_Op:
*** 12557,12567 ****
                      { $$ = $3; }
              | LIKE
                      { $$ = list_make1(makeString("~~")); }
!             | NOT LIKE
                      { $$ = list_make1(makeString("!~~")); }
              | ILIKE
                      { $$ = list_make1(makeString("~~*")); }
!             | NOT ILIKE
                      { $$ = list_make1(makeString("!~~*")); }
  /* cannot put SIMILAR TO here, because SIMILAR TO is a hack.
   * the regular expression is preprocessed by a function (similar_escape),
--- 12559,12569 ----
                      { $$ = $3; }
              | LIKE
                      { $$ = list_make1(makeString("~~")); }
!             | NOT_HP LIKE
                      { $$ = list_make1(makeString("!~~")); }
              | ILIKE
                      { $$ = list_make1(makeString("~~*")); }
!             | NOT_HP ILIKE
                      { $$ = list_make1(makeString("!~~*")); }
  /* cannot put SIMILAR TO here, because SIMILAR TO is a hack.
   * the regular expression is preprocessed by a function (similar_escape),
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index db49275..6170c0c 100644
*** a/src/backend/parser/parser.c
--- b/src/backend/parser/parser.c
*************** base_yylex(YYSTYPE *lvalp, YYLTYPE *lloc
*** 101,106 ****
--- 101,136 ----
      /* Do we need to look ahead for a possible multiword token? */
      switch (cur_token)
      {
+         case NOT:
+
+             /*
+              * If we have NOT LIKE etc, change NOT to NOT_HP
+              */
+             cur_yylval = lvalp->core_yystype;
+             cur_yylloc = *llocp;
+             next_token = core_yylex(&(lvalp->core_yystype), llocp, yyscanner);
+             switch (next_token)
+             {
+                 case BETWEEN:
+                 case IN_P:
+                 case LIKE:
+                 case ILIKE:
+                 case SIMILAR:
+                     cur_token = NOT_HP;
+                     break;
+                 default:
+                     break;
+             }
+             /* save the lookahead token for next time */
+             yyextra->lookahead_token = next_token;
+             yyextra->lookahead_yylval = lvalp->core_yystype;
+             yyextra->lookahead_yylloc = *llocp;
+             yyextra->have_lookahead = true;
+             /* and back up the output info to cur_token */
+             lvalp->core_yystype = cur_yylval;
+             *llocp = cur_yylloc;
+             break;
+
          case NULLS_P:

              /*

Re: Precedence of NOT LIKE, NOT BETWEEN, etc

От
Peter Eisentraut
Дата:
On 2/23/15 3:08 PM, Tom Lane wrote:
> I thought of another possibility:
> 
> 3. Leave everything as-is but mark the NOT-operator productions as having
> the precedence of NOT rather than of LIKE etc.  This would change the
> behavior only for the NOT-LIKE-followed-by-< example, and would make the
> two cases for NOT LIKE consistent though they'd remain inconsistent with
> LIKE.  This behavior seems at least somewhat explainable/documentable
> ("NOT-foo operators have the precedence of NOT"), whereas what we have
> seems about impossible to justify.

I don't like this third option.  If we're going to change anything, it
should be changed so that LIKE and NOT LIKE have the same precedence.

I realize that the other options are also ugly.




Re: Precedence of NOT LIKE, NOT BETWEEN, etc

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> On 2/23/15 3:08 PM, Tom Lane wrote:
>> I thought of another possibility:
>> 
>> 3. Leave everything as-is but mark the NOT-operator productions as having
>> the precedence of NOT rather than of LIKE etc.  This would change the
>> behavior only for the NOT-LIKE-followed-by-< example, and would make the
>> two cases for NOT LIKE consistent though they'd remain inconsistent with
>> LIKE.  This behavior seems at least somewhat explainable/documentable
>> ("NOT-foo operators have the precedence of NOT"), whereas what we have
>> seems about impossible to justify.

> I don't like this third option.  If we're going to change anything, it
> should be changed so that LIKE and NOT LIKE have the same precedence.

Yeah, I concur.  Working on patch to make that happen via token lookahead.
        regards, tom lane



Re: Precedence of NOT LIKE, NOT BETWEEN, etc

От
Greg Stark
Дата:
On Tue, Feb 24, 2015 at 5:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Also, it strikes me that we could significantly reduce, maybe even fully
> eliminate, the funny behaviors around the existing base_yylex()
> substitutions if we made them use the same idea, ie replace the leading
> token with something special but keep the second token's separate
> identity.  Unless somebody sees a hole in this idea, I'll probably go
> do that and then come back to the precedence issues.

IIRC that's exactly what the earlier patch for this did.

-- 
greg



Re: Precedence of NOT LIKE, NOT BETWEEN, etc

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> On Tue, Feb 24, 2015 at 5:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, it strikes me that we could significantly reduce, maybe even fully
>> eliminate, the funny behaviors around the existing base_yylex()
>> substitutions if we made them use the same idea, ie replace the leading
>> token with something special but keep the second token's separate
>> identity.  Unless somebody sees a hole in this idea, I'll probably go
>> do that and then come back to the precedence issues.

> IIRC that's exactly what the earlier patch for this did.

Right, see d809fd0008a2e26de463f47b7aba0365264078f3
        regards, tom lane



Re: Precedence of NOT LIKE, NOT BETWEEN, etc

От
Bruce Momjian
Дата:
On Wed, Apr  8, 2015 at 01:14:38PM -0400, Tom Lane wrote:
> Greg Stark <stark@mit.edu> writes:
> > On Tue, Feb 24, 2015 at 5:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Also, it strikes me that we could significantly reduce, maybe even fully
> >> eliminate, the funny behaviors around the existing base_yylex()
> >> substitutions if we made them use the same idea, ie replace the leading
> >> token with something special but keep the second token's separate
> >> identity.  Unless somebody sees a hole in this idea, I'll probably go
> >> do that and then come back to the precedence issues.
> 
> > IIRC that's exactly what the earlier patch for this did.
> 
> Right, see d809fd0008a2e26de463f47b7aba0365264078f3

Where are we on this?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Precedence of NOT LIKE, NOT BETWEEN, etc

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> On Wed, Apr  8, 2015 at 01:14:38PM -0400, Tom Lane wrote:
>> Greg Stark <stark@mit.edu> writes:
>>> On Tue, Feb 24, 2015 at 5:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Also, it strikes me that we could significantly reduce, maybe even fully
>>>> eliminate, the funny behaviors around the existing base_yylex()
>>>> substitutions if we made them use the same idea, ie replace the leading
>>>> token with something special but keep the second token's separate
>>>> identity.  Unless somebody sees a hole in this idea, I'll probably go
>>>> do that and then come back to the precedence issues.

>>> IIRC that's exactly what the earlier patch for this did.

>> Right, see d809fd0008a2e26de463f47b7aba0365264078f3

> Where are we on this?

It's done as far as seemed reasonable to push it.
        regards, tom lane