Обсуждение: Lexer patch question
I am confused why the following change Tom made to scan.l works. Isn't that 'x' required so xqescape doesn't match '\x'? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: scan.l =================================================================== RCS file: /cvsroot/pgsql/src/backend/parser/scan.l,v retrieving revision 1.123 retrieving revision 1.124 diff -c -c -r1.123 -r1.124 *** scan.l 2 Jun 2005 01:23:08 -0000 1.123 --- scan.l 2 Jun 2005 17:45:17 -0000 1.124 *************** *** 193,199 **** xqstart {quote} xqdouble {quote}{quote} xqinside [^\\']+ ! xqescape [\\][^0-7x] xqoctesc [\\][0-7]{1,3} xqhexesc [\\]x[0-9A-Fa-f]{1,2} --- 193,199 ---- xqstart {quote} xqdouble {quote}{quote} xqinside [^\\']+ ! xqescape [\\][^0-7] xqoctesc [\\][0-7]{1,3} xqhexesc [\\]x[0-9A-Fa-f]{1,2}
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I am confused why the following change Tom made to scan.l works.
> Isn't that 'x' required so xqescape doesn't match '\x'?
> *** scan.l 2 Jun 2005 01:23:08 -0000 1.123
> --- scan.l 2 Jun 2005 17:45:17 -0000 1.124
> ***************
> *** 193,199 ****
> xqstart {quote}
> xqdouble {quote}{quote}
> xqinside [^\\']+
> ! xqescape [\\][^0-7x]
> xqoctesc [\\][0-7]{1,3}
> xqhexesc [\\]x[0-9A-Fa-f]{1,2}
> --- 193,199 ----
> xqstart {quote}
> xqdouble {quote}{quote}
> xqinside [^\\']+
> ! xqescape [\\][^0-7]
> xqoctesc [\\][0-7]{1,3}
> xqhexesc [\\]x[0-9A-Fa-f]{1,2}
No; if a match to xqhexesc is possible, the lexer will prefer that match
because it is longer. If a match to xqhexesc is not possible --- that
is, we have \x not followed by a hex digit --- then we *want* xqescape
to match. The original coding forced a backup to the <xq>. rule in this
situation, which is not how we want it to behave.
regards, tom lane
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I am confused why the following change Tom made to scan.l works.
> > Isn't that 'x' required so xqescape doesn't match '\x'?
>
> > *** scan.l 2 Jun 2005 01:23:08 -0000 1.123
> > --- scan.l 2 Jun 2005 17:45:17 -0000 1.124
> > ***************
> > *** 193,199 ****
> > xqstart {quote}
> > xqdouble {quote}{quote}
> > xqinside [^\\']+
> > ! xqescape [\\][^0-7x]
> > xqoctesc [\\][0-7]{1,3}
> > xqhexesc [\\]x[0-9A-Fa-f]{1,2}
>
> > --- 193,199 ----
> > xqstart {quote}
> > xqdouble {quote}{quote}
> > xqinside [^\\']+
> > ! xqescape [\\][^0-7]
> > xqoctesc [\\][0-7]{1,3}
> > xqhexesc [\\]x[0-9A-Fa-f]{1,2}
>
> No; if a match to xqhexesc is possible, the lexer will prefer that match
> because it is longer. If a match to xqhexesc is not possible --- that
> is, we have \x not followed by a hex digit --- then we *want* xqescape
> to match. The original coding forced a backup to the <xq>. rule in this
> situation, which is not how we want it to behave.
Oh, I didn't realize lexers would choose the longer token when given
multiple options.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Oh, I didn't realize lexers would choose the longer token when given
> multiple options.
See lines 95-100 in scan.l:
* OK, here is a short description of lex/flex rules behavior.
* The longest pattern which matches an input string is always chosen.
* For equal-length patterns, the first occurring in the rules list is chosen.
* INITIAL is the starting state, to which all non-conditional rules apply.
* Exclusive states change parsing rules while the state is active. When in
* an exclusive state, only those rules defined for that state apply.
regards, tom lane