RE: [HACKERS] Postgres' lexer
От | Ansley, Michael |
---|---|
Тема | RE: [HACKERS] Postgres' lexer |
Дата | |
Msg-id | 1BF7C7482189D211B03F00805F8527F748C02C@S-NATH-EXCH2 обсуждение исходный текст |
Список | pgsql-hackers |
>> > To my mind, without spaces this construction *is* ambiguous, and frankly >> > I'd have expected the second interpretation ('+-' is a single operator >> > name). Almost every computer language in the world uses "greedy" >> > tokenization where the next token is the longest series of characters >> > that can validly be a token. I don't regard the above behavior as >> > predictable, natural, nor obvious. In fact, I'd say it's a bug that >> > "3+-2" and "3+-x" are not lexed in the same way. >> > >> >> Completely agree with that. This differentiating behavior looks like a bug. >> >> > However, aside from arguing about whether the current behavior is good >> > or bad, these examples seem to indicate that it doesn't take an infinite >> > amount of lookahead to reproduce the behavior. It looks to me like we >> > could preserve the current behavior by parsing a '-' as a separate token >> > if it *immediately* precedes a digit, and otherwise allowing it to be >> > folded into the preceding operator. That could presumably be done >> > without VLTC. >> >> Ok. If we *have* to preserve old weird behavior, here is the patch. >> It is to be applied over all my other patches. Though if I were to >> decide whether to restore old behavior, I wouldn't do it. Because it >> is inconsistency in grammar, i.e. a bug. >> If a construct is ambiguous, then the behaviour should be undefined (i.e.: we can do what we like, within reason). If the user wants something predictable, then she should use brackets ;-) If 3+-2 presents an ambiguity (which it does) then make sure that you do this: 3+(-2). If you have an operator +- then you should do this (3)+-(2). However, if you have 3+-2 without brackets, then, because this is ambiguous (assuming no +- operator), this is undefined, and we can do pretty much whatever we feel like with it. Unless there is an operator +- defined, because then the behaviour is no longer ambiguous. The longest possible identifier is always matched, and this means that the +- will be identified. Especially with the unary minus, my feeling is that it should be placed in brackets if correct behaviour is desired. MikeA
В списке pgsql-hackers по дате отправления: