Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
Дата
Msg-id CA+TgmoZq6SiDYgFrcSuKez_nCcW8V0FcN2_TbiymATfc1e--7Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Sun, Jun 23, 2013 at 10:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Fetter <david@fetter.org> writes:
>> On Sun, Jun 23, 2013 at 07:44:26AM -0700, Kevin Grittner wrote:
>>> I think it is OK if that gets a syntax error.  If that's the "worst
>>> case" I like this approach.
>
>> I think reducing the usefulness of error messages is something we need
>> to think extremely hard about before we do.  Is there maybe a way to
>> keep the error messages even if by some magic we manage to unreserve
>> the words?
>
> Of the alternatives discussed so far, I don't really like anything
> better than adding another special case to base_yylex().  Robert opined
> in the other thread about RESPECT NULLS that the potential failure cases
> with that approach are harder to reason about, which is true, but that
> doesn't mean that we should accept failures we know of because there
> might be failures we don't know of.

Sure, that's true; but the proposal on the other thread is just to
disallow invalid syntax early enough that it benefits the parser.  The
error message is different, but I don't think it's a BAD error
message.

> One thing that struck me while thinking about this is that it seems
> like we've been going about it wrong in base_yylex() in any case.
> For example, because we fold WITH followed by TIME into a special token
> WITH_TIME, we will fail on a statement beginning
>
>         WITH time AS ...
>
> even though "time" is a legal ColId.  But suppose that instead of
> merging the two tokens into one, we just changed the first token into
> something special; that is, base_yylex would return a special token
> WITH_FOLLOWED_BY_TIME and then TIME.  We could then fix the above
> problem by allowing either WITH or WITH_FOLLOWED_BY_TIME as the leading
> keyword of a statement; and similarly for the few other places where
> WITH can be followed by an arbitrary identifier.
>
> Going on the same principle, we could probably let FILTER be an
> unreserved keyword while FILTER_FOLLOWED_BY_PAREN could be a
> type_func_name_keyword.  (I've not tried this though.)

I think this whole direction is going to collapse under its own weight
VERY quickly.  The problems you're describing are essentially
shift/reduce conflicts that are invisible because they're hidden
behind lexer magic.  Part of the value of using a parser generator is
that it TELLS you when you've added ambiguous syntax.  But it doesn't
know about lexer hacks, so stuff will just silently break.  I think
this type of lexer hacks works reasonably well keyword-like things
that are used in just one place in the grammar.  As soon as you get up
to two, the wheels come off - as with RESPECT NULLS vs. NULLS FIRST.

> This idea doesn't help much for OVER because one of the alternatives for
> over_clause is "OVER ColId", and I doubt we want to have base_yylex know
> all the alternatives for ColId.  I also had no great success with the
> NULLS FIRST/LAST case: AFAICT the substitute token for NULLS still has
> to be fully reserved, meaning that something like "select nulls last"
> still doesn't work without quoting.  We could maybe fix that with enough
> denormalization of the index_elem productions, but it'd be ugly.

I don't think that particular example is very compelling - there's a
general rule that column aliases can't be keywords of any type.
That's not wonderful, and EnterpriseDB has had bug reports filed about
it, but the real-world impact is pretty minimal, certainly compared to
what we used to do which is not allow column aliases AT ALL.

> It'd sure be interesting to know what the SQL committee's target parsing
> algorithm is.  I find it hard to believe they're uninformed enough to
> not know that these random syntaxes they keep inventing are hard to deal
> with in LALR(1).  Or maybe they really don't give a damn about breaking
> applications every time they invent a new reserved word?

Does the SQL committee contemplate that SELECT * FROM somefunc()
filter (id, val) should act as a table alias and that SELECT * FROM
somefunc() filter (where x > 1) is an aggregate filter?  This all gets
much easier to understand if one of those constructs isn't allowed in
that particular context.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Hash partitioning.
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Hash partitioning.