Re: benchmarking Flex practices

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: benchmarking Flex practices
Дата
Msg-id 16941.1562106939@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: benchmarking Flex practices  (John Naylor <john.naylor@2ndquadrant.com>)
Ответы Re: benchmarking Flex practices  (John Naylor <john.naylor@2ndquadrant.com>)
Re: benchmarking Flex practices  (John Naylor <john.naylor@2ndquadrant.com>)
Список pgsql-hackers
John Naylor <john.naylor@2ndquadrant.com> writes:
> 0001 is a small patch to remove some unneeded generality from the
> current rules. This lowers the number of elements in the yy_transition
> array from 37045 to 36201.

I don't particularly like 0001.  The two bits like this

-whitespace        ({space}+|{comment})
+whitespace        ({space}|{comment})

seem likely to create performance problems for runs of whitespace, in that
the lexer will now have to execute the associated action once per space
character not just once for the whole run.  Those actions are empty, but
I don't think flex optimizes for that, and it's really flex's per-action
overhead that I'm worried about.  Note the comment in the "Performance"
section of the flex manual:

    Another area where the user can increase a scanner's performance (and
    one that's easier to implement) arises from the fact that the longer
    the tokens matched, the faster the scanner will run.  This is because
    with long tokens the processing of most input characters takes place
    in the (short) inner scanning loop, and does not often have to go
    through the additional work of setting up the scanning environment
    (e.g., `yytext') for the action.

There are a bunch of higher-order productions that use "{whitespace}*",
which is surely a bit redundant given the contents of {whitespace}.
But maybe we could address that by replacing "{whitespace}*" with
"{opt_whitespace}" defined as

opt_whitespace        ({space}*|{comment})

Not sure what impact if any that'd have on table size, but I'm quite sure
that {whitespace} was defined with an eye to avoiding unnecessary
lexer action cycles.

As for the other two bits that are like

-<xe>.            {
-                    /* This is only needed for \ just before EOF */
+<xe>\\            {

my recollection is that those productions are defined that way to avoid a
flex warning about not all possible input characters being accounted for
in the <xe> (resp. <xdolq>) state.  Maybe that warning is
flex-version-dependent, or maybe this was just a worry and not something
that actually produced a warning ... but I'm hesitant to change it.
If we ever did get to flex's default action, that action is to echo the
current input character to stdout, which would be Very Bad.

As far as I can see, the point of 0002 is to have just one set of
flex rules for the various variants of quotecontinue processing.
That sounds OK, though I'm a bit surprised it makes this much difference
in the table size. I would suggest that "state_before" needs a less
generic name (maybe "state_before_xqs"?) and more than no comment.
Possibly more to the point, it's not okay to have static state variables
in the core scanner, so that variable needs to be kept in yyextra.
(Don't remember offhand whether it's any more acceptable in the other
scanners.)

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: "long" type is not appropriate for counting tuples
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Fix two issues after moving to unified logging system forcommand-line utils