Обсуждение: Summer of Code idea

Поиск
Список
Период
Сортировка

Summer of Code idea

От
"Jesper Pedersen"
Дата:
Hi.

I have been thinking about this for a while and now that Google Summer of Code
is coming I thought I would share this idea.

The GCC people have traded their bison/flex parser with a hand written
recursive-descent parser for a nice speed up.

So it would be interesting to see if PostgreSQL would benefit from the same
'switch'.

By the looks of it *) the job could be completed within the time frame and
maybe pgbench could serve as the testing framework for the performance
measurements.

I think it has an academic angle to it -- something fun for the student and
maybe a speed up for PostgreSQL :)

Just an idea...

Keep up the great work you are doing !

Best regards,Jesper

*) http://gcc.gnu.org/ml/gcc-patches/2004-10/msg01969.html



Re: Summer of Code idea

От
Martijn van Oosterhout
Дата:
On Tue, Apr 25, 2006 at 10:30:26PM +0200, Jesper Pedersen wrote:
> Hi.
>
> I have been thinking about this for a while and now that Google Summer of Code
> is coming I thought I would share this idea.
>
> The GCC people have traded their bison/flex parser with a hand written
> recursive-descent parser for a nice speed up.

Nice? The figures I'm seeing are 2%. Is that even noticable?

> So it would be interesting to see if PostgreSQL would benefit from the same
> 'switch'.
>
> By the looks of it *) the job could be completed within the time frame and
> maybe pgbench could serve as the testing framework for the performance
> measurements.

I think it's worth a try, but you have to consider that unlike C, SQL
as a language keeps changing in ways we have no idea about yet.
Whatever the result is, it has to be more maintainable than what we
have now...

> I think it has an academic angle to it -- something fun for the student and
> maybe a speed up for PostgreSQL :)

Abosolutly. It'd be a fun experiment, if one were so inclined.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Summer of Code idea

От
Alvaro Herrera
Дата:
Jesper Pedersen wrote:

> I have been thinking about this for a while and now that Google Summer of Code
> is coming I thought I would share this idea.
> 
> The GCC people have traded their bison/flex parser with a hand written
> recursive-descent parser for a nice speed up.
> 
> So it would be interesting to see if PostgreSQL would benefit from the same
> 'switch'.

We talked about it when GCC announced their switch.  The conclusion was
that our grammar is still too much a moving target, so it would be too
difficult to mantain such a grammar.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Summer of Code idea

От
"Jonah H. Harris"
Дата:
On 4/27/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> We talked about it when GCC announced their switch.  The conclusion was
> that our grammar is still too much a moving target, so it would be too
> difficult to mantain such a grammar.

For the sake of saying again, I already have a recursive-descent
parser for PostgreSQL written in a PCCTS grammar.  It's something I
started writing years ago, but I'd be willing to consider open
sourcing it if the PostgreSQL community will really entertain the
thought of switching.

Unfortunately, this discussion usually ends up with, "why would we
want to change what we have now when it already works?"

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324


Re: Summer of Code idea

От
Stephen Frost
Дата:
* Jonah H. Harris (jonah.harris@gmail.com) wrote:
> Unfortunately, this discussion usually ends up with, "why would we
> want to change what we have now when it already works?"

The answer to that can certainly be "performance" provided other factors
(such as maintainability) don't change much.  If you could show that
then I think such a switch would be very seriously considered.  Of
course, the performance improvment would have to be some substantial
amount (I would guess at least 5%, maybe 10%) to warrent the learning
curve associated with changing it.

If you're not interested or not able to do the performance comparison
then I guess you'd need to find someone who is and then work out if
they'd be willing to do the testing under some NDA in case it doesn't
pan out.
Thanks,
    Stephen

Re: Summer of Code idea

От
Tom Lane
Дата:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> Unfortunately, this discussion usually ends up with, "why would we
> want to change what we have now when it already works?"

... and is far more maintainable than an RD parser, and is not a
performance bottleneck.  I've never seen yyparse occupy as much as 2%
of a backend profile ...
        regards, tom lane


Re: Summer of Code idea

От
"Jonah H. Harris"
Дата:
On 4/27/06, Stephen Frost <sfrost@snowman.net> wrote:
> The answer to that can certainly be "performance" provided other factors
> (such as maintainability) don't change much.  If you could show that
> then I think such a switch would be very seriously considered.

IMHO, switching parser-types (and parser generators) is more about
maintainability than performance itself.  SQL is much nicer in
recursive descent where you don't have yacc/bison limitations such as
1 token of lookahead and non-ebnf grammars.  The sort-of odd thing is
that PCCTS (like its much younger brother ANTLR) is intended on
generating ASTs whereas yacc/bison requires you to build the parse
tree manually (as we do).

Don't get me wrong, this was taken into consideration with PCCTS, but
it's not as optimal or beautiful as it would be to have PCCTS itself
generate the parse tree.  Still, it's nicer to maintain than a
yacc/bison grammar.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324


Re: Summer of Code idea

От
"Jonah H. Harris"
Дата:
On 4/27/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ... and is far more maintainable than an RD parser, and is not a
> performance bottleneck.  I've never seen yyparse occupy as much as 2%
> of a backend profile ...

Not more maintainable by any stretch of the imagination.  For example,
try and remove the AS alias for columns in our bison grammar without
having to remove keywords from unreserved_keywords :)  In PCCTS (or a
hand-written RD parser), it's VERY easy because you can control the
lookahead as necessary.

Now, in terms of performance, it's hard to beat a LALR parser, but LL
parsers are comparable if done correctly; especially if written by
hand.  Besides, parsing itself isn't what kills us, it's the lack of
caching unprepared statements.  Yes, this is another topic in and of
itself, but I know there was discussion about it between you and Neil;
did anything ever come of it?

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324


Re: Summer of Code idea

От
Christopher Kings-Lynne
Дата:
> For the sake of saying again, I already have a recursive-descent
> parser for PostgreSQL written in a PCCTS grammar.  It's something I
> started writing years ago, but I'd be willing to consider open
> sourcing it if the PostgreSQL community will really entertain the
> thought of switching.
> 
> Unfortunately, this discussion usually ends up with, "why would we
> want to change what we have now when it already works?"

Is it faster?  How much faster?




Re: Summer of Code idea

От
"Jonah H. Harris"
Дата:
On 4/27/06, Christopher Kings-Lynne <chris.kings-lynne@calorieking.com> wrote:
> Is it faster?  How much faster?

I'm not sure, I haven't done direct timings on it vs. the bison
version.  When I wrote it, I wasn't really concerned with the time it
took to parse.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324


Re: Summer of Code idea

От
Alvaro Herrera
Дата:
Jonah H. Harris wrote:
> On 4/27/06, Christopher Kings-Lynne <chris.kings-lynne@calorieking.com> wrote:
> > Is it faster?  How much faster?
> 
> I'm not sure, I haven't done direct timings on it vs. the bison
> version.  When I wrote it, I wasn't really concerned with the time it
> took to parse.

Is the source easier to maintain?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Summer of Code idea

От
"Jonah H. Harris"
Дата:
On 4/27/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Is the source easier to maintain?

Yes, aside from extra lookahead, that was my main motivation.


--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324