Обсуждение: first cut at PL/PgSQL table functions

Поиск
Список
Период
Сортировка

first cut at PL/PgSQL table functions

От
Neil Conway
Дата:
The attached patch is my first attempt at implementing table functions
for PL/PgSQL. The approach I used is (apparently) the method suggested
by Tom:

        - nodeFunctionscan.c special-cases set-returning PL/PgSQL
          functions. Rather than calling the SRF repeatedly until all
          the tuples have been produced, the SRF is called once and
          is expected to return a pointer to a tuple store containing
          the result set.

        - inside PL/PgSQL, the application developer can add another
          tuple to the result set by using RETURN NEXT. PL/PgSQL takes
          care of creating a tuple store, etc etc.

Still remaining to be done:

        - memory allocation: the tuple store needs to be accessible
          after the PL/PgSQL function returns, so it must be allocated
          in a sufficiently long-lived memory context (e.g. SPI_palloc
          won't work). I chose to just piggy back on
          TopTransactionContext, but there's likely a better way to do
          this...

        - syntax: the current patch uses 'NEXT', not 'RETURN
          NEXT'. That's because the latter syntax causes the parsing a
          regular RETURN statement to go haywire, for some reason I
          haven't yet determined. Any ideas on how to solve this
          (and/or use alternative syntax) would be welcome.

        - returning data: currently, the patch only allows you to call
          NEXT to return a tuple stored in a 'record' variable. This
          should be expanded to allow for set-returning scalar
          functions, as well as tuples stored in row-type variables,
          and perhaps even 'NEXT' followed by an SQL query. I'll be
          improving this shortly.

        - regression tests & documentation: this will be coming as
          soon as I've whipped the patch into better shape.

Here's an example set-returning function in PL/PgSQL:

create or replace function test_func() returns setof pg_class AS '
DECLARE
    r RECORD;
BEGIN
    FOR r in SELECT * FROM pg_class LOOP
        NEXT r;
    END LOOP;
    RETURN;
END;' language 'plpgsql';

Cheers,

Neil

--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC

Вложения

Re: first cut at PL/PgSQL table functions

От
Neil Conway
Дата:
Neil Conway <neilc@samurai.com> writes:
> The attached patch is my first attempt at implementing table functions
> for PL/PgSQL.

Oh, one more thing: the patch also includes a minor cleanup to the
FunctionMode enum, which I discussed with Joe Conway via private
email. The "use a tuple store provided by the function" mechanism is a
bit of a hack right now, but Joe and I discussed making it more
general (adding another FunctionMode for it, and allowing it to be
specified when executing CREATE FUNCTION). That will wait for 7.4,
however.

I also removed some spi.c functions that were just wrappers over the
appropriate "regular" postgres functions (e.g. SPI_pfree), and
replaced them with macros. IMHO that communicates the intent more
clearly.

Cheers,

Neil

--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC

Re: first cut at PL/PgSQL table functions

От
Tom Lane
Дата:
Neil Conway <neilc@samurai.com> writes:

[ new improved address, eh? ]

> The attached patch is my first attempt at implementing table functions
> for PL/PgSQL. The approach I used is (apparently) the method suggested
> by Tom:

>         - nodeFunctionscan.c special-cases set-returning PL/PgSQL
>           functions.

<<itch>> No, that wasn't something I'd ever have suggested.  The
nodeFunctionscan.c code should let the called function *tell* it which
return convention the function wants to use.  We don't want to prevent,
say, dblink.c from using this feature once it's in.

The cleanest way to handle this would be to add an additional field to
the ReturnSetInfo struct, into which the callee could store an enum
value indicating his desired convention.  (We'd initialize the field
to the default mode before call, so as to provide backwards
compatibility for existing SRFs that don't know to do this.)

This will need a little work since the ReturnSetInfo struct is presently
a few layers down in execQual.c, and thus not easily manipulated by
nodeFunctionscan.c.  I'm not sure whether we should modify execQual.c
to provide a more general API, or change things so that
nodeFunctionscan.c executes the function call directly (using execQual.c
only to evaluate the function parameters).  Comments welcome.

>         - memory allocation: the tuple store needs to be accessible
>           after the PL/PgSQL function returns, so it must be allocated
>           in a sufficiently long-lived memory context (e.g. SPI_palloc
>           won't work). I chose to just piggy back on
>           TopTransactionContext, but there's likely a better way to do
>           this...

It would probably be appropriate for ReturnSetInfo to include a field
showing which context to create the returned tuplestore in.  Or it might
do to just have the function return it in the calling context, in which
case SPI_palloc *would* work.  I need to look at the memory management
in nodeFunctionscan.c and see whether it needs revisions...

>         - syntax: the current patch uses 'NEXT', not 'RETURN
>           NEXT'. That's because the latter syntax causes the parsing a
>           regular RETURN statement to go haywire, for some reason I
>           haven't yet determined. Any ideas on how to solve this
>           (and/or use alternative syntax) would be welcome.

The plpgsql parser is pretty grotty, but I imagine we can fix this.
Will look.  In the meantime I agree you shouldn't let this block
progress on other issues.


Now that I've finally dug out from under Rod Taylor's patches, I'm
hoping to start reviewing other stuff that's in my to-look-at queue,
including all the SRF code.  Should be able to give you a hand soon.

            regards, tom lane

Re: first cut at PL/PgSQL table functions

От
Neil Conway
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> Neil Conway <neilc@samurai.com> writes:
>
> [ new improved address, eh? ]

heh, yeah :-) FYI, my old address will work for the next two weeks or
so, at which it will probably bounce...

> >         - nodeFunctionscan.c special-cases set-returning PL/PgSQL
> >           functions.
>
> <<itch>> No, that wasn't something I'd ever have suggested.  The
> nodeFunctionscan.c code should let the called function *tell* it
> which return convention the function wants to use.  We don't want to
> prevent, say, dblink.c from using this feature once it's in.

Yeah, I agree that special-casing PL/PgSQL is kludgy. Did you read the
addendum I posted to my original mail on -patches? Joe Conway and I
were discussing how to improve this mechanism -- the idea Joe
suggested was to add a new FunctionMode for "this function provides a
tuple store", and then add the necessary grammar smarts to make the
function mode a property of CREATE FUNCTION. Let me know if I haven't
explained Joe's idea in sufficient depth.

Any comment on the scheme we had been thinking of using, as opposed to
the design you suggested?

Also, we had intended that this could wait for 7.4, given how little
time there is remaining before the beta.

> >         - memory allocation

[...]

> It would probably be appropriate for ReturnSetInfo to include a
> field showing which context to create the returned tuplestore in.

Ok, that might work.

> Or it might do to just have the function return it in the calling
> context, in which case SPI_palloc *would* work.

I was originally allocating the tuple store in the same context used
by SPI_palloc(), but the backend was segfaulting after the PL/PgSQL
function returned (IIRC it was during the second call to
FunctionNext(), but my memory's a bit fuzzy). I haven't looked into
this a lot -- if you think SPI_palloc() is the right way to go, let me
know and I'll try to track down the problem.

Cheers,

Neil

--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC

Re: first cut at PL/PgSQL table functions

От
Tom Lane
Дата:
Neil Conway <neilc@samurai.com> writes:
> Yeah, I agree that special-casing PL/PgSQL is kludgy. Did you read the
> addendum I posted to my original mail on -patches? Joe Conway and I
> were discussing how to improve this mechanism -- the idea Joe
> suggested was to add a new FunctionMode for "this function provides a
> tuple store", and then add the necessary grammar smarts to make the
> function mode a property of CREATE FUNCTION. Let me know if I haven't
> explained Joe's idea in sufficient depth.

I see no reason to push it into the grammar.  Any one SRF is unlikely to
support more than one return mode (eg, plpgsql SRFs won't) so having the
function tell the appropriate mode on first call would be sufficient.
Also, if we push it into the grammar how do we get the user to get it
right?

> Also, we had intended that this could wait for 7.4, given how little
> time there is remaining before the beta.

No, this is something that has to be gotten right the first time, or
there are going to be compatibility constraints in the way of improving
it.

> I was originally allocating the tuple store in the same context used
> by SPI_palloc(), but the backend was segfaulting after the PL/PgSQL
> function returned (IIRC it was during the second call to
> FunctionNext(), but my memory's a bit fuzzy). I haven't looked into
> this a lot -- if you think SPI_palloc() is the right way to go, let me
> know and I'll try to track down the problem.

It's quite likely that that won't work at present, depending on what
sort of context-flinging is done in nodeFunctionscan.  I need to review
that code and see if I think the context handling is good or not.
Will try to look today.

            regards, tom lane

Re: first cut at PL/PgSQL table functions

От
Jan Wieck
Дата:
Neil Conway wrote:
>
> The attached patch is my first attempt at implementing table functions
> for PL/PgSQL.
> [...]
>
>         - inside PL/PgSQL, the application developer can add another
>           tuple to the result set by using RETURN NEXT. PL/PgSQL takes
>           care of creating a tuple store, etc etc.

If memory serves, the syntax another major database is using is

    RETURN ... AND RESUME;

And you're right, the parser is a nightmare. Sorry for that.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: first cut at PL/PgSQL table functions

От
Jan Wieck
Дата:
Tom Lane wrote:
>
> Neil Conway <neilc@samurai.com> writes:
> > Yeah, I agree that special-casing PL/PgSQL is kludgy. Did you read the
> > addendum I posted to my original mail on -patches? Joe Conway and I
> > were discussing how to improve this mechanism -- the idea Joe
> > suggested was to add a new FunctionMode for "this function provides a
> > tuple store", and then add the necessary grammar smarts to make the
> > function mode a property of CREATE FUNCTION. Let me know if I haven't
> > explained Joe's idea in sufficient depth.
>
> I see no reason to push it into the grammar.  Any one SRF is unlikely to
> support more than one return mode (eg, plpgsql SRFs won't) so having the
> function tell the appropriate mode on first call would be sufficient.
> Also, if we push it into the grammar how do we get the user to get it
> right?

I object too and object again.

A PL/pgSQL function might (in the future) want to return a refcursor on
one call, but use RETURN ... AND RESUME on another. So this has to be
done for every SET.

My original idea was to make a tuplestore part of the cursor (Portal
structure). This way the tuplestore access would be hidden behind the
fetching and the caller doesn't have to care what the function really
returns. Also it'd avoid the memory context problem, because the
tuplestore would be part of the Portal memory context and go away when
the cursor is closed.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: first cut at PL/PgSQL table functions

От
Joe Conway
Дата:
Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
>
>>Yeah, I agree that special-casing PL/PgSQL is kludgy. Did you read the
>>addendum I posted to my original mail on -patches? Joe Conway and I
>>were discussing how to improve this mechanism -- the idea Joe
>>suggested was to add a new FunctionMode for "this function provides a
>>tuple store", and then add the necessary grammar smarts to make the
>>function mode a property of CREATE FUNCTION. Let me know if I haven't
>>explained Joe's idea in sufficient depth.
>
> I see no reason to push it into the grammar.  Any one SRF is unlikely to
> support more than one return mode (eg, plpgsql SRFs won't) so having the
> function tell the appropriate mode on first call would be sufficient.
> Also, if we push it into the grammar how do we get the user to get it
> right?

Well one reason is that in the future we might want to be able to
specify a *sorted* FunctionMode and make use of that in the optimizer.
But I suppose the concept of "sorted" is orthogonal to the "mode" (i.e.
we can send sorted results either by steaming or as a tuplestore batch).

So how do we pass the mode info from the function back up the line to
FunctionNext? Would it make sense to add a member to ExprContext? It
gets passed in to the function as part of ReturnSetInfo.


> It's quite likely that that won't work at present, depending on what
> sort of context-flinging is done in nodeFunctionscan.  I need to review
> that code and see if I think the context handling is good or not.
> Will try to look today.

Does a good primer on proper backend memory-context handling exist? If
so I'd love a cheat-sheet version that I could laminate ;-)

Joe


Re: first cut at PL/PgSQL table functions

От
Tom Lane
Дата:
Joe Conway <mail@joeconway.com> writes:
> So how do we pass the mode info from the function back up the line to
> FunctionNext? Would it make sense to add a member to ExprContext?

Probably not, because the same ExprContext would be used for evaluation
of the function's arguments.  If you have
    select ... from foo(bar(22))
bar is not being called in a context that should accept a set result,
I think.  Possibly this example says that we'd better change the way
nodeFunctionscan.c uses execQual.c --- instead of letting
ExecEvalExpression take the whole expression, just use it to evaluate
the function arguments, and then call the function directly from
nodeFunctionscan.c.  (I have a vague recollection that you did it that
way originally and I encouraged you to use ExecEvalExpression instead;
looks like I was wrong...)

> Does a good primer on proper backend memory-context handling exist?

The original design document is in src/backend/utils/mmgr/README;
somebody needs to recast that into present tense and put it into the
Developer's Guide SGML docs.

If you read that and feel you understand it, next read
executor/nodeAgg.c and see if you follow the memory management there...
AFAIR that's the most complex use of short-term contexts in the system.

            regards, tom lane

Re: first cut at PL/PgSQL table functions

От
Joe Conway
Дата:
Jan Wieck wrote:
> A PL/pgSQL function might (in the future) want to return a refcursor on
> one call, but use RETURN ... AND RESUME on another. So this has to be
> done for every SET.

How? They are going to be two different datatypes.


> My original idea was to make a tuplestore part of the cursor (Portal
> structure). This way the tuplestore access would be hidden behind the
> fetching and the caller doesn't have to care what the function really
> returns. Also it'd avoid the memory context problem, because the
> tuplestore would be part of the Portal memory context and go away when
> the cursor is closed.

I don't understand your concern. In the current implementation, table
functions really have very little to do with portals. That was why
pretty early on (7 May) Tom asked me to "s/portal/function/ throughout
the patch".

In any case, what Neil has proposed does hide the tuplestore behind the
fetching. The user only declares the tuple return type like they would
have to anyway. If you're referring to the functionmode being added to
the grammar, I think Tom has talked us out of that already ;-)

Joe


Re: first cut at PL/PgSQL table functions

От
Jan Wieck
Дата:
Joe Conway wrote:
>
> Jan Wieck wrote:
> > A PL/pgSQL function might (in the future) want to return a refcursor on
> > one call, but use RETURN ... AND RESUME on another. So this has to be
> > done for every SET.
>
> How? They are going to be two different datatypes.

It might be that the function declaration once says that the function
returns a refcursor, another time that it returns SETOF something.
That's no good reason for me that a function declared SETOF something
cannot internally open a cursor and return that, as long as the cursors
result structure matches the return type. Also why shouldn't a function
returning a refcursor internally say RETURN ... AND RESUME?

It is an efficiency question when dealing with bigger result sets. A
function that returns a tuplestore has to materialize the entire result
set allways before it gets returned. A cursor as of today is an executor
on hold, that if it's a simple scan or join generates the requested
result tuples on the fly (it might materialize them as well because of a
final sort step).

> > My original idea was to make a tuplestore part of the cursor (Portal
> > structure). This way the tuplestore access would be hidden behind the
> > fetching and the caller doesn't have to care what the function really
> > returns. Also it'd avoid the memory context problem, because the
> > tuplestore would be part of the Portal memory context and go away when
> > the cursor is closed.
>
> I don't understand your concern. In the current implementation, table
> functions really have very little to do with portals. That was why
> pretty early on (7 May) Tom asked me to "s/portal/function/ throughout
> the patch".

And who cares about the current implementation? Joe, if "the current
implementation" would ever become a good point, I will consider leaving
the project because new features aren't welcome any more thereafter. It
is an excuse why things aren't implemented yet. But it is no good
argument to reject ideas.

>
> In any case, what Neil has proposed does hide the tuplestore behind the
> fetching. The user only declares the tuple return type like they would
> have to anyway. If you're referring to the functionmode being added to
> the grammar, I think Tom has talked us out of that already ;-)

Maybe Tom can talk some people into or out of something, but I allways
try to have my own opinion. That it is often similar to Tom's doesn't
surprise me, because he is a very gifted and talented engineer ;-)

I meant that the detail if one particular result set is a tuplestore,
one tuple at a time or a cursor should be expected to change from one
invocation (initial invocation, not repeated for fetching single tuples
for one result set) to another. If that should be allowed, and it should
be, then the type of result has to be part of the functions return.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: first cut at PL/PgSQL table functions

От
Joe Conway
Дата:
<Whew -- bad day to be away from my email for several hours!>

Jan Wieck wrote:
> It might be that the function declaration once says that the function
> returns a refcursor, another time that it returns SETOF something.
> That's no good reason for me that a function declared SETOF something
> cannot internally open a cursor and return that, as long as the cursors
> result structure matches the return type. Also why shouldn't a function
> returning a refcursor internally say RETURN ... AND RESUME?

Hmmm. Are you proposing that a plpgsql function which returns SETOF
might return  a cursor *or* a tuplestore, or a cursor *in place of* a
tuplestore? I agree that in the long term (i.e. after 7.3 is released)
we should support passing cursors, but I don't see it getting done in
time to support a 7.3 release. But we ought not preclude passing cursors
in the future.


> It is an efficiency question when dealing with bigger result sets. A
> function that returns a tuplestore has to materialize the entire result
> set allways before it gets returned. A cursor as of today is an executor
> on hold, that if it's a simple scan or join generates the requested
> result tuples on the fly (it might materialize them as well because of a
> final sort step).

Makes sense. We've discussed before three types of modes that we should
support:

1. materialized - that's the tuplestore method in place now.
2. stream - send everything straight on through (the big O calls this
    PIPE I believe). I previously proposed targeting this for post 7.3.
3. query - this one I'd forgotten about, but the idea was to use
    cursor support to allow start/stop/fetch etc, as you have described.
    I think this should also be post 7.3

> And who cares about the current implementation? Joe, if "the current
> implementation" would ever become a good point, I will consider leaving
> the project because new features aren't welcome any more thereafter. It
> is an excuse why things aren't implemented yet. But it is no good
> argument to reject ideas.

I didn't mean it as an argument that your ideas should be rejected (I
don't consider myself qualified to reject your ideas, just qualified
enough to comment on them ;-)). It was just an appeal to keep the scope
down to something we can get out in 7.3


> Maybe Tom can talk some people into or out of something, but I allways
> try to have my own opinion. That it is often similar to Tom's doesn't
> surprise me, because he is a very gifted and talented engineer ;-)

And we count on everyone to have their own opinions, otherwise this list
would be much more boring (and less productive) than it usually is! The
diversity of ideas here is one of the strengths of the project.


> I meant that the detail if one particular result set is a tuplestore,
> one tuple at a time or a cursor should be expected to change from one
> invocation (initial invocation, not repeated for fetching single tuples
> for one result set) to another. If that should be allowed, and it should
> be, then the type of result has to be part of the functions return.
>

Sure. I can see that point now. We just need to figure out the best way
to pass that info from the function back to FunctionNext.

Joe