Обсуждение: The flinfo->fn_extra question, from me this time.

Поиск

Список

Период

Сортировка

The flinfo->fn_extra question, from me this time.

От

Chapman Flack

Дата:

16 июня 2019 г., 01:04:04

Hi hackers,

I see evidence on this list that it's sort of a rite of passage
to ask the flinfo->fn_extra question, and my time has come.

So please let me know if I seem to correctly understand the limits
on its use.

I gather that various extensions use it to stash various things. But
(I assume) ... they will only touch fn_extra in FmgrInfo structs that
pertain to *their own functions*. (Please say that's true?)

IOW, it is assured that, if I am a language handler, when I am called
to handle a function in my language, fn_extra is mine to use as I please ...

... with the one big exception, if I am handling a function in my language
that returns a set, and I will use SFRM_ValuePerCall mode, I have to leave
fn_extra NULL before SRF_FIRSTCALL_INIT(), which plants its own gunk there,
and then I can stash my stuff in gunk->user_fctx for the duration of that
SRF call.

Does that seem to catch the essentials?

Thanks,
-Chap


p.s.: noticed in fmgr/README: "Note that simple "strict" functions can
ignore both isnull and args[i].isnull, since they won't even get called
when there are any TRUE values in args[].isnull."

I get why a strict function can ignore args[i].isnull, but is the part
about ignoring isnull a mistake? A strict function can be passed all
non-null arguments and still return null if it wants to, right?

Re: The flinfo->fn_extra question, from me this time.

От

Tom Lane

Дата:

16 июня 2019 г., 01:21:10

Chapman Flack <chap@anastigmatix.net> writes:
> So please let me know if I seem to correctly understand the limits
> on its use.

> I gather that various extensions use it to stash various things. But
> (I assume) ... they will only touch fn_extra in FmgrInfo structs that
> pertain to *their own functions*. (Please say that's true?)

> IOW, it is assured that, if I am a language handler, when I am called
> to handle a function in my language, fn_extra is mine to use as I please ...

Yup.

> ... with the one big exception, if I am handling a function in my language
> that returns a set, and I will use SFRM_ValuePerCall mode, I have to leave
> fn_extra NULL before SRF_FIRSTCALL_INIT(), which plants its own gunk there,
> and then I can stash my stuff in gunk->user_fctx for the duration of that
> SRF call.

Yup.  (Of course, you don't have to use the SRF_FIRSTCALL_INIT
infrastructure.)

Keep in mind that in most contexts, whatever you cache in fn_extra
will only be there for the life of the current query.

            regards, tom lane

Re: The flinfo->fn_extra question, from me this time.

От

Chapman Flack

Дата:

16 июня 2019 г., 01:46:55

On 06/15/19 21:21, Tom Lane wrote:
> Yup.  (Of course, you don't have to use the SRF_FIRSTCALL_INIT
> infrastructure.)

That had crossed my mind ... but it seems there's around 80 or 100
lines of good stuff there that'd be a shame to duplicate. If only
init_MultiFuncCall() took an extra void ** argument, and the stock
SRF_FIRSTCALL_INIT passed &(fcinfo->flinfo->fn_extra), seems like
most of it would be reusable. shutdown_MultiFuncCall would need to work
slightly differently, and a caller who wanted to be different would need
a customized variant of SRF_PERCALL_SETUP, but that's two lines.

Cheers,
-Chap

Re: The flinfo->fn_extra question, from me this time.

От

Chapman Flack

Дата:

21 июля 2019 г., 20:44:37

On 06/15/19 21:46, Chapman Flack wrote:
> On 06/15/19 21:21, Tom Lane wrote:
>> Yup.  (Of course, you don't have to use the SRF_FIRSTCALL_INIT
>> infrastructure.)
> 
> That had crossed my mind ... but it seems there's around 80 or 100
> lines of good stuff there that'd be a shame to duplicate. If only

I suppose that's only if I want to continue using SFRM_ValuePerCall mode.

SFRM_Materialize mode could remove a good deal of complexity currently
in PL/Java around managing memory contexts, SPI_connect, etc. through
multiple calls ... and I'd also have fn_extra all to myself.

Until now, I had assumed that SFRM_ValuePerCall mode might offer some
benefits, such as the possibility of pipelining certain queries and not
building up a whole tuplestore in advance.

But looking in the code, I'm getting the impression that those
benefits are only theoretical future ones, as ExecMakeTableFunctionResult
implements SFRM_ValuePerCall mode by ... repeatedly calling the function
to build up a whole tuplestore in advance.

Am I right about that? Are there other sites from which a SRF might be
called that I haven't found, where ValuePerCall mode might actually
support some form of pipelining? Are there actual cases where allowedModes
might not contain SFRM_Materialize?

Or is the ValuePerCall variant currently there just to support possible
future such cases, none of which exist at the moment?

Regards,
-Chap

Re: The flinfo->fn_extra question, from me this time.

От

Tom Lane

Дата:

21 июля 2019 г., 21:54:17

Chapman Flack <chap@anastigmatix.net> writes:
> Until now, I had assumed that SFRM_ValuePerCall mode might offer some
> benefits, such as the possibility of pipelining certain queries and not
> building up a whole tuplestore in advance.

> But looking in the code, I'm getting the impression that those
> benefits are only theoretical future ones, as ExecMakeTableFunctionResult
> implements SFRM_ValuePerCall mode by ... repeatedly calling the function
> to build up a whole tuplestore in advance.

Yes, that's the case for a SRF in FROM.  A SRF in the targetlist
actually does get the chance to pipeline, if it implements ValuePerCall.

The FROM case could be improved perhaps, if somebody wanted to put
time into it.  You'd still need to be prepared to build a tuplestore,
in case of rescan or backwards fetch; but in principle you could return
rows immediately while stashing them aside in a tuplestore.

            regards, tom lane

Re: The flinfo->fn_extra question, from me this time.

От

Dent John

Дата:

22 сентября 2019 г., 10:40:29

On 21 Jul 2019, at 22:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Chapman Flack <chap@anastigmatix.net> writes:
>> Until now, I had assumed that SFRM_ValuePerCall mode might offer some
>> benefits, such as the possibility of pipelining certain queries and not
>> building up a whole tuplestore in advance.
>
>> But looking in the code, I'm getting the impression that those
>> benefits are only theoretical future ones, as ExecMakeTableFunctionResult
>> implements SFRM_ValuePerCall mode by ... repeatedly calling the function
>> to build up a whole tuplestore in advance.
>
> Yes, that's the case for a SRF in FROM.  A SRF in the targetlist
> actually does get the chance to pipeline, if it implements ValuePerCall.
>
> The FROM case could be improved perhaps, if somebody wanted to put
> time into it.

While looking at whether REFCURSOR output could be pipelined into the executor [1], I’ve stumbled upon the same.

By any chance, do either of you know if there are initiatives to make the changes mentioned?

> You'd still need to be prepared to build a tuplestore,
> in case of rescan or backwards fetch; but […]

I’m also interested in your comment here. If the function was STABLE, could not the function scan simply be restarted?
(Ratherthan needing to create the tuplestore for all cases.) 

I guess perhaps the backwards scan is where it falls down though...

> […] in principle you could return
> rows immediately while stashing them aside in a tuplestore.

Does the planner have any view on this? When I first saw what was going on, I presumed the planner had decided the cost
ofmultiple function scans was greater than the cost of materialising it in a temporary store. 

It occurs to me that, if we made a switch towards pipelining the function scan results directly out, then we might be
looseefficiency where there are a significant number of scans and/or the function cost high. Is that why you were
suggestingto as well stash them aside? 

denty.

[1] https://www.postgresql.org/message-id/B2AFCAB5-FACD-44BF-963F-7DD2735FAB5D%40QQdd.eu

Re: The flinfo->fn_extra question, from me this time.

От

Tom Lane

Дата:

22 сентября 2019 г., 15:01:52

Dent John <denty@QQdd.eu> writes:
> On 21 Jul 2019, at 22:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Chapman Flack <chap@anastigmatix.net> writes:
>>> But looking in the code, I'm getting the impression that those
>>> benefits are only theoretical future ones, as ExecMakeTableFunctionResult
>>> implements SFRM_ValuePerCall mode by ... repeatedly calling the function
>>> to build up a whole tuplestore in advance.

>> Yes, that's the case for a SRF in FROM.  A SRF in the targetlist
>> actually does get the chance to pipeline, if it implements ValuePerCall.
>> The FROM case could be improved perhaps, if somebody wanted to put
>> time into it.

> By any chance, do either of you know if there are initiatives to make the changes mentioned?

I don't know of anybody working on it.

>> You'd still need to be prepared to build a tuplestore,
>> in case of rescan or backwards fetch; but […]

> I’m also interested in your comment here. If the function was STABLE, could not the function scan simply be
restarted?(Rather than needing to create the tuplestore for all cases.) 
> I guess perhaps the backwards scan is where it falls down though...

My point was that you can't simply remove the tuplestore-building code
path.  The exact boundary conditions for that might be negotiable.
But I'd be very dubious of an assumption that re-running the function
would be cheaper than building a tuplestore, regardless of whether it's
safe.

> Does the planner have any view on this?

cost_functionscan and cost_rescan would likely need some adjustment if
possible.  However, I'm not sure that the planner has any way to know
whether a given SRF will support ValuePerCall or not.

            regards, tom lane

Re: The flinfo->fn_extra question, from me this time.

От

Robert Haas

Дата:

24 сентября 2019 г., 18:09:32

On Sun, Jul 21, 2019 at 5:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The FROM case could be improved perhaps, if somebody wanted to put
> time into it.  You'd still need to be prepared to build a tuplestore,
> in case of rescan or backwards fetch; but in principle you could return
> rows immediately while stashing them aside in a tuplestore.

But you could skip it if you could prove that no rescans or backward
fetches are possible for a particular node, something that we also
want for Gather, as discussed not long ago.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: The flinfo->fn_extra question, from me this time.

От

Dent John

Дата:

05 октября 2019 г., 10:27:49

On 22 Sep 2019, at 16:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hi Tom,

I don't know of anybody working on it.

Okay. I had a look at this. I tried to apply Andre’s patch [1] from some time ago, but that turned out not so easy. I guess the code has moved on since. So I’ve attempted to re-invent the same spirit, stealing from his patch, and from how the tSRF code does things. The patch isn’t final, but it demonstrates a concept.

However, given your comments below, I wonder if you might comment on the approach before I go further?

(Patch is presently still against 12beta2.)

You'd still need to be prepared to build a tuplestore,
in case of rescan or backwards fetch; but […]

I do recognise this. The patch teaches ExecMaterializesOutput() and ExecSupportsBackwardScan() that T_FunctionScan nodes don't materialise their output.

(Actually, Andre’s patch did the educating of ExecMaterializesOutput() and ExecSupportsBackwardScan() — it’s not my invention.)

I haven’t worked out how to easily demonstrate the backward scan case, but joins (which presumably are the typical cause of rescan) now yield an intermediate Materialize node.

postgres=# explain (analyze, buffers) select * from unnest (array_fill ('scanner'::text, array[10])) t1, unnest (array_fill ('dummy'::text, array[10000000])) t2 limit 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.01..1.36 rows=100 width=64) (actual time=0.009..0.067 rows=100 loops=1)
   -> Nested Loop (cost=0.01..1350000.13 rows=100000000 width=64) (actual time=0.008..0.049 rows=100 loops=1)
   -> Function Scan on unnest t2 (cost=0.00..100000.00 rows=10000000 width=32) (actual time=0.003..0.006 rows=10 loops=1)
   -> Materialize (cost=0.00..0.15 rows=10 width=32) (actual time=0.000..0.002 rows=10 loops=10)
   -> Function Scan on unnest t1 (cost=0.00..0.10 rows=10 width=32) (actual time=0.001..0.004 rows=10 loops=1)
Planning Time: 127.875 ms
Execution Time: 0.102 ms
(7 rows)

My point was that you can't simply remove the tuplestore-building code
path. The exact boundary conditions for that might be negotiable.
But I'd be very dubious of an assumption that re-running the function
would be cheaper than building a tuplestore, regardless of whether it's
safe.

Understood, and I agree. I think it’s preferable to allow the planner control over when to explicitly materialise.

But if I’m not wrong, at present, the planner doesn’t really trade-off the cost of rescan versus materialisation, but instead adopts a simple heuristic of materialising one or other side during a join. We can see this in the plans if the unnest()s are moved into the target list and buried in a subquery. For example:

postgres=# explain (analyze, buffers) select * from (select unnest (array_fill ('scanner'::text, array[10]))) t1, (select unnest (array_fill ('dummy'::text, array[10000000]))) t2 limit 100;
   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.40 rows=100 width=64) (actual time=0.011..0.106 rows=100 loops=1)
   -> Nested Loop (cost=0.00..1400000.21 rows=100000000 width=64) (actual time=0.010..0.081 rows=100 loops=1)
   -> ProjectSet (cost=0.00..50000.02 rows=10000000 width=32) (actual time=0.004..0.024 rows=10 loops=1)
   -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
   -> Materialize (cost=0.00..0.22 rows=10 width=32) (actual time=0.001..0.002 rows=10 loops=10)
   -> ProjectSet (cost=0.00..0.07 rows=10 width=32) (actual time=0.001..0.004 rows=10 loops=1)
   -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
Planning Time: 180.482 ms
Execution Time: 0.148 ms
(9 rows)

I am tempted to stop short of educating the planner about the possibility to re-scan (thus dropping the materialise node) during a join. It seems feasible, and sometimes advantageous. (Perhaps if the join quals would cause a huge amount of the output to be filtered??) But it also seems better to treat it as an entirely separate issue.

cost_functionscan and cost_rescan would likely need some adjustment if
possible.

I looked at cost_functionscan(), but I think it is already of the view that a function can pipeline. It computes a startup_cost and a run_cost, where run_cost is the per-tuple cost * num_rows. With this understanding, it is actually wrong given the current materialisation-always behaviour. I think this means I don’t need to make any fundamental changes in order to correctly cost the new behaviour.

However, I'm not sure that the planner has any way to know
whether a given SRF will support ValuePerCall or not.

Yes. There is a flaw. But with the costing support function interface, it’s possible to supply costs that correctly relate to the SRF’s abilities.

I guess there can be a case where the SRF supports ValuePerCall, and supplies costs accordingly, but at execution time, decides to not to use it. That seems a curious situation, but it will, at worst, cost us a bit more buffer space.

In the opposite case, where the SRF can’t support ValuePerCall, the risk is that the planner has decided it wants to interject a Materialize node, and the result will be buffer-to-buffer copying. If the function has a costing support function, it should all be costed correctly, but it’s obviously not ideal. Currently, my patch doesn’t do anything about this case. My plan would be to allow the Materialize node to be supplied with a tuplestore from the FunctionScan node at execution time. I guess this optimisation would similarly help non-ValuePerCall tSRFs.

After all this, I’m wondering how you view the proposal?

For sake of comparison, 12beta1 achieves the following plans:

postgres=# create or replace function test1() returns setof record language plpgsql as $$ begin return query (select 'a', generate_series (1, 1e6)); end; $$; -- using plpgsql because it can’t pipeline
CREATE FUNCTION
postgres=# explain (verbose, analyse, buffers) select key, count (value), sum (value) from test1() as (key text, value numeric) group by key;
...
Planning Time: 0.068 ms
Execution Time: 589.651 ms

postgres=# explain (verbose, analyse, buffers) select * from test1() as (key text, value numeric) limit 50;
...
Planning Time: 0.059 ms
Execution Time: 348.334 ms

postgres=# explain (analyze, buffers) select count (a.a), sum (a.a) from unnest (array_fill (1::numeric, array[10000000])) a;
...
Planning Time: 165.502 ms
Execution Time: 5629.094 ms

postgres=# explain (analyze, buffers) select * from unnest (array_fill (1::numeric, array[10000000])) limit 50;
...
Planning Time: 110.952 ms
Execution Time: 1080.609 ms

Versus 12beta2+patch, which seem favourable in the main, at least for these pathological cases:

postgres=# explain (verbose, analyse, buffers) select key, count (value), sum (value) from test1() as (key text, value numeric) group by key;
...
Planning Time: 0.068 ms
Execution Time: 591.749 ms

postgres=# explain (verbose, analyse, buffers) select * from test1() as (key text, value numeric) limit 50;
...
Planning Time: 0.051 ms
Execution Time: 289.820 ms

postgres=# explain (analyze, buffers) select count (a.a), sum (a.a) from unnest (array_fill (1::numeric, array[10000000])) a;
...
Planning Time: 169.260 ms
Execution Time: 4759.781 ms

postgres=# explain (analyze, buffers) select * from unnest (array_fill (1::numeric, array[10000000])) limit 50;
...
Planning Time: 163.374 ms
Execution Time: 0.051 ms

denty.

[1] https://www.postgresql.org/message-id/20160822214023.aaxz5l4igypowyri%40alap3.anarazel.de

Вложения

pipeline-functionscan.patch

Re: The flinfo->fn_extra question, from me this time.

От

Dent John

Дата:

02 ноября 2019 г., 22:42:56

Hi,

Turns out — to my embarrassment — that pretty much all of the regression tests failed with my patch. No idea if anyone spotted that and withheld reply in revenge, but I wouldn’t blame if you did!

I have spent a bit more time on it. The attached patch is a much better show, though there are still a few regressions and undoubtedly it’s still rough.

(Attached patch is against 12.0.)

As was perhaps predictable, some of the regression tests do indeed break in the case of rescan. To cite the specific class of fail, it’s this:

SELECT * FROM (VALUES (1),(2),(3)) v(r), ROWS FROM( rngfunc_sql(11,11), rngfunc_mat(10+r,13) );
r | i  | s | i  | s
---+----+---+----+—
1 | 11 | 1 | 11 | 1
1 |    |   | 12 | 2
1 |    |   | 13 | 3
- 2 | 11 | 1 | 12 | 4
+ 2 | 11 | 2 | 12 | 4
2 |    |   | 13 | 5
- 3 | 11 | 1 | 13 | 6
+ 3 | 11 | 3 | 13 | 6
(6 rows)

The reason for the change is that ’s' comes from rngfunc_mat(), which computes s as nextval(). The patch currently prefers to re-execute the function in place of materialising it into a tuplestore.

Tom suggested not dropping the tuplestore creation logic. I can’t fathom a way of avoiding change for folk that have gotten used to the current behaviour without doing that. So I’m tempted to pipeline the rows back from a function (if it returns ValuePerCall), and also record it in a tuplestore, just in case rescan happens. There’s still wastage in this approach, but it would save the current behaviour, while stil enabling the early abort of ValuePerCall SRFs at relatively low cost, which is certainly one of my goals.

I’d welcome opinion on whether there are downsides the that approach, as I might move to integrate that next.

But I would also like to kick around ideas for how to avoid entirely the tuplestore.

Earlier, I suggested that we might make the decision logic prefer to materialise a tuplestore for VOLATILE functions, and prefer to pipeline directly from STABLE (and IMMUTABLE) functions. The docs on volatility categories describe that the optimiser will evaluate a VOLATILE function for every row it is needed, whereas it might cache STABLE and IMMUTABLE with greater aggression. It’s basically the polar opposite of what I want to achieve.

It is arguably also in conflict with current behaviour. I think we should make the docs clearer about that.

So, on second thoughts, I don’t think overloading the meaning of STABLE, et al., is the right thing to do. I wonder if we could invent a new modifier to CREATE FUNCTION, perhaps “PIPELINED”, which would simply declare a function's ability and preference for ValuePerCall mode.

Or perhaps modify the ROWS FROM extension, and adopt WITH’s [ NOT ] MATERIALIZED clause. For example, the following would achieve the above proposed behaviour:

ROWS FROM( rngfunc_sql(11,11) MATERIALIZED, rngfunc_mat(10+r,13) MATERIALIZED )

Of course, NOT MATERIALIZED would achieve ValuePerCall mode, and omit materialisation. I guess MATERIALIZED would have to be the default.

I wonder if another alternative would be to decide materialization based on what the outer plan includes. I guess we can tell if we’re part of a join, or if the plan requires the ability to scan backwards. Could that work?

denty.

Re: The flinfo->fn_extra question, from me this time.

От

Dent John

Дата:

03 ноября 2019 г., 11:51:14

(And here’s aforementioned attachment… doh.)

Вложения

pipeline-functionscan-v2.patch

Re: The flinfo->fn_extra question, from me this time.

От

Pavel Stehule

Дата:

03 ноября 2019 г., 13:33:01

ne 3. 11. 2019 v 12:51 odesílatel Dent John <denty@qqdd.eu> napsal:

(And here’s aforementioned attachment… doh.)

can be nice, if patch has some regress tests - it is good for memory refreshing what is target of patch.

Regards

Pavel

Re: The flinfo->fn_extra question, from me this time.

От

Dent John

Дата:

03 ноября 2019 г., 15:53:26

> On 3 Nov 2019, at 13:33, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> can be nice, if patch has some regress tests - it is good for memory refreshing what is target of patch.

With a suitably small work_mem constraint, it is possible to show the absence of buffers resulting from the tuplestore.
It’llneed some commentary explaining what is being looked for, and why. But it’s a good idea. 

I’ll take a look.

denty.

Re: The flinfo->fn_extra question, from me this time.

От

Dent John

Дата:

09 ноября 2019 г., 10:51:58

>> On 3 Nov 2019, at 13:33, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>
>> can be nice, if patch has some regress tests - it is good for memory refreshing what is target of patch.

I’ve updated the patch, and added some regression tests.

denty.

Вложения

pipeline-functionscan-v3.patch

Re: The flinfo->fn_extra question, from me this time.

От

Dent John

Дата:

08 декабря 2019 г., 20:33:02

Hi folks,

I’ve updated the patch, addressed the rescan issue, and restructured the tests.

I’ve taken a slightly different approach this time, re-using the (already pipeline-supporting) machinery of the Materialize node, and extended it to allow an SFRM_Materialize SRF to donate the tuplestore it returns. I feel this yields a better code structure, as well getting as more reuse.

It also opens up more informative and transparent EXPLAIN output. For example, the following shows Materialize explicitly, whereas previously a FunctionScan would have silently materialised the result of both generate_series() invocations.

postgres=# explain (analyze, costs off, timing off, summary off) 

select * from generate_series(11,15) r, generate_series(11,14) s;

                            QUERY PLAN                            

------------------------------------------------------------------

 Nested Loop (actual rows=20 loops=1)

   ->  Function Scan on generate_series s (actual rows=4 loops=1)

         ->  SRF Scan (actual rows=4 loops=1)

               SFRM: ValuePerCall

   ->  Function Scan on generate_series r (actual rows=5 loops=4)

         ->  Materialize (actual rows=5 loops=4)

               ->  SRF Scan (actual rows=5 loops=1)

                     SFRM: ValuePerCall

I also thought again about when to materialise, and particularly Robert’s suggestion[1] (which is in also this thread, but I didn’t originally understand the implication of). If I’m not wrong, between occasional explicit use of a Materialize node by the planner, and more careful observation of EXEC_FLAG_REWIND and EXEC_FLAG_BACKWARD in FunctionScan’s initialisation, we do actually have what is needed to pipeline without materialisation in at least some cases. There is not a mechanism to preferentially re-execute a SRF rather than materialise it, but because materialisation only seems to be necessary in the face of a join or a scrollable cursor, I’m not considering much of a problem anymore.

The EXPLAIN output needs a bit of work, costing is still a sore point, and it’s not quite as straight-line performant as my first attempt, as well as there undoubtedly being some unanticipated breakages and rough edges.

But the concept seems to work roughly as I intended (i.e., allowing FunctionScan to pipeline). Unless there are any objections, I will push it into the January commit fest for progressing.

(Revised patch attached.)

denty.

[1] https://www.postgresql.org/message-id/CA%2BTgmobw%2BPhNVciLesd-mQQ4As9D8L2-F7AiKqv465RhDkPf2Q%40mail.gmail.com

Вложения

pipeline-functionscan-v4.patch

Re: The flinfo->fn_extra question, from me this time.

От

Tom Lane

Дата:

28 января 2020 г., 02:57:57

Dent John <denty@QQdd.eu> writes:
> I’ve updated the patch, addressed the rescan issue, and restructured the tests.
> [ pipeline-functionscan-v4.patch ]

FWIW, this patch doesn't apply to HEAD anymore.  The cfbot
has failed to notice because it is still testing the v3 patch.
Apparently the formatting of this email is weird enough that
neither the archives nor the CF app notice the embedded patch.

Please fix and repost.

            regards, tom lane

Re: The flinfo->fn_extra question, from me this time.

От

Dent John

Дата:

28 января 2020 г., 08:58:51

Thanks Tom.

I’ll look at it. Probably won’t be able to until after the commitfest closes though.

d.

> On 28 Jan 2020, at 02:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Dent John <denty@QQdd.eu> writes:
>> I’ve updated the patch, addressed the rescan issue, and restructured the tests.
>> [ pipeline-functionscan-v4.patch ]
>
> FWIW, this patch doesn't apply to HEAD anymore.  The cfbot
> has failed to notice because it is still testing the v3 patch.
> Apparently the formatting of this email is weird enough that
> neither the archives nor the CF app notice the embedded patch.
>
> Please fix and repost.
>
>            regards, tom lane

Re: The flinfo->fn_extra question, from me this time.

От

Thomas Munro

Дата:

28 января 2020 г., 09:56:26

On Tue, Jan 28, 2020 at 9:59 PM Dent John <denty@qqdd.eu> wrote:
> I’ll look at it. Probably won’t be able to until after the commitfest closes though.

(We've seen that hidden attachment problem from Apple Mail before,
discussion of the MIME details in the archives somewhere. I have no
idea what GUI interaction causes that, but most Apple Mail attachments
seem to be fine.)

Here's a quick rebase in case it helps. I mostly applied fine (see
below). The conflicts were just Makefile and expected output files,
which I tried to do the obvious thing with. I had to add a #include
"access/tupdesc.h" to plannodes.h to make something compile (because
it uses TupleDesc). Passes check-world here.

$ gpatch --merge -p1 < ~/pipeline-functionscan-v4.patch
patching file src/backend/access/common/tupdesc.c
patching file src/backend/commands/explain.c
patching file src/backend/executor/Makefile
Hunk #1 NOT MERGED at 19-29.
patching file src/backend/executor/execAmi.c
patching file src/backend/executor/execProcnode.c
patching file src/backend/executor/execSRF.c
patching file src/backend/executor/nodeFunctionscan.c
Hunk #1 merged at 4-20.
patching file src/backend/executor/nodeMaterial.c
patching file src/backend/executor/nodeNestloop.c
patching file src/backend/executor/nodeProjectSet.c
patching file src/backend/executor/nodeSRFScan.c
patching file src/include/access/tupdesc.h
patching file src/include/executor/executor.h
patching file src/include/executor/nodeFunctionscan.h
patching file src/include/executor/nodeMaterial.h
patching file src/include/executor/nodeSRFScan.h
patching file src/include/nodes/execnodes.h
patching file src/include/nodes/nodes.h
patching file src/include/nodes/plannodes.h
patching file src/test/regress/expected/aggregates.out
patching file src/test/regress/expected/groupingsets.out
patching file src/test/regress/expected/inherit.out
patching file src/test/regress/expected/join.out
Hunk #1 NOT MERGED at 3078-3087.
Hunk #3 NOT MERGED at 3111-3120, merged at 3127.
patching file src/test/regress/expected/misc_functions.out
patching file src/test/regress/expected/pg_lsn.out
patching file src/test/regress/expected/plpgsql.out
patching file src/test/regress/expected/rangefuncs.out
patching file src/test/regress/expected/union.out
patching file src/test/regress/sql/plpgsql.sql
patching file src/test/regress/sql/rangefuncs.sql

Вложения

0001-pipeline-functionscan-v5.patch

Re: The flinfo->fn_extra question, from me this time.

От

Dent John

Дата:

01 февраля 2020 г., 09:09:32

> On 28 Jan 2020, at 09:56, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> ([…]  I have no
> idea what GUI interaction causes that, but most Apple Mail attachments
> seem to be fine.)

I gathered from the other thread that posting plain text seems to attach the patches in a way that’s more acceptable.
Seemsto work, but doesn’t explain exactly what the issue is, and I’m pretty sure I’ve not always had to go via the
“makeplain text” menu item before. 

> Here's a quick rebase in case it helps.  I mostly applied fine (see
> below).  The conflicts were just Makefile and expected output files,
> which I tried to do the obvious thing with.  I had to add a #include
> "access/tupdesc.h" to plannodes.h to make something compile (because
> it uses TupleDesc).  Passes check-world here.

Thanks a lot for doing that. I tried it against 530609a, and indeed it seems to work.

I’m also watching the polymorphic table functions light thread[0], which at first glance would also seems to make
usefulSRF RECORD-returning functions when employed in the SELECT list. It’s not doing what this patch does, but people
mighthappy enough to transform their queries into SELECT … FROM (SELECT fn(…)) to achieve pipelining, at least in the
shortterm. 

[0] https://www.postgresql.org/message-id/46a1cb32-e9c6-e7a8-f3c0-78e6b3f70cfe@2ndquadrant.com

denty.

Re: The flinfo->fn_extra question, from me this time.

От

Tom Lane

Дата:

12 марта 2020 г., 18:51:09

The cfbot is still not happy with this, because you're ignoring the
project style rule against C99-like mixing of code and declarations.
I went to fix that, and soon found that the code doesn't compile,
much less pass regression tests, with --enable-cassert.  That's
really a serious error on your part: basically, nobody should ever
do backend code development in non-cassert builds, because there is
too much useful error checking you forego that way.  (Performance
testing is a different matter ... but you need to make the code
work before you worry about speed.)

Anyway, attached is a marginal update that gets this to the point
where it should compile in the cfbot, but it'll still fail regression
tests there.  (At least on the Linux side.  I guess the cfbot's
Windows builds are sans cassert, which seems like an odd choice.)

I didn't want to spend any more effort on it than that, because I'm
not really on board with this line of attack.  This patch seems
awfully invasive for what it's accomplishing, both at the code level
and in terms of what users will see in EXPLAIN.  No, I don't think
that adding additional "SRF Scan" nodes below FunctionScan is an
improvement, nor do I like your repurposing/abusing of Materialize.
It might be okay if you were just using Materialize as-is, but if
it's sort-of-materialize-but-not-always, I don't think that's going
to make anyone less confused.

More locally, this business with creating new "plan nodes" below the
FunctionScan at executor startup is a real abuse of a whole lot of stuff,
and I suspect that it's not unrelated to the assertion failures I'm
seeing.  Don't do that.  If you want to build some data structures at
executor start, fine, but they're not plans and shouldn't be mislabeled as
that.  On the other hand, if they do need to be plan nodes, they should be
made by the planner (which in turn would require a lot of infrastructure
you haven't built, eg copyfuncs/outfuncs/readfuncs/setrefs/...).

The v3 patch seemed closer to the sort of thing I was expecting
to get out of this (though I've not read it in any detail).

            regards, tom lane

diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index 1e743d7..86bb80a 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "common/hashfn.h"
 #include "miscadmin.h"
+#include "parser/parse_coerce.h"
 #include "parser/parse_type.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -927,3 +928,53 @@ BuildDescFromLists(List *names, List *types, List *typmods, List *collations)

     return desc;
 }
+
+/*
+ * Check that function result tuple type (src_tupdesc) matches or can
+ * be considered to match what the query expects (dst_tupdesc). If
+ * they don't match, ereport.
+ *
+ * We really only care about number of attributes and data type.
+ * Also, we can ignore type mismatch on columns that are dropped in the
+ * destination type, so long as the physical storage matches.  This is
+ * helpful in some cases involving out-of-date cached plans.
+ */
+void
+tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc)
+{
+    int            i;
+
+    if (dst_tupdesc->natts != src_tupdesc->natts)
+        ereport(ERROR,
+                (errcode(ERRCODE_DATATYPE_MISMATCH),
+                 errmsg("function return row and query-specified return row do not match"),
+                 errdetail_plural("Returned row contains %d attribute, but query expects %d.",
+                                  "Returned row contains %d attributes, but query expects %d.",
+                                  src_tupdesc->natts,
+                                  src_tupdesc->natts, dst_tupdesc->natts)));
+
+    for (i = 0; i < dst_tupdesc->natts; i++)
+    {
+        Form_pg_attribute dattr = TupleDescAttr(dst_tupdesc, i);
+        Form_pg_attribute sattr = TupleDescAttr(src_tupdesc, i);
+
+        if (IsBinaryCoercible(sattr->atttypid, dattr->atttypid))
+            continue;            /* no worries */
+        if (!dattr->attisdropped)
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("function return row and query-specified return row do not match"),
+                     errdetail("Returned type %s at ordinal position %d, but query expects %s.",
+                               format_type_be(sattr->atttypid),
+                               i + 1,
+                               format_type_be(dattr->atttypid))));
+
+        if (dattr->attlen != sattr->attlen ||
+            dattr->attalign != sattr->attalign)
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("function return row and query-specified return row do not match"),
+                     errdetail("Physical storage mismatch on dropped attribute at ordinal position %d.",
+                               i + 1)));
+    }
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d901dc4..71e7ab5 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,8 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
 #include "nodes/extensible.h"
@@ -1182,6 +1184,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
         case T_SubqueryScan:
             pname = sname = "Subquery Scan";
             break;
+        case T_SRFScanPlan:
+            pname = sname = "SRF Scan";
+            break;
         case T_FunctionScan:
             pname = sname = "Function Scan";
             break;
@@ -1770,6 +1775,31 @@ ExplainNode(PlanState *planstate, List *ancestors,
                 }
             }
             break;
+        case T_SRFScanPlan:
+            if (es->analyze)
+            {
+                SRFScanState *sss = (SRFScanState *) planstate;
+
+                if (sss->setexpr)
+                {
+                    SetExprState *setexpr = (SetExprState *) sss->setexpr;
+                    FunctionCallInfo fcinfo = setexpr->fcinfo;
+                    ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+                    if (rsinfo)
+                    {
+                        ExplainPropertyText("SFRM",
+                            rsinfo->returnMode == SFRM_ValuePerCall ? "ValuePerCall" :
+                                rsinfo->returnMode == SFRM_Materialize ? "Materialize" :
+                                    "Unknown",
+                                            es);
+
+                        if (rsinfo->returnMode == SFRM_Materialize)
+                            ExplainPropertyBool("Donated tuplestore",
+                                                setexpr->funcResultStoreDonated, es);
+                    }
+                }
+            }
         case T_FunctionScan:
             if (es->verbose)
             {
@@ -2002,6 +2032,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
         IsA(plan, BitmapAnd) ||
         IsA(plan, BitmapOr) ||
         IsA(plan, SubqueryScan) ||
+        IsA(plan, FunctionScan) ||
         (IsA(planstate, CustomScanState) &&
          ((CustomScanState *) planstate)->custom_ps != NIL) ||
         planstate->subPlan;
@@ -2026,6 +2057,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
         ExplainNode(innerPlanState(planstate), ancestors,
                     "Inner", NULL, es);

+    /* FunctionScan subnodes */
+    if (IsA(planstate, FunctionScanState))
+        for(int i=0; i<((FunctionScanState *)planstate)->nfuncs; i++)
+        {
+            bool oldverbose = es->verbose;
+            es->verbose = false;
+            ExplainNode(&((FunctionScanState *)planstate)->funcstates[i].scanstate->ps,
+                        ancestors, "Function", NULL, es);
+            es->verbose = oldverbose;
+        }
+
     /* special child plans */
     switch (nodeTag(plan))
     {
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index a983800..9dae142 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -65,6 +65,7 @@ OBJS = \
     nodeSort.o \
     nodeSubplan.o \
     nodeSubqueryscan.o \
+    nodeSRFScan.o \
     nodeTableFuncscan.o \
     nodeTidscan.o \
     nodeUnique.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index b12aeb3..07ccca75 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -25,6 +25,7 @@
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "executor/nodeGather.h"
 #include "executor/nodeGatherMerge.h"
 #include "executor/nodeGroup.h"
@@ -204,6 +205,10 @@ ExecReScan(PlanState *node)
             ExecReScanFunctionScan((FunctionScanState *) node);
             break;

+        case T_SRFScanState:
+            ExecReScanSRF((SRFScanState *) node);
+            break;
+
         case T_TableFuncScanState:
             ExecReScanTableFuncScan((TableFuncScanState *) node);
             break;
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 7b2e84f..da39593 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -83,6 +83,7 @@
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "executor/nodeGather.h"
 #include "executor/nodeGatherMerge.h"
 #include "executor/nodeGroup.h"
@@ -252,6 +253,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
                                                          estate, eflags);
             break;

+        case T_SRFScanPlan:
+            result = (PlanState *) ExecInitSRFScan((SRFScanPlan *) node,
+                                                         estate, eflags);
+            break;
+
         case T_ValuesScan:
             result = (PlanState *) ExecInitValuesScan((ValuesScan *) node,
                                                       estate, eflags);
@@ -639,6 +645,10 @@ ExecEndNode(PlanState *node)
             ExecEndFunctionScan((FunctionScanState *) node);
             break;

+        case T_SRFScanState:
+            ExecEndSRFScan((SRFScanState *) node);
+            break;
+
         case T_TableFuncScanState:
             ExecEndTableFuncScan((TableFuncScanState *) node);
             break;
diff --git a/src/backend/executor/execSRF.c b/src/backend/executor/execSRF.c
index 2312cc7..e29296a 100644
--- a/src/backend/executor/execSRF.c
+++ b/src/backend/executor/execSRF.c
@@ -21,6 +21,9 @@
 #include "access/htup_details.h"
 #include "catalog/objectaccess.h"
 #include "executor/execdebug.h"
+#include "executor/nodeMaterial.h"
+#include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
@@ -44,17 +47,17 @@ static void ExecPrepareTuplestoreResult(SetExprState *sexpr,
                                         ExprContext *econtext,
                                         Tuplestorestate *resultStore,
                                         TupleDesc resultDesc);
-static void tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc);


 /*
- * Prepare function call in FROM (ROWS FROM) for execution.
+ * Prepare function call in FROM (ROWS FROM) or targetlist SRF function
+ * call for execution for execution.
  *
- * This is used by nodeFunctionscan.c.
+ * This is used by nodeFunctionscan.c and nodeProjectSet.c.
  */
 SetExprState *
-ExecInitTableFunctionResult(Expr *expr,
-                            ExprContext *econtext, PlanState *parent)
+ExecInitFunctionResultSet(Expr *expr,
+                          ExprContext *econtext, PlanState *parent)
 {
     SetExprState *state = makeNode(SetExprState);

@@ -62,402 +65,54 @@ ExecInitTableFunctionResult(Expr *expr,
     state->expr = expr;
     state->func.fn_oid = InvalidOid;

-    /*
-     * Normally the passed expression tree will be a FuncExpr, since the
-     * grammar only allows a function call at the top level of a table
-     * function reference.  However, if the function doesn't return set then
-     * the planner might have replaced the function call via constant-folding
-     * or inlining.  So if we see any other kind of expression node, execute
-     * it via the general ExecEvalExpr() code.  That code path will not
-     * support set-returning functions buried in the expression, though.
-     */
     if (IsA(expr, FuncExpr))
     {
+        /*
+         * For a FunctionScan or ProjectSet, the passed expression tree can be a
+         * FuncExpr, since the grammar only allows a function call at the top
+         * level of a table function reference.
+         */
         FuncExpr   *func = (FuncExpr *) expr;

         state->funcReturnsSet = func->funcretset;
         state->args = ExecInitExprList(func->args, parent);
-
         init_sexpr(func->funcid, func->inputcollid, expr, state, parent,
-                   econtext->ecxt_per_query_memory, func->funcretset, false);
+                   econtext->ecxt_per_query_memory, func->funcretset, true);
     }
-    else
-    {
-        state->elidedFuncState = ExecInitExpr(expr, parent);
-    }
-
-    return state;
-}
-
-/*
- *        ExecMakeTableFunctionResult
- *
- * Evaluate a table function, producing a materialized result in a Tuplestore
- * object.
- *
- * This is used by nodeFunctionscan.c.
- */
-Tuplestorestate *
-ExecMakeTableFunctionResult(SetExprState *setexpr,
-                            ExprContext *econtext,
-                            MemoryContext argContext,
-                            TupleDesc expectedDesc,
-                            bool randomAccess)
-{
-    Tuplestorestate *tupstore = NULL;
-    TupleDesc    tupdesc = NULL;
-    Oid            funcrettype;
-    bool        returnsTuple;
-    bool        returnsSet = false;
-    FunctionCallInfo fcinfo;
-    PgStat_FunctionCallUsage fcusage;
-    ReturnSetInfo rsinfo;
-    HeapTupleData tmptup;
-    MemoryContext callerContext;
-    MemoryContext oldcontext;
-    bool        first_time = true;
-
-    callerContext = CurrentMemoryContext;
-
-    funcrettype = exprType((Node *) setexpr->expr);
-
-    returnsTuple = type_is_rowtype(funcrettype);
-
-    /*
-     * Prepare a resultinfo node for communication.  We always do this even if
-     * not expecting a set result, so that we can pass expectedDesc.  In the
-     * generic-expression case, the expression doesn't actually get to see the
-     * resultinfo, but set it up anyway because we use some of the fields as
-     * our own state variables.
-     */
-    rsinfo.type = T_ReturnSetInfo;
-    rsinfo.econtext = econtext;
-    rsinfo.expectedDesc = expectedDesc;
-    rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize | SFRM_Materialize_Preferred);
-    if (randomAccess)
-        rsinfo.allowedModes |= (int) SFRM_Materialize_Random;
-    rsinfo.returnMode = SFRM_ValuePerCall;
-    /* isDone is filled below */
-    rsinfo.setResult = NULL;
-    rsinfo.setDesc = NULL;
-
-    fcinfo = palloc(SizeForFunctionCallInfo(list_length(setexpr->args)));
-
-    /*
-     * Normally the passed expression tree will be a SetExprState, since the
-     * grammar only allows a function call at the top level of a table
-     * function reference.  However, if the function doesn't return set then
-     * the planner might have replaced the function call via constant-folding
-     * or inlining.  So if we see any other kind of expression node, execute
-     * it via the general ExecEvalExpr() code; the only difference is that we
-     * don't get a chance to pass a special ReturnSetInfo to any functions
-     * buried in the expression.
-     */
-    if (!setexpr->elidedFuncState)
+    else if (IsA(expr, OpExpr))
     {
         /*
-         * This path is similar to ExecMakeFunctionResultSet.
-         */
-        returnsSet = setexpr->funcReturnsSet;
-        InitFunctionCallInfoData(*fcinfo, &(setexpr->func),
-                                 list_length(setexpr->args),
-                                 setexpr->fcinfo->fncollation,
-                                 NULL, (Node *) &rsinfo);
-
-        /*
-         * Evaluate the function's argument list.
-         *
-         * We can't do this in the per-tuple context: the argument values
-         * would disappear when we reset that context in the inner loop.  And
-         * the caller's CurrentMemoryContext is typically a query-lifespan
-         * context, so we don't want to leak memory there.  We require the
-         * caller to pass a separate memory context that can be used for this,
-         * and can be reset each time through to avoid bloat.
-         */
-        MemoryContextReset(argContext);
-        oldcontext = MemoryContextSwitchTo(argContext);
-        ExecEvalFuncArgs(fcinfo, setexpr->args, econtext);
-        MemoryContextSwitchTo(oldcontext);
-
-        /*
-         * If function is strict, and there are any NULL arguments, skip
-         * calling the function and act like it returned NULL (or an empty
-         * set, in the returns-set case).
+         * For ProjectSet, the expression node could be an OpExpr.
          */
-        if (setexpr->func.fn_strict)
-        {
-            int            i;
+        OpExpr       *op = (OpExpr *) expr;

-            for (i = 0; i < fcinfo->nargs; i++)
-            {
-                if (fcinfo->args[i].isnull)
-                    goto no_function_result;
-            }
-        }
+        state->funcReturnsSet = op->opretset;
+        state->args = ExecInitExprList(op->args, parent);
+        init_sexpr(op->opfuncid, op->inputcollid, expr, state, parent,
+                   econtext->ecxt_per_query_memory, op->opretset, true);
     }
     else
     {
-        /* Treat setexpr as a generic expression */
-        InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, NULL, NULL);
-    }
-
-    /*
-     * Switch to short-lived context for calling the function or expression.
-     */
-    MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-
-    /*
-     * Loop to handle the ValuePerCall protocol (which is also the same
-     * behavior needed in the generic ExecEvalExpr path).
-     */
-    for (;;)
-    {
-        Datum        result;
-
-        CHECK_FOR_INTERRUPTS();
-
         /*
-         * reset per-tuple memory context before each call of the function or
-         * expression. This cleans up any local memory the function may leak
-         * when called.
+         * However, again for FunctionScan, if the function doesn't return set
+         * then the planner might have replaced the function call via constant-
+         * folding or inlining.  So if we see any other kind of expression node,
+         * execute it via the general ExecEvalExpr() code.  That code path will
+         * not support set-returning functions buried in the expression, though.
          */
-        ResetExprContext(econtext);
-
-        /* Call the function or expression one time */
-        if (!setexpr->elidedFuncState)
-        {
-            pgstat_init_function_usage(fcinfo, &fcusage);
-
-            fcinfo->isnull = false;
-            rsinfo.isDone = ExprSingleResult;
-            result = FunctionCallInvoke(fcinfo);
-
-            pgstat_end_function_usage(&fcusage,
-                                      rsinfo.isDone != ExprMultipleResult);
-        }
-        else
-        {
-            result =
-                ExecEvalExpr(setexpr->elidedFuncState, econtext, &fcinfo->isnull);
-            rsinfo.isDone = ExprSingleResult;
-        }
-
-        /* Which protocol does function want to use? */
-        if (rsinfo.returnMode == SFRM_ValuePerCall)
-        {
-            /*
-             * Check for end of result set.
-             */
-            if (rsinfo.isDone == ExprEndResult)
-                break;
-
-            /*
-             * If first time through, build tuplestore for result.  For a
-             * scalar function result type, also make a suitable tupdesc.
-             */
-            if (first_time)
-            {
-                oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-                tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-                rsinfo.setResult = tupstore;
-                if (!returnsTuple)
-                {
-                    tupdesc = CreateTemplateTupleDesc(1);
-                    TupleDescInitEntry(tupdesc,
-                                       (AttrNumber) 1,
-                                       "column",
-                                       funcrettype,
-                                       -1,
-                                       0);
-                    rsinfo.setDesc = tupdesc;
-                }
-                MemoryContextSwitchTo(oldcontext);
-            }
-
-            /*
-             * Store current resultset item.
-             */
-            if (returnsTuple)
-            {
-                if (!fcinfo->isnull)
-                {
-                    HeapTupleHeader td = DatumGetHeapTupleHeader(result);
-
-                    if (tupdesc == NULL)
-                    {
-                        /*
-                         * This is the first non-NULL result from the
-                         * function.  Use the type info embedded in the
-                         * rowtype Datum to look up the needed tupdesc.  Make
-                         * a copy for the query.
-                         */
-                        oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-                        tupdesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td),
-                                                              HeapTupleHeaderGetTypMod(td));
-                        rsinfo.setDesc = tupdesc;
-                        MemoryContextSwitchTo(oldcontext);
-                    }
-                    else
-                    {
-                        /*
-                         * Verify all later returned rows have same subtype;
-                         * necessary in case the type is RECORD.
-                         */
-                        if (HeapTupleHeaderGetTypeId(td) != tupdesc->tdtypeid ||
-                            HeapTupleHeaderGetTypMod(td) != tupdesc->tdtypmod)
-                            ereport(ERROR,
-                                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                                     errmsg("rows returned by function are not all of the same row type")));
-                    }
-
-                    /*
-                     * tuplestore_puttuple needs a HeapTuple not a bare
-                     * HeapTupleHeader, but it doesn't need all the fields.
-                     */
-                    tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
-                    tmptup.t_data = td;
-
-                    tuplestore_puttuple(tupstore, &tmptup);
-                }
-                else
-                {
-                    /*
-                     * NULL result from a tuple-returning function; expand it
-                     * to a row of all nulls.  We rely on the expectedDesc to
-                     * form such rows.  (Note: this would be problematic if
-                     * tuplestore_putvalues saved the tdtypeid/tdtypmod from
-                     * the provided descriptor, since that might not match
-                     * what we get from the function itself.  But it doesn't.)
-                     */
-                    int            natts = expectedDesc->natts;
-                    bool       *nullflags;
-
-                    nullflags = (bool *) palloc(natts * sizeof(bool));
-                    memset(nullflags, true, natts * sizeof(bool));
-                    tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
-                }
-            }
-            else
-            {
-                /* Scalar-type case: just store the function result */
-                tuplestore_putvalues(tupstore, tupdesc, &result, &fcinfo->isnull);
-            }
-
-            /*
-             * Are we done?
-             */
-            if (rsinfo.isDone != ExprMultipleResult)
-                break;
-        }
-        else if (rsinfo.returnMode == SFRM_Materialize)
-        {
-            /* check we're on the same page as the function author */
-            if (!first_time || rsinfo.isDone != ExprSingleResult)
-                ereport(ERROR,
-                        (errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-                         errmsg("table-function protocol for materialize mode was not followed")));
-            /* Done evaluating the set result */
-            break;
-        }
-        else
-            ereport(ERROR,
-                    (errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-                     errmsg("unrecognized table-function returnMode: %d",
-                            (int) rsinfo.returnMode)));
-
-        first_time = false;
-    }
-
-no_function_result:
-
-    /*
-     * If we got nothing from the function (ie, an empty-set or NULL result),
-     * we have to create the tuplestore to return, and if it's a
-     * non-set-returning function then insert a single all-nulls row.  As
-     * above, we depend on the expectedDesc to manufacture the dummy row.
-     */
-    if (rsinfo.setResult == NULL)
-    {
-        MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-        tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-        rsinfo.setResult = tupstore;
-        if (!returnsSet)
-        {
-            int            natts = expectedDesc->natts;
-            bool       *nullflags;
-
-            MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-            nullflags = (bool *) palloc(natts * sizeof(bool));
-            memset(nullflags, true, natts * sizeof(bool));
-            tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
-        }
-    }
-
-    /*
-     * If function provided a tupdesc, cross-check it.  We only really need to
-     * do this for functions returning RECORD, but might as well do it always.
-     */
-    if (rsinfo.setDesc)
-    {
-        tupledesc_match(expectedDesc, rsinfo.setDesc);
-
-        /*
-         * If it is a dynamically-allocated TupleDesc, free it: it is
-         * typically allocated in a per-query context, so we must avoid
-         * leaking it across multiple usages.
-         */
-        if (rsinfo.setDesc->tdrefcount == -1)
-            FreeTupleDesc(rsinfo.setDesc);
-    }
-
-    MemoryContextSwitchTo(callerContext);
-
-    /* All done, pass back the tuplestore */
-    return rsinfo.setResult;
-}
-
+        MemoryContext oldcontext;

-/*
- * Prepare targetlist SRF function call for execution.
- *
- * This is used by nodeProjectSet.c.
- */
-SetExprState *
-ExecInitFunctionResultSet(Expr *expr,
-                          ExprContext *econtext, PlanState *parent)
-{
-    SetExprState *state = makeNode(SetExprState);
+        state->elidedFuncState = ExecInitExpr(expr, parent);

-    state->funcReturnsSet = true;
-    state->expr = expr;
-    state->func.fn_oid = InvalidOid;
+        oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);

-    /*
-     * Initialize metadata.  The expression node could be either a FuncExpr or
-     * an OpExpr.
-     */
-    if (IsA(expr, FuncExpr))
-    {
-        FuncExpr   *func = (FuncExpr *) expr;
+        /* By performing InitFunctionCallInfoData here, we avoid palloc0() */
+        state->fcinfo = palloc(SizeForFunctionCallInfo(list_length(state->args)));

-        state->args = ExecInitExprList(func->args, parent);
-        init_sexpr(func->funcid, func->inputcollid, expr, state, parent,
-                   econtext->ecxt_per_query_memory, true, true);
-    }
-    else if (IsA(expr, OpExpr))
-    {
-        OpExpr       *op = (OpExpr *) expr;
+        MemoryContextSwitchTo(oldcontext);

-        state->args = ExecInitExprList(op->args, parent);
-        init_sexpr(op->opfuncid, op->inputcollid, expr, state, parent,
-                   econtext->ecxt_per_query_memory, true, true);
+        InitFunctionCallInfoData(*state->fcinfo, NULL, 0, InvalidOid, NULL, NULL);
     }
-    else
-        elog(ERROR, "unrecognized node type: %d",
-             (int) nodeTag(expr));
-
-    /* shouldn't get here unless the selected function returns set */
-    Assert(state->func.fn_retset);

     return state;
 }
@@ -473,7 +128,7 @@ ExecInitFunctionResultSet(Expr *expr,
  * needs to live until all rows have been returned (i.e. *isDone set to
  * ExprEndResult or ExprSingleResult).
  *
- * This is used by nodeProjectSet.c.
+ * This is used by nodeProjectSet.c and nodeFunctionscan.c.
  */
 Datum
 ExecMakeFunctionResultSet(SetExprState *fcache,
@@ -486,7 +141,7 @@ ExecMakeFunctionResultSet(SetExprState *fcache,
     Datum        result;
     FunctionCallInfo fcinfo;
     PgStat_FunctionCallUsage fcusage;
-    ReturnSetInfo rsinfo;
+    ReturnSetInfo *rsinfo;
     bool        callit;
     int            i;

@@ -540,6 +195,28 @@ restart:
     }

     /*
+     * Prepare a resultinfo node for communication.  We always do this even if
+     * not expecting a set result, so that we can pass expectedDesc.  In the
+     * generic-expression case, the expression doesn't actually get to see the
+     * resultinfo, but set it up anyway because we use some of the fields as
+     * our own state variables.
+     */
+    fcinfo = fcache->fcinfo;
+    rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+    if (rsinfo == NULL)
+    {
+        MemoryContext oldContext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+
+        rsinfo = makeNode (ReturnSetInfo);
+        rsinfo->econtext = econtext;
+        rsinfo->expectedDesc = fcache->funcResultDesc;
+        fcinfo->resultinfo = (Node *) rsinfo;
+
+        MemoryContextSwitchTo(oldContext);
+    }
+
+    /*
      * arguments is a list of expressions to evaluate before passing to the
      * function manager.  We skip the evaluation if it was already done in the
      * previous call (ie, we are continuing the evaluation of a set-valued
@@ -549,7 +226,6 @@ restart:
      * rows from this SRF have been returned, otherwise ValuePerCall SRFs
      * would reference freed memory after the first returned row.
      */
-    fcinfo = fcache->fcinfo;
     arguments = fcache->args;
     if (!fcache->setArgsValid)
     {
@@ -557,6 +233,14 @@ restart:

         ExecEvalFuncArgs(fcinfo, arguments, econtext);
         MemoryContextSwitchTo(oldContext);
+
+        /* Reset the rsinfo structure */
+        rsinfo->allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
+        /* note we do not set SFRM_Materialize_Random or _Preferred */
+        rsinfo->returnMode = SFRM_ValuePerCall;
+        /* isDone is filled below */
+        rsinfo->setResult = NULL;
+        rsinfo->setDesc = NULL;
     }
     else
     {
@@ -568,18 +252,6 @@ restart:
      * Now call the function, passing the evaluated parameter values.
      */

-    /* Prepare a resultinfo node for communication. */
-    fcinfo->resultinfo = (Node *) &rsinfo;
-    rsinfo.type = T_ReturnSetInfo;
-    rsinfo.econtext = econtext;
-    rsinfo.expectedDesc = fcache->funcResultDesc;
-    rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
-    /* note we do not set SFRM_Materialize_Random or _Preferred */
-    rsinfo.returnMode = SFRM_ValuePerCall;
-    /* isDone is filled below */
-    rsinfo.setResult = NULL;
-    rsinfo.setDesc = NULL;
-
     /*
      * If function is strict, and there are any NULL arguments, skip calling
      * the function.
@@ -599,16 +271,25 @@ restart:

     if (callit)
     {
-        pgstat_init_function_usage(fcinfo, &fcusage);
+        if (!fcache->elidedFuncState)
+        {
+            pgstat_init_function_usage(fcinfo, &fcusage);

-        fcinfo->isnull = false;
-        rsinfo.isDone = ExprSingleResult;
-        result = FunctionCallInvoke(fcinfo);
-        *isNull = fcinfo->isnull;
-        *isDone = rsinfo.isDone;
+            fcinfo->isnull = false;
+            rsinfo->isDone = ExprSingleResult;
+            result = FunctionCallInvoke(fcinfo);
+            *isNull = fcinfo->isnull;
+            *isDone = rsinfo->isDone;

-        pgstat_end_function_usage(&fcusage,
-                                  rsinfo.isDone != ExprMultipleResult);
+            pgstat_end_function_usage(&fcusage,
+                                      rsinfo->isDone != ExprMultipleResult);
+        }
+        else
+        {
+            result =
+                ExecEvalExpr(fcache->elidedFuncState, econtext, isNull);
+            *isDone = ExprSingleResult;
+        }
     }
     else
     {
@@ -619,11 +300,32 @@ restart:
     }

     /* Which protocol does function want to use? */
-    if (rsinfo.returnMode == SFRM_ValuePerCall)
+    if (rsinfo->returnMode == SFRM_ValuePerCall)
     {
         if (*isDone != ExprEndResult)
         {
             /*
+             * Obtain a suitable tupdesc, when we first encounter a non-NULL result.
+             */
+            if (rsinfo->setDesc == NULL)
+            {
+                if (fcache->funcReturnsTuple && !*isNull)
+                {
+                    HeapTupleHeader td = DatumGetHeapTupleHeader(result);
+
+                    /*
+                     * This is the first non-NULL result from the
+                     * function.  Use the type info embedded in the
+                     * rowtype Datum to look up the needed tupdesc.  Make
+                     * a copy for the query.
+                     */
+                    MemoryContext oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+                    rsinfo->setDesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td),
HeapTupleHeaderGetTypMod(td));
+                    MemoryContextSwitchTo(oldcontext);
+                }
+            }
+
+            /*
              * Save the current argument values to re-use on the next call.
              */
             if (*isDone == ExprMultipleResult)
@@ -640,21 +342,34 @@ restart:
             }
         }
     }
-    else if (rsinfo.returnMode == SFRM_Materialize)
+    else if (rsinfo->returnMode == SFRM_Materialize)
     {
         /* check we're on the same page as the function author */
-        if (rsinfo.isDone != ExprSingleResult)
+        if (rsinfo->isDone != ExprSingleResult)
             ereport(ERROR,
                     (errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
                      errmsg("table-function protocol for materialize mode was not followed")));
-        if (rsinfo.setResult != NULL)
+        if (rsinfo->setResult != NULL)
         {
             /* prepare to return values from the tuplestore */
             ExecPrepareTuplestoreResult(fcache, econtext,
-                                        rsinfo.setResult,
-                                        rsinfo.setDesc);
-            /* loop back to top to start returning from tuplestore */
-            goto restart;
+                                        rsinfo->setResult,
+                                        rsinfo->setDesc);
+
+            /*
+             * If we are being invoked by a Materialize node, attempt
+             * to donate the returned tuplstore to it.
+             */
+            if (ExecSRFDonateResultTuplestore(fcache))
+            {
+                *isDone = ExprMultipleResult;
+                return 0;
+            }
+            else
+            {
+                /* loop back to top to start returning from tuplestore */
+                goto restart;
+            }
         }
         /* if setResult was left null, treat it as empty set */
         *isDone = ExprEndResult;
@@ -665,7 +380,7 @@ restart:
         ereport(ERROR,
                 (errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
                  errmsg("unrecognized table-function returnMode: %d",
-                        (int) rsinfo.returnMode)));
+                        (int) rsinfo->returnMode)));

     return result;
 }
@@ -712,6 +427,7 @@ init_sexpr(Oid foid, Oid input_collation, Expr *node,
     InitFunctionCallInfoData(*sexpr->fcinfo, &(sexpr->func),
                              numargs,
                              input_collation, NULL, NULL);
+    sexpr->fcinfo->resultinfo = NULL;

     /* If function returns set, check if that's allowed by caller */
     if (sexpr->func.fn_retset && !allowSRF)
@@ -782,6 +498,7 @@ init_sexpr(Oid foid, Oid input_collation, Expr *node,
     sexpr->funcResultStore = NULL;
     sexpr->funcResultSlot = NULL;
     sexpr->shutdown_reg = false;
+    sexpr->funcResultStoreDonationEnabled = false;
 }

 /*
@@ -792,6 +509,7 @@ static void
 ShutdownSetExpr(Datum arg)
 {
     SetExprState *sexpr = castNode(SetExprState, DatumGetPointer(arg));
+    ReturnSetInfo *rsinfo = castNode(ReturnSetInfo, sexpr->fcinfo->resultinfo);

     /* If we have a slot, make sure it's let go of any tuplestore pointer */
     if (sexpr->funcResultSlot)
@@ -802,6 +520,13 @@ ShutdownSetExpr(Datum arg)
         tuplestore_end(sexpr->funcResultStore);
     sexpr->funcResultStore = NULL;

+    /* Release the ReturnSetInfo structure */
+    if (rsinfo != NULL)
+    {
+        pfree(rsinfo);
+        sexpr->fcinfo->resultinfo = NULL;
+    }
+
     /* Clear any active set-argument state */
     sexpr->setArgsValid = false;

@@ -910,53 +635,3 @@ ExecPrepareTuplestoreResult(SetExprState *sexpr,
         sexpr->shutdown_reg = true;
     }
 }
-
-/*
- * Check that function result tuple type (src_tupdesc) matches or can
- * be considered to match what the query expects (dst_tupdesc). If
- * they don't match, ereport.
- *
- * We really only care about number of attributes and data type.
- * Also, we can ignore type mismatch on columns that are dropped in the
- * destination type, so long as the physical storage matches.  This is
- * helpful in some cases involving out-of-date cached plans.
- */
-static void
-tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc)
-{
-    int            i;
-
-    if (dst_tupdesc->natts != src_tupdesc->natts)
-        ereport(ERROR,
-                (errcode(ERRCODE_DATATYPE_MISMATCH),
-                 errmsg("function return row and query-specified return row do not match"),
-                 errdetail_plural("Returned row contains %d attribute, but query expects %d.",
-                                  "Returned row contains %d attributes, but query expects %d.",
-                                  src_tupdesc->natts,
-                                  src_tupdesc->natts, dst_tupdesc->natts)));
-
-    for (i = 0; i < dst_tupdesc->natts; i++)
-    {
-        Form_pg_attribute dattr = TupleDescAttr(dst_tupdesc, i);
-        Form_pg_attribute sattr = TupleDescAttr(src_tupdesc, i);
-
-        if (IsBinaryCoercible(sattr->atttypid, dattr->atttypid))
-            continue;            /* no worries */
-        if (!dattr->attisdropped)
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("function return row and query-specified return row do not match"),
-                     errdetail("Returned type %s at ordinal position %d, but query expects %s.",
-                               format_type_be(sattr->atttypid),
-                               i + 1,
-                               format_type_be(dattr->atttypid))));
-
-        if (dattr->attlen != sattr->attlen ||
-            dattr->attalign != sattr->attalign)
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("function return row and query-specified return row do not match"),
-                     errdetail("Physical storage mismatch on dropped attribute at ordinal position %d.",
-                               i + 1)));
-    }
-}
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index ccb66ce..7499705 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -1,7 +1,23 @@
 /*-------------------------------------------------------------------------
  *
  * nodeFunctionscan.c
- *      Support routines for scanning RangeFunctions (functions in rangetable).
+ *      Coordinates a scan over PL functions. It supports several use cases:
+ *
+ *      - single function scan, and multiple functions in ROWS FROM;
+ *      - SRFs and regular functions;
+ *      - tuple- and scalar-returning functions;
+ *      - it will materialise if eflags call for it;
+ *      - if possible, it will pipeline its output;
+ *      - it avoids double-materialisation in case of SFRM_Materialize.
+ *
+ *    To achieve these, it depends upon the Materialize (for materialisation
+ *    and pipelining) and SRFScan (for SRF handling, and tuple expansion,
+ *    and double-materialisation avoidance) nodes, and the actual function
+ *    invocation (for SRF- and regular functions alike) is done in execSRF.c.
+ *
+ *    The Planner knows nothing of the Materialize and SRFScan structures.
+ *    They are constructed by the Executor at execution time, and are reported
+ *    in the EXPLAIN output.
  *
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -24,26 +40,15 @@

 #include "catalog/pg_type.h"
 #include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
+#include "executor/nodeMaterial.h"
 #include "funcapi.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/makefuncs.h"
+#include "parser/parse_type.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
-
-
-/*
- * Runtime data for each function being scanned.
- */
-typedef struct FunctionScanPerFuncState
-{
-    SetExprState *setexpr;        /* state of the expression being evaluated */
-    TupleDesc    tupdesc;        /* desc of the function result type */
-    int            colcount;        /* expected number of result columns */
-    Tuplestorestate *tstore;    /* holds the function result set */
-    int64        rowcount;        /* # of rows in result set, -1 if not known */
-    TupleTableSlot *func_slot;    /* function result slot (or NULL) */
-} FunctionScanPerFuncState;
-
-static TupleTableSlot *FunctionNext(FunctionScanState *node);
+#include "utils/syscache.h"


 /* ----------------------------------------------------------------
@@ -82,37 +87,22 @@ FunctionNext(FunctionScanState *node)
          * into the scan result slot. No need to update ordinality or
          * rowcounts either.
          */
-        Tuplestorestate *tstore = node->funcstates[0].tstore;
+        TupleTableSlot *rs = node->funcstates[0].scanstate->ps.ps_ResultTupleSlot;

         /*
-         * If first time through, read all tuples from function and put them
-         * in a tuplestore. Subsequent calls just fetch tuples from
-         * tuplestore.
+         * Get the next tuple from the Scan node.
+         *
+         * If we have a rowcount for the function, and we know the previous
+         * read position was out of bounds, don't try the read. This allows
+         * backward scan to work when there are mixed row counts present.
          */
-        if (tstore == NULL)
-        {
-            node->funcstates[0].tstore = tstore =
-                ExecMakeTableFunctionResult(node->funcstates[0].setexpr,
-                                            node->ss.ps.ps_ExprContext,
-                                            node->argcontext,
-                                            node->funcstates[0].tupdesc,
-                                            node->eflags & EXEC_FLAG_BACKWARD);
+        rs = ExecProcNode(&node->funcstates[0].scanstate->ps);

-            /*
-             * paranoia - cope if the function, which may have constructed the
-             * tuplestore itself, didn't leave it pointing at the start. This
-             * call is fast, so the overhead shouldn't be an issue.
-             */
-            tuplestore_rescan(tstore);
-        }
+        if (TupIsNull(rs))
+            return NULL;
+
+        ExecCopySlot(scanslot, rs);

-        /*
-         * Get the next tuple from tuplestore.
-         */
-        (void) tuplestore_gettupleslot(tstore,
-                                       ScanDirectionIsForward(direction),
-                                       false,
-                                       scanslot);
         return scanslot;
     }

@@ -141,46 +131,22 @@ FunctionNext(FunctionScanState *node)
     for (funcno = 0; funcno < node->nfuncs; funcno++)
     {
         FunctionScanPerFuncState *fs = &node->funcstates[funcno];
+        TupleTableSlot *func_slot = fs->scanstate->ps.ps_ResultTupleSlot;
         int            i;

         /*
-         * If first time through, read all tuples from function and put them
-         * in a tuplestore. Subsequent calls just fetch tuples from
-         * tuplestore.
-         */
-        if (fs->tstore == NULL)
-        {
-            fs->tstore =
-                ExecMakeTableFunctionResult(fs->setexpr,
-                                            node->ss.ps.ps_ExprContext,
-                                            node->argcontext,
-                                            fs->tupdesc,
-                                            node->eflags & EXEC_FLAG_BACKWARD);
-
-            /*
-             * paranoia - cope if the function, which may have constructed the
-             * tuplestore itself, didn't leave it pointing at the start. This
-             * call is fast, so the overhead shouldn't be an issue.
-             */
-            tuplestore_rescan(fs->tstore);
-        }
-
-        /*
-         * Get the next tuple from tuplestore.
+         * Get the next tuple from the Scan node.
          *
          * If we have a rowcount for the function, and we know the previous
          * read position was out of bounds, don't try the read. This allows
          * backward scan to work when there are mixed row counts present.
          */
         if (fs->rowcount != -1 && fs->rowcount < oldpos)
-            ExecClearTuple(fs->func_slot);
+            ExecClearTuple(func_slot);
         else
-            (void) tuplestore_gettupleslot(fs->tstore,
-                                           ScanDirectionIsForward(direction),
-                                           false,
-                                           fs->func_slot);
+            func_slot = ExecProcNode(&fs->scanstate->ps);

-        if (TupIsNull(fs->func_slot))
+        if (TupIsNull(func_slot))
         {
             /*
              * If we ran out of data for this function in the forward
@@ -207,12 +173,12 @@ FunctionNext(FunctionScanState *node)
             /*
              * we have a result, so just copy it to the result cols.
              */
-            slot_getallattrs(fs->func_slot);
+            slot_getallattrs(func_slot);

             for (i = 0; i < fs->colcount; i++)
             {
-                scanslot->tts_values[att] = fs->func_slot->tts_values[i];
-                scanslot->tts_isnull[att] = fs->func_slot->tts_isnull[i];
+                scanslot->tts_values[att] = func_slot->tts_values[i];
+                scanslot->tts_isnull[att] = func_slot->tts_isnull[i];
                 att++;
             }

@@ -272,6 +238,53 @@ ExecFunctionScan(PlanState *pstate)
                     (ExecScanRecheckMtd) FunctionRecheck);
 }

+/*
+ * Helper function to build target list, which is required in order for
+ * normal processing of ExecInit, from the tupdesc.
+ */
+static void
+build_tlist_for_tupdesc(TupleDesc tupdesc, int colcount,
+                        List **mat_tlist, List **scan_tlist)
+{
+    Form_pg_attribute attr;
+    int attno;
+
+    for (attno = 1; attno <= colcount; attno++)
+    {
+        attr = TupleDescAttr(tupdesc, attno - 1);
+
+        if (attr->attisdropped)
+        {
+            *scan_tlist = lappend(*scan_tlist,
+                              makeTargetEntry((Expr *)
+                                  makeConst(INT2OID, -1,
+                                            0,
+                                            attr->attlen,
+                                            0 /* value */, true /* isnull */,
+                                            true),
+                                  attno, attr->attname.data,
+                                  attr->attisdropped));
+            *mat_tlist = lappend(*mat_tlist,
+                             makeTargetEntry((Expr *)
+                                 makeVar(1 /* varno */, attno, INT2OID, -1, 0, 0),
+                                 attno, attr->attname.data, attr->attisdropped));
+        }
+        else
+        {
+            *scan_tlist = lappend(*scan_tlist,
+                              makeTargetEntry((Expr *)
+                                  makeVar(1 /* varno */, attno, attr->atttypid,
+                                          attr->atttypmod, attr->attcollation, 0),
+                                  attno, attr->attname.data, attr->attisdropped));
+            *mat_tlist = lappend(*mat_tlist,
+                             makeTargetEntry((Expr *)
+                                 makeVar(1 /* varno */, attno, attr->atttypid,
+                                         attr->atttypmod, attr->attcollation, 0),
+                                 attno, attr->attname.data, attr->attisdropped));
+        }
+    }
+}
+
 /* ----------------------------------------------------------------
  *        ExecInitFunctionScan
  * ----------------------------------------------------------------
@@ -285,6 +298,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
     int            i,
                 natts;
     ListCell   *lc;
+    bool         needs_material;

     /* check for unsupported flags */
     Assert(!(eflags & EXEC_FLAG_MARK));
@@ -315,6 +329,9 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
     else
         scanstate->simple = false;

+    /* Only add a Mterialize node if required */
+    needs_material = eflags & (EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD);
+
     /*
      * Ordinal 0 represents the "before the first row" position.
      *
@@ -347,23 +364,17 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
         TypeFuncClass functypclass;
         Oid            funcrettype;
         TupleDesc    tupdesc;
+        SRFScanPlan *srfscan;
+        Plan *scan;
+        List /* TargetEntry* */ *mat_tlist = NIL;
+        List /* TargetEntry* */ *scan_tlist = NIL;
+        bool funcReturnsTuple;

-        fs->setexpr =
-            ExecInitTableFunctionResult((Expr *) funcexpr,
-                                        scanstate->ss.ps.ps_ExprContext,
-                                        &scanstate->ss.ps);
-
-        /*
-         * Don't allocate the tuplestores; the actual calls to the functions
-         * do that.  NULL means that we have not called the function yet (or
-         * need to call it again after a rescan).
-         */
-        fs->tstore = NULL;
         fs->rowcount = -1;

         /*
          * Now determine if the function returns a simple or composite type,
-         * and build an appropriate tupdesc.  Note that in the composite case,
+         * and build an appropriate targetlist.  Note that in the composite case,
          * the function may now return more columns than it did when the plan
          * was made; we have to ignore any columns beyond "colcount".
          */
@@ -379,6 +390,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
             Assert(tupdesc->natts >= colcount);
             /* Must copy it out of typcache for safety */
             tupdesc = CreateTupleDescCopy(tupdesc);
+            funcReturnsTuple = true;
         }
         else if (functypclass == TYPEFUNC_SCALAR)
         {
@@ -393,6 +405,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
             TupleDescInitEntryCollation(tupdesc,
                                         (AttrNumber) 1,
                                         exprCollation(funcexpr));
+            funcReturnsTuple = false;
         }
         else if (functypclass == TYPEFUNC_RECORD)
         {
@@ -407,6 +420,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
              * case it doesn't.)
              */
             BlessTupleDesc(tupdesc);
+            funcReturnsTuple = true;
         }
         else
         {
@@ -414,21 +428,45 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
             elog(ERROR, "function in FROM has unsupported return type");
         }

-        fs->tupdesc = tupdesc;
         fs->colcount = colcount;

-        /*
-         * We only need separate slots for the function results if we are
-         * doing ordinality or multiple functions; otherwise, we'll fetch
-         * function results directly into the scan slot.
-         */
-        if (!scanstate->simple)
+        /* Expand tupdesc into targetlists for the scan nodes */
+        build_tlist_for_tupdesc(tupdesc, colcount, &mat_tlist, &scan_tlist);
+
+        srfscan = makeNode(SRFScanPlan);
+        srfscan->funcexpr = funcexpr;
+        srfscan->rtfunc = (Node *) rtfunc;
+        srfscan->plan.targetlist = scan_tlist;
+        srfscan->plan.extParam = rtfunc->funcparams;
+        srfscan->plan.allParam = rtfunc->funcparams;
+        srfscan->funcResultDesc = tupdesc;
+        srfscan->funcReturnsTuple = funcReturnsTuple;
+        scan = &srfscan->plan;
+
+        if (needs_material)
         {
-            fs->func_slot = ExecInitExtraTupleSlot(estate, fs->tupdesc,
-                                                   &TTSOpsMinimalTuple);
+            Material *fscan = makeNode(Material);
+            fscan->plan.lefttree = scan;
+            fscan->plan.targetlist = mat_tlist;
+            fscan->plan.extParam = rtfunc->funcparams;
+            fscan->plan.allParam = rtfunc->funcparams;
+            scan = &fscan->plan;
+        }
+
+        fs->scanstate = (ScanState *) ExecInitNode (scan, estate, eflags);
+
+        if (needs_material)
+        {
+            /*
+             * Tell the SRFScan about its parent, so that it can donate
+             * the SRF's tuplestore if the SRF uses SFRM_Materialize.
+             */
+            MaterialState *ms = (MaterialState *) fs->scanstate;
+            SRFScanState *sss = (SRFScanState *) outerPlanState(ms);
+
+            sss->setexpr->funcResultStoreDonationEnabled = true;
+            sss->setexpr->funcResultStoreDonationTarget = &ms->ss.ps;
         }
-        else
-            fs->func_slot = NULL;

         natts += colcount;
         i++;
@@ -443,7 +481,11 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
      */
     if (scanstate->simple)
     {
-        scan_tupdesc = CreateTupleDescCopy(scanstate->funcstates[0].tupdesc);
+        SRFScanState *sss = IsA(scanstate->funcstates[0].scanstate, MaterialState) ?
+                (SRFScanState *) outerPlanState((MaterialState *) scanstate->funcstates[0].scanstate) :
+                (SRFScanState *) scanstate->funcstates[0].scanstate;
+
+        scan_tupdesc = CreateTupleDescCopy(sss->setexpr->funcResultDesc);
         scan_tupdesc->tdtypeid = RECORDOID;
         scan_tupdesc->tdtypmod = -1;
     }
@@ -458,8 +500,12 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)

         for (i = 0; i < nfuncs; i++)
         {
-            TupleDesc    tupdesc = scanstate->funcstates[i].tupdesc;
-            int            colcount = scanstate->funcstates[i].colcount;
+            SRFScanState *sss = IsA(scanstate->funcstates[i].scanstate, MaterialState) ?
+                    (SRFScanState *) outerPlanState((MaterialState *) scanstate->funcstates[i].scanstate) :
+                    (SRFScanState *) scanstate->funcstates[i].scanstate;
+
+            TupleDesc    tupdesc = sss->setexpr->funcResultDesc;
+            int            colcount = sss->colcount;
             int            j;

             for (j = 1; j <= colcount; j++)
@@ -536,20 +582,11 @@ ExecEndFunctionScan(FunctionScanState *node)
     ExecClearTuple(node->ss.ss_ScanTupleSlot);

     /*
-     * Release slots and tuplestore resources
+     * Release the Material scan resources
      */
     for (i = 0; i < node->nfuncs; i++)
     {
-        FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-        if (fs->func_slot)
-            ExecClearTuple(fs->func_slot);
-
-        if (fs->tstore != NULL)
-        {
-            tuplestore_end(node->funcstates[i].tstore);
-            fs->tstore = NULL;
-        }
+        ExecEndNode(&node->funcstates[i].scanstate->ps);
     }
 }

@@ -568,23 +605,12 @@ ExecReScanFunctionScan(FunctionScanState *node)

     if (node->ss.ps.ps_ResultTupleSlot)
         ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-    for (i = 0; i < node->nfuncs; i++)
-    {
-        FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-        if (fs->func_slot)
-            ExecClearTuple(fs->func_slot);
-    }

     ExecScanReScan(&node->ss);

     /*
-     * Here we have a choice whether to drop the tuplestores (and recompute
-     * the function outputs) or just rescan them.  We must recompute if an
-     * expression contains changed parameters, else we rescan.
-     *
-     * XXX maybe we should recompute if the function is volatile?  But in
-     * general the executor doesn't conditionalize its actions on that.
+     * We must recompute if an
+     * expression contains changed parameters.
      */
     if (chgparam)
     {
@@ -597,11 +623,9 @@ ExecReScanFunctionScan(FunctionScanState *node)

             if (bms_overlap(chgparam, rtfunc->funcparams))
             {
-                if (node->funcstates[i].tstore != NULL)
-                {
-                    tuplestore_end(node->funcstates[i].tstore);
-                    node->funcstates[i].tstore = NULL;
-                }
+                UpdateChangedParamSet(&node->funcstates[i].scanstate->ps,
+                                      node->ss.ps.chgParam);
+
                 node->funcstates[i].rowcount = -1;
             }
             i++;
@@ -611,10 +635,9 @@ ExecReScanFunctionScan(FunctionScanState *node)
     /* Reset ordinality counter */
     node->ordinal = 0;

-    /* Make sure we rewind any remaining tuplestores */
+    /* Rescan them all */
     for (i = 0; i < node->nfuncs; i++)
     {
-        if (node->funcstates[i].tstore != NULL)
-            tuplestore_rescan(node->funcstates[i].tstore);
+        ExecReScan(&node->funcstates[i].scanstate->ps);
     }
 }
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index dd077f4..2fde0dc 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -45,9 +45,12 @@ ExecMaterial(PlanState *pstate)
     Tuplestorestate *tuplestorestate;
     bool        eof_tuplestore;
     TupleTableSlot *slot;
+    bool         first_time = true;

     CHECK_FOR_INTERRUPTS();

+restart:
+
     /*
      * get state info from node
      */
@@ -126,12 +129,24 @@ ExecMaterial(PlanState *pstate)
         PlanState  *outerNode;
         TupleTableSlot *outerslot;

+        if (!first_time)
+            ereport(ERROR,
+                    (errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+                     errmsg("attempt to scan donated result store failed")));
+
         /*
          * We can only get here with forward==true, so no need to worry about
          * which direction the subplan will go.
          */
         outerNode = outerPlanState(node);
         outerslot = ExecProcNode(outerNode);
+
+        if (node->tuplestore_donated)
+        {
+            first_time = false;
+            goto restart;
+        }
+
         if (TupIsNull(outerslot))
         {
             node->eof_underlying = true;
@@ -196,6 +211,7 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)

     matstate->eof_underlying = false;
     matstate->tuplestorestate = NULL;
+    matstate->tuplestore_donated = false;

     /*
      * Miscellaneous initialization
@@ -346,6 +362,7 @@ ExecReScanMaterial(MaterialState *node)
         {
             tuplestore_end(node->tuplestorestate);
             node->tuplestorestate = NULL;
+            node->tuplestore_donated = false;
             if (outerPlan->chgParam == NULL)
                 ExecReScan(outerPlan);
             node->eof_underlying = false;
@@ -361,8 +378,29 @@ ExecReScanMaterial(MaterialState *node)
          * if chgParam of subnode is not null then plan will be re-scanned by
          * first ExecProcNode.
          */
+        node->tuplestore_donated = false;
         if (outerPlan->chgParam == NULL)
             ExecReScan(outerPlan);
         node->eof_underlying = false;
     }
 }
+
+void
+ExecMaterialReceiveResultStore(MaterialState *node, Tuplestorestate *store)
+{
+    if (!node->tuplestore_donated)
+    {
+        if (node->tuplestorestate)
+        {
+            tuplestore_end(node->tuplestorestate);
+        }
+
+        node->tuplestorestate = store;
+        node->tuplestore_donated = true;
+        node->eof_underlying = true;
+    }
+    else
+        ereport(ERROR,
+                (errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+                 errmsg("Result tuplestore donated more than once")));
+}
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b07c299..48d7db5 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -293,9 +293,16 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
      * such parameters, then there is no point in REWIND support at all in the
      * inner child, because it will always be rescanned with fresh parameter
      * values.
+     *
+     * The exception to this simple rule is a ROWS FROM function scan where it
+     * is possible that only some of the inolved functions are affected by the
+     * parameters. In this case, we blanket request support for REWIND. A more
+     * intelligent approch would request REWIND only for nodes unaffected by
+     * the parameters, but we aren't so intelligent yet.
      */
     outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
-    if (node->nestParams == NIL)
+    if (node->nestParams == NIL ||
+        IsA(innerPlan(node), FunctionScan))
         eflags |= EXEC_FLAG_REWIND;
     else
         eflags &= ~EXEC_FLAG_REWIND;
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index 4a1b060..66a1d30 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -283,6 +283,7 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
             state->elems[off] = (Node *)
                 ExecInitFunctionResultSet(expr, state->ps.ps_ExprContext,
                                           &state->ps);
+            Assert (((SetExprState *) state->elems[off])->funcReturnsSet);
         }
         else
         {
diff --git a/src/backend/executor/nodeSRFScan.c b/src/backend/executor/nodeSRFScan.c
new file mode 100644
index 0000000..35ef67e
--- /dev/null
+++ b/src/backend/executor/nodeSRFScan.c
@@ -0,0 +1,262 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSRFScan.c
+ *      Coordinates a scan over a single SRF function, or a non-SRF as if it
+ *    were an SRF returning a single row.
+ *
+ *    SRFScan expands the function's output if it returns a tuple. If the
+ *    SRF uses SFRM_Materialize, it will donate the returned tuplestore to
+ *    the parent Materialize node, if there is one, to avoid double-
+ *    materialisation.
+ *
+ *    The Planner knows nothing of the SRFScan structure. It is constructed
+ *    by the Executor at execution time, and is reported in the EXPLAIN
+ *    output.
+ *
+ * IDENTIFICATION
+ *      src/backend/executor/nodeSRFScan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "catalog/pg_type.h"
+#include "executor/nodeSRFScan.h"
+#include "executor/nodeMaterial.h"
+#include "funcapi.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/makefuncs.h"
+#include "parser/parse_type.h"
+#include "utils/builtins.h"
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+
+static TupleTableSlot *            /* result tuple from subplan */
+ExecSRF(PlanState *node)
+{
+    SRFScanState *pstate = (SRFScanState *) node;
+    ExprContext *econtext = pstate->ss.ps.ps_ExprContext;
+    TupleTableSlot *resultSlot = pstate->ss.ps.ps_ResultTupleSlot;
+    Datum result;
+    ExprDoneCond *isdone = &pstate->elemdone;
+    bool       isnull;
+    SetExprState *setexpr = pstate->setexpr;
+    FunctionCallInfo fcinfo;
+    ReturnSetInfo *rsinfo;
+
+    /* We only support forward scans. */
+    Assert(ScanDirectionIsForward(pstate->ss.ps.state->es_direction));
+
+    ExecClearTuple(resultSlot);
+
+    /*
+     * Only execute something if we are not already complete...
+     */
+    if (*isdone == ExprEndResult)
+        return NULL;
+
+    /*
+     * Evaluate SRF - possibly continuing previously started output.
+     */
+    result = ExecMakeFunctionResultSet((SetExprState *) setexpr,
+                                        econtext, pstate->argcontext,
+                                        &isnull, isdone);
+
+    if (*isdone == ExprEndResult)
+        return NULL;
+
+    fcinfo = setexpr->fcinfo;
+    rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+    /* Have we donated the result store? */
+    if (setexpr->funcResultStoreDonated)
+        return 0;
+
+    /*
+     * If we obtained a tupdesc, check it is appropriate, but not in
+     * the case of SFRM_Materialize becuase is will have been checked
+     * already.
+     */
+    if (!pstate->tupdesc_checked &&
+        setexpr->funcReturnsTuple &&
+        rsinfo->returnMode != SFRM_Materialize &&
+        rsinfo->setDesc && setexpr->funcResultDesc)
+    {
+        tupledesc_match (setexpr->funcResultDesc, rsinfo->setDesc);
+        pstate->tupdesc_checked = true;
+    }
+
+    /*
+     * If returned a tupple, expand into multiple columns.
+     */
+    if (setexpr->funcReturnsTuple)
+    {
+        if (!isnull)
+        {
+            HeapTupleHeader td = DatumGetHeapTupleHeader(result);
+            HeapTupleData tmptup;
+
+            /*
+             * In SFRM_Materialize mode, the type will have been checked
+             * already.
+             */
+            if (rsinfo->returnMode != SFRM_Materialize)
+            {
+                /*
+                 * Verify all later returned rows have same subtype;
+                 * necessary in case the type is RECORD.
+                 */
+                if (HeapTupleHeaderGetTypeId(td) != rsinfo->setDesc->tdtypeid ||
+                    HeapTupleHeaderGetTypMod(td) != rsinfo->setDesc->tdtypmod)
+                    ereport(ERROR,
+                            (errcode(ERRCODE_DATATYPE_MISMATCH),
+                             errmsg("rows returned by function are not all of the same row type")));
+            }
+
+            /*
+             * tuplestore_puttuple needs a HeapTuple not a bare
+             * HeapTupleHeader, but it doesn't need all the fields.
+             */
+            tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
+            tmptup.t_data = td;
+
+            heap_deform_tuple (&tmptup, setexpr->funcResultDesc,
+                               resultSlot->tts_values,
+                               resultSlot->tts_isnull);
+        }
+        else
+        {
+            /*
+             * populate the result cols with nulls
+             */
+            int i;
+            for (i = 0; i < pstate->colcount; i++)
+            {
+                resultSlot->tts_values[i] = (Datum) 0;
+                resultSlot->tts_isnull[i] = true;
+            }
+        }
+    }
+    else
+    {
+        /* Scalar-type case: just store the function result */
+        resultSlot->tts_values[0] = result;
+        resultSlot->tts_isnull[0] = isnull;
+    }
+
+    /*
+     * If we achieved obtained a single result, don't execute again.
+     */
+    if (*isdone == ExprSingleResult)
+        *isdone = ExprEndResult;
+
+    ExecStoreVirtualTuple(resultSlot);
+    return resultSlot;
+}
+
+SRFScanState *
+ExecInitSRFScan(SRFScanPlan *node, EState *estate, int eflags)
+{
+    RangeTblFunction *rtfunc = (RangeTblFunction *) node->rtfunc;
+
+    SRFScanState *srfstate;
+
+    /*
+     * SRFScan should not have any children.
+     */
+    Assert(outerPlan(node) == NULL);
+    Assert(innerPlan(node) == NULL);
+
+    /*
+     * create state structure
+     */
+    srfstate = makeNode(SRFScanState);
+    srfstate->ss.ps.plan = (Plan *) node;
+    srfstate->ss.ps.state = estate;
+    srfstate->ss.ps.ExecProcNode = ExecSRF;
+
+    /*
+     * Miscellaneous initialization
+     *
+     * create expression context for node
+     */
+    ExecAssignExprContext(estate, &srfstate->ss.ps);
+
+    srfstate->setexpr =
+        ExecInitFunctionResultSet((Expr *) node->funcexpr,
+                                  srfstate->ss.ps.ps_ExprContext,
+                                  &srfstate->ss.ps);
+
+    srfstate->setexpr->funcResultDesc = node->funcResultDesc;
+    srfstate->setexpr->funcReturnsTuple = node->funcReturnsTuple;
+
+    srfstate->colcount = rtfunc->funccolcount;
+
+    srfstate->tupdesc_checked = false;
+
+    /* Start with the assumption we will get some result. */
+    srfstate->elemdone = ExprSingleResult;
+
+    /*
+     * Initialize result type and slot. No need to initialize projection info
+     * because this node doesn't do projections (ps_ResultTupleSlot).
+     *
+     * material nodes only return tuples from their materialized relation.
+     */
+    ExecInitScanTupleSlot(estate, &srfstate->ss, srfstate->setexpr->funcResultDesc,
+                          &TTSOpsMinimalTuple);
+    ExecInitResultTupleSlotTL(&srfstate->ss.ps, &TTSOpsMinimalTuple);
+    ExecAssignScanProjectionInfo(&srfstate->ss);
+
+    /*
+     * Create a memory context that ExecMakeFunctionResultSet can use to
+     * evaluate function arguments in.  We can't use the per-tuple context for
+     * this because it gets reset too often; but we don't want to leak
+     * evaluation results into the query-lifespan context either.  We use one
+     * context for the arguments of all tSRFs, as they have roughly equivalent
+     * lifetimes.
+     */
+    srfstate->argcontext = AllocSetContextCreate(CurrentMemoryContext,
+                                              "SRF function arguments",
+                                              ALLOCSET_DEFAULT_SIZES);
+    return srfstate;
+}
+
+void
+ExecEndSRFScan(SRFScanState *node)
+{
+    /* Nothing to do */
+}
+
+void
+ExecReScanSRF(SRFScanState *node)
+{
+    /* Expecting some results. */
+    node->elemdone = ExprSingleResult;
+
+    /* We must re-evaluate function call arguments. */
+    node->setexpr->setArgsValid = false;
+}
+
+bool
+ExecSRFDonateResultTuplestore(SetExprState *fcache)
+{
+    if (fcache->funcResultStoreDonationEnabled)
+    {
+        if (IsA (fcache->funcResultStoreDonationTarget, MaterialState))
+        {
+            MaterialState *target = (MaterialState *) fcache->funcResultStoreDonationTarget;
+
+            ExecMaterialReceiveResultStore(target, fcache->funcResultStore);
+
+            fcache->funcResultStore = NULL;
+
+            fcache->funcResultStoreDonated = true;
+
+            return true;
+        }
+    }
+
+    return false;
+}
diff --git a/src/include/access/tupdesc.h b/src/include/access/tupdesc.h
index d17af13..118fdcb 100644
--- a/src/include/access/tupdesc.h
+++ b/src/include/access/tupdesc.h
@@ -151,4 +151,6 @@ extern TupleDesc BuildDescForRelation(List *schema);

 extern TupleDesc BuildDescFromLists(List *names, List *types, List *typmods, List *collations);

+extern void tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc);
+
 #endif                            /* TUPDESC_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9489051..8ada13e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -409,13 +409,6 @@ extern bool ExecCheck(ExprState *state, ExprContext *context);
 /*
  * prototypes from functions in execSRF.c
  */
-extern SetExprState *ExecInitTableFunctionResult(Expr *expr,
-                                                 ExprContext *econtext, PlanState *parent);
-extern Tuplestorestate *ExecMakeTableFunctionResult(SetExprState *setexpr,
-                                                    ExprContext *econtext,
-                                                    MemoryContext argContext,
-                                                    TupleDesc expectedDesc,
-                                                    bool randomAccess);
 extern SetExprState *ExecInitFunctionResultSet(Expr *expr,
                                                ExprContext *econtext, PlanState *parent);
 extern Datum ExecMakeFunctionResultSet(SetExprState *fcache,
diff --git a/src/include/executor/nodeFunctionscan.h b/src/include/executor/nodeFunctionscan.h
index 74e8eef..ca89980 100644
--- a/src/include/executor/nodeFunctionscan.h
+++ b/src/include/executor/nodeFunctionscan.h
@@ -16,6 +16,16 @@

 #include "nodes/execnodes.h"

+/*
+ * Runtime data for each function being scanned.
+ */
+typedef struct FunctionScanPerFuncState
+{
+    int            colcount;        /* expected number of result columns */
+    int64        rowcount;        /* # of rows in result set, -1 if not known */
+    ScanState  *scanstate;        /* scan node: either SRFScan or Materialize */
+} FunctionScanPerFuncState;
+
 extern FunctionScanState *ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags);
 extern void ExecEndFunctionScan(FunctionScanState *node);
 extern void ExecReScanFunctionScan(FunctionScanState *node);
diff --git a/src/include/executor/nodeMaterial.h b/src/include/executor/nodeMaterial.h
index 99e7cbf..f55922c 100644
--- a/src/include/executor/nodeMaterial.h
+++ b/src/include/executor/nodeMaterial.h
@@ -21,5 +21,6 @@ extern void ExecEndMaterial(MaterialState *node);
 extern void ExecMaterialMarkPos(MaterialState *node);
 extern void ExecMaterialRestrPos(MaterialState *node);
 extern void ExecReScanMaterial(MaterialState *node);
+extern void ExecMaterialReceiveResultStore(MaterialState *node, Tuplestorestate *store);

 #endif                            /* NODEMATERIAL_H */
diff --git a/src/include/executor/nodeSRFScan.h b/src/include/executor/nodeSRFScan.h
new file mode 100644
index 0000000..2430de5
--- /dev/null
+++ b/src/include/executor/nodeSRFScan.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * IDENTIFICATION
+ *      src/include/executor/nodeSRFScan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef nodeSRFScan_h
+#define nodeSRFScan_h
+
+#include "nodes/execnodes.h"
+
+typedef struct
+{
+    ScanState        ss;                    /* its first field is NodeTag */
+    SetExprState     *setexpr;            /* state of the expression being evaluated */
+    ExprDoneCond    elemdone;
+    int                colcount;            /* # of columns */
+    bool            tupdesc_checked;    /* has the return tupdesc been checked? */
+    MemoryContext     argcontext;            /* context for SRF arguments */
+    PlanState        *parent;            /* the plan's parent node */
+} SRFScanState;
+
+extern SRFScanState *ExecInitSRFScan(SRFScanPlan *node, EState *estate, int eflags);
+extern void ExecEndSRFScan(SRFScanState *node);
+extern void ExecReScanSRF(SRFScanState *node);
+extern bool ExecSRFDonateResultTuplestore(SetExprState *fcache);
+
+#endif /* nodeSRFScan_h */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cd3ddf7..f1c8085 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -797,10 +797,16 @@ typedef struct SetExprState
     /*
      * For a set-returning function (SRF) that returns a tuplestore, we keep
      * the tuplestore here and dole out the result rows one at a time. The
-     * slot holds the row currently being returned.
+     * slot holds the row currently being returned. The boolean
+     * funcResultStoreDonationEnabled indicates whether the an SRF
+     * returning SFRM_Materialize tupleStore should attempt to donate its
+     * resultStore to a higher level Materialize node.
      */
     Tuplestorestate *funcResultStore;
     TupleTableSlot *funcResultSlot;
+    bool         funcResultStoreDonationEnabled;
+    bool         funcResultStoreDonated;
+    struct PlanState *funcResultStoreDonationTarget;

     /*
      * In some cases we need to compute a tuple descriptor for the function's
@@ -1651,6 +1657,7 @@ typedef struct SubqueryScanState
  *        funcstates            per-function execution states (private in
  *                            nodeFunctionscan.c)
  *        argcontext            memory context to evaluate function arguments in
+ *        pending_srf_tuples    still evaluating any SRFs?
  * ----------------
  */
 struct FunctionScanPerFuncState;
@@ -1978,6 +1985,7 @@ typedef struct MaterialState
     int            eflags;            /* capability flags to pass to tuplestore */
     bool        eof_underlying; /* reached end of underlying plan? */
     Tuplestorestate *tuplestorestate;
+    bool        tuplestore_donated; /* was duplestore donated by another node? */
 } MaterialState;

 /* ----------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe..24a72a2 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -514,7 +514,9 @@ typedef enum NodeTag
     T_SupportRequestSelectivity,    /* in nodes/supportnodes.h */
     T_SupportRequestCost,        /* in nodes/supportnodes.h */
     T_SupportRequestRows,        /* in nodes/supportnodes.h */
-    T_SupportRequestIndexCondition    /* in nodes/supportnodes.h */
+    T_SupportRequestIndexCondition,    /* in nodes/supportnodes.h */
+    T_SRFScanPlan,
+    T_SRFScanState
 } NodeTag;

 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4869fe7..3486ede 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -16,6 +16,7 @@

 #include "access/sdir.h"
 #include "access/stratnum.h"
+#include "access/tupdesc.h"
 #include "lib/stringinfo.h"
 #include "nodes/bitmapset.h"
 #include "nodes/lockoptions.h"
@@ -546,6 +547,14 @@ typedef struct TableFuncScan
     TableFunc  *tablefunc;        /* table function node */
 } TableFuncScan;

+typedef struct SRFScanPlan {
+    Plan        plan;
+    Node        *funcexpr;
+    Node         *rtfunc;
+    TupleDesc    funcResultDesc;        /* funciton output columns tuple descriptor */
+    bool        funcReturnsTuple;
+} SRFScanPlan;
+
 /* ----------------
  *        CteScan node
  * ----------------
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index f457b5b..ab8e222 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -514,13 +514,15 @@ order by 1, 2;
          ->  Function Scan on pg_catalog.generate_series s1
                Output: s1.s1
                Function Call: generate_series(1, 3)
+               ->  SRF Scan
          ->  HashAggregate
                Output: s2.s2, sum((s1.s1 + s2.s2))
                Group Key: s2.s2
                ->  Function Scan on pg_catalog.generate_series s2
                      Output: s2.s2
                      Function Call: generate_series(1, 3)
-(14 rows)
+                     ->  SRF Scan
+(16 rows)

 select s1, s2, sm
 from generate_series(1, 3) s1,
@@ -549,6 +551,7 @@ select array(select sum(x+y) s
  Function Scan on pg_catalog.generate_series x
    Output: (SubPlan 1)
    Function Call: generate_series(1, 3)
+   ->  SRF Scan
    SubPlan 1
      ->  Sort
            Output: (sum((x.x + y.y))), y.y
@@ -559,7 +562,8 @@ select array(select sum(x+y) s
                  ->  Function Scan on pg_catalog.generate_series y
                        Output: y.y
                        Function Call: generate_series(1, 3)
-(13 rows)
+                       ->  SRF Scan
+(15 rows)

 select array(select sum(x+y) s
             from generate_series(1,3) y group by y order by s)
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index c1f802c..5eb7dba 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -374,7 +374,8 @@ select g as alias1, g as alias2
    ->  Sort
          Sort Key: g
          ->  Function Scan on generate_series g
-(6 rows)
+               ->  SRF Scan
+(7 rows)

 select g as alias1, g as alias2
   from generate_series(1,3) g
@@ -1234,7 +1235,9 @@ explain (costs off)
          ->  Nested Loop
                ->  Values Scan on "*VALUES*"
                ->  Function Scan on gstest_data
-(8 rows)
+                     ->  Materialize
+                           ->  SRF Scan
+(10 rows)

 select *
   from (values (1),(2)) v(x),
@@ -1358,7 +1361,9 @@ explain (costs off)
          ->  Nested Loop
                ->  Values Scan on "*VALUES*"
                ->  Function Scan on gstest_data
-(10 rows)
+                     ->  Materialize
+                           ->  SRF Scan
+(12 rows)

 -- Verify that we correctly handle the child node returning a
 -- non-minimal slot, which happens if the input is pre-sorted,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index dfd0ee4..c339722 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1684,6 +1684,7 @@ FROM generate_series(1, 3) g(i);
                            QUERY PLAN
 ----------------------------------------------------------------
  Function Scan on generate_series g
+   ->  SRF Scan
    SubPlan 1
      ->  Limit
            ->  Merge Append
@@ -1691,10 +1692,12 @@ FROM generate_series(1, 3) g(i);
                  ->  Sort
                        Sort Key: ((d.d + g.i))
                        ->  Function Scan on generate_series d
+                             ->  SRF Scan
                  ->  Sort
                        Sort Key: ((d_1.d + g.i))
                        ->  Function Scan on generate_series d_1
-(11 rows)
+                             ->  SRF Scan
+(14 rows)

 SELECT
     ARRAY(SELECT f.i FROM (
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b..3650aee 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3403,7 +3403,8 @@ select * from mki8(1,2);
  Function Scan on mki8
    Output: q1, q2
    Function Call: '(1,2)'::int8_tbl
-(3 rows)
+   ->  SRF Scan
+(4 rows)

 select * from mki8(1,2);
  q1 | q2
@@ -3418,7 +3419,8 @@ select * from mki4(42);
  Function Scan on mki4
    Output: f1
    Function Call: '(42)'::int4_tbl
-(3 rows)
+   ->  SRF Scan
+(4 rows)

 select * from mki4(42);
  f1
@@ -3660,9 +3662,10 @@ left join unnest(v1ys) as u1(u1y) on u1y = v2y;
          Hash Cond: (u1.u1y = "*VALUES*_1".column2)
          Filter: ("*VALUES*_1".column1 = "*VALUES*".column1)
          ->  Function Scan on unnest u1
+               ->  SRF Scan
          ->  Hash
                ->  Values Scan on "*VALUES*_1"
-(8 rows)
+(9 rows)

 select * from
 (values (1, array[10,20]), (2, array[20,30])) as v1(v1x,v1ys)
@@ -4475,7 +4478,9 @@ select 1 from (select a.id FROM a left join b on a.b_id = b.id) q,
    ->  Seq Scan on a
    ->  Function Scan on generate_series gs
          Filter: (a.id = i)
-(4 rows)
+         ->  Materialize
+               ->  SRF Scan
+(6 rows)

 rollback;
 create temp table parent (k int primary key, pd int);
@@ -4814,7 +4819,9 @@ explain (costs off)
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+               ->  Materialize
+                     ->  SRF Scan
+(6 rows)

 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
@@ -4824,7 +4831,9 @@ explain (costs off)
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+               ->  Materialize
+                     ->  SRF Scan
+(6 rows)

 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
@@ -4835,7 +4844,9 @@ explain (costs off)
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+               ->  Materialize
+                     ->  SRF Scan
+(6 rows)

 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -4846,12 +4857,13 @@ explain (costs off)
 ------------------------------------------
  Nested Loop
    ->  Function Scan on generate_series g
+         ->  SRF Scan
    ->  Append
          ->  Seq Scan on int8_tbl a
                Filter: (g.g = q1)
          ->  Seq Scan on int8_tbl b
                Filter: (g.g = q2)
-(7 rows)
+(8 rows)

 select * from generate_series(100,200) g,
   lateral (select * from int8_tbl a where g = q1 union all
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index e217b67..19b14f8 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -226,9 +226,10 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,1000) g ON a.unique1 = g;
  Hash Join
    Hash Cond: (g.g = a.unique1)
    ->  Function Scan on my_gen_series g
+         ->  SRF Scan
    ->  Hash
          ->  Seq Scan on tenk1 a
-(5 rows)
+(6 rows)

 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
@@ -236,7 +237,8 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
 -------------------------------------------------
  Nested Loop
    ->  Function Scan on my_gen_series g
+         ->  SRF Scan
    ->  Index Scan using tenk1_unique1 on tenk1 a
          Index Cond: (unique1 = g.g)
-(4 rows)
+(5 rows)

diff --git a/src/test/regress/expected/pg_lsn.out b/src/test/regress/expected/pg_lsn.out
index 64d41df..e68adc1 100644
--- a/src/test/regress/expected/pg_lsn.out
+++ b/src/test/regress/expected/pg_lsn.out
@@ -87,13 +87,17 @@ SELECT DISTINCT (i || '/' || j)::pg_lsn f
          Group Key: ((((i.i)::text || '/'::text) || (j.j)::text))::pg_lsn
          ->  Nested Loop
                ->  Function Scan on generate_series k
+                     ->  SRF Scan
                ->  Materialize
                      ->  Nested Loop
                            ->  Function Scan on generate_series j
                                  Filter: ((j > 0) AND (j <= 10))
+                                 ->  SRF Scan
                            ->  Function Scan on generate_series i
                                  Filter: (i <= 10)
-(12 rows)
+                                 ->  Materialize
+                                       ->  SRF Scan
+(16 rows)

 SELECT DISTINCT (i || '/' || j)::pg_lsn f
   FROM generate_series(1, 10) i,
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index cd2c79f..67a6f39 100644
--- a/src/test/regress/expected/plpgsql.out
+++ b/src/test/regress/expected/plpgsql.out
@@ -3094,7 +3094,7 @@ select * from sc_test();

 create or replace function sc_test() returns setof integer as $$
 declare
-  c cursor for select * from generate_series(1, 10);
+  c scroll cursor for select * from generate_series(1, 10);
   x integer;
 begin
   open c;
@@ -4852,7 +4852,9 @@ select i, a from
    ->  Function Scan on public.consumes_rw_array i
          Output: i.i
          Function Call: consumes_rw_array((returns_rw_array(1)))
-(7 rows)
+         ->  Materialize
+               ->  SRF Scan
+(9 rows)

 select i, a from
   (select returns_rw_array(1) as a offset 0) ss,
@@ -4869,7 +4871,8 @@ select consumes_rw_array(a), a from returns_rw_array(1) a;
  Function Scan on public.returns_rw_array a
    Output: consumes_rw_array(a), a
    Function Call: returns_rw_array(1)
-(3 rows)
+   ->  SRF Scan
+(4 rows)

 select consumes_rw_array(a), a from returns_rw_array(1) a;
  consumes_rw_array |   a
diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out
index a70060b..7f96baa 100644
--- a/src/test/regress/expected/rangefuncs.out
+++ b/src/test/regress/expected/rangefuncs.out
@@ -1841,7 +1841,8 @@ explain (verbose, costs off)
  Function Scan on public.array_to_set t
    Output: f1, f2
    Function Call: array_to_set('{one,two}'::text[])
-(3 rows)
+   ->  SRF Scan
+(4 rows)

 -- but without, it can be:
 create or replace function array_to_set(anyarray) returns setof record as $$
@@ -1879,7 +1880,8 @@ explain (verbose, costs off)
  Function Scan on pg_catalog.generate_subscripts i
    Output: i.i, ('{one,two}'::text[])[i.i]
    Function Call: generate_subscripts('{one,two}'::text[], 1)
-(3 rows)
+   ->  SRF Scan
+(4 rows)

 create temp table rngfunc(f1 int8, f2 int8);
 create function testrngfunc() returns record as $$
@@ -1950,7 +1952,8 @@ select * from testrngfunc();
  Function Scan on testrngfunc
    Output: f1, f2
    Function Call: '(7.136178,7.14)'::rngfunc_type
-(3 rows)
+   ->  SRF Scan
+(4 rows)

 select * from testrngfunc();
     f1    |  f2
@@ -1982,7 +1985,8 @@ select * from testrngfunc();
  Function Scan on public.testrngfunc
    Output: f1, f2
    Function Call: testrngfunc()
-(3 rows)
+   ->  SRF Scan
+(4 rows)

 select * from testrngfunc();
     f1    |  f2
@@ -2048,7 +2052,8 @@ select * from testrngfunc();
  Function Scan on public.testrngfunc
    Output: f1, f2
    Function Call: testrngfunc()
-(3 rows)
+   ->  SRF Scan
+(4 rows)

 select * from testrngfunc();
     f1    |  f2
@@ -2217,7 +2222,9 @@ select x from int8_tbl, extractq2(int8_tbl) f(x);
    ->  Function Scan on f
          Output: f.x
          Function Call: int8_tbl.q2
-(7 rows)
+         ->  Materialize
+               ->  SRF Scan
+(9 rows)

 select x from int8_tbl, extractq2(int8_tbl) f(x);
          x
@@ -2306,3 +2313,155 @@ select *, row_to_json(u) from unnest(array[]::rngfunc2[]) u;
 (0 rows)

 drop type rngfunc2;
+--------------------------------------------------------------------------------
+-- Start of tests for support of ValuePerCall-mode SRFs
+CREATE TEMPORARY SEQUENCE rngfunc_vpc_seq;
+CREATE TEMPORARY SEQUENCE rngfunc_mat_seq;
+CREATE TYPE rngfunc_vpc_t AS (i integer, s bigint);
+-- rngfunc_vpc is SQL, so will yield a ValuePerCall SRF
+CREATE FUNCTION rngfunc_vpc(int,int)
+    RETURNS setof rngfunc_vpc_t AS
+$$
+    SELECT i, nextval('rngfunc_vpc_seq')
+        FROM generate_series($1,$2) i;
+$$
+LANGUAGE SQL;
+-- rngfunc_mat is plpgsql, so will yield a Materialize SRF
+CREATE FUNCTION rngfunc_mat(int,int)
+    RETURNS setof rngfunc_vpc_t AS
+$$
+begin
+    for i in $1..$2 loop
+        return next (i, nextval('rngfunc_mat_seq'));
+    end loop;
+end;
+$$
+LANGUAGE plpgsql;
+-- A VPC SRF that is not part of a complex query should not materialize.
+--
+-- To illustrate this, we explain a simple VPC SRF scan, and note the
+-- absence of a Materialize node.
+--
+explain (costs off)
+    select * from rngfunc_vpc(1, 3) t;
+           QUERY PLAN
+--------------------------------
+ Function Scan on rngfunc_vpc t
+   ->  SRF Scan
+(2 rows)
+
+-- A VPC SRF that aborts early should do so without emitting all results.
+--
+-- To illustrate this, we show that an SRF that uses a sequence does not
+-- have its value incremented if the SRF is not invoked to generate a row.
+--
+select nextval('rngfunc_vpc_seq');
+ nextval
+---------
+       1
+(1 row)
+
+select * from rngfunc_vpc(1, 3) t limit 2;
+ i | s
+---+---
+ 1 | 2
+ 2 | 3
+(2 rows)
+
+select nextval('rngfunc_vpc_seq');
+ nextval
+---------
+       4
+(1 row)
+
+-- A Marerialize SRF should show Materialization if the query demand rescan.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- The same plan should be generated for both VPC and Materialize mode SRFs.
+--
+explain (costs off)
+    select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+                QUERY PLAN
+------------------------------------------
+ Nested Loop
+   ->  Function Scan on generate_series n
+         ->  SRF Scan
+   ->  Function Scan on rngfunc_vpc t
+         ->  Materialize
+               ->  SRF Scan
+(6 rows)
+
+explain (costs off)
+    select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+                QUERY PLAN
+------------------------------------------
+ Nested Loop
+   ->  Function Scan on generate_series n
+         ->  SRF Scan
+   ->  Function Scan on rngfunc_mat t
+         ->  Materialize
+               ->  SRF Scan
+(6 rows)
+
+-- A Marerialize SRF should show donation of the returned tuplestore.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- Only the Materialize mode SRF should show donation.
+--
+explain (analyze, timing off, costs off, summary off)
+    select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+                            QUERY PLAN
+------------------------------------------------------------------
+ Nested Loop (actual rows=9 loops=1)
+   ->  Function Scan on generate_series n (actual rows=3 loops=1)
+         ->  SRF Scan (actual rows=3 loops=1)
+               SFRM: ValuePerCall
+   ->  Function Scan on rngfunc_vpc t (actual rows=3 loops=3)
+         ->  Materialize (actual rows=3 loops=3)
+               ->  SRF Scan (actual rows=3 loops=1)
+                     SFRM: ValuePerCall
+(8 rows)
+
+explain (analyze, timing off, costs off, summary off)
+    select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+                            QUERY PLAN
+------------------------------------------------------------------
+ Nested Loop (actual rows=9 loops=1)
+   ->  Function Scan on generate_series n (actual rows=3 loops=1)
+         ->  SRF Scan (actual rows=3 loops=1)
+               SFRM: ValuePerCall
+   ->  Function Scan on rngfunc_mat t (actual rows=3 loops=3)
+         ->  Materialize (actual rows=3 loops=3)
+               ->  SRF Scan (actual rows=0 loops=1)
+                     SFRM: Materialize
+                     Donated tuplestore: true
+(9 rows)
+
+-- A Marerialize SRF that aborts early should still generate all results.
+--
+-- To illustrate this, we show that an SRF that uses a sequence still has
+-- its value incremented if even when SRF's rows are not emitted.
+--
+select nextval('rngfunc_mat_seq');
+ nextval
+---------
+       4
+(1 row)
+
+select * from rngfunc_mat(1, 3) t limit 2;
+ i | s
+---+---
+ 1 | 5
+ 2 | 6
+(2 rows)
+
+select nextval('rngfunc_mat_seq');
+ nextval
+---------
+       8
+(1 row)
+
+-- End of tests for support of ValuePerCall-mode SRFs
+--------------------------------------------------------------------------------
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index fe1cd9d..9f6deff 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -1669,8 +1669,9 @@ select * from test_tsquery, to_tsquery('new') q where txtsample @@ q;
  Nested Loop
    Join Filter: (test_tsquery.txtsample @@ q.q)
    ->  Function Scan on to_tsquery q
+         ->  SRF Scan
    ->  Seq Scan on test_tsquery
-(4 rows)
+(5 rows)

 -- to_tsquery(regconfig, text) is an immutable function.
 -- That allows us to get rid of using function scan and join at all.
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 6e72e92..6828582 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -577,8 +577,10 @@ select from generate_series(1,5) union select from generate_series(1,3);
  HashAggregate
    ->  Append
          ->  Function Scan on generate_series
+               ->  SRF Scan
          ->  Function Scan on generate_series generate_series_1
-(4 rows)
+               ->  SRF Scan
+(6 rows)

 explain (costs off)
 select from generate_series(1,5) intersect select from generate_series(1,3);
@@ -588,9 +590,11 @@ select from generate_series(1,5) intersect select from generate_series(1,3);
    ->  Append
          ->  Subquery Scan on "*SELECT* 1"
                ->  Function Scan on generate_series
+                     ->  SRF Scan
          ->  Subquery Scan on "*SELECT* 2"
                ->  Function Scan on generate_series generate_series_1
-(6 rows)
+                     ->  SRF Scan
+(8 rows)

 select from generate_series(1,5) union select from generate_series(1,3);
 --
@@ -626,8 +630,10 @@ select from generate_series(1,5) union select from generate_series(1,3);
  Unique
    ->  Append
          ->  Function Scan on generate_series
+               ->  SRF Scan
          ->  Function Scan on generate_series generate_series_1
-(4 rows)
+               ->  SRF Scan
+(6 rows)

 explain (costs off)
 select from generate_series(1,5) intersect select from generate_series(1,3);
@@ -637,9 +643,11 @@ select from generate_series(1,5) intersect select from generate_series(1,3);
    ->  Append
          ->  Subquery Scan on "*SELECT* 1"
                ->  Function Scan on generate_series
+                     ->  SRF Scan
          ->  Subquery Scan on "*SELECT* 2"
                ->  Function Scan on generate_series generate_series_1
-(6 rows)
+                     ->  SRF Scan
+(8 rows)

 select from generate_series(1,5) union select from generate_series(1,3);
 --
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index d5fd404..d2cd0b5 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3851,7 +3851,8 @@ EXPLAIN (costs off) SELECT * FROM pg_temp.f(2);
          ->  Sort
                Sort Key: s.s
                ->  Function Scan on generate_series s
-(5 rows)
+                     ->  SRF Scan
+(6 rows)

 SELECT * FROM pg_temp.f(2);
     f
diff --git a/src/test/regress/sql/plpgsql.sql b/src/test/regress/sql/plpgsql.sql
index d841d8c..4717b06 100644
--- a/src/test/regress/sql/plpgsql.sql
+++ b/src/test/regress/sql/plpgsql.sql
@@ -2646,7 +2646,7 @@ select * from sc_test();

 create or replace function sc_test() returns setof integer as $$
 declare
-  c cursor for select * from generate_series(1, 10);
+  c scroll cursor for select * from generate_series(1, 10);
   x integer;
 begin
   open c;
diff --git a/src/test/regress/sql/rangefuncs.sql b/src/test/regress/sql/rangefuncs.sql
index 476b4f2..4d39f39 100644
--- a/src/test/regress/sql/rangefuncs.sql
+++ b/src/test/regress/sql/rangefuncs.sql
@@ -730,3 +730,82 @@ select *, row_to_json(u) from unnest(array[null::rngfunc2, (1,'foo')::rngfunc2,
 select *, row_to_json(u) from unnest(array[]::rngfunc2[]) u;

 drop type rngfunc2;
+
+--------------------------------------------------------------------------------
+-- Start of tests for support of ValuePerCall-mode SRFs
+
+CREATE TEMPORARY SEQUENCE rngfunc_vpc_seq;
+CREATE TEMPORARY SEQUENCE rngfunc_mat_seq;
+CREATE TYPE rngfunc_vpc_t AS (i integer, s bigint);
+
+-- rngfunc_vpc is SQL, so will yield a ValuePerCall SRF
+CREATE FUNCTION rngfunc_vpc(int,int)
+    RETURNS setof rngfunc_vpc_t AS
+$$
+    SELECT i, nextval('rngfunc_vpc_seq')
+        FROM generate_series($1,$2) i;
+$$
+LANGUAGE SQL;
+
+-- rngfunc_mat is plpgsql, so will yield a Materialize SRF
+CREATE FUNCTION rngfunc_mat(int,int)
+    RETURNS setof rngfunc_vpc_t AS
+$$
+begin
+    for i in $1..$2 loop
+        return next (i, nextval('rngfunc_mat_seq'));
+    end loop;
+end;
+$$
+LANGUAGE plpgsql;
+
+-- A VPC SRF that is not part of a complex query should not materialize.
+--
+-- To illustrate this, we explain a simple VPC SRF scan, and note the
+-- absence of a Materialize node.
+--
+explain (costs off)
+    select * from rngfunc_vpc(1, 3) t;
+
+-- A VPC SRF that aborts early should do so without emitting all results.
+--
+-- To illustrate this, we show that an SRF that uses a sequence does not
+-- have its value incremented if the SRF is not invoked to generate a row.
+--
+select nextval('rngfunc_vpc_seq');
+select * from rngfunc_vpc(1, 3) t limit 2;
+select nextval('rngfunc_vpc_seq');
+
+-- A Marerialize SRF should show Materialization if the query demand rescan.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- The same plan should be generated for both VPC and Materialize mode SRFs.
+--
+explain (costs off)
+    select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+explain (costs off)
+    select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+
+-- A Marerialize SRF should show donation of the returned tuplestore.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- Only the Materialize mode SRF should show donation.
+--
+explain (analyze, timing off, costs off, summary off)
+    select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+explain (analyze, timing off, costs off, summary off)
+    select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+
+-- A Marerialize SRF that aborts early should still generate all results.
+--
+-- To illustrate this, we show that an SRF that uses a sequence still has
+-- its value incremented if even when SRF's rows are not emitted.
+--
+select nextval('rngfunc_mat_seq');
+select * from rngfunc_mat(1, 3) t limit 2;
+select nextval('rngfunc_mat_seq');
+
+-- End of tests for support of ValuePerCall-mode SRFs
+--------------------------------------------------------------------------------

Re: The flinfo->fn_extra question, from me this time.

От

Thomas Munro

Дата:

13 марта 2020 г., 04:28:59

On Fri, Mar 13, 2020 at 7:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ...  (At least on the Linux side.  I guess the cfbot's
> Windows builds are sans cassert, which seems like an odd choice.)

I tried turning that on by adding $config{asserts} = 1 in the build
script and adding some scripting to dump all relevant logs on
appveyor.  It had the desired effect, but I had some trouble getting
any useful information out of it.  Somehow the FailedAssertion message
is not making it to the log, which seems to be the bare minimum you'd
need for this to be useful, and ideally you'd also want a backtrace.
I'll look into that next week with the help of a Windows-enabled
colleague.

Re: The flinfo->fn_extra question, from me this time.

От

Dent John

Дата:

23 марта 2020 г., 00:13:17

> On 12 Mar 2020, at 18:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> […]
>
> I didn't want to spend any more effort on it than that, because I'm
> not really on board with this line of attack.

Appreciate that. It was about the approach that I was most keen to get feedback upon.

> This patch seems
> awfully invasive for what it's accomplishing, both at the code level
> and in terms of what users will see in EXPLAIN.  No, I don't think
> that adding additional "SRF Scan" nodes below FunctionScan is an
> improvement, nor do I like your repurposing/abusing of Materialize.
> It might be okay if you were just using Materialize as-is, but if
> it's sort-of-materialize-but-not-always, I don't think that's going
> to make anyone less confused.

Okay. Makes sense.

I wonder whether you think it's valuable to retain in the EXPLAIN output the mode the SRF operated in?

That information is not available to end users, yet it is important to understand when trying to create a pipeline-able
plan.

> More locally, this business with creating new "plan nodes" below the
> FunctionScan at executor startup is a real abuse of a whole lot of stuff,
> and I suspect that it's not unrelated to the assertion failures I'm
> seeing.  Don't do that.  If you want to build some data structures at
> executor start, fine, but they're not plans and shouldn't be mislabeled as
> that.

I felt that FunctionScan was duplicating a bunch of stuff that the Materialize node could be doing for it. But in the
end,I agree. Actually making re-use of Materialize turned out quite invasive. 

> On the other hand, if they do need to be plan nodes, they should be
> made by the planner (which in turn would require a lot of infrastructure
> you haven't built, eg copyfuncs/outfuncs/readfuncs/setrefs/...).
>
> The v3 patch seemed closer to the sort of thing I was expecting
> to get out of this (though I've not read it in any detail).

I did a bit more exploration down the route of pushing it into the planner. I figured perhaps some of the complexities
wouldshake out by approaching it at the planner level, but I learned enough along the way to realise that it is a long
journey.

I’ll dust off the v3 approach and resubmit. While I’m doing that, I'll pull it back from the CF.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: The flinfo->fn_extra question, from me this time.

Вложения

Вложения

Вложения

Вложения

Вложения