Re: The flinfo->fn_extra question, from me this time.

Поиск

Список

Период

Сортировка

От	Dent John
Тема	Re: The flinfo->fn_extra question, from me this time.
Дата	5 октября 2019 г. 13:27:49
Msg-id	DD10EDF3-BF82-4122-BEFA-207ABBBCB34E@QQdd.eu обсуждение исходный текст
Ответ на	Re: The flinfo->fn_extra question, from me this time. (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: The flinfo->fn_extra question, from me this time. (Dent John <denty@QQdd.eu>)
Список	pgsql-hackers

Дерево обсуждения

On 22 Sep 2019, at 16:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hi Tom,

I don't know of anybody working on it.

Okay. I had a look at this. I tried to apply Andre’s patch [1] from some time ago, but that turned out not so easy. I guess the code has moved on since. So I’ve attempted to re-invent the same spirit, stealing from his patch, and from how the tSRF code does things. The patch isn’t final, but it demonstrates a concept.

However, given your comments below, I wonder if you might comment on the approach before I go further?

(Patch is presently still against 12beta2.)

You'd still need to be prepared to build a tuplestore,
in case of rescan or backwards fetch; but […]

I do recognise this. The patch teaches ExecMaterializesOutput() and ExecSupportsBackwardScan() that T_FunctionScan nodes don't materialise their output.

(Actually, Andre’s patch did the educating of ExecMaterializesOutput() and ExecSupportsBackwardScan() — it’s not my invention.)

I haven’t worked out how to easily demonstrate the backward scan case, but joins (which presumably are the typical cause of rescan) now yield an intermediate Materialize node.

postgres=# explain (analyze, buffers) select * from unnest (array_fill ('scanner'::text, array[10])) t1, unnest (array_fill ('dummy'::text, array[10000000])) t2 limit 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.01..1.36 rows=100 width=64) (actual time=0.009..0.067 rows=100 loops=1)
   -> Nested Loop (cost=0.01..1350000.13 rows=100000000 width=64) (actual time=0.008..0.049 rows=100 loops=1)
   -> Function Scan on unnest t2 (cost=0.00..100000.00 rows=10000000 width=32) (actual time=0.003..0.006 rows=10 loops=1)
   -> Materialize (cost=0.00..0.15 rows=10 width=32) (actual time=0.000..0.002 rows=10 loops=10)
   -> Function Scan on unnest t1 (cost=0.00..0.10 rows=10 width=32) (actual time=0.001..0.004 rows=10 loops=1)
Planning Time: 127.875 ms
Execution Time: 0.102 ms
(7 rows)

My point was that you can't simply remove the tuplestore-building code
path. The exact boundary conditions for that might be negotiable.
But I'd be very dubious of an assumption that re-running the function
would be cheaper than building a tuplestore, regardless of whether it's
safe.

Understood, and I agree. I think it’s preferable to allow the planner control over when to explicitly materialise.

But if I’m not wrong, at present, the planner doesn’t really trade-off the cost of rescan versus materialisation, but instead adopts a simple heuristic of materialising one or other side during a join. We can see this in the plans if the unnest()s are moved into the target list and buried in a subquery. For example:

postgres=# explain (analyze, buffers) select * from (select unnest (array_fill ('scanner'::text, array[10]))) t1, (select unnest (array_fill ('dummy'::text, array[10000000]))) t2 limit 100;
   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.40 rows=100 width=64) (actual time=0.011..0.106 rows=100 loops=1)
   -> Nested Loop (cost=0.00..1400000.21 rows=100000000 width=64) (actual time=0.010..0.081 rows=100 loops=1)
   -> ProjectSet (cost=0.00..50000.02 rows=10000000 width=32) (actual time=0.004..0.024 rows=10 loops=1)
   -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
   -> Materialize (cost=0.00..0.22 rows=10 width=32) (actual time=0.001..0.002 rows=10 loops=10)
   -> ProjectSet (cost=0.00..0.07 rows=10 width=32) (actual time=0.001..0.004 rows=10 loops=1)
   -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
Planning Time: 180.482 ms
Execution Time: 0.148 ms
(9 rows)

I am tempted to stop short of educating the planner about the possibility to re-scan (thus dropping the materialise node) during a join. It seems feasible, and sometimes advantageous. (Perhaps if the join quals would cause a huge amount of the output to be filtered??) But it also seems better to treat it as an entirely separate issue.

cost_functionscan and cost_rescan would likely need some adjustment if
possible.

I looked at cost_functionscan(), but I think it is already of the view that a function can pipeline. It computes a startup_cost and a run_cost, where run_cost is the per-tuple cost * num_rows. With this understanding, it is actually wrong given the current materialisation-always behaviour. I think this means I don’t need to make any fundamental changes in order to correctly cost the new behaviour.

However, I'm not sure that the planner has any way to know
whether a given SRF will support ValuePerCall or not.

Yes. There is a flaw. But with the costing support function interface, it’s possible to supply costs that correctly relate to the SRF’s abilities.

I guess there can be a case where the SRF supports ValuePerCall, and supplies costs accordingly, but at execution time, decides to not to use it. That seems a curious situation, but it will, at worst, cost us a bit more buffer space.

In the opposite case, where the SRF can’t support ValuePerCall, the risk is that the planner has decided it wants to interject a Materialize node, and the result will be buffer-to-buffer copying. If the function has a costing support function, it should all be costed correctly, but it’s obviously not ideal. Currently, my patch doesn’t do anything about this case. My plan would be to allow the Materialize node to be supplied with a tuplestore from the FunctionScan node at execution time. I guess this optimisation would similarly help non-ValuePerCall tSRFs.

After all this, I’m wondering how you view the proposal?

For sake of comparison, 12beta1 achieves the following plans:

postgres=# create or replace function test1() returns setof record language plpgsql as $$ begin return query (select 'a', generate_series (1, 1e6)); end; $$; -- using plpgsql because it can’t pipeline
CREATE FUNCTION
postgres=# explain (verbose, analyse, buffers) select key, count (value), sum (value) from test1() as (key text, value numeric) group by key;
...
Planning Time: 0.068 ms
Execution Time: 589.651 ms

postgres=# explain (verbose, analyse, buffers) select * from test1() as (key text, value numeric) limit 50;
...
Planning Time: 0.059 ms
Execution Time: 348.334 ms

postgres=# explain (analyze, buffers) select count (a.a), sum (a.a) from unnest (array_fill (1::numeric, array[10000000])) a;
...
Planning Time: 165.502 ms
Execution Time: 5629.094 ms

postgres=# explain (analyze, buffers) select * from unnest (array_fill (1::numeric, array[10000000])) limit 50;
...
Planning Time: 110.952 ms
Execution Time: 1080.609 ms

Versus 12beta2+patch, which seem favourable in the main, at least for these pathological cases:

postgres=# explain (verbose, analyse, buffers) select key, count (value), sum (value) from test1() as (key text, value numeric) group by key;
...
Planning Time: 0.068 ms
Execution Time: 591.749 ms

postgres=# explain (verbose, analyse, buffers) select * from test1() as (key text, value numeric) limit 50;
...
Planning Time: 0.051 ms
Execution Time: 289.820 ms

postgres=# explain (analyze, buffers) select count (a.a), sum (a.a) from unnest (array_fill (1::numeric, array[10000000])) a;
...
Planning Time: 169.260 ms
Execution Time: 4759.781 ms

postgres=# explain (analyze, buffers) select * from unnest (array_fill (1::numeric, array[10000000])) limit 50;
...
Planning Time: 163.374 ms
Execution Time: 0.051 ms