Обсуждение: slow SELECT expr INTO var in plpgsql

Поиск
Список
Период
Сортировка

slow SELECT expr INTO var in plpgsql

От
Pavel Stehule
Дата:
Hi


It compare T-SQL and PLpgSQL performance on some simple benchmark

do $$ declare x int; begin for i in 1..10000000 loop x := 0; end loop; end $$;
do $$ declare x int; begin for i in 1..10000000 loop select 0 into x; end loop; end $$;

SELECT expr INTO var is syntax used on old sybase and mssql systems. The positive result in this article is fact, so Postgres in all tests are very well comparable. More - the assignment is really fast and significantly faster than on MSSQL.

I remember the old discussion about this issue, and I thought that the performance of SELECT INTO and assignment should be almost the same. I repeated these tests on pg 9.4, 11 and master (asserts are disabled) with interesting results

release, assign time, select into time
9.4, 2900 ms, 20800 ms
11, 2041 ms, 16243 ms
master, 534ms, 15438 ms

Originally, I used gcc with O0, and master is really slow without O2 optimization

9.4, 2600 ms, 20800 ms --<< 9.4 is faster with O0
11, 2177 ms, 19128 ms
master, 1395 ms, 70060 ms -- << master is very slow with O0

Using SELECT expr INTO var is plpgsql's antipattern. plpgsql_check can detect it now. But it will still be nice if there will not be too big a difference like now. I didn't check the code yet, and I have no idea if there are some possibilities on how to execute this case better.

Regards

Pavel

tested on Fedora 43


Re: slow SELECT expr INTO var in plpgsql

От
Tom Lane
Дата:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I remember the old discussion about this issue, and I thought that the
> performance of SELECT INTO and assignment should be almost the same. I
> repeated these tests on pg 9.4, 11 and master (asserts are disabled) with
> interesting results

> release, assign time, select into time
> 9.4, 2900 ms, 20800 ms
> 11, 2041 ms, 16243 ms
> master, 534ms, 15438 ms

Yeah, we've sweated a good deal about optimizing plpgsql assignment,
but SELECT INTO is always done the hard way.

I experimented a little bit with converting simple-expression
SELECT INTO into an assignment, as attached.  It does reclaim
nearly all of the performance difference: for me, these two
test cases now take about 276 vs 337 ms.  However, I'm concerned
about the side-effects of substituting this other code path;
there's a lot of potential minor differences in behavior.
Two that you can see in the regression test changes are:

* SELECT INTO is tracked by pg_stat_statements, assignments aren't.

* The context report for an error can be different, because
_SPI_error_callback() doesn't get used.

We could probably eliminate the context-report difference by setting
up a custom error context callback in this new code path, but the
difference in pg_stat_statements output would be hard to mask.
There may be other discrepancies as well, such as variations in
error message wording.

Probably no one would notice such details if it had been like that
all along, but would they complain about a change?  I dunno.

            regards, tom lane

diff --git a/contrib/pg_stat_statements/expected/level_tracking.out
b/contrib/pg_stat_statements/expected/level_tracking.out
index a15d897e59b..832d65e97ca 100644
--- a/contrib/pg_stat_statements/expected/level_tracking.out
+++ b/contrib/pg_stat_statements/expected/level_tracking.out
@@ -1500,12 +1500,11 @@ SELECT PLUS_ONE(1);
 SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
  calls | rows |                       query
 -------+------+----------------------------------------------------
-     2 |    2 | SELECT (i + $2 + $3)::INTEGER
      2 |    2 | SELECT (i + $2)::INTEGER LIMIT $3
      2 |    2 | SELECT PLUS_ONE($1)
      2 |    2 | SELECT PLUS_TWO($1)
      1 |    1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
-(5 rows)
+(4 rows)

 -- immutable SQL function --- can be executed at plan time
 CREATE FUNCTION PLUS_THREE(i INTEGER) RETURNS INTEGER AS
@@ -1525,15 +1524,14 @@ SELECT PLUS_THREE(10);
 SELECT toplevel, calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
  toplevel | calls | rows |                                    query
 ----------+-------+------+------------------------------------------------------------------------------
- f        |     2 |    2 | SELECT (i + $2 + $3)::INTEGER
  f        |     2 |    2 | SELECT (i + $2)::INTEGER LIMIT $3
  t        |     2 |    2 | SELECT PLUS_ONE($1)
  t        |     2 |    2 | SELECT PLUS_THREE($1)
  t        |     2 |    2 | SELECT PLUS_TWO($1)
- t        |     1 |    5 | SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"
+ t        |     1 |    4 | SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"
  f        |     2 |    2 | SELECT i + $2 LIMIT $3
  t        |     1 |    1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
-(8 rows)
+(7 rows)

 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t
diff --git a/contrib/pg_stat_statements/expected/plancache.out b/contrib/pg_stat_statements/expected/plancache.out
index e152de9f551..32bf913b286 100644
--- a/contrib/pg_stat_statements/expected/plancache.out
+++ b/contrib/pg_stat_statements/expected/plancache.out
@@ -159,11 +159,10 @@ SELECT calls, generic_plan_calls, custom_plan_calls, toplevel, query FROM pg_sta
  calls | generic_plan_calls | custom_plan_calls | toplevel |                       query
 -------+--------------------+-------------------+----------+----------------------------------------------------
      2 |                  0 |                 0 | t        | CALL select_one_proc($1)
-     4 |                  2 |                 2 | f        | SELECT $1
      1 |                  0 |                 0 | t        | SELECT pg_stat_statements_reset() IS NOT NULL AS t
      2 |                  0 |                 0 | t        | SELECT select_one_func($1)
      2 |                  0 |                 0 | t        | SET plan_cache_mode TO $1
-(5 rows)
+(4 rows)

 --
 -- EXPLAIN [ANALYZE] EXECUTE + functions/procedures
@@ -211,10 +210,9 @@ SELECT calls, generic_plan_calls, custom_plan_calls, toplevel, query FROM pg_sta
      2 |                  0 |                 0 | t        | EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF,
BUFFERSOFF) SELECT select_one_func($1) 
      4 |                  0 |                 0 | f        | EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF,
BUFFERSOFF) SELECT select_one_func($1); 
      2 |                  0 |                 0 | t        | EXPLAIN (COSTS OFF) SELECT select_one_func($1)
-     4 |                  2 |                 2 | f        | SELECT $1
      1 |                  0 |                 0 | t        | SELECT pg_stat_statements_reset() IS NOT NULL AS t
      2 |                  0 |                 0 | t        | SET plan_cache_mode TO $1
-(7 rows)
+(6 rows)

 RESET pg_stat_statements.track;
 --
diff --git a/src/pl/plpgsql/src/expected/plpgsql_cache.out b/src/pl/plpgsql/src/expected/plpgsql_cache.out
index 9df188ce56b..601e1ff3bc1 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_cache.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_cache.out
@@ -55,8 +55,7 @@ select show_result_type('select 1 as a');
 -- (but if debug_discard_caches is on, it will succeed)
 select show_result_type('select 2.0 as a');
 ERROR:  type of parameter 5 (numeric) does not match that when preparing the plan (integer)
-CONTEXT:  SQL statement "select pg_typeof(r.a)"
-PL/pgSQL function show_result_type(text) line 7 at SQL statement
+CONTEXT:  PL/pgSQL function show_result_type(text) line 7 at SQL statement
 -- but it's OK if we force plan rebuilding
 discard plans;
 select show_result_type('select 2.0 as a');
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 75325117ec9..b693d6cc829 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4267,6 +4267,43 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
         stmt->mod_stmt_set = true;
     }

+    /*
+     * Some users write "SELECT expr INTO var" instead of "var := expr".  If
+     * the expression is simple and the INTO target is a single variable, we
+     * can bypass SPI and call ExecEvalExpr() directly.  (exec_assign_expr
+     * would actually work for non-simple expressions too, but such an
+     * expression might return more or less than one row, complicating matters
+     * greatly.  The potential performance win is small if it's non-simple,
+     * and any errors we might issue would likely look different, so avoid
+     * using this code path for non-simple cases.)
+     */
+    if (expr->expr_simple_expr && stmt->into)
+    {
+        PLpgSQL_datum *target = estate->datums[stmt->target->dno];
+
+        if (target->dtype == PLPGSQL_DTYPE_ROW)
+        {
+            PLpgSQL_row *row = (PLpgSQL_row *) target;
+
+            if (row->nfields == 1)
+            {
+                /* Evaluate the expression and assign to the INTO target */
+                exec_assign_expr(estate, estate->datums[row->varnos[0]],
+                                 expr);
+
+                /*
+                 * We must duplicate the other effects of the code below, as
+                 * well.  We know that exactly one row was returned, so it
+                 * doesn't matter whether the INTO was STRICT or not.
+                 */
+                exec_set_found(estate, true);
+                estate->eval_processed = 1;
+
+                return PLPGSQL_RC_OK;
+            }
+        }
+    }
+
     /*
      * Set up ParamListInfo to pass to executor
      */

Re: slow SELECT expr INTO var in plpgsql

От
Pavel Stehule
Дата:


so 31. 1. 2026 v 21:58 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I remember the old discussion about this issue, and I thought that the
> performance of SELECT INTO and assignment should be almost the same. I
> repeated these tests on pg 9.4, 11 and master (asserts are disabled) with
> interesting results

> release, assign time, select into time
> 9.4, 2900 ms, 20800 ms
> 11, 2041 ms, 16243 ms
> master, 534ms, 15438 ms

Yeah, we've sweated a good deal about optimizing plpgsql assignment,
but SELECT INTO is always done the hard way.

I experimented a little bit with converting simple-expression
SELECT INTO into an assignment, as attached.  It does reclaim
nearly all of the performance difference: for me, these two
test cases now take about 276 vs 337 ms.  However, I'm concerned
about the side-effects of substituting this other code path;
there's a lot of potential minor differences in behavior.
Two that you can see in the regression test changes are:

* SELECT INTO is tracked by pg_stat_statements, assignments aren't.

* The context report for an error can be different, because
_SPI_error_callback() doesn't get used.

We could probably eliminate the context-report difference by setting
up a custom error context callback in this new code path, but the
difference in pg_stat_statements output would be hard to mask.
There may be other discrepancies as well, such as variations in
error message wording.

Probably no one would notice such details if it had been like that
all along, but would they complain about a change?  I dunno.

This patch looks well. I can confirm massive speedup. 

I don't remember any report related to change of implementation of assign statement before, and I think it can be similar with this patch. 

In this specific case, I think so users suppose SELECT INTO is translated to assignment by default. And there are a lot of documents on the net that describe the transformation of the assignment statement to SELECT - so I think there is some grey zone where optimization can do some magic. More - the statistics for function execution can be covered by track_functions.

Regards

Pavel

 

                        regards, tom lane