Обсуждение: BUG #19386: Unnecessary Sort in query plan for SELECT literal with ORDER BY

Поиск
Список
Период
Сортировка

BUG #19386: Unnecessary Sort in query plan for SELECT literal with ORDER BY

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      19386
Logged by:          Chi Zhang
Email address:      798604270@qq.com
PostgreSQL version: 18.1
Operating system:   ubuntu 24.04 with docker
Description:

Hi,

In the following test case, there are two equivalent queries. One is a
normal SELECT, and the other is a prepared SELECT. In the query plan of the
normal SELECT, there is an unnecessary Sort, which causes it to be slower
than the prepared SELECT. In general, the prepared SELECT should be slower
than the normal SELECT, as its query plan is suboptimal. So there maybe
potential opportunities for further optimization in the query planning of
normal SELECT statements.

```
CREATE UNLOGGED TABLE IF NOT EXISTS t0(c0 DECIMAL  DEFAULT
(0.941408570867201) NULL, c1 boolean  PRIMARY KEY);
CREATE TEMPORARY TABLE IF NOT EXISTS t1(LIKE t0);
CREATE UNLOGGED TABLE t3(LIKE t0);
CREATE TEMPORARY TABLE IF NOT EXISTS t5(c0 integer , c1 money , c2 REAL
PRIMARY KEY) USING heap;
INSERT INTO t1 (c0, c1) VALUES (0.1, true), (0.2, false) ON CONFLICT DO
NOTHING;
INSERT INTO t3 (c0, c1) VALUES (0.3, true), (0.4, false) ON CONFLICT DO
NOTHING;
INSERT INTO t5 (c0, c1, c2) SELECT (random() * 10000)::int, (random() *
1000)::numeric::money, (random() + i)::real FROM generate_series(1, 10000)
i;
ANALYZE t1, t3, t5;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT '178.229.172.255'::inet FROM
t1*, t5, ONLY t3 WHERE (('24186777'::text COLLATE "pg_c_utf8")!~'8E'::text)
IN (t1.c1) ORDER BY t1.c1;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1844.84..1894.84 rows=20000 width=33) (actual time=8.188..9.059
rows=20000.00 loops=1)
   Sort Key: t1.c1
   Sort Method: quicksort  Memory: 1237kB
   Buffers: shared hit=1, local hit=65
   ->  Nested Loop  (cost=0.00..416.06 rows=20000 width=33) (actual
time=0.021..5.233 rows=20000.00 loops=1)
         Buffers: shared hit=1, local hit=65
         ->  Seq Scan on t5  (cost=0.00..164.00 rows=10000 width=0) (actual
time=0.010..0.668 rows=10000.00 loops=1)
               Buffers: local hit=64
         ->  Materialize  (cost=0.00..2.07 rows=2 width=1) (actual
time=0.000..0.000 rows=2.00 loops=10000)
               Storage: Memory  Maximum Storage: 17kB
               Buffers: shared hit=1, local hit=1
               ->  Nested Loop  (cost=0.00..2.06 rows=2 width=1) (actual
time=0.006..0.008 rows=2.00 loops=1)
                     Buffers: shared hit=1, local hit=1
                     ->  Seq Scan on t1  (cost=0.00..1.02 rows=1 width=1)
(actual time=0.003..0.003 rows=1.00 loops=1)
                           Filter: c1
                           Rows Removed by Filter: 1
                           Buffers: local hit=1
                     ->  Seq Scan on t3  (cost=0.00..1.02 rows=2 width=0)
(actual time=0.003..0.003 rows=2.00 loops=1)
                           Buffers: shared hit=1
 Planning:
   Buffers: shared hit=48, local hit=1
 Planning Time: 0.315 ms
 Execution Time: 10.281 ms
(23 rows)


PREPARE prepare_query (inet, text, text) AS SELECT ALL $1 FROM t1*, t5*,
ONLY t3 WHERE (($2 COLLATE "pg_c_utf8")!~$3) IN (t1.c1) ORDER BY t1.c1;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) EXECUTE
prepare_query('178.229.172.255', '24186777', '8E');
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..416.08 rows=20000 width=33) (actual
time=0.012..4.668 rows=20000.00 loops=1)
   Buffers: shared hit=1, local hit=65
   ->  Seq Scan on t5  (cost=0.00..164.00 rows=10000 width=0) (actual
time=0.004..0.538 rows=10000.00 loops=1)
         Buffers: local hit=64
   ->  Materialize  (cost=0.00..2.08 rows=2 width=1) (actual
time=0.000..0.000 rows=2.00 loops=10000)
         Storage: Memory  Maximum Storage: 17kB
         Buffers: shared hit=1, local hit=1
         ->  Nested Loop  (cost=0.00..2.07 rows=2 width=1) (actual
time=0.006..0.008 rows=2.00 loops=1)
               Buffers: shared hit=1, local hit=1
               ->  Seq Scan on t1  (cost=0.00..1.03 rows=1 width=1) (actual
time=0.004..0.005 rows=1.00 loops=1)
                     Filter: ((($2)::text !~ $3) = c1)
                     Rows Removed by Filter: 1
                     Buffers: local hit=1
               ->  Seq Scan on t3  (cost=0.00..1.02 rows=2 width=0) (actual
time=0.001..0.002 rows=2.00 loops=1)
                     Buffers: shared hit=1
 Planning Time: 0.052 ms
 Execution Time: 5.531 ms
(17 rows)
```





Re: BUG #19386: Unnecessary Sort in query plan for SELECT literal with ORDER BY

От
Andrei Lepikhov
Дата:
On 21/1/26 09:26, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      19386
> Logged by:          Chi Zhang
> Email address:      798604270@qq.com
> PostgreSQL version: 18.1
> Operating system:   ubuntu 24.04 with docker
> Description:
> 
> Hi,
> 
> In the following test case, there are two equivalent queries. One is a
> normal SELECT, and the other is a prepared SELECT. In the query plan of the
> normal SELECT, there is an unnecessary Sort, which causes it to be slower
> than the prepared SELECT. In general, the prepared SELECT should be slower
> than the normal SELECT, as its query plan is suboptimal. So there maybe
> potential opportunities for further optimization in the query planning of
> normal SELECT statements.

These queries aren't equivalent for me. The generic case may produce 
errors if a parameter has an incompatible type. The 'simple query' case 
validates constants and may simplify the clause, being sure no logical 
errors happen during clause evaluation.
Another question - should we do anything to optimise this quite narrow 
(at least it seems so for me) case and stop simplification of the clause?

-- 
regards, Andrei Lepikhov,
pgEdge



Re: BUG #19386: Unnecessary Sort in query plan for SELECT literal with ORDER BY

От
Pierre Forstmann
Дата:

It's only a detail but I don't understand why '*' is added to some table name if there are no table inheritance ?

https://www.postgresql.org/docs/18/sql-select.html  says

table_name

The name (optionally schema-qualified) of an existing table or view. If ONLY is specified before the table name, only that table is scanned. If ONLY is not specified, the table and all its descendant tables (if any) are scanned. Optionally, * can be specified after the table name to explicitly indicate that descendant tables are included. Le 21/01/2026 à 12:11, Andrei Lepikhov a écrit :

On 21/1/26 09:26, PG Bug reporting form wrote:
The following bug has been logged on the website:

Bug reference:      19386
Logged by:          Chi Zhang
Email address:      798604270@qq.com
PostgreSQL version: 18.1
Operating system:   ubuntu 24.04 with docker
Description:

Hi,

In the following test case, there are two equivalent queries. One is a
normal SELECT, and the other is a prepared SELECT. In the query plan of the
normal SELECT, there is an unnecessary Sort, which causes it to be slower
than the prepared SELECT. In general, the prepared SELECT should be slower
than the normal SELECT, as its query plan is suboptimal. So there maybe
potential opportunities for further optimization in the query planning of
normal SELECT statements.

These queries aren't equivalent for me. The generic case may produce errors if a parameter has an incompatible type. The 'simple query' case validates constants and may simplify the clause, being sure no logical errors happen during clause evaluation.
Another question - should we do anything to optimise this quite narrow (at least it seems so for me) case and stop simplification of the clause?

Andrei Lepikhov <lepihov@gmail.com> writes:
> Another question - should we do anything to optimise this quite narrow 
> (at least it seems so for me) case and stop simplification of the clause?

I don't think so.  Shorn of irrelevancies, what we have here is

regression=# create table foo (b bool);
CREATE TABLE
regression=# explain select * from foo where b = true order by b;
                         QUERY PLAN                          
-------------------------------------------------------------
 Sort  (cost=111.56..115.07 rows=1405 width=1)
   Sort Key: b
   ->  Seq Scan on foo  (cost=0.00..38.10 rows=1405 width=1)
         Filter: b

The planner can omit the sort step if it sees that the sort key has
been equated to a constant:

regression=# create table bar (i int);
CREATE TABLE
regression=# explain select * from bar where i = 42 order by i;
                     QUERY PLAN                      
-----------------------------------------------------
 Seq Scan on bar  (cost=0.00..41.88 rows=13 width=4)
   Filter: (i = 42)

However, in the case of a bool column we go out of our way to
convert "b = true" and similar cases to "b" or "NOT b" so that
we won't be fooled by variant phrasings of equivalent conditions.
That doesn't happen in the OP's generic-plan case because the
original is "b = parameter".  The planner knows the parameter
expression will be effectively constant in any one run, so it
elides the Sort even though it doesn't know the parameter's
value.  But it doesn't see plain "b" as a phrasing of "b = true".

We could, I imagine, allow the EquivalenceClass machinery to treat
"WHERE b" as a shorthand for "b = true", but that would lead to
cluttering the mechanism with a ton of almost-always-useless
EquivalenceClasses, because it's hard to see how such a decision
doesn't result in every random WHERE qual producing an
EquivalenceClass.  I think that would slow down planning of most
queries by more than such an optimization is worth.

            regards, tom lane