Wrapping a select in another select makes it slower

Поиск
Список
Период
Сортировка
От	Peter J. Holzer
Тема	Wrapping a select in another select makes it slower
Дата	22 июля 18:17:35
Msg-id	20250722151735.toridqqnb3krhimp@hjp.at обсуждение исходный текст
Ответы	Re: Wrapping a select in another select makes it slower
Список	pgsql-general
Дерево обсуждения
PostgreSQL version 17.5 on Ubuntu 24.04 (PGDG build).

Context: We use CheckMK to monitor our systems. This also reports
various health indicators for Postgres. Recently a colleague noticed
that the indicators for one database were missing, presumable since
upgrading to Ubuntu 24.04 and Postgres 17. Closer investigation showed
that the plugin was still running but took a long time (almost 4
minutes) to return a result (which caused a timeout).

So I checked which of the many queries was slow and found the culprit.
I'm not including the whole query here (it's a mess of of many nested
subqueries) only the interesting part.

This query returns a result quickly (less than 0.5 seconds):

SELECT
    ns.nspname, tbl.relname, hdr, ma, bs,
    SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS datawidth,
    MAX(coalesce(null_frac,0)) AS maxfracsum,
    hdr+(
        SELECT 1+count(*)/8
        FROM pg_stats s2
        WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename = tbl.relname
    ) AS nullhdr
FROM pg_attribute att
    JOIN pg_class tbl ON att.attrelid = tbl.oid
    JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
    LEFT JOIN pg_stats s ON s.schemaname=ns.nspname AND s.tablename = tbl.relname AND s.inherited=false AND
s.attname=att.attname,
    (
        SELECT
            ( SELECT current_setting('block_size')::numeric) AS bs,
            CASE WHEN SUBSTRING(SPLIT_PART(v, ' ', 2) FROM '#\[0-9]+.[0-9]+#\%' for '#') IN ('8.0','8.1','8.2') THEN 27
ELSE23 END AS hdr, 
            CASE WHEN v ~ 'mingw32' OR v ~ '64-bit' THEN 8 ELSE 4 END AS ma
        FROM (
            SELECT version() AS v
        ) AS foo
    ) AS constants
WHERE att.attnum > 0 AND tbl.relkind='r'
GROUP BY 1,2,3,4,5
;

This is wrapped in a very simple select:

SELECT ma,bs,foo.nspname,foo.relname,
    (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
    (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
    -- the query above
) AS foo;

And now it takes 4 minutes.

Rewriting it as a normal CTE doesn't change that:

with foo as (
    -- the query above
)
SELECT ... FROM foo;

But a materialized CTE

with foo as materialized (
    -- the query above
)
SELECT ... FROM foo;

is fast again.

Here are the execution plans of the first two queries (from explain (analyze)):

GroupAggregate  (cost=7048.46..142133.79 rows=2500 width=188) (actual time=405.135..550.535 rows=625 loops=1)
  Group Key: tbl.relname, ns.nspname
  InitPlan 1
    ->  Result  (cost=0.00..0.02 rows=1 width=32) (actual time=0.017..0.017 rows=1 loops=1)
  ->  Merge Left Join  (cost=7048.44..7133.41 rows=8357 width=136) (actual time=404.807..434.347 rows=40460 loops=1)
        Merge Cond: ((tbl.relname = s.tablename) AND (ns.nspname = s.schemaname) AND (att.attname = s.attname))
        ->  Sort  (cost=2571.61..2592.50 rows=8357 width=192) (actual time=111.335..120.020 rows=40460 loops=1)
              Sort Key: tbl.relname, ns.nspname, att.attname
              Sort Method: external merge  Disk: 8008kB
              ->  Hash Join  (cost=137.38..2027.20 rows=8357 width=192) (actual time=57.024..75.192 rows=40460 loops=1)
                    Hash Cond: (tbl.relnamespace = ns.oid)
                    ->  Hash Join  (cost=136.29..1980.93 rows=8357 width=132) (actual time=0.617..13.882 rows=40460
loops=1)
                          Hash Cond: (att.attrelid = tbl.oid)
                          ->  Seq Scan on pg_attribute att  (cost=0.00..1719.62 rows=47572 width=68) (actual
time=0.026..6.804rows=47512 loops=1) 
                                Filter: (attnum > 0)
                                Rows Removed by Filter: 10218
                          ->  Hash  (cost=128.47..128.47 rows=625 width=72) (actual time=0.564..0.565 rows=625 loops=1)
                                Buckets: 1024  Batches: 1  Memory Usage: 72kB
                                ->  Seq Scan on pg_class tbl  (cost=0.00..128.47 rows=625 width=72) (actual
time=0.016..0.485rows=625 loops=1) 
                                      Filter: (relkind = 'r'::"char")
                                      Rows Removed by Filter: 2933
                    ->  Hash  (cost=1.04..1.04 rows=4 width=68) (actual time=56.360..56.360 rows=4 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on pg_namespace ns  (cost=0.00..1.04 rows=4 width=68) (actual
time=56.319..56.325rows=4 loops=1) 
        ->  Sort  (cost=4476.84..4477.18 rows=139 width=200) (actual time=293.367..300.227 rows=30921 loops=1)
              Sort Key: s.tablename, s.schemaname, s.attname
              Sort Method: external sort  Disk: 6480kB
              ->  Subquery Scan on s  (cost=2189.59..4471.89 rows=139 width=200) (actual time=207.797..253.934
rows=30921loops=1) 
                    ->  Hash Left Join  (cost=2189.59..4470.50 rows=139 width=469) (actual time=207.791..251.676
rows=30921loops=1) 
                          Hash Cond: (c.relnamespace = n.oid)
                          ->  Hash Join  (cost=2188.50..4468.65 rows=139 width=140) (actual time=207.723..246.208
rows=30921loops=1) 
                                Hash Cond: ((s_1.starelid = c.oid) AND (s_1.staattnum = a.attnum))
                                ->  Seq Scan on pg_statistic s_1  (cost=0.00..2008.21 rows=15460 width=14) (actual
time=0.024..6.845rows=30921 loops=1) 
                                      Filter: (NOT stainherit)
                                ->  Hash  (cost=1899.86..1899.86 rows=19243 width=142) (actual time=207.320..207.323
rows=57730loops=1) 
                                      Buckets: 65536 (originally 32768)  Batches: 2 (originally 1)  Memory Usage:
7681kB
                                      ->  Hash Join  (cost=172.95..1899.86 rows=19243 width=142) (actual
time=1.129..176.772rows=57730 loops=1) 
                                            Hash Cond: (a.attrelid = c.oid)
                                            Join Filter: has_column_privilege(c.oid, a.attnum, 'select'::text)
                                            ->  Seq Scan on pg_attribute a  (cost=0.00..1575.30 rows=57730 width=70)
(actualtime=0.013..9.457 rows=57730 loops=1) 
                                                  Filter: (NOT attisdropped)
                                            ->  Hash  (cost=128.47..128.47 rows=3558 width=72) (actual
time=0.993..0.994rows=3558 loops=1) 
                                                  Buckets: 4096  Batches: 1  Memory Usage: 394kB
                                                  ->  Seq Scan on pg_class c  (cost=0.00..128.47 rows=3558 width=72)
(actualtime=0.021..0.558 rows=3558 loops=1) 
                                                        Filter: ((NOT relrowsecurity) OR (NOT
row_security_active(oid)))
                          ->  Hash  (cost=1.04..1.04 rows=4 width=68) (actual time=0.023..0.024 rows=4 loops=1)
                                Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                ->  Seq Scan on pg_namespace n  (cost=0.00..1.04 rows=4 width=68) (actual
time=0.015..0.017rows=4 loops=1) 
  SubPlan 2
    ->  Aggregate  (cost=53.88..53.90 rows=1 width=8) (actual time=0.174..0.175 rows=1 loops=625)
          ->  Nested Loop  (cost=0.86..53.87 rows=1 width=469) (actual time=0.058..0.172 rows=39 loops=625)
                Join Filter: (s_2.starelid = c_1.oid)
                ->  Nested Loop  (cost=0.57..53.42 rows=1 width=10) (actual time=0.011..0.076 rows=71 loops=625)
                      ->  Nested Loop  (cost=0.28..9.36 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=625)
                            Join Filter: (c_1.relnamespace = n_1.oid)
                            ->  Index Scan using pg_class_relname_nsp_index on pg_class c_1  (cost=0.28..8.30 rows=1
width=8)(actual time=0.002..0.002 rows=1 loops=625) 
                                  Index Cond: (relname = tbl.relname)
                                  Filter: ((NOT relrowsecurity) OR (NOT row_security_active(oid)))
                            ->  Seq Scan on pg_namespace n_1  (cost=0.00..1.05 rows=1 width=4) (actual
time=0.001..0.001rows=1 loops=625) 
                                  Filter: (nspname = ns.nspname)
                                  Rows Removed by Filter: 3
                      ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a_1  (cost=0.29..43.98
rows=7width=6) (actual time=0.005..0.064 rows=71 loops=625) 
                            Index Cond: (attrelid = c_1.oid)
                            Filter: ((NOT attisdropped) AND has_column_privilege(c_1.oid, attnum, 'select'::text))
                ->  Index Scan using pg_statistic_relid_att_inh_index on pg_statistic s_2  (cost=0.29..0.44 rows=1
width=6)(actual time=0.001..0.001 rows=1 loops=44210) 
                      Index Cond: ((starelid = a_1.attrelid) AND (staattnum = a_1.attnum))
                      Filter: (stanullfrac <> '0'::double precision)
                      Rows Removed by Filter: 0
Planning Time: 9.752 ms
JIT:
  Functions: 115
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 6.386 ms (Deform 3.235 ms), Inlining 0.000 ms, Optimization 2.815 ms, Emission 53.702 ms, Total
62.903ms 
Execution Time: 592.748 ms



Subquery Scan on foo  (cost=5370.24..142883.56 rows=2500 width=204) (actual time=256.040..292421.584 rows=625 loops=1)
  ->  GroupAggregate  (cost=5370.24..142746.06 rows=2500 width=188) (actual time=256.022..292420.807 rows=625 loops=1)
        Group Key: ns.nspname, tbl.relname
        InitPlan 1
          ->  Result  (cost=0.00..0.02 rows=1 width=32) (actual time=0.014..0.015 rows=1 loops=1)
        ->  Incremental Sort  (cost=5370.23..7745.68 rows=8357 width=136) (actual time=255.739..292322.422 rows=40460
loops=1)
              Sort Key: ns.nspname, tbl.relname
              Presorted Key: ns.nspname
              Full-sort Groups: 2  Sort Method: quicksort  Average Memory: 34kB  Peak Memory: 34kB
              Pre-sorted Groups: 3  Sort Methods: quicksort, external merge  Average Memory: 38kB  Peak Memory: 88kB
AverageDisk: 1874kB  Peak Disk: 5624kB 
              ->  Merge Left Join  (cost=4613.25..7180.30 rows=8357 width=136) (actual time=222.037..292242.701
rows=40460loops=1) 
                    Merge Cond: (ns.nspname = s.schemaname)
                    Join Filter: ((s.tablename = tbl.relname) AND (s.attname = att.attname))
                    Rows Removed by Join Filter: 1215045276
                    ->  Nested Loop  (cost=136.42..2515.44 rows=8357 width=192) (actual time=65.105..188.427 rows=40460
loops=1)
                          Join Filter: (tbl.relnamespace = ns.oid)
                          Rows Removed by Join Filter: 121380
                          ->  Index Scan using pg_namespace_nspname_index on pg_namespace ns  (cost=0.13..12.19 rows=4
width=68)(actual time=0.021..0.039 rows=4 loops=1) 
                          ->  Materialize  (cost=136.29..2022.72 rows=8357 width=132) (actual time=16.114..36.529
rows=40460loops=4) 
                                ->  Hash Join  (cost=136.29..1980.93 rows=8357 width=132) (actual time=64.422..76.465
rows=40460loops=1) 
                                      Hash Cond: (att.attrelid = tbl.oid)
                                      ->  Seq Scan on pg_attribute att  (cost=0.00..1719.62 rows=47572 width=68)
(actualtime=63.875..69.458 rows=47512 loops=1) 
                                            Filter: (attnum > 0)
                                            Rows Removed by Filter: 10218
                                      ->  Hash  (cost=128.47..128.47 rows=625 width=72) (actual time=0.490..0.491
rows=625loops=1) 
                                            Buckets: 1024  Batches: 1  Memory Usage: 72kB
                                            ->  Seq Scan on pg_class tbl  (cost=0.00..128.47 rows=625 width=72) (actual
time=0.038..0.410rows=625 loops=1) 
                                                  Filter: (relkind = 'r'::"char")
                                                  Rows Removed by Filter: 2933
                    ->  Sort  (cost=4476.84..4477.18 rows=139 width=200) (actual time=156.864..178628.030
rows=1215036357loops=1) 
                          Sort Key: s.schemaname
                          Sort Method: external sort  Disk: 6480kB
                          ->  Subquery Scan on s  (cost=2189.59..4471.89 rows=139 width=200) (actual
time=81.840..131.044rows=30921 loops=1) 
                                ->  Hash Left Join  (cost=2189.59..4470.50 rows=139 width=469) (actual
time=81.835..128.788rows=30921 loops=1) 
                                      Hash Cond: (c.relnamespace = n.oid)
                                      ->  Hash Join  (cost=2188.50..4468.65 rows=139 width=140) (actual
time=81.787..123.140rows=30921 loops=1) 
                                            Hash Cond: ((s_1.starelid = c.oid) AND (s_1.staattnum = a.attnum))
                                            ->  Seq Scan on pg_statistic s_1  (cost=0.00..2008.21 rows=15460 width=14)
(actualtime=0.014..5.069 rows=30921 loops=1) 
                                                  Filter: (NOT stainherit)
                                            ->  Hash  (cost=1899.86..1899.86 rows=19243 width=142) (actual
time=81.660..81.661rows=57730 loops=1) 
                                                  Buckets: 65536 (originally 32768)  Batches: 2 (originally 1)  Memory
Usage:7681kB 
                                                  ->  Hash Join  (cost=172.95..1899.86 rows=19243 width=142) (actual
time=1.119..59.096rows=57730 loops=1) 
                                                        Hash Cond: (a.attrelid = c.oid)
                                                        Join Filter: has_column_privilege(c.oid, a.attnum,
'select'::text)
                                                        ->  Seq Scan on pg_attribute a  (cost=0.00..1575.30 rows=57730
width=70)(actual time=0.010..7.991 rows=57730 loops=1) 
                                                              Filter: (NOT attisdropped)
                                                        ->  Hash  (cost=128.47..128.47 rows=3558 width=72) (actual
time=1.062..1.063rows=3558 loops=1) 
                                                              Buckets: 4096  Batches: 1  Memory Usage: 394kB
                                                              ->  Seq Scan on pg_class c  (cost=0.00..128.47 rows=3558
width=72)(actual time=0.065..0.636 rows=3558 loops=1) 
                                                                    Filter: ((NOT relrowsecurity) OR (NOT
row_security_active(oid)))
                                      ->  Hash  (cost=1.04..1.04 rows=4 width=68) (actual time=0.025..0.026 rows=4
loops=1)
                                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                            ->  Seq Scan on pg_namespace n  (cost=0.00..1.04 rows=4 width=68) (actual
time=0.016..0.018rows=4 loops=1) 
        SubPlan 2
          ->  Aggregate  (cost=53.88..53.90 rows=1 width=8) (actual time=0.146..0.146 rows=1 loops=625)
                ->  Nested Loop  (cost=0.86..53.87 rows=1 width=469) (actual time=0.048..0.144 rows=39 loops=625)
                      Join Filter: (s_2.starelid = c_1.oid)
                      ->  Nested Loop  (cost=0.57..53.42 rows=1 width=10) (actual time=0.009..0.060 rows=71 loops=625)
                            ->  Nested Loop  (cost=0.28..9.36 rows=1 width=4) (actual time=0.005..0.005 rows=1
loops=625)
                                  Join Filter: (c_1.relnamespace = n_1.oid)
                                  ->  Index Scan using pg_class_relname_nsp_index on pg_class c_1  (cost=0.28..8.30
rows=1width=8) (actual time=0.002..0.002 rows=1 loops=625) 
                                        Index Cond: (relname = tbl.relname)
                                        Filter: ((NOT relrowsecurity) OR (NOT row_security_active(oid)))
                                  ->  Seq Scan on pg_namespace n_1  (cost=0.00..1.05 rows=1 width=4) (actual
time=0.001..0.001rows=1 loops=625) 
                                        Filter: (nspname = ns.nspname)
                                        Rows Removed by Filter: 3
                            ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a_1  (cost=0.29..43.98
rows=7width=6) (actual time=0.004..0.050 rows=71 loops=625) 
                                  Index Cond: (attrelid = c_1.oid)
                                  Filter: ((NOT attisdropped) AND has_column_privilege(c_1.oid, attnum,
'select'::text))
                      ->  Index Scan using pg_statistic_relid_att_inh_index on pg_statistic s_2  (cost=0.29..0.44
rows=1width=6) (actual time=0.001..0.001 rows=1 loops=44210) 
                            Index Cond: ((starelid = a_1.attrelid) AND (staattnum = a_1.attnum))
                            Filter: (stanullfrac <> '0'::double precision)
                            Rows Removed by Filter: 0
Planning Time: 5.314 ms
JIT:
  Functions: 110
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 11.353 ms (Deform 5.059 ms), Inlining 0.000 ms, Optimization 5.227 ms, Emission 58.982 ms, Total
75.563ms 
Execution Time: 292436.103 ms


What is happening here?

The plans are clearly different, with the slow one claiming to sort 1.2
billion rows (but using only 6.5MB of disk space) despite the node below
it only returning 30921 rows. 1215036357 is almost but not quite
30921*40460, but that nested loop is a sibling of the sort, so it
shouldn't be included, right?

And why are the plans different at all? Computing a few extra values per
row shouldn't change the cost of the query delivering the rows, IMHO.
But then the costs are very similar, so maybe it's just some random
variation.

Some estimates are quite a bit off, even on the tables. I did ANALYZE
the whole database (and then the tables, again) during my tests.

        hjp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
Вложения

signature.asc
В списке pgsql-general по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Wrapping a select in another select makes it slower

Вложения