Обсуждение: Question about nested loops..

Поиск

Список

Период

Сортировка

Question about nested loops..

От

Frits Jalvingh

Дата:

09 октября 2025 г., 11:52:12

Dear list,

We have databases with rather large tables, and we do bulk queries on them, i.e. those queries always have a large result set.
We execute those queries with enable_nestloop set to false, because having nested loops in those queries always cause large performance issues. This usually leads to perfectly acceptable execution plans which use table scans, hash joins and merge joins. We have work_mem set to 64MB to 512MB (depending on the actual database size) to help with these.

I have a query that does not want to work at all without a nested loop and a materialize, while a trivial (but for me strange) change will allow it to make a good plan, and I'm wondering why that is.
The below examples are stripped versions of the real queries (which have a lot more filter conditions) but they exhibit the issue. I tested them on both Postgres 16 and 18, on Ubuntu 25.04, with the same results.

The slow query is this one:

select
eenheid.id_s as id_s
, eenheid.identificatie as id
, exists(
select
1
from datavault.l_cluster_eenheid_ssm
inner join datavault.s_h_huurovereenkomst_ssm huurovereenkomst_s
on eenheid.id_h_eenheid = huurovereenkomst_s._l_eenheid
where eenheid.id_h_eenheid = l_cluster_eenheid_ssm.id_h_eenheid
and l_cluster_eenheid_ssm.dv_start_dts <= timestamp '2025-10-08 00:00:00'
and l_cluster_eenheid_ssm.dv_end_dts > timestamp '2025-10-08 00:00:00'
) as qd001_e
from datavault.s_h_eenheid_ssm eenheid

It produces the following execution plan:
Seq Scan on s_h_eenheid_ssm eenheid (cost=0.00..172110004562434.50 rows=17211 width=21)
SubPlan 1
-> Nested Loop (cost=10000000000.00..10000007950.48 rows=30 width=0)
-> Seq Scan on l_cluster_eenheid_ssm (cost=0.00..6768.46 rows=15 width=0)
Filter: ((dv_start_dts <= '2025-10-08 00:00:00'::timestamp without time zone) AND (dv_end_dts > '2025-10-08 00:00:00'::timestamp without time zone) AND (id_h_eenheid = eenheid.id_h_eenheid))
-> Materialize (cost=0.00..1181.65 rows=2 width=0)
-> Seq Scan on s_h_huurovereenkomst_ssm huurovereenkomst_s (cost=0.00..1181.64 rows=2 width=0)
Filter: (_l_eenheid = eenheid.id_h_eenheid)

I can make the query very fast by adding a self-join to the root table (s_h_eenheid_ssm) inside the subquery:

select
eenheid.id_s as id_s
, eenheid.identificatie as id
, exists(
select
1
from datavault.s_h_eenheid_ssm eenheid_s
inner join datavault.l_cluster_eenheid_ssm l_cluster_eenheid_ssm
on eenheid_s.id_h_eenheid = l_cluster_eenheid_ssm.id_h_eenheid
inner join datavault.s_h_huurovereenkomst_ssm huurovereenkomst_s
on eenheid_s.id_h_eenheid = huurovereenkomst_s._l_eenheid
where eenheid.id_h_eenheid = eenheid_s.id_h_eenheid
and l_cluster_eenheid_ssm.dv_start_dts <= timestamp '2025-10-08 00:00:00'
and l_cluster_eenheid_ssm.dv_end_dts > timestamp '2025-10-08 00:00:00'
) as DQ87_e
from datavault.s_h_eenheid_ssm eenheid

which produces a way better plan:

QUERY PLAN
Seq Scan on s_h_eenheid_ssm eenheid (cost=0.00..344220005300951.31 rows=17211 width=21)
SubPlan 2
-> Hash Join (cost=3244.40..14039.76 rows=371258 width=4)
Hash Cond: (l_cluster_eenheid_ssm.id_h_eenheid = eenheid_s.id_h_eenheid)
-> Seq Scan on l_cluster_eenheid_ssm (cost=0.00..6139.82 rows=251455 width=4)
Filter: ((dv_start_dts <= '2025-10-08 00:00:00'::timestamp without time zone) AND (dv_end_dts > '2025-10-08 00:00:00'::timestamp without time zone))
-> Hash (cost=2926.76..2926.76 rows=25411 width=8)
-> Hash Join (cost=1459.25..2926.76 rows=25411 width=8)
Hash Cond: (huurovereenkomst_s._l_eenheid = eenheid_s.id_h_eenheid)
-> Seq Scan on s_h_huurovereenkomst_ssm huurovereenkomst_s (cost=0.00..1118.11 rows=25411 width=4)
-> Hash (cost=1244.11..1244.11 rows=17211 width=4)
-> Seq Scan on s_h_eenheid_ssm eenheid_s (cost=0.00..1244.11 rows=17211 width=4)

The difference in execution speed is large; the slow query takes about 30 seconds on a test system while the fast one is under a second.

I do not understand why the simpler query (without the self join) produces a plan that seems to require nested loops, I hope someone can explain.

Re: Question about nested loops..

От

Andrei Lepikhov

Дата:

09 октября 2025 г., 13:00:29

On 9/10/2025 10:52, Frits Jalvingh wrote:
> I do not understand why the simpler query (without the self join) 
> produces a plan that seems to require nested loops, I hope someone can 
> explain.
It seems obvious. You have a join clause:
'enheid.id_h_eenheid = huurovereenkomst_s._l_eenheid'

One side of this clause fits the underlying relation, but another one 
does not (references the external relation).
According to the HashJoin rules, the hash join clause is quite strict: 
each side should depend on only the left or right side of the join. So, 
it is just impossible here.
What can be done here without rewriting the query? This subject requires 
investigation to determine if the outer part of the hash clause can be 
parameterised or not.

-- regards, Andrei Lepikhov
pgEdge

Re: Question about nested loops..

От

Andrei Lepikhov

Дата:

10 октября 2025 г., 12:28:44

On 10/10/2025 10:51, Frits Jalvingh wrote:
> Hi Andrei,
> Thanks a lot for your response. I do not fully get it though because the 
> well-performing query also references the outer (external?) relation:
> -> where eenheid.id_h_eenheid = eenheid_s.id_h_eenheid
> But your response made me think a but more.This is the schema of EXPLAIN for the second 'optimal' query:

Seq Scan on s_h_eenheid_ssm D
   SubPlan 2
     ->  Hash Join
           Hash Cond: (A.x = B.x)
           ->  Seq Scan on A
                 Filter: ...
           ->  Hash
                 ->  Hash Join
                       Hash Cond: (C.y = B.x)
                       ->  Seq Scan on s_h_huurovereenkomst_ssm C
                       ->  Hash
                             ->  Seq Scan on s_h_eenheid_ssm B

There is no reference to the upper relation D - each HashJoin refers to 
its left and right side only.

-- regards, Andrei Lepikhov, pgEdge

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Question about nested loops..

Question about nested loops..

Re: Question about nested loops..

Re: Question about nested loops..