Re: pgsql: Avoid mislabeling of lateral references when pulling up a subque

Поиск

Список

Период

Сортировка

От	Andrei Lepikhov
Тема	Re: pgsql: Avoid mislabeling of lateral references when pulling up a subque
Дата	3 декабря 2024 г. 12:15:20
Msg-id	7d8175ca-82bb-4878-8b25-7edb124fac21@gmail.com обсуждение исходный текст
Ответ на	pgsql: Avoid mislabeling of lateral references when pulling up a subque (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: pgsql: Avoid mislabeling of lateral references when pulling up a subque
Список	pgsql-committers

Дерево обсуждения

On 11/29/24 05:33, Tom Lane wrote:
> Avoid mislabeling of lateral references when pulling up a subquery.
> 
> If we are pulling up a subquery that's under an outer join, and
> the subquery's target list contains a strict expression that uses
> both a subquery variable and a lateral-reference variable, it's okay
> to pull up the expression without wrapping it in a PlaceHolderVar.
> That's safe because if the subquery variable is forced to NULL
> by the outer join, the expression result will come out as NULL too,
> so we don't have to force that outcome by evaluating the expression
> below the outer join.  It'd be correct to wrap in a PHV, but that can
> lead to very significantly worse plans, since we'd then have to use
> a nestloop plan to pass down the lateral reference to where the
> expression will be evaluated.
Pardon the noise, but I'm curious why the optimiser must choose NestLoop 
in the case of lateral reference.

It would be nice to provide alternatives. Because now we have some 
corner cases. For example, with pull-up correlated subqueries, we've got 
one degraded case. Look the following:

DROP TABLE IF EXISTS t1,t2;
CREATE TABLE t1(x int, y int);
CREATE TABLE t2(x int, y int);
INSERT INTO t1 (x,y)
   SELECT gs,-gs FROM generate_series(1,1E4) AS gs;
ANALYZE t1,t2;

EXPLAIN (ANALYZE, COSTS ON)
SELECT t1.* FROM t1 LEFT JOIN LATERAL (
   SELECT t3.* FROM t1 t3 WHERE t3.x=t1.x) AS t4
ON (t4.y IN (SELECT y FROM t2 WHERE t4.x=t2.x));

In previous versions Postgres executed this plan in milliseconds:

  Hash Left Join
    Hash Cond: (t1.x = t3.x)
    ->  Seq Scan on t1
    ->  Hash
          ->  Seq Scan on t1 t3
                Filter: (ANY (y = (SubPlan 1).col1))
                SubPlan 1
                  ->  Seq Scan on t2
                        Filter: (t3.x = x)
  Planning Time: 0.175 ms
  Execution Time: 6.396 ms

But now we have seconds:

  Nested Loop Left Join
    ->  Seq Scan on t1
    ->  Nested Loop Semi Join
          Join Filter: ((t3.x = t2.x) AND (t3.y = t2.y))
          ->  Seq Scan on t1 t3
                Filter: (x = t1.x)
          ->  Seq Scan on t2
  Planning Time: 1.309 ms
  Execution Time: 6780.217 ms

Correlated subquery pull-up is a nice optimisation, of course. So, why 
not let optimiser try a HashJoin like that (not a really generated 
plan, just my imagination):

  Hash Left Join
    Hash Cond: (t1.x = t3.x)
    ->  Seq Scan on t1
    Hash
    ->  Nested Loop Semi Join
          Join Filter: ((t3.x = t2.x) AND (t3.y = t2.y))
          ->  Seq Scan on t1 t3
          ->  Seq Scan on t2

Does the optimiser have some internal limits to let such a path?

-- 
regards, Andrei Lepikhov

В списке pgsql-committers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: pgsql: Avoid mislabeling of lateral references when pulling up a subque