Re: pgsql: Avoid mislabeling of lateral references when pulling up a subque
От | Andrei Lepikhov |
---|---|
Тема | Re: pgsql: Avoid mislabeling of lateral references when pulling up a subque |
Дата | |
Msg-id | 7d8175ca-82bb-4878-8b25-7edb124fac21@gmail.com обсуждение исходный текст |
Ответ на | pgsql: Avoid mislabeling of lateral references when pulling up a subque (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: pgsql: Avoid mislabeling of lateral references when pulling up a subque
|
Список | pgsql-committers |
On 11/29/24 05:33, Tom Lane wrote: > Avoid mislabeling of lateral references when pulling up a subquery. > > If we are pulling up a subquery that's under an outer join, and > the subquery's target list contains a strict expression that uses > both a subquery variable and a lateral-reference variable, it's okay > to pull up the expression without wrapping it in a PlaceHolderVar. > That's safe because if the subquery variable is forced to NULL > by the outer join, the expression result will come out as NULL too, > so we don't have to force that outcome by evaluating the expression > below the outer join. It'd be correct to wrap in a PHV, but that can > lead to very significantly worse plans, since we'd then have to use > a nestloop plan to pass down the lateral reference to where the > expression will be evaluated. Pardon the noise, but I'm curious why the optimiser must choose NestLoop in the case of lateral reference. It would be nice to provide alternatives. Because now we have some corner cases. For example, with pull-up correlated subqueries, we've got one degraded case. Look the following: DROP TABLE IF EXISTS t1,t2; CREATE TABLE t1(x int, y int); CREATE TABLE t2(x int, y int); INSERT INTO t1 (x,y) SELECT gs,-gs FROM generate_series(1,1E4) AS gs; ANALYZE t1,t2; EXPLAIN (ANALYZE, COSTS ON) SELECT t1.* FROM t1 LEFT JOIN LATERAL ( SELECT t3.* FROM t1 t3 WHERE t3.x=t1.x) AS t4 ON (t4.y IN (SELECT y FROM t2 WHERE t4.x=t2.x)); In previous versions Postgres executed this plan in milliseconds: Hash Left Join Hash Cond: (t1.x = t3.x) -> Seq Scan on t1 -> Hash -> Seq Scan on t1 t3 Filter: (ANY (y = (SubPlan 1).col1)) SubPlan 1 -> Seq Scan on t2 Filter: (t3.x = x) Planning Time: 0.175 ms Execution Time: 6.396 ms But now we have seconds: Nested Loop Left Join -> Seq Scan on t1 -> Nested Loop Semi Join Join Filter: ((t3.x = t2.x) AND (t3.y = t2.y)) -> Seq Scan on t1 t3 Filter: (x = t1.x) -> Seq Scan on t2 Planning Time: 1.309 ms Execution Time: 6780.217 ms Correlated subquery pull-up is a nice optimisation, of course. So, why not let optimiser try a HashJoin like that (not a really generated plan, just my imagination): Hash Left Join Hash Cond: (t1.x = t3.x) -> Seq Scan on t1 Hash -> Nested Loop Semi Join Join Filter: ((t3.x = t2.x) AND (t3.y = t2.y)) -> Seq Scan on t1 t3 -> Seq Scan on t2 Does the optimiser have some internal limits to let such a path? -- regards, Andrei Lepikhov
В списке pgsql-committers по дате отправления: