Обсуждение: Failure to reordering in case of a lateral join in combination with aleft join (not inner join) resulting in suboptimal nested loop plan

Поиск
Список
Период
Сортировка
Hi,

The queries in what follows can be executed on the following fiddle: https://dbfiddle.uk/?rdbms=postgres_10&fiddle=64542f2d987d3ce0d85bbc40ddadf7d6 - Please note that the queries/functions might look silly/pointless, I extracted the performance issue I am seeing to a minimal reproducible example.

I have the following schema:

create table parent
(
    id           int    primary key
);
create table child
(
    id           int    primary key,
    parent_id    int    references parent(id)
);
create index on child(parent_id);

Let's start with the following inlinable setof-returning function which basically returns the child rows for a given parent identifier:

create function child_get_1(int) returns table(a int) as
$$
    select    child.id
    from      child
    left join parent
    on        parent.id = child.parent_id
    where     child.parent_id = $1;
$$
language sql stable;

Note that the left join branch is intentionally not used, and thus could be eliminated by the planner.

When executing the following query, I get a satisfying hash (left) join (and the left join to parent is indeed eliminated):

explain analyze
select ch.* from parent p, child_get_1(p.id) ch;

+--------------------------------------------------------------------------------------------------------------------+
| Hash Join  (cost=3.25..17163.23 rows=999900 width=4) (actual time=0.025..194.279 rows=999900 loops=1)              |
|   Hash Cond: (child.parent_id = p.id)                                                                              |
|   ->  Seq Scan on child  (cost=0.00..14424.00 rows=999900 width=8) (actual time=0.005..47.215 rows=999900 loops=1) |
|   ->  Hash  (cost=2.00..2.00 rows=100 width=4) (actual time=0.016..0.016 rows=100 loops=1)                         |
|         Buckets: 1024  Batches: 1  Memory Usage: 12kB                                                              |
|         ->  Seq Scan on parent p  (cost=0.00..2.00 rows=100 width=4) (actual time=0.001..0.007 rows=100 loops=1)   |
+--------------------------------------------------------------------------------------------------------------------+


Now, I introduce a convenience function, also inlinable, which fetches the child rows by its parent id:

create function t(int) returns setof child as
$$
    select child.* from child where child.parent_id = $1;
$$
language sql stable;

I refactor `child_get_1(int)` from above as following:

create function child_get_2(int) returns table(a int) as
$$
    select    child.id
    from      t($1) child
    left join parent
    on        parent.id = child.parent_id;
$$
language sql stable;

explain analyze
select ch.* from parent p, child_get_2(p.id) ch;

Now, I get a nested loop, which as expected performs quite badly:

+--------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=189.92..493990.48 rows=999900 width=4) (actual time=1.519..713.680 rows=999900 loops=1)                                 |
|   ->  Seq Scan on parent p  (cost=0.00..2.00 rows=100 width=4) (actual time=0.004..0.081 rows=100 loops=1)                                 |
|   ->  Bitmap Heap Scan on child  (cost=189.92..4739.90 rows=9999 width=4) (actual time=1.365..6.332 rows=9999 loops=100)                   |
|         Recheck Cond: (parent_id = p.id)                                                                                                   |
|         Heap Blocks: exact=442476                                                                                                          |
|         ->  Bitmap Index Scan on child_parent_id_idx  (cost=0.00..187.42 rows=9999 width=0) (actual time=0.838..0.838 rows=9999 loops=100) |
|               Index Cond: (parent_id = p.id)                                                                                               |
+--------------------------------------------------------------------------------------------------------------------------------------------+

For some reason I cannot explain we now end up with a nested loop, instead an hash join. The fairly trivial introduction of `t(int)` messes up with reordering, but I fail to see why. I manually upped the from and join collapse limit to 32 - just to be sure -, but no effect. Also, the left join branch could not be eliminated. I believe this is related to the usage of the implicit lateral join to `child_get_2(p.id)` in the main query, which somehow messes up with the reordering of `from t($1) as child` in `child_get_2(int)`, though I am not 100% sure.

Also, note that when we apply an inner join instead of a left join, the problem goes away. The planner now manages to end up with a hash join in both cases.

I am seeing this on v10 and v11.

Any ideas?

Thank you. Best regards.
Peter Billen <peter.billen@gmail.com> writes:
> For some reason I cannot explain we now end up with a nested loop, instead
> an hash join. The fairly trivial introduction of `t(int)` messes up with
> reordering, but I fail to see why.

I traced through this and determined that it's got nothing to do with
function inlining; you can reproduce the same plan with the functions
written out by hand:

explain
select ch.* from parent p,
lateral (    select    child.id
    from      
    ( select child.* from child where child.parent_id = p.id ) child
    left join parent
    on        parent.id = child.parent_id
 ) ch;

The problem here actually is that the planner refuses to flatten the
LATERAL subquery.  You don't see a SubqueryScan in the finished plan,
but that's just because it gets optimized away at the end.  Because
of the lack of flattening, we don't get a terribly good plan
for the outermost join.

The reason for the flattening failure is some probably-overly-conservative
analysis in is_simple_subquery and jointree_contains_lateral_outer_refs:

        /*
         * The subquery's WHERE and JOIN/ON quals mustn't contain any lateral
         * references to rels outside a higher outer join (including the case
         * where the outer join is within the subquery itself).  In such a
         * case, pulling up would result in a situation where we need to
         * postpone quals from below an outer join to above it, which is
         * probably completely wrong and in any case is a complication that
         * doesn't seem worth addressing at the moment.
         */

The lateral reference to p.id is syntactically underneath the LEFT JOIN
in the subquery, so this restriction is violated.

It seems like we could possibly conclude that the restriction doesn't
have to apply to the outer side of the LEFT JOIN, but proving that and
then tightening up the logic is not a task I care to undertake right now.

This code dates back to c64de21e9625acad57e2caf8f22435e1617fb1ce
if you want to do some excavation.

            regards, tom lane