Luc Vlaming <luc@swarm64.com> writes:
> Given the testcase we see that the outer semi join tries to join the
> outer with the inner table id columns, even though the middle table id
> column is also there. Is this expected behavior?
I don't see anything greatly wrong with it. The planner has concluded
that _inner.id2 and middle.id1 are part of an equivalence class, so it
can form the top-level join by equating _outer.id3 to either of them.
AFAIR that choice is made at random --- there's certainly not any logic
that thinks about "well, the intermediate join output could be a bit
narrower if we choose this one instead of that one".
I think "made at random" actually boils down to "take the first usable
member of the equivalence class". If I switch around the wording of
the first equality condition:
... select 1 from _inner where middle.id1 = _inner.id2
then I get a plan where the top join uses middle.id1. However,
it's still propagating both middle.id1 and _inner.id2 up through
the bottom join, so that isn't buying anything efficiency-wise.
regards, tom lane