Обсуждение: Remove inner joins based on foreign keys
Currently, the planner can remove useless left joins if the join
condition cannot match more than one RHS row, and the RHS rel is not
referenced above the join. I'd like to propose a similar optimization
for inner joins.
For an inner join to be removable, we need to make sure that the join
condition matches exactly one (no more, no less) RHS row. We can
leverage foreign key constraints to achieve that. But we need to be
careful about the case where the referencing relation's foreign key
columns are null. We can inject an IS NOT NULL filter for any foreign
key columns that are not defined as NOT NULL.
Here is an example to show this removal:
CREATE TABLE users (id int primary key, name text);
CREATE TABLE orders (id int, user_id int references users(id), amount int);
EXPLAIN (COSTS OFF)
SELECT o.* FROM orders o JOIN users u ON o.user_id = u.id;
QUERY PLAN
---------------------------------
Seq Scan on orders o
Filter: (user_id IS NOT NULL)
(2 rows)
One interesting aspect of this patch is that we can actually allow the
usage of the referenced key columns in ECs even above the join, which
can help bridge multi-hop joins. This is pretty useful for the
brand-new graph queries.
As an example, consider:
CREATE TABLE knows (
edge_id int primary key,
src_id int not null references users(id),
dst_id int not null references users(id),
creation_date date
);
CREATE PROPERTY GRAPH social_graph
VERTEX TABLES (users)
EDGE TABLES (knows
SOURCE KEY (src_id) REFERENCES users(id)
DESTINATION KEY (dst_id) REFERENCES users(id)
);
Suppose we only want to find the creation dates of a 3-hop connection
chain:
EXPLAIN (COSTS OFF)
SELECT e1_date, e2_date, e3_date
FROM GRAPH_TABLE (social_graph
MATCH (u1 IS users)
-[e1 IS knows]-> (u2 IS users)
-[e2 IS knows]-> (u3 IS users)
-[e3 IS knows]-> (u4 IS users)
COLUMNS (
e1.creation_date AS e1_date,
e2.creation_date AS e2_date,
e3.creation_date AS e3_date
)
);
QUERY PLAN
----------------------------------------------------
Hash Join
Hash Cond: (knows_1.dst_id = knows_2.src_id)
-> Hash Join
Hash Cond: (knows.dst_id = knows_1.src_id)
-> Seq Scan on knows
-> Hash
-> Seq Scan on knows knows_1
-> Hash
-> Seq Scan on knows knows_2
(9 rows)
As a result, all user nodes are removed, and this query is reduced
from a 7-table join down to a 3-table join.
Please see the draft commit message in the attached patch for the
implementation details. Any thoughts and feedback are very welcome.
- Richard
Вложения
On Sat, Mar 21, 2026 at 11:47:03AM +0900, Richard Guo wrote: > Please see the draft commit message in the attached patch for the > implementation details. Any thoughts and feedback are very welcome. You did not mention anything about the version that would be impacted by this change. At this stage of the release cycle, is it right to imply that this would be for v20 and not v19? -- Michael
Вложения
On Sat, Mar 21, 2026 at 4:42 PM Michael Paquier <michael@paquier.xyz> wrote: > You did not mention anything about the version that would be impacted > by this change. At this stage of the release cycle, is it right to > imply that this would be for v20 and not v19? Right. This is for v20. - Richard
On Sat, 21 Mar 2026 at 15:47, Richard Guo <guofenglinux@gmail.com> wrote: > Currently, the planner can remove useless left joins if the join > condition cannot match more than one RHS row, and the RHS rel is not > referenced above the join. I'd like to propose a similar optimization > for inner joins. I tried this many years ago and it was pretty much a dead end with how the current foreign key implementation deferring the cascade of the foreign key until the end of the query. There's plenty of discussion. See [1] and [2]. In particular, read and follow along from what Heikki mentions in [3]. You should read all of that and understand it to prevent prompting the same discussions all over again. It doesn't look like your patch has anything to protect against any of the issues mentioned in [1], so I assume you weren't aware of that work. David [1] https://www.postgresql.org/message-id/flat/CAApHDvpDXXvKE%2B%3Dug1kA--nKfa%3DbjrjvK8Gp9G8UYwv6nHckVg%40mail.gmail.com#2f33cb1e196e863eba0e1376084cdf4d [2] https://www.postgresql.org/message-id/CAApHDvocUEYdt1uT%2BDLDPs2xEu%3Dv3qJGT6HeXKonQM4rY_OsSA%40mail.gmail.com [3] https://www.postgresql.org/message-id/54240C51.8040304%40vmware.com
On Sun, Mar 22, 2026 at 6:09 PM David Rowley <dgrowleyml@gmail.com> wrote: > On Sat, 21 Mar 2026 at 15:47, Richard Guo <guofenglinux@gmail.com> wrote: > > Currently, the planner can remove useless left joins if the join > > condition cannot match more than one RHS row, and the RHS rel is not > > referenced above the join. I'd like to propose a similar optimization > > for inner joins. > I tried this many years ago and it was pretty much a dead end with how > the current foreign key implementation deferring the cascade of the > foreign key until the end of the query. Thanks for pointing this out! I failed to find the prior work, and I missed the fatal flaw introduced by the AFTER ROW trigger mechanism for foreign key constraints. I had been making a mental analogy to UNIQUE and NOT NULL constraints, but those are enforced immediately at the heap/B-tree level. Just for the sake of archives, the timeline of the trap during a cascading delete looks like this: T0: DELETE FROM users WHERE id = 1; T1: The executor finds users row 1 and sets its xmax, physically marking it as dead. T2: [The Gap] The executor pushes the RI trigger into a queue to deal with orders later. Right now, the orders row still exists, but its referenced row in users is dead. T3: The statement finishes, the trigger fires, and the orders row is finally deleted. The "T2 Gap" is small, but there are several ways to execute user code inside that window, such as RETURNING clauses, volatile functions, or user-defined AFTER ROW triggers. Since the planner operates on static logical schema guarantees and cannot predict dynamic execution-time trigger queues, it seems any plan-time optimization that relies on foreign keys for correctness is effectively a dead end. Maybe the only solution would be to handle the join removal in the executor (where the trigger state is known), but I noticed you explored that a decade ago and it seems far too invasive. Thanks again for the save. You saved me a lot of time and effort chasing a dead end. - Richard
Hi
po 23. 3. 2026 v 7:13 odesílatel Richard Guo <guofenglinux@gmail.com> napsal:
On Sun, Mar 22, 2026 at 6:09 PM David Rowley <dgrowleyml@gmail.com> wrote:
> On Sat, 21 Mar 2026 at 15:47, Richard Guo <guofenglinux@gmail.com> wrote:
> > Currently, the planner can remove useless left joins if the join
> > condition cannot match more than one RHS row, and the RHS rel is not
> > referenced above the join. I'd like to propose a similar optimization
> > for inner joins.
> I tried this many years ago and it was pretty much a dead end with how
> the current foreign key implementation deferring the cascade of the
> foreign key until the end of the query.
Thanks for pointing this out! I failed to find the prior work, and I
missed the fatal flaw introduced by the AFTER ROW trigger mechanism
for foreign key constraints. I had been making a mental analogy to
UNIQUE and NOT NULL constraints, but those are enforced immediately at
the heap/B-tree level.
Just for the sake of archives, the timeline of the trap during a
cascading delete looks like this:
T0: DELETE FROM users WHERE id = 1;
T1: The executor finds users row 1 and sets its xmax, physically
marking it as dead.
T2: [The Gap] The executor pushes the RI trigger into a queue to deal
with orders later. Right now, the orders row still exists, but its
referenced row in users is dead.
T3: The statement finishes, the trigger fires, and the orders row is
finally deleted.
The "T2 Gap" is small, but there are several ways to execute user code
inside that window, such as RETURNING clauses, volatile functions, or
user-defined AFTER ROW triggers.
Since the planner operates on static logical schema guarantees and
cannot predict dynamic execution-time trigger queues, it seems any
plan-time optimization that relies on foreign keys for correctness is
effectively a dead end. Maybe the only solution would be to handle
the join removal in the executor (where the trigger state is known),
but I noticed you explored that a decade ago and it seems far too
invasive.
Thanks again for the save. You saved me a lot of time and effort
chasing a dead end.
Maybe you can push this analysis to some README in the code.
Regards
Pavel
- Richard
On Mon, Mar 23, 2026 at 3:12 PM Richard Guo <guofenglinux@gmail.com> wrote: > Just for the sake of archives, the timeline of the trap during a > cascading delete looks like this: > > T0: DELETE FROM users WHERE id = 1; > > T1: The executor finds users row 1 and sets its xmax, physically > marking it as dead. > > T2: [The Gap] The executor pushes the RI trigger into a queue to deal > with orders later. Right now, the orders row still exists, but its > referenced row in users is dead. > > T3: The statement finishes, the trigger fires, and the orders row is > finally deleted. > > The "T2 Gap" is small, but there are several ways to execute user code > inside that window, such as RETURNING clauses, volatile functions, or > user-defined AFTER ROW triggers. I've been exploring ways to detect whether a query is executing within the RI trigger gap. It seems one promising approach is to leverage the lock manager. Every DML statement acquires RowExclusiveLock on its target table before modifying any rows, so if the trigger gap is active, the lock is guaranteed to be held. We can verify this during planning by calling CheckRelationOidLockedByMe(), which is a quite efficient check. (As a point of precedent, the planner already relies on the local lock manager's state in several existing code paths.) For prepared statements and cached plans, a generic plan that was built with FK join removal could be reused in a context where the trigger gap is active. To handle this, we can modify choose_custom_plan() to check whether any relation involved in an FK-based join removal currently holds RowExclusiveLock, and choose custom plan if so. The trade-off is false positives: because RowExclusiveLock persists for the entire transaction while the trigger gap is intra-statement, the optimization is also skipped after the DML completes but within the same transaction, after ROLLBACK TO a savepoint, or when RowExclusiveLock is held for other reasons (e.g., LOCK TABLE). These seem like uncommon cases, and the query simply falls back to executing the join normally as it would without the optimization at all, so I think this is an acceptable trade-off. Please see the v2 patch for the implementation details. I didn't find any mention of this approach in the 2014 thread. I'd appreciate any thoughts or feedback on this direction. - Richard