Обсуждение: Re: Eliminating SPI / SQL from some RI triggers - take 3
On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <amitlangote09@gmail.com> wrote: > We discussed $subject at [1] and [2] and I'd like to continue that > work with the hope to commit some part of it for v18. I did not get a chance to do any further work on this in this cycle, but plan to start working on it after beta release, so moving this to the next CF. I will post a rebased patch after the freeze to keep the bots green for now. -- Thanks, Amit Langote
On Thu, Apr 3, 2025 at 7:19 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <amitlangote09@gmail.com> wrote: > > We discussed $subject at [1] and [2] and I'd like to continue that > > work with the hope to commit some part of it for v18. > > I did not get a chance to do any further work on this in this cycle, > but plan to start working on it after beta release, so moving this to > the next CF. I will post a rebased patch after the freeze to keep the > bots green for now. Sorry for the inactivity. I've moved the patch entry in the CF app to PG19-Drafts, since I don't plan to work on it myself in the immediate future. However, Junwang Zhao has expressed interest in taking this work forward, and I look forward to working with him on it. -- Thanks, Amit Langote
Hi
út 21. 10. 2025 v 6:07 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
On Thu, Apr 3, 2025 at 7:19 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > We discussed $subject at [1] and [2] and I'd like to continue that
> > work with the hope to commit some part of it for v18.
>
> I did not get a chance to do any further work on this in this cycle,
> but plan to start working on it after beta release, so moving this to
> the next CF. I will post a rebased patch after the freeze to keep the
> bots green for now.
Sorry for the inactivity. I've moved the patch entry in the CF app to
PG19-Drafts, since I don't plan to work on it myself in the immediate
future. However, Junwang Zhao has expressed interest in taking this
work forward, and I look forward to working with him on it.
This is very interesting and important feature - I can help with testing and review if it will be necessary
Regards
Pavel
--
Thanks, Amit Langote
. On Tue, Oct 21, 2025 at 2:10 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > út 21. 10. 2025 v 6:07 odesílatel Amit Langote <amitlangote09@gmail.com> napsal: >> >> On Thu, Apr 3, 2025 at 7:19 PM Amit Langote <amitlangote09@gmail.com> wrote: >> > On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <amitlangote09@gmail.com> wrote: >> > > We discussed $subject at [1] and [2] and I'd like to continue that >> > > work with the hope to commit some part of it for v18. >> > >> > I did not get a chance to do any further work on this in this cycle, >> > but plan to start working on it after beta release, so moving this to >> > the next CF. I will post a rebased patch after the freeze to keep the >> > bots green for now. >> >> Sorry for the inactivity. I've moved the patch entry in the CF app to >> PG19-Drafts, since I don't plan to work on it myself in the immediate >> future. However, Junwang Zhao has expressed interest in taking this >> work forward, and I look forward to working with him on it. > > > This is very interesting and important feature - I can help with testing and review if it will be necessary Thanks for the interest. Just to add a quick note on the current direction I’ve been discussing off-list with Junwang: The next iteration of this work will likely follow a hybrid "fast-path + fallback" design rather than the original pure fast-path approach. The idea is to keep the optimization for straightforward cases where the foreign key and referenced key can be verified by a direct index probe, while falling back to the existing SPI path only when the runtime behavior of the executor is non-trivial to replicate -- such as visibility rechecks under concurrent updates -- or when the constraint itself involves richer semantics, like temporal foreign keys that require range and aggregation logic. That keeps the optimization safe without changing the meaning of constraint enforcement. This direction comes partly in response to the feedback from Robert and Tom in the earlier Eliminating SPI threads, who raised concerns that a fast path might silently diverge from what the executor does at runtime in subtle cases. The fallback design aims to address that directly: it keeps the optimization where it’s clearly safe, but defers to the existing SPI-based implementation whenever correctness might depend on executor behavior that would otherwise be difficult or risky to reproduce locally. In practice, this means adding a guarded fast path that performs the index probe and tuple lock directly under the same snapshot and security context that SPI would use, while caching stable metadata such as index descriptors, scan keys, and operator information per constraint or per statement. The fallback to SPI remains for the few cases that either depend on executor behavior or need features beyond a simple index probe: * Concurrent updates or deletes: If table_tuple_lock() reports that the target tuple was updated or deleted, we delegate to the SPI path so that EvalPlanQual and visibility rules are applied as today. * Partitioned parents: Skipped in v1 for simplicity, since they require routing the probe through the correct partition using PartitionDirectory. This can be added later as a separate patch once the core mechanism is stable. * Temporal foreign keys: These use range overlap and containment semantics (&&, <@, range_agg()) that inherently involve aggregation and multiple-row reasoning, so they stay on the SPI path. Everything else -- multi-column keys, cross-type equality supported by the index opfamily, collation matching, and RLS/ACL enforcement -- will be handled directly in the fast path. The security behavior will mirror the existing SPI path by temporarily switching to the parent table's owner with SECURITY_LOCAL_USERID_CHANGE | SECURITY_NOFORCE_RLS around the probe, like ri_PerformCheck() does. For concurrency, the fast path locks the located parent tuple with LockTupleKeyShare under GetActiveSnapshot(). If that succeeds (TM_Ok), the check passes immediately. While non-TM_Ok cases fall back for now, a later refinement could follow the update chain with table_tuple_fetch_row_version() under the current snapshot and re-lock the visible version, making the fast path fully self-contained. That’s the direction Junwang and I plan to explore next. -- Thanks, Amit Langote
Hi, On Wed, Oct 22, 2025 at 9:56 PM Amit Langote <amitlangote09@gmail.com> wrote: > > . > On Tue, Oct 21, 2025 at 2:10 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > > út 21. 10. 2025 v 6:07 odesílatel Amit Langote <amitlangote09@gmail.com> napsal: > >> > >> On Thu, Apr 3, 2025 at 7:19 PM Amit Langote <amitlangote09@gmail.com> wrote: > >> > On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <amitlangote09@gmail.com> wrote: > >> > > We discussed $subject at [1] and [2] and I'd like to continue that > >> > > work with the hope to commit some part of it for v18. > >> > > >> > I did not get a chance to do any further work on this in this cycle, > >> > but plan to start working on it after beta release, so moving this to > >> > the next CF. I will post a rebased patch after the freeze to keep the > >> > bots green for now. > >> > >> Sorry for the inactivity. I've moved the patch entry in the CF app to > >> PG19-Drafts, since I don't plan to work on it myself in the immediate > >> future. However, Junwang Zhao has expressed interest in taking this > >> work forward, and I look forward to working with him on it. > > > > > > This is very interesting and important feature - I can help with testing and review if it will be necessary > > Thanks for the interest. > > Just to add a quick note on the current direction I’ve been discussing > off-list with Junwang: > > The next iteration of this work will likely follow a hybrid "fast-path > + fallback" design rather than the original pure fast-path approach. > The idea is to keep the optimization for straightforward cases where > the foreign key and referenced key can be verified by a direct index > probe, while falling back to the existing SPI path only when the > runtime behavior of the executor is non-trivial to replicate -- such > as visibility rechecks under concurrent updates -- or when the > constraint itself involves richer semantics, like temporal foreign > keys that require range and aggregation logic. That keeps the > optimization safe without changing the meaning of constraint > enforcement. > > This direction comes partly in response to the feedback from Robert > and Tom in the earlier Eliminating SPI threads, who raised concerns > that a fast path might silently diverge from what the executor does at > runtime in subtle cases. The fallback design aims to address that > directly: it keeps the optimization where it’s clearly safe, but > defers to the existing SPI-based implementation whenever correctness > might depend on executor behavior that would otherwise be difficult or > risky to reproduce locally. > > In practice, this means adding a guarded fast path that performs the > index probe and tuple lock directly under the same snapshot and > security context that SPI would use, while caching stable metadata > such as index descriptors, scan keys, and operator information per > constraint or per statement. The fallback to SPI remains for the few > cases that either depend on executor behavior or need features beyond > a simple index probe: > > * Concurrent updates or deletes: If table_tuple_lock() reports that > the target tuple was updated or deleted, we delegate to the SPI path > so that EvalPlanQual and visibility rules are applied as today. > > * Partitioned parents: Skipped in v1 for simplicity, since they > require routing the probe through the correct partition using > PartitionDirectory. This can be added later as a separate patch once > the core mechanism is stable. > > * Temporal foreign keys: These use range overlap and containment > semantics (&&, <@, range_agg()) that inherently involve aggregation > and multiple-row reasoning, so they stay on the SPI path. > > Everything else -- multi-column keys, cross-type equality supported by > the index opfamily, collation matching, and RLS/ACL enforcement -- > will be handled directly in the fast path. The security behavior will > mirror the existing SPI path by temporarily switching to the parent > table's owner with SECURITY_LOCAL_USERID_CHANGE | SECURITY_NOFORCE_RLS > around the probe, like ri_PerformCheck() does. > > For concurrency, the fast path locks the located parent tuple with > LockTupleKeyShare under GetActiveSnapshot(). If that succeeds (TM_Ok), > the check passes immediately. While non-TM_Ok cases fall back for now, > a later refinement could follow the update chain with > table_tuple_fetch_row_version() under the current snapshot and re-lock > the visible version, making the fast path fully self-contained. > > That’s the direction Junwang and I plan to explore next. > > -- > Thanks, Amit Langote As Amit has already stated, we are approaching a hybrid "fast-path + fallback" design. 0001 adds a fast path optimization for foreign key constraint checks that bypasses the SPI executor, the fast path applies when the referenced table is not partitioned, and the constraint does not involve temporal semantics. With the following test: create table pk (a numeric primary key); create table fk (a bigint references pk); insert into pk select generate_series(1, 2000000); head: [local] zhjwpku@postgres:5432-90419=# insert into fk select generate_series(1, 2000000, 2); INSERT 0 1000000 Time: 13516.177 ms (00:13.516) [local] zhjwpku@postgres:5432-90419=# update fk set a = a + 1; UPDATE 1000000 Time: 15057.638 ms (00:15.058) patched: [local] zhjwpku@postgres:5432-98673=# insert into fk select generate_series(1, 2000000, 2); INSERT 0 1000000 Time: 8248.777 ms (00:08.249) [local] zhjwpku@postgres:5432-98673=# update fk set a = a + 1; UPDATE 1000000 Time: 10117.002 ms (00:10.117) 0002 cache fast-path metadata used by the index probe, at the current time only comparison operator hash entries, operator function OIDs and strategy numbers and subtypes for index scans. But this cache doesn't buy any performance improvement. Caching additional metadata should improve performance for foreign key checks. Amit suggested introducing a mechanism for ri_triggers.c to register a cleanup callback in the EState, which AfterTriggerEndQuery() could then invoke to release per-statement cached metadata (such as the IndexScanDesc). However, I haven't been able to implement this mechanism yet. Amit and I agree that we can post the patches here for review now. We are continuing to work on improving the metadata cache implementation. -- Regards Junwang Zhao
Вложения
Hi Junwang, On Mon, Dec 1, 2025 at 3:09 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > As Amit has already stated, we are approaching a hybrid "fast-path + fallback" > design. > > 0001 adds a fast path optimization for foreign key constraint checks > that bypasses the SPI executor, the fast path applies when the referenced > table is not partitioned, and the constraint does not involve temporal > semantics. > > With the following test: > > create table pk (a numeric primary key); > create table fk (a bigint references pk); > insert into pk select generate_series(1, 2000000); > > head: > > [local] zhjwpku@postgres:5432-90419=# insert into fk select > generate_series(1, 2000000, 2); > INSERT 0 1000000 > Time: 13516.177 ms (00:13.516) > > [local] zhjwpku@postgres:5432-90419=# update fk set a = a + 1; > UPDATE 1000000 > Time: 15057.638 ms (00:15.058) > > patched: > > [local] zhjwpku@postgres:5432-98673=# insert into fk select > generate_series(1, 2000000, 2); > INSERT 0 1000000 > Time: 8248.777 ms (00:08.249) > > [local] zhjwpku@postgres:5432-98673=# update fk set a = a + 1; > UPDATE 1000000 > Time: 10117.002 ms (00:10.117) > > 0002 cache fast-path metadata used by the index probe, at the current > time only comparison operator hash entries, operator function OIDs > and strategy numbers and subtypes for index scans. But this cache > doesn't buy any performance improvement. > > Caching additional metadata should improve performance for foreign key checks. > > Amit suggested introducing a mechanism for ri_triggers.c to register a > cleanup callback in the EState, which AfterTriggerEndQuery() could then > invoke to release per-statement cached metadata (such as the IndexScanDesc). > However, I haven't been able to implement this mechanism yet. Thanks for working on this. I've taken your patches as a starting point and reworked the series into two patches (attached): 1st is your 0001+0002 as the core patch that adds a gated fast-path alternative to SPI and 2nd where I added per-statement resource caching. Doing the latter turned out to be not so hard thanks to the structure you chose to build the core fast path. Good call on adding the RLS and ACL test cases, btw. So, 0001 is a functionally complete fast path: concurrency handling, REPEATABLE READ crosscheck, cross-type operators, security context, and metadata caching. 0002 implements the per-statement resource caching we discussed, though instead of sharing the EState between trigger.c and ri_triggers.c it uses a new AfterTriggerBatchCallback mechanism that fires at the end of each trigger-firing cycle (per-statement for immediate constraints, or until COMMIT for deferred ones). It layers resource caching on top so that the PK relation, index, scan descriptor, and snapshot stay open across all FK trigger invocations within a single trigger-firing cycle rather than being opened and closed per row. Note that phe previous 0002 (metadata caching) is folded into 0001, and most of the new fast-path logic added in 0001 now lives in ri_FastPathCheck() rather than inline in RI_FKey_check(), so the RI_FKey_check diff is just the gating call and SPI fallback. I re-ran the benchmarks (same test as yours, different machine): create table pk (a numeric primary key); create table fk (a bigint references pk); insert into pk select generate_series(1, 2000000); insert into fk select generate_series(1, 2000000, 2); master: 2444 ms (median of 3 runs) 0001: 1382 ms (43% faster) 0001+0002: 1202 ms (51% faster, 13% over 0001 alone) Also, with int PK / int FK (1M rows): create table pk (a int primary key); create table fk (a int references pk); insert into pk select generate_series(1, 1000000); insert into fk select generate_series(1, 1000000); master: 1000 ms 0001: 520 ms (48% faster) 0001+0002: 432 ms (57% faster, 17% over 0001 alone) The incremental gain from 0002 comes from eliminating per-row relation open/close, scan begin/end, slot alloc/free, and replacing per-row GetSnapshotData() with only curcid adjustment on the registered snapshot copy in the cache. The two current limitations are partitioned referenced tables and temporal foreign keys. Partitioned PKs are relatively uncommon in practice, so the non-partitioned case should cover most FK workloads, so I'm not sure it's worth the added complexity to support them. Temporal FKs are inherently multi-row, so they're a poor fit for a single-probe fast path. David Rowley mentioned off-list that it might be worth batching multiple FK values into a single index probe, leveraging the ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be to buffer FK values across trigger invocations in the per-constraint cache (0002 already has the right structure for this), build a SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf pages in one sorted traversal instead of one tree descent per row. The locking and recheck would still be per-tuple, but the index traversal cost drops significantly. Single-column FKs are the obvious starting point. That seems worth exploring but can be done as a separate patch on top of this. I think the series is in reasonable shape but would appreciate extra eyeballs, especially on the concurrency handling in ri_LockPKTuple() in 0001 and the snapshot lifecycle in 0002. Or anything else that catches one's eye. -- Thanks, Amit Langote