Обсуждение: Re: Eliminating SPI / SQL from some RI triggers - take 3

Поиск
Список
Период
Сортировка

Re: Eliminating SPI / SQL from some RI triggers - take 3

От
Amit Langote
Дата:
On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
> We discussed $subject at [1] and [2] and I'd like to continue that
> work with the hope to commit some part of it for v18.

I did not get a chance to do any further work on this in this cycle,
but plan to start working on it after beta release, so moving this to
the next CF.  I will post a rebased patch after the freeze to keep the
bots green for now.

--
Thanks, Amit Langote



Re: Eliminating SPI / SQL from some RI triggers - take 3

От
Amit Langote
Дата:
On Thu, Apr 3, 2025 at 7:19 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > We discussed $subject at [1] and [2] and I'd like to continue that
> > work with the hope to commit some part of it for v18.
>
> I did not get a chance to do any further work on this in this cycle,
> but plan to start working on it after beta release, so moving this to
> the next CF.  I will post a rebased patch after the freeze to keep the
> bots green for now.

Sorry for the inactivity. I've moved the patch entry in the CF app to
PG19-Drafts, since I don't plan to work on it myself in the immediate
future. However, Junwang Zhao has expressed interest in taking this
work forward, and I look forward to working with him on it.

--
Thanks, Amit Langote



Re: Eliminating SPI / SQL from some RI triggers - take 3

От
Pavel Stehule
Дата:
Hi

út 21. 10. 2025 v 6:07 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
On Thu, Apr 3, 2025 at 7:19 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > We discussed $subject at [1] and [2] and I'd like to continue that
> > work with the hope to commit some part of it for v18.
>
> I did not get a chance to do any further work on this in this cycle,
> but plan to start working on it after beta release, so moving this to
> the next CF.  I will post a rebased patch after the freeze to keep the
> bots green for now.

Sorry for the inactivity. I've moved the patch entry in the CF app to
PG19-Drafts, since I don't plan to work on it myself in the immediate
future. However, Junwang Zhao has expressed interest in taking this
work forward, and I look forward to working with him on it.

This is very interesting and important feature - I can help with testing and review if it will be necessary

Regards

Pavel

 

--
Thanks, Amit Langote


Re: Eliminating SPI / SQL from some RI triggers - take 3

От
Amit Langote
Дата:
.
On Tue, Oct 21, 2025 at 2:10 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> út 21. 10. 2025 v 6:07 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
>>
>> On Thu, Apr 3, 2025 at 7:19 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> > On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> > > We discussed $subject at [1] and [2] and I'd like to continue that
>> > > work with the hope to commit some part of it for v18.
>> >
>> > I did not get a chance to do any further work on this in this cycle,
>> > but plan to start working on it after beta release, so moving this to
>> > the next CF.  I will post a rebased patch after the freeze to keep the
>> > bots green for now.
>>
>> Sorry for the inactivity. I've moved the patch entry in the CF app to
>> PG19-Drafts, since I don't plan to work on it myself in the immediate
>> future. However, Junwang Zhao has expressed interest in taking this
>> work forward, and I look forward to working with him on it.
>
>
> This is very interesting and important feature - I can help with testing and review if it will be necessary

Thanks for the interest.

Just to add a quick note on the current direction I’ve been discussing
off-list with Junwang:

The next iteration of this work will likely follow a hybrid "fast-path
+ fallback" design rather than the original pure fast-path approach.
The idea is to keep the optimization for straightforward cases where
the foreign key and referenced key can be verified by a direct index
probe, while falling back to the existing SPI path only when the
runtime behavior of the executor is non-trivial to replicate -- such
as visibility rechecks under concurrent updates -- or when the
constraint itself involves richer semantics, like temporal foreign
keys that require range and aggregation logic. That keeps the
optimization safe without changing the meaning of constraint
enforcement.

This direction comes partly in response to the feedback from Robert
and Tom in the earlier Eliminating SPI threads, who raised concerns
that a fast path might silently diverge from what the executor does at
runtime in subtle cases. The fallback design aims to address that
directly: it keeps the optimization where it’s clearly safe, but
defers to the existing SPI-based implementation whenever correctness
might depend on executor behavior that would otherwise be difficult or
risky to reproduce locally.

In practice, this means adding a guarded fast path that performs the
index probe and tuple lock directly under the same snapshot and
security context that SPI would use, while caching stable metadata
such as index descriptors, scan keys, and operator information per
constraint or per statement. The fallback to SPI remains for the few
cases that either depend on executor behavior or need features beyond
a simple index probe:

* Concurrent updates or deletes: If table_tuple_lock() reports that
the target tuple was updated or deleted, we delegate to the SPI path
so that EvalPlanQual and visibility rules are applied as today.

* Partitioned parents: Skipped in v1 for simplicity, since they
require routing the probe through the correct partition using
PartitionDirectory. This can be added later as a separate patch once
the core mechanism is stable.

* Temporal foreign keys: These use range overlap and containment
semantics (&&, <@, range_agg()) that inherently involve aggregation
and multiple-row reasoning, so they stay on the SPI path.

Everything else -- multi-column keys, cross-type equality supported by
the index opfamily, collation matching, and RLS/ACL enforcement --
will be handled directly in the fast path.  The security behavior will
mirror the existing SPI path by temporarily switching to the parent
table's owner with SECURITY_LOCAL_USERID_CHANGE | SECURITY_NOFORCE_RLS
around the probe, like ri_PerformCheck() does.

For concurrency, the fast path locks the located parent tuple with
LockTupleKeyShare under GetActiveSnapshot(). If that succeeds (TM_Ok),
the check passes immediately. While non-TM_Ok cases fall back for now,
a later refinement could follow the update chain with
table_tuple_fetch_row_version() under the current snapshot and re-lock
the visible version, making the fast path fully self-contained.

That’s the direction Junwang and I plan to explore next.

--
Thanks, Amit Langote



Re: Eliminating SPI / SQL from some RI triggers - take 3

От
Junwang Zhao
Дата:
Hi,

On Wed, Oct 22, 2025 at 9:56 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> .
> On Tue, Oct 21, 2025 at 2:10 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > út 21. 10. 2025 v 6:07 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
> >>
> >> On Thu, Apr 3, 2025 at 7:19 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >> > On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >> > > We discussed $subject at [1] and [2] and I'd like to continue that
> >> > > work with the hope to commit some part of it for v18.
> >> >
> >> > I did not get a chance to do any further work on this in this cycle,
> >> > but plan to start working on it after beta release, so moving this to
> >> > the next CF.  I will post a rebased patch after the freeze to keep the
> >> > bots green for now.
> >>
> >> Sorry for the inactivity. I've moved the patch entry in the CF app to
> >> PG19-Drafts, since I don't plan to work on it myself in the immediate
> >> future. However, Junwang Zhao has expressed interest in taking this
> >> work forward, and I look forward to working with him on it.
> >
> >
> > This is very interesting and important feature - I can help with testing and review if it will be necessary
>
> Thanks for the interest.
>
> Just to add a quick note on the current direction I’ve been discussing
> off-list with Junwang:
>
> The next iteration of this work will likely follow a hybrid "fast-path
> + fallback" design rather than the original pure fast-path approach.
> The idea is to keep the optimization for straightforward cases where
> the foreign key and referenced key can be verified by a direct index
> probe, while falling back to the existing SPI path only when the
> runtime behavior of the executor is non-trivial to replicate -- such
> as visibility rechecks under concurrent updates -- or when the
> constraint itself involves richer semantics, like temporal foreign
> keys that require range and aggregation logic. That keeps the
> optimization safe without changing the meaning of constraint
> enforcement.
>
> This direction comes partly in response to the feedback from Robert
> and Tom in the earlier Eliminating SPI threads, who raised concerns
> that a fast path might silently diverge from what the executor does at
> runtime in subtle cases. The fallback design aims to address that
> directly: it keeps the optimization where it’s clearly safe, but
> defers to the existing SPI-based implementation whenever correctness
> might depend on executor behavior that would otherwise be difficult or
> risky to reproduce locally.
>
> In practice, this means adding a guarded fast path that performs the
> index probe and tuple lock directly under the same snapshot and
> security context that SPI would use, while caching stable metadata
> such as index descriptors, scan keys, and operator information per
> constraint or per statement. The fallback to SPI remains for the few
> cases that either depend on executor behavior or need features beyond
> a simple index probe:
>
> * Concurrent updates or deletes: If table_tuple_lock() reports that
> the target tuple was updated or deleted, we delegate to the SPI path
> so that EvalPlanQual and visibility rules are applied as today.
>
> * Partitioned parents: Skipped in v1 for simplicity, since they
> require routing the probe through the correct partition using
> PartitionDirectory. This can be added later as a separate patch once
> the core mechanism is stable.
>
> * Temporal foreign keys: These use range overlap and containment
> semantics (&&, <@, range_agg()) that inherently involve aggregation
> and multiple-row reasoning, so they stay on the SPI path.
>
> Everything else -- multi-column keys, cross-type equality supported by
> the index opfamily, collation matching, and RLS/ACL enforcement --
> will be handled directly in the fast path.  The security behavior will
> mirror the existing SPI path by temporarily switching to the parent
> table's owner with SECURITY_LOCAL_USERID_CHANGE | SECURITY_NOFORCE_RLS
> around the probe, like ri_PerformCheck() does.
>
> For concurrency, the fast path locks the located parent tuple with
> LockTupleKeyShare under GetActiveSnapshot(). If that succeeds (TM_Ok),
> the check passes immediately. While non-TM_Ok cases fall back for now,
> a later refinement could follow the update chain with
> table_tuple_fetch_row_version() under the current snapshot and re-lock
> the visible version, making the fast path fully self-contained.
>
> That’s the direction Junwang and I plan to explore next.
>
> --
> Thanks, Amit Langote

As Amit has already stated, we are approaching a hybrid "fast-path + fallback"
design.

0001 adds a fast path optimization for foreign key constraint checks
that bypasses the SPI executor, the fast path applies when the referenced
table is not partitioned, and the constraint does not involve temporal
semantics.

With the following test:

create table pk (a numeric primary key);
create table fk (a bigint references pk);
insert into pk select generate_series(1, 2000000);

head:

[local] zhjwpku@postgres:5432-90419=# insert into fk select
generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 13516.177 ms (00:13.516)

[local] zhjwpku@postgres:5432-90419=# update fk set a = a + 1;
UPDATE 1000000
Time: 15057.638 ms (00:15.058)

patched:

[local] zhjwpku@postgres:5432-98673=# insert into fk select
generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 8248.777 ms (00:08.249)

[local] zhjwpku@postgres:5432-98673=# update fk set a = a + 1;
UPDATE 1000000
Time: 10117.002 ms (00:10.117)

0002 cache fast-path metadata used by the index probe, at the current
time only comparison operator hash entries, operator function OIDs
and strategy numbers and subtypes for index scans. But this cache
doesn't buy any performance improvement.

Caching additional metadata should improve performance for foreign key checks.

Amit suggested introducing a mechanism for ri_triggers.c to register a
cleanup callback in the EState, which AfterTriggerEndQuery() could then
invoke to release per-statement cached metadata (such as the IndexScanDesc).
However, I haven't been able to implement this mechanism yet.

Amit and I agree that we can post the patches here for review now. We are
continuing to work on improving the metadata cache implementation.

--
Regards
Junwang Zhao

Вложения

Re: Eliminating SPI / SQL from some RI triggers - take 3

От
Amit Langote
Дата:
Hi Junwang,

On Mon, Dec 1, 2025 at 3:09 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> As Amit has already stated, we are approaching a hybrid "fast-path + fallback"
> design.
>
> 0001 adds a fast path optimization for foreign key constraint checks
> that bypasses the SPI executor, the fast path applies when the referenced
> table is not partitioned, and the constraint does not involve temporal
> semantics.
>
> With the following test:
>
> create table pk (a numeric primary key);
> create table fk (a bigint references pk);
> insert into pk select generate_series(1, 2000000);
>
> head:
>
> [local] zhjwpku@postgres:5432-90419=# insert into fk select
> generate_series(1, 2000000, 2);
> INSERT 0 1000000
> Time: 13516.177 ms (00:13.516)
>
> [local] zhjwpku@postgres:5432-90419=# update fk set a = a + 1;
> UPDATE 1000000
> Time: 15057.638 ms (00:15.058)
>
> patched:
>
> [local] zhjwpku@postgres:5432-98673=# insert into fk select
> generate_series(1, 2000000, 2);
> INSERT 0 1000000
> Time: 8248.777 ms (00:08.249)
>
> [local] zhjwpku@postgres:5432-98673=# update fk set a = a + 1;
> UPDATE 1000000
> Time: 10117.002 ms (00:10.117)
>
> 0002 cache fast-path metadata used by the index probe, at the current
> time only comparison operator hash entries, operator function OIDs
> and strategy numbers and subtypes for index scans. But this cache
> doesn't buy any performance improvement.
>
> Caching additional metadata should improve performance for foreign key checks.
>
> Amit suggested introducing a mechanism for ri_triggers.c to register a
> cleanup callback in the EState, which AfterTriggerEndQuery() could then
> invoke to release per-statement cached metadata (such as the IndexScanDesc).
> However, I haven't been able to implement this mechanism yet.

Thanks for working on this.  I've taken your patches as a starting
point and reworked the series into two patches (attached): 1st is your
0001+0002 as the core patch that adds a gated fast-path alternative to
SPI and 2nd where I added per-statement resource caching.  Doing the
latter turned out to be not so hard thanks to the structure you chose
to build the core fast path.  Good call on adding the RLS and ACL test
cases, btw.

So, 0001 is a functionally complete fast path: concurrency handling,
REPEATABLE READ crosscheck, cross-type operators, security context,
and metadata caching. 0002 implements the per-statement resource
caching we discussed, though instead of sharing the EState between
trigger.c and ri_triggers.c it uses a new AfterTriggerBatchCallback
mechanism that fires at the end of each trigger-firing cycle
(per-statement for immediate constraints, or until COMMIT for deferred
ones). It layers resource caching on top so that the PK relation,
index, scan descriptor, and snapshot stay open across all FK trigger
invocations within a single trigger-firing cycle rather than being
opened and closed per row.

Note that phe previous 0002 (metadata caching) is folded into 0001,
and most of the new fast-path logic added in 0001 now lives in
ri_FastPathCheck() rather than inline in RI_FKey_check(), so the
RI_FKey_check diff is just the gating call and SPI fallback.

I re-ran the benchmarks (same test as yours, different machine):

create table pk (a numeric primary key);
create table fk (a bigint references pk);
insert into pk select generate_series(1, 2000000);
insert into fk select generate_series(1, 2000000, 2);

master: 2444 ms  (median of 3 runs)
0001: 1382 ms  (43% faster)
0001+0002: 1202 ms  (51% faster, 13% over 0001 alone)

Also, with int PK / int FK (1M rows):

create table pk (a int primary key);
create table fk (a int references pk);
insert into pk select generate_series(1, 1000000);
insert into fk select generate_series(1, 1000000);

master: 1000 ms
0001: 520 ms  (48% faster)
0001+0002: 432 ms  (57% faster, 17% over 0001 alone)

The incremental gain from 0002 comes from eliminating per-row relation
open/close, scan begin/end, slot alloc/free, and replacing per-row
GetSnapshotData() with only curcid adjustment on the registered
snapshot copy in the cache.

The two current limitations are partitioned referenced tables and
temporal foreign keys. Partitioned PKs are relatively uncommon in
practice, so the non-partitioned case should cover most FK workloads,
so I'm not sure it's worth the added complexity to support them.
Temporal FKs are inherently multi-row, so they're a poor fit for a
single-probe fast path.

David Rowley mentioned off-list that it might be worth batching
multiple FK values into a single index probe, leveraging the
ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be
to buffer FK values across trigger invocations in the per-constraint
cache (0002 already has the right structure for this), build a
SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf
pages in one sorted traversal instead of one tree descent per row. The
locking and recheck would still be per-tuple, but the index traversal
cost drops significantly. Single-column FKs are the obvious starting
point. That seems worth exploring but can be done as a separate patch
on top of this.

I think the series is in reasonable shape but would appreciate extra
eyeballs, especially on the concurrency handling in ri_LockPKTuple()
in 0001 and the snapshot lifecycle in 0002. Or anything else that
catches one's eye.

--
Thanks, Amit Langote

Вложения