Обсуждение: Proposal for discussions: Allow reads to proceed during FK/trigger drops by reducing relation-level lock from AccessExclusive to ShareRowExclusive

Поиск
Список
Период
Сортировка
Hello hackers,

I stumbled upon a case and wanted to raise a quick thread to get some feedback on this.

Basically, dropping a foreign key constraint or a table that owns an FK currently blocks reads on the other table due to AccessExclusive locks taken while removing the FK’s internal triggers and constraint metadata. In busy systems, this short full-read outage can cause user-visible timeouts for otherwise read-only traffic. Similar topic discussed here as well [1]

Example setup
- pktable(id primary key)
- fktable(id primary key, fk references pktable(id))

Where reads get blocked today
- ALTER TABLE fktable DROP CONSTRAINT fkname:
  - The FK lives on fktable, and its RI action triggers live on pktable.
  - Both tables see AccessExclusive-level effects during removal, so SELECTs on either can be blocked while the FK and triggers are dropped.

- DROP TABLE fktable:
  - fktable is dropped with AccessExclusive (expected).
  - While removing RI action triggers on pktable, pktable also sees an AccessExclusive lock, so SELECTs on pktable can be blocked even though pktable is not being dropped.

- DROP TABLE pktable CASCADE:
  - pktable is dropped with AccessExclusive (expected).
  - FK removal on fktable (check triggers) also introduces an AccessExclusive effect on fktable, so SELECTs on fktable can be blocked.

Proposal
I wanted to see if we could reduce the relation-level lock used specifically for FK/trigger removal from AccessExclusive to ShareRowExclusive. This keeps readers moving while still blocking writers during the small window where RI triggers/constraint rows are removed and relcache is invalidated. Ideally, with a change like this `ALTER TABLE fktable DROP CONSTRAINT fkname` should take ShareRowExclusive for fktable and pktable, and `DROP TABLE fktable` should take ShareRowExclusive for fktable and AccessExclusive for pktable.

Next, dropping fktable would take ShareRowExclusive for fktable and AccessExclusive for pktable. And DROP TABLE pktable CASCADE would take AccessExclusive for pktable and ShareRowExclusive for fktable.

My understanding is that the table being dropped still uses AccessExclusive; reads/writes on that table remain blocked as today while it updates relcache, updating metadata/catalog entries and other cleanup tasks.

Also, I believe ShareRowExclusive would still serialize writers, so there should be no window where DML can bypass enforcement during removal. So this change should not affect correctness?

I just attached a small patch to get the idea across. If folks think this direction makes sense, I am happy to work on a more complete patch as well. If I’ve missed any critical cases that truly need AccessExclusive at the relation level during FK/trigger removal, especially around partitions, pending trigger events, hot standby, logical decoding, event triggers or something else perhaps please let me know.

[1] https://www.postgresql.org/message-id/flat/2bdf0fe5679f44ebf56890dfd5c5ead21a0c0bc3.camel%40cybertec.at#c50baf640c680c9e3f4def34b565d3b9

Thanks for your time and feedback.
Shayon
Вложения
Hello,

Following up on the previous thread - I took a stab at trying to see what a full patch for the proposal to reduce lock levels during FK/trigger drops would look like, and this is what I ended up with.

Recap on the problem space:

When dropping a foreign key constraint (or a table that owns one), PostgreSQL currently takes AccessExclusiveLock on both the table being altered and the referenced table for FKs. This blocks all access, including plain SELECTs, on the other table during FK/trigger removal.

Concrete example:
- Setup: pktable(id PK), fktable(id PK, fk → pktable)
- Action: BEGIN; DROP TABLE fktable
- From another shell: SELECTs on pktable are blocked while fktable's RI action triggers are removed from pktable, even though pktable itself is not being dropped

Why it matters:
In busy production systems, this creates a brief but total read outage on the referenced table. We've seen cases where webhook handlers doing simple SELECTs hit lock_timeout and failed during a routine table drop, because the referenced table (tenants) was locked exclusively to remove FK triggers.

My hope with this patch is that it reduces the lock on the referenced table from AccessExclusive to ShareRowExclusive, allowing SELECTs to proceed while still blocking writers during the brief FK/trigger removal window.

What this patch does:
- Reduces the relation-level lock from AccessExclusive to ShareRowExclusive in RemoveConstraintById(), RemoveTriggerById(), and dropconstraint_internal()
- This allows SELECTs on the other table (e.g., the referenced table in an FK relationship) to proceed during FK/trigger removal
- The table being directly altered/dropped still gets AccessExclusive as before
- Writers remain blocked (ShareRowExclusive conflicts with RowExclusiveLock), so there's no window where DML can bypass enforcement

What it also includes:
- Lock changes in 3 backend files (pg_constraint.c, trigger.c, tablecmds.c)
- Updated isolation tests (detach-partition-concurrently-4 + new fk-drop-constraint-concurrency)
- Documentation updates (alter_table.sgml)
- 8 test permutations covering: regular FKs, DROP TABLE, self-referential FKs, ALTER COLUMN TYPE, prepared plans

All FK-related isolation tests pass. The changes are minimal and surgical—just lock level adjustments with updated comments.

I'd be curious to hear any thoughts/feedback on this, especially:
- Whether the ShareRowExclusive approach makes sense or if there are edge cases I'm missing
- Hot Standby implications (this only affects primary; standbys still use AccessExclusive during WAL replay)
- Any concerns around event triggers, extensions, or operational tooling that might have relied on the stronger lock behavior

Patch attached.

Thanks,
Shayon

On Tue, Oct 7, 2025 at 12:54 PM Shayon Mukherjee <shayonj@gmail.com> wrote:
Hello hackers,

I stumbled upon a case and wanted to raise a quick thread to get some feedback on this.

Basically, dropping a foreign key constraint or a table that owns an FK currently blocks reads on the other table due to AccessExclusive locks taken while removing the FK’s internal triggers and constraint metadata. In busy systems, this short full-read outage can cause user-visible timeouts for otherwise read-only traffic. Similar topic discussed here as well [1]

Example setup
- pktable(id primary key)
- fktable(id primary key, fk references pktable(id))

Where reads get blocked today
- ALTER TABLE fktable DROP CONSTRAINT fkname:
  - The FK lives on fktable, and its RI action triggers live on pktable.
  - Both tables see AccessExclusive-level effects during removal, so SELECTs on either can be blocked while the FK and triggers are dropped.

- DROP TABLE fktable:
  - fktable is dropped with AccessExclusive (expected).
  - While removing RI action triggers on pktable, pktable also sees an AccessExclusive lock, so SELECTs on pktable can be blocked even though pktable is not being dropped.

- DROP TABLE pktable CASCADE:
  - pktable is dropped with AccessExclusive (expected).
  - FK removal on fktable (check triggers) also introduces an AccessExclusive effect on fktable, so SELECTs on fktable can be blocked.

Proposal
I wanted to see if we could reduce the relation-level lock used specifically for FK/trigger removal from AccessExclusive to ShareRowExclusive. This keeps readers moving while still blocking writers during the small window where RI triggers/constraint rows are removed and relcache is invalidated. Ideally, with a change like this `ALTER TABLE fktable DROP CONSTRAINT fkname` should take ShareRowExclusive for fktable and pktable, and `DROP TABLE fktable` should take ShareRowExclusive for fktable and AccessExclusive for pktable.

Next, dropping fktable would take ShareRowExclusive for fktable and AccessExclusive for pktable. And DROP TABLE pktable CASCADE would take AccessExclusive for pktable and ShareRowExclusive for fktable.

My understanding is that the table being dropped still uses AccessExclusive; reads/writes on that table remain blocked as today while it updates relcache, updating metadata/catalog entries and other cleanup tasks.

Also, I believe ShareRowExclusive would still serialize writers, so there should be no window where DML can bypass enforcement during removal. So this change should not affect correctness?

I just attached a small patch to get the idea across. If folks think this direction makes sense, I am happy to work on a more complete patch as well. If I’ve missed any critical cases that truly need AccessExclusive at the relation level during FK/trigger removal, especially around partitions, pending trigger events, hot standby, logical decoding, event triggers or something else perhaps please let me know.

[1] https://www.postgresql.org/message-id/flat/2bdf0fe5679f44ebf56890dfd5c5ead21a0c0bc3.camel%40cybertec.at#c50baf640c680c9e3f4def34b565d3b9

Thanks for your time and feedback.
Shayon
Вложения
Shayon Mukherjee <shayonj@gmail.com> writes:
> Following up on the previous thread - I took a stab at trying to see what a
> full patch for the proposal to reduce lock levels during FK/trigger drops
> would look like, and this is what I ended up with.

I don't think this is safe, at least not for FK removal.  There's a
comment in AlterTableGetLockLevel explaining why:

                /*
                 * Removing constraints can affect SELECTs that have been
                 * optimized assuming the constraint holds true. See also
                 * CloneFkReferenced.
                 */

Adding a foreign key can (and I think does) take a lesser lock,
because the additional constraint won't invalidate any proofs the
optimizer may have made beforehand.  But dropping one seems
problematic.

Another issue is that the proposed patch looks like it reduces
the locking level for more types of constraints than just FKs.
That would require further analysis, but I believe that (for
example) dropping a unique constraint likewise risks breaking
existing query plans, even when they aren't directly using that
index.

            regards, tom lane



Thank you so much for the review and feedback Tom

On Wed, Oct 8, 2025 at 5:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Shayon Mukherjee <shayonj@gmail.com> writes:
> Following up on the previous thread - I took a stab at trying to see what a
> full patch for the proposal to reduce lock levels during FK/trigger drops
> would look like, and this is what I ended up with.

I don't think this is safe, at least not for FK removal.  There's a
comment in AlterTableGetLockLevel explaining why:

                /*
                 * Removing constraints can affect SELECTs that have been
                 * optimized assuming the constraint holds true. See also
                 * CloneFkReferenced.
                 */

I tried to map my mental model and walk down the code paths, I knew I was missing something :^). Thank you for pointing this out. 

Adding a foreign key can (and I think does) take a lesser lock,
because the additional constraint won't invalidate any proofs the
optimizer may have made beforehand.  But dropping one seems
problematic.

You're 100% correct that the table owning the constraint (fktable) needs AccessExclusive because of the requirement that dropping the FK can invalidate plans that used it for proofs. That said, I wonder if we can reliably do something where only the referenced table’s RI action triggers are weakened to ShareRowExclusive in trigger deletion, which would help us achieve the criteria that the weaker locks on the referenced table (not the table being dropped), can accept SELECTs/reads?


Another issue is that the proposed patch looks like it reduces
the locking level for more types of constraints than just FKs.
That would require further analysis, but I believe that (for
example) dropping a unique constraint likewise risks breaking
existing query plans, even when they aren't directly using that
index.

That makes sense, yes. I attempted at an updated patch with reduced locks such that it's only for FK RI action triggers on the referenced table with no lock reduction applied to referenced tables for other constraint types. That is, if my understanding and mental model is right here.

I attached an updated patch with the new discoveries and the changes made are:

- RemoveConstraintById(): Reverted to AccessExclusiveLock
- RemoveTriggerById(): Now only uses ShareRowExclusiveLock for internal RI action triggers on the referenced table (confrelid). All other triggers (user triggers, RI check triggers on the FK table) still use AccessExclusiveLock.
- dropconstraint_internal(): ShareRowExclusiveLock only for FK drops on the referenced table (already inside `if CONSTRAINT_FOREIGN` block)
- ATPostAlterTypeCleanup(): Added constraint type check; only FK rebuilds use ShareRowExclusive

I feel like I could still be missing something, so I really appreciate any feedback. Definitely not trying to push hard on this and more so just using this as a learning opportunity as well. 

Thank you
Shayon 
Вложения