Обсуждение: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW
Attached is a patch implementing support for a WHERE clause in REFRESH MATERIALIZED VIEW.
The syntax allows for targeted refreshes:
REFRESH MATERIALIZED VIEW mv WHERE invoice_id = ANY('{1,2,3}');
REFRESH MATERIALIZED VIEW CONCURRENTLY mv WHERE customer_id = 42;
REFRESH MATERIALIZED VIEW mv WHERE order_date >= '2023-01-01';
I was inspired to implement this feature after watching the Hacking Postgres discussion on the topic: https://www.youtube.com/watch?v=6cZvHjDrmlQ
This allows the user to restrict the refresh operation to a subset of the view. The qualification is applied to the view's output columns. The optimizer can then push this condition down to the underlying base tables, avoiding a full scan when only a known subset of data has changed.
Implementation notes:
1. The grammar accepts an optional WHERE clause. We forbid volatile functions in the clause to ensure correctness.
2. Non-Concurrent Partial Refresh: When `CONCURRENTLY` is not specified, the operation performs an in-place modification using a `ROW EXCLUSIVE` lock.
* This mode requires a unique index to ensure constraint violations are handled correctly (e.g., when a row's values change such that it "drifts" into or out of the `WHERE` clause scope).
* It executes a Prune + Upsert strategy:
* `DELETE` all rows in the materialized view that match the `WHERE` clause.
* `INSERT` the new data from the source query.
* It uses `ON CONFLICT DO UPDATE` during the insert phase to handle concurrency edge cases, ensuring the refresh is robust against constraint violations.
3. Concurrent Partial Refresh: When `CONCURRENTLY` is specified, it uses the existing diff/merge infrastructure (`refresh_by_match_merge`), limiting the scope of the diff (and the temporary table population) to the rows matching the predicate. This requires an `EXCLUSIVE` lock and a unique index, consistent with existing concurrent refresh behavior. It is much slower than `Non-Concurrent Partial Refresh`
4. The execution logic uses SPI to inject the predicate into the source queries during execution.
I have attached a benchmark suite to validate performance and correctness:
* `setup.sql`: Creates a schema `mv_benchmark` modeling an invoicing system (`invoices` and `invoice_lines`). It includes an aggregated materialized view (`invoice_summary`) and a control table (`invoice_summary_table`).
* `workload_*.sql`: pgbench scripts simulating a high-churn environment (45% inserts, 10% updates, 45% deletes) to maintain roughly stable dataset sizes while generating significant refresh work.
* `run_benchmark_comprehensive.sh`: Orchestrates the benchmark across multiple scale factors and concurrency levels.
The benchmark compares strategies for keeping a summary up to date (vs baseline):
* Partial Refresh: Triggers on the base table collect modified IDs and execute `REFRESH MATERIALIZED VIEW ... WHERE ...`.
* Materialized Table (Control): A standard table maintained via complex PL/pgSQL triggers (the traditional manual workaround).
* Full Refresh (Legacy): Manually refresh the view after changes.
Results are below:
Concurrency: 1 client(s)
----------------------------------------------------------------------------------
Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
---------- ------ | ------------ | ------------ ------------ ------------
20000 1 | 5309.05 | 0.002x 0.437x 0.470x
20000 50 | 1209.32 | 0.010x 0.600x 0.598x
20000 1000 | 56.05 | 0.164x 0.594x 0.576x
400000 1 | 5136.91 | 0 x 0.450x 0.487x
400000 50 | 1709.17 | 0 x 0.497x 0.482x
400000 1000 | 110.35 | 0.006x 0.507x 0.460x
Concurrency: 4 client(s)
----------------------------------------------------------------------------------
Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
---------- ------ | ------------ | ------------ ------------ ------------
20000 1 | 19197.50 | 0x 0.412x 0.435x
20000 50 | 1016.14 | 0.007x 0.966x 1.036x
20000 1000 | 9.94 | 0.708x 1.401x 1.169x
400000 1 | 19637.36 | 0x 0.436x 0.483x
400000 50 | 4669.32 | 0x 0.574x 0.566x
400000 1000 | 23.26 | 0.029x 1.147x 0.715x
Concurrency: 8 client(s)
----------------------------------------------------------------------------------
Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
---------- ------ | ------------ | ------------ ------------ ------------
20000 1 | 30358.32 | 0x 0.440x 0.457x
20000 50 | 262.75 | 0.026x 2.943x 2.740x
20000 1000 | 11.28 | 0.575x 0.840x 0.578x
400000 1 | 36007.15 | 0x 0.430x 0.464x
400000 50 | 6664.58 | 0x 0.563x 0.494x
400000 1000 | 11.61 | 0.058x 1.000x 1.277x
In these tests, the partial refresh behaves as O(delta) rather than O(total), performing comparably to the manual PL/pgSQL approach but with significantly lower code complexity for the user.
I recognize that adding a WHERE clause to REFRESH is an extension to the SQL standard. I believe the syntax is intuitive, but I am open to discussion regarding alternative implementation strategies or syntax if the community feels a different approach is warranted.
New regression tests are included in the patch.
This is my first time submitting a patch to PostgreSQL, so please bear with me if I've missed anything or made any procedural mistakes. I'm happy to address any feedback.
Thanks,
Adam Brusselback
The syntax allows for targeted refreshes:
REFRESH MATERIALIZED VIEW mv WHERE invoice_id = ANY('{1,2,3}');
REFRESH MATERIALIZED VIEW CONCURRENTLY mv WHERE customer_id = 42;
REFRESH MATERIALIZED VIEW mv WHERE order_date >= '2023-01-01';
I was inspired to implement this feature after watching the Hacking Postgres discussion on the topic: https://www.youtube.com/watch?v=6cZvHjDrmlQ
This allows the user to restrict the refresh operation to a subset of the view. The qualification is applied to the view's output columns. The optimizer can then push this condition down to the underlying base tables, avoiding a full scan when only a known subset of data has changed.
Implementation notes:
1. The grammar accepts an optional WHERE clause. We forbid volatile functions in the clause to ensure correctness.
2. Non-Concurrent Partial Refresh: When `CONCURRENTLY` is not specified, the operation performs an in-place modification using a `ROW EXCLUSIVE` lock.
* This mode requires a unique index to ensure constraint violations are handled correctly (e.g., when a row's values change such that it "drifts" into or out of the `WHERE` clause scope).
* It executes a Prune + Upsert strategy:
* `DELETE` all rows in the materialized view that match the `WHERE` clause.
* `INSERT` the new data from the source query.
* It uses `ON CONFLICT DO UPDATE` during the insert phase to handle concurrency edge cases, ensuring the refresh is robust against constraint violations.
3. Concurrent Partial Refresh: When `CONCURRENTLY` is specified, it uses the existing diff/merge infrastructure (`refresh_by_match_merge`), limiting the scope of the diff (and the temporary table population) to the rows matching the predicate. This requires an `EXCLUSIVE` lock and a unique index, consistent with existing concurrent refresh behavior. It is much slower than `Non-Concurrent Partial Refresh`
4. The execution logic uses SPI to inject the predicate into the source queries during execution.
I have attached a benchmark suite to validate performance and correctness:
* `setup.sql`: Creates a schema `mv_benchmark` modeling an invoicing system (`invoices` and `invoice_lines`). It includes an aggregated materialized view (`invoice_summary`) and a control table (`invoice_summary_table`).
* `workload_*.sql`: pgbench scripts simulating a high-churn environment (45% inserts, 10% updates, 45% deletes) to maintain roughly stable dataset sizes while generating significant refresh work.
* `run_benchmark_comprehensive.sh`: Orchestrates the benchmark across multiple scale factors and concurrency levels.
The benchmark compares strategies for keeping a summary up to date (vs baseline):
* Partial Refresh: Triggers on the base table collect modified IDs and execute `REFRESH MATERIALIZED VIEW ... WHERE ...`.
* Materialized Table (Control): A standard table maintained via complex PL/pgSQL triggers (the traditional manual workaround).
* Full Refresh (Legacy): Manually refresh the view after changes.
Results are below:
Concurrency: 1 client(s)
----------------------------------------------------------------------------------
Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
---------- ------ | ------------ | ------------ ------------ ------------
20000 1 | 5309.05 | 0.002x 0.437x 0.470x
20000 50 | 1209.32 | 0.010x 0.600x 0.598x
20000 1000 | 56.05 | 0.164x 0.594x 0.576x
400000 1 | 5136.91 | 0 x 0.450x 0.487x
400000 50 | 1709.17 | 0 x 0.497x 0.482x
400000 1000 | 110.35 | 0.006x 0.507x 0.460x
Concurrency: 4 client(s)
----------------------------------------------------------------------------------
Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
---------- ------ | ------------ | ------------ ------------ ------------
20000 1 | 19197.50 | 0x 0.412x 0.435x
20000 50 | 1016.14 | 0.007x 0.966x 1.036x
20000 1000 | 9.94 | 0.708x 1.401x 1.169x
400000 1 | 19637.36 | 0x 0.436x 0.483x
400000 50 | 4669.32 | 0x 0.574x 0.566x
400000 1000 | 23.26 | 0.029x 1.147x 0.715x
Concurrency: 8 client(s)
----------------------------------------------------------------------------------
Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
---------- ------ | ------------ | ------------ ------------ ------------
20000 1 | 30358.32 | 0x 0.440x 0.457x
20000 50 | 262.75 | 0.026x 2.943x 2.740x
20000 1000 | 11.28 | 0.575x 0.840x 0.578x
400000 1 | 36007.15 | 0x 0.430x 0.464x
400000 50 | 6664.58 | 0x 0.563x 0.494x
400000 1000 | 11.61 | 0.058x 1.000x 1.277x
In these tests, the partial refresh behaves as O(delta) rather than O(total), performing comparably to the manual PL/pgSQL approach but with significantly lower code complexity for the user.
I recognize that adding a WHERE clause to REFRESH is an extension to the SQL standard. I believe the syntax is intuitive, but I am open to discussion regarding alternative implementation strategies or syntax if the community feels a different approach is warranted.
New regression tests are included in the patch.
This is my first time submitting a patch to PostgreSQL, so please bear with me if I've missed anything or made any procedural mistakes. I'm happy to address any feedback.
Thanks,
Adam Brusselback
Вложения
On Mon, Dec 8, 2025 at 3:58 PM Adam Brusselback <adambrusselback@gmail.com> wrote:
Attached is a patch implementing support for a WHERE clause in REFRESH MATERIALIZED VIEW.
The syntax allows for targeted refreshes:
REFRESH MATERIALIZED VIEW mv WHERE invoice_id = ANY('{1,2,3}');
REFRESH MATERIALIZED VIEW CONCURRENTLY mv WHERE customer_id = 42;
REFRESH MATERIALIZED VIEW mv WHERE order_date >= '2023-01-01';
I was inspired to implement this feature after watching the Hacking Postgres discussion on the topic: https://www.youtube.com/watch?v=6cZvHjDrmlQ
+1 (But I was in that hacking session).
Our situation was a wonderful MV with all the columns we needed (some hard to calculate) to augment search data done millions of times/day. It was a thing of beauty. Until we realized we needed to update 1 record (vendor inventory UPDATE date/time) every time we processed a file (something we do 24x7, a hundred times each hour!
For that ONE field, we ended up doing REFRESH MV concurrently; OVER 2,000 times per day.
Our understanding is that many people run into this exact issue. The cache needs small frequent updates.
(After reading the code that handles MVs, we just created our own TABLE, and maintain it with a scheduler to rebuild HOURLY,
and when we process the file, a Simple UPDATE is issued for the one column).
While this "Works", the CONCEPT of this patch (untested by me, as of yet), would have fixed this with far less effort,
and would be easier to maintain.
After I review the code, I will add additional comments.
I am curious what others think? (And FWIW, I believe that the larger the MV, the MORE this feature is needed,
vs refreshing the ENTIRE view).
Regards...
Regards...
On Tue, Dec 09, 2025 at 12:08:58AM -0500, Kirk Wolak wrote: > Our understanding is that many people run into this exact issue. The > cache needs small frequent updates. > (After reading the code that handles MVs, we just created our own TABLE, > and maintain it with a scheduler to rebuild HOURLY, > and when we process the file, a Simple UPDATE is issued for the one column). > > While this "Works", the CONCEPT of this patch (untested by me, as of > yet), would have fixed this with far less effort, > and would be easier to maintain. > > After I review the code, I will add additional comments. > > I am curious what others think? (And FWIW, I believe that the larger the > MV, the MORE this feature is needed, > vs refreshing the ENTIRE view). What I do is I have my own materialized view infrastructure, written entirely in PlPgSQL, and I completely avoid PG's MV support. This alternative MV scheme creates an actual table for each MV, which means: - one can update the MV directly (and I do, via triggers) - one can have triggers on the MV (e.g., to record history) This is has been very handy for me. I also have a state table in which to keep track of whether an MV needs a refresh, and I have a function i can use to mark an MV as needing a refresh. Marking an MV as needing a refresh sends a NOTIFY, and then I have a daemon that will refresh views as needed (with some debouncing/coalescing of notifications). This way I can have MVs with very complex underlying queries for which some kinds of updates I can easily write fast triggers for and others where I can't (or where they would slow down transactions too much) I simply mark the MV as needing a refresh. Typical MV queries I have that this works very well for include transitive reachability closure computations (e.g., all the groups a thing is a member of, directly and indirectly, or vice versa -- recursive CTEs basically). Though I do now have triggers that can do a reasonably good job of synchronously and quickly updating MVs with such queries, it's just I didn't always. Refreshes are always 'concurrent'. This is my 80/20 solution to the "Incremental View Maintenance" (IVM) problem. A not very current version is here: https://github.com/twosigma/postgresql-contrib/blob/master/mat_views.sql If you like it I might be able to get a newer version out. The version above has a few minor issues: - it uses DELETE FROM instead of TRUNCATE for its' sort-of temp tables - using TRUNCATE ends up requiring some care to avoid occasional deadlocks with VACUUM that are due to using tables as types of the columns of the deltas tables - logging -- lots of logging in the newest version Another issue is that I rely on NATURAL FULL OUTER JOIN to avoid having to generate ON conditions, but that means that all columns of the underlying VIEW must not have NULLs. As I've not needed to support nullable columns in these MVs, I don't mind. Nico --
> Our situation was a wonderful MV with all the columns we needed (some
> hard to calculate) to augment search data done millions of times/day. It
> was a thing of beauty. Until we realized we needed to update 1 record
> (vendor inventory UPDATE date/time) every time we processed a file
> (something we do 24x7, a hundred times each hour!
>
> For that ONE field, we ended up doing REFRESH MV concurrently; OVER
> 2,000 times per day.
Thanks for the feedback and the use case, Kirk.
Regarding that specific scenario where a single column ("last updated" or similar) churns significantly faster than the heavy-computation columns:
Even with this patch, you might find it beneficial to separate that high-velocity column into its own small materialized view (or regular view) and join it to the main MV at query time. That will reduce the bloat you get on the main MV by quite a lot, especially if you have very wide rows (which it seems like you do).
I initially tried to implement logic that would allow for direct UPDATEs (which would enable HOT updates). However, to handle rows that matched the predicate but were no longer present in the new source data, I had to run an anti-join to identify them for deletion. That approach caused performance issues, so I settled on the "Prune + Upsert" strategy (DELETE matching rows, then INSERT from source).
Because this patch performs a delete/insert cycle, updating that one timestamp column will still result in rewriting the whole tuple in the MV.
> For that ONE field, we ended up doing REFRESH MV concurrently; OVER
> 2,000 times per day.
That said, 2,000 refreshes per day is nothing for this implementation, provided your updates are selective enough and your queries allow for predicate push-down to the base tables.
I look forward to your thoughts after reviewing the code.
Thanks,
Adam Brusselback
> hard to calculate) to augment search data done millions of times/day. It
> was a thing of beauty. Until we realized we needed to update 1 record
> (vendor inventory UPDATE date/time) every time we processed a file
> (something we do 24x7, a hundred times each hour!
>
> For that ONE field, we ended up doing REFRESH MV concurrently; OVER
> 2,000 times per day.
Thanks for the feedback and the use case, Kirk.
Regarding that specific scenario where a single column ("last updated" or similar) churns significantly faster than the heavy-computation columns:
Even with this patch, you might find it beneficial to separate that high-velocity column into its own small materialized view (or regular view) and join it to the main MV at query time. That will reduce the bloat you get on the main MV by quite a lot, especially if you have very wide rows (which it seems like you do).
I initially tried to implement logic that would allow for direct UPDATEs (which would enable HOT updates). However, to handle rows that matched the predicate but were no longer present in the new source data, I had to run an anti-join to identify them for deletion. That approach caused performance issues, so I settled on the "Prune + Upsert" strategy (DELETE matching rows, then INSERT from source).
Because this patch performs a delete/insert cycle, updating that one timestamp column will still result in rewriting the whole tuple in the MV.
> For that ONE field, we ended up doing REFRESH MV concurrently; OVER
> 2,000 times per day.
That said, 2,000 refreshes per day is nothing for this implementation, provided your updates are selective enough and your queries allow for predicate push-down to the base tables.
I look forward to your thoughts after reviewing the code.
Thanks,
Adam Brusselback
Hi all,
I've been running some more concurrency tests against this patch (specifically looking for race conditions), and I found a flaw in the implementation for the REFRESH ... WHERE ... mode (without CONCURRENTLY).
I believe the issue is that the DELETE -> INSERT strategy leaves a consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent reads, the moment we delete the rows, we lose the physical lock on them. If a concurrent transaction inserts a colliding row during that gap, the materialized view ends up inconsistent with the base query (or hits a constraint violation).
I initially was using SELECT ... FOR UPDATE to lock the rows before modification, but that lock is (now that I know) obviously lost when the row is deleted.
My plan is to replace that row-locking strategy with transaction-level advisory locks inside the refresh logic:
Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid, hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause.
This effectively locks the "logical" ID of the row, preventing concurrent refreshes on the same ID even while the physical tuple is temporarily gone. Hash collisions should not have any correctness issues that I can think of.
However, before I sink time into implementing that fix:
Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE ... in core?
If the community feels this feature is a footgun or conceptually wrong for Postgres, I'd rather know now before spending more time on this.
If the feature concept is sound, does the advisory lock approach seem like the right way to handle the concurrency safety here?
Thanks,
Adam Brusselback
I've been running some more concurrency tests against this patch (specifically looking for race conditions), and I found a flaw in the implementation for the REFRESH ... WHERE ... mode (without CONCURRENTLY).
I believe the issue is that the DELETE -> INSERT strategy leaves a consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent reads, the moment we delete the rows, we lose the physical lock on them. If a concurrent transaction inserts a colliding row during that gap, the materialized view ends up inconsistent with the base query (or hits a constraint violation).
I initially was using SELECT ... FOR UPDATE to lock the rows before modification, but that lock is (now that I know) obviously lost when the row is deleted.
My plan is to replace that row-locking strategy with transaction-level advisory locks inside the refresh logic:
Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid, hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause.
This effectively locks the "logical" ID of the row, preventing concurrent refreshes on the same ID even while the physical tuple is temporarily gone. Hash collisions should not have any correctness issues that I can think of.
However, before I sink time into implementing that fix:
Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE ... in core?
If the community feels this feature is a footgun or conceptually wrong for Postgres, I'd rather know now before spending more time on this.
If the feature concept is sound, does the advisory lock approach seem like the right way to handle the concurrency safety here?
Thanks,
Adam Brusselback
Hey Adam,
1. In the non-concurrent REFRESH ... WHERE .... path, the UPSERT SQL is built using the unique index metadata. The code currently uses indnatts when building the ON Conflict (...) target list. That includes INCLUDE columns, so for an index like:
Moving on to broader questions
Apologies for the delay, and as promised on discord, I did a review of the current patch (cf 6305) and wanted to share findings that line up with the thread’s design discussion, plus one additional correctness bug that I could reproduce.
1. In the non-concurrent REFRESH ... WHERE .... path, the UPSERT SQL is built using the unique index metadata. The code currently uses indnatts when building the ON Conflict (...) target list. That includes INCLUDE columns, so for an index like:
CREATE UNIQUE INDEX ON mv(id) INCLUDE (extra);
the generated statement becomes effectively ON CONFLICT (id, extra) ..., which fails with:
ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification
The fix appears straightforward: use indnkeyatts (key attributes only) when generating the conflict target, and also when deciding which columns are “key” for the UPDATE SET clause. I’ve attached a minimal repro SQL script (repro_include_issue.sql)
2. Another small test quality issue: the regression script has a comment “Subqueries -> Error” but the expected output shows no error for the schema-qualified subquery. There is no explicit check forbidding subqueries in transformRefreshWhereClause(), so schema-qualified subqueries appear allowed.
Moving on to broader questions
I believe the issue is that the DELETE -> INSERT strategy leaves a consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent reads, the moment we delete the rows, we lose the physical lock on them. If a concurrent transaction inserts a colliding row during that gap, the materialized view ends up inconsistent with the base query (or hits a constraint violation).
Consistency gap in the non-concurrent mode matches what I’d expect: with ROW EXCLUSIVE you allow concurrent readers/writers, and a pure DELETE → INSERT approach can create a window where the old tuple is gone and a concurrent session can insert a conflicting logical row.
That said, I think it would help the patch to explicitly define the intended safety model:
1. Is the goal to be safe against concurrent DML on base tables only (i.e., refresh sees a snapshot and updates MV accordingly), or also to be safe against concurrent partial refreshes and direct writes to the MV (when maintenance is enabled)?
2. Should the non-concurrent partial refresh be “best effort” like normal DML (user coordinates), or should it be “maintenance-like” (serialized / logically safe by default)?
1. Is the goal to be safe against concurrent DML on base tables only (i.e., refresh sees a snapshot and updates MV accordingly), or also to be safe against concurrent partial refreshes and direct writes to the MV (when maintenance is enabled)?
2. Should the non-concurrent partial refresh be “best effort” like normal DML (user coordinates), or should it be “maintenance-like” (serialized / logically safe by default)?
If the intent is “safe by default”, I’d encourage documenting very clearly what’s guaranteed, and adding regression/README-style notes for footguns
From a reviewer standpoint, I think the feature concept is sound and valuable, but it needs a crisp statement of semantics and safety boundaries. The tricky part is exactly what you called out: incremental refresh implies concurrency questions that aren’t present with full rebuild + strong locks.
I’m happy to keep reviewing iterations (especially around the advisory lock approach), and I’ll attach the reproduction scripts and notes I used.
I’m happy to keep reviewing iterations (especially around the advisory lock approach), and I’ll attach the reproduction scripts and notes I used.
As a possible staging approach: it might be simplest to start with a conservative serialization model for non-concurrent WHERE (while still allowing readers), and then iterate toward finer-grained logical locking if/when needed for throughput.
Thanks,
Dharin
On Sun, Jan 4, 2026 at 3:56 AM Adam Brusselback <adambrusselback@gmail.com> wrote:
Hi all,
I've been running some more concurrency tests against this patch (specifically looking for race conditions), and I found a flaw in the implementation for the REFRESH ... WHERE ... mode (without CONCURRENTLY).
I believe the issue is that the DELETE -> INSERT strategy leaves a consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent reads, the moment we delete the rows, we lose the physical lock on them. If a concurrent transaction inserts a colliding row during that gap, the materialized view ends up inconsistent with the base query (or hits a constraint violation).
I initially was using SELECT ... FOR UPDATE to lock the rows before modification, but that lock is (now that I know) obviously lost when the row is deleted.
My plan is to replace that row-locking strategy with transaction-level advisory locks inside the refresh logic:
Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid, hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause.
This effectively locks the "logical" ID of the row, preventing concurrent refreshes on the same ID even while the physical tuple is temporarily gone. Hash collisions should not have any correctness issues that I can think of.
However, before I sink time into implementing that fix:
Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE ... in core?
If the community feels this feature is a footgun or conceptually wrong for Postgres, I'd rather know now before spending more time on this.
If the feature concept is sound, does the advisory lock approach seem like the right way to handle the concurrency safety here?
Thanks,
Adam Brusselback
Вложения
(repro_include_issue.sql)
Typo fix : test_include_bug.sql (attached file)
Thanks,
Dharin
Dharin
On Thu, Jan 15, 2026 at 7:46 PM Dharin Shah <dharinshah95@gmail.com> wrote:
Hey Adam,Apologies for the delay, and as promised on discord, I did a review of the current patch (cf 6305) and wanted to share findings that line up with the thread’s design discussion, plus one additional correctness bug that I could reproduce.
1. In the non-concurrent REFRESH ... WHERE .... path, the UPSERT SQL is built using the unique index metadata. The code currently uses indnatts when building the ON Conflict (...) target list. That includes INCLUDE columns, so for an index like:CREATE UNIQUE INDEX ON mv(id) INCLUDE (extra);the generated statement becomes effectively ON CONFLICT (id, extra) ..., which fails with:ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specificationThe fix appears straightforward: use indnkeyatts (key attributes only) when generating the conflict target, and also when deciding which columns are “key” for the UPDATE SET clause. I’ve attached a minimal repro SQL script (repro_include_issue.sql)2. Another small test quality issue: the regression script has a comment “Subqueries -> Error” but the expected output shows no error for the schema-qualified subquery. There is no explicit check forbidding subqueries in transformRefreshWhereClause(), so schema-qualified subqueries appear allowed.
Moving on to broader questionsI believe the issue is that the DELETE -> INSERT strategy leaves a consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent reads, the moment we delete the rows, we lose the physical lock on them. If a concurrent transaction inserts a colliding row during that gap, the materialized view ends up inconsistent with the base query (or hits a constraint violation).Consistency gap in the non-concurrent mode matches what I’d expect: with ROW EXCLUSIVE you allow concurrent readers/writers, and a pure DELETE → INSERT approach can create a window where the old tuple is gone and a concurrent session can insert a conflicting logical row.That said, I think it would help the patch to explicitly define the intended safety model:
1. Is the goal to be safe against concurrent DML on base tables only (i.e., refresh sees a snapshot and updates MV accordingly), or also to be safe against concurrent partial refreshes and direct writes to the MV (when maintenance is enabled)?
2. Should the non-concurrent partial refresh be “best effort” like normal DML (user coordinates), or should it be “maintenance-like” (serialized / logically safe by default)?If the intent is “safe by default”, I’d encourage documenting very clearly what’s guaranteed, and adding regression/README-style notes for footgunsFrom a reviewer standpoint, I think the feature concept is sound and valuable, but it needs a crisp statement of semantics and safety boundaries. The tricky part is exactly what you called out: incremental refresh implies concurrency questions that aren’t present with full rebuild + strong locks.
I’m happy to keep reviewing iterations (especially around the advisory lock approach), and I’ll attach the reproduction scripts and notes I used.As a possible staging approach: it might be simplest to start with a conservative serialization model for non-concurrent WHERE (while still allowing readers), and then iterate toward finer-grained logical locking if/when needed for throughput.Thanks,DharinOn Sun, Jan 4, 2026 at 3:56 AM Adam Brusselback <adambrusselback@gmail.com> wrote:Hi all,
I've been running some more concurrency tests against this patch (specifically looking for race conditions), and I found a flaw in the implementation for the REFRESH ... WHERE ... mode (without CONCURRENTLY).
I believe the issue is that the DELETE -> INSERT strategy leaves a consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent reads, the moment we delete the rows, we lose the physical lock on them. If a concurrent transaction inserts a colliding row during that gap, the materialized view ends up inconsistent with the base query (or hits a constraint violation).
I initially was using SELECT ... FOR UPDATE to lock the rows before modification, but that lock is (now that I know) obviously lost when the row is deleted.
My plan is to replace that row-locking strategy with transaction-level advisory locks inside the refresh logic:
Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid, hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause.
This effectively locks the "logical" ID of the row, preventing concurrent refreshes on the same ID even while the physical tuple is temporarily gone. Hash collisions should not have any correctness issues that I can think of.
However, before I sink time into implementing that fix:
Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE ... in core?
If the community feels this feature is a footgun or conceptually wrong for Postgres, I'd rather know now before spending more time on this.
If the feature concept is sound, does the advisory lock approach seem like the right way to handle the concurrency safety here?
Thanks,
Adam Brusselback