Обсуждение: Proposal: Add a UNIQUE NOT ENFORCED constraint
I am working through a case that requires very high throughput inserts that would be slowed down by a unique index, but that has externally guaranteed unique values and could benefit from the query planning and documentation provided by a unique constraint. To fill this need, I propose adding a UNIQUE NOT ENFORCED constraint, which would tell Postgres to assume uniqueness without creating a unique index to enforce it, leaving ensuring uniqueness to the developer, similar to unenforced CHECK and foreign key constraints (https://commitfest.postgresql.org/patch/5285/). -- Reasoning -- In many cases, the idiomatic/generally best way to write a query requires a uniqueness check (a SELECT WHERE ... = ANY()/IN or really any semi-join when optimized to an inner join, UNION, etc), meaning a Unique/HashAggregate node will be added, increasing overhead unless there is an explicit unique constraint. An unenforced unique constraint would allow developers to use their knowledge of the data/previous validation procedures to eliminate the extra node. This would also help with documentation/observability tools by providing more context on the data without adding overhead. A number of SQL databases already implement this, not only including most data warehouses but also many more traditional databases (Db2 (https://www.ibm.com/docs/en/ias?topic=constraints-informational), SingleStore (https://docs.singlestore.com/cloud/create-a-database/specifying-unenforced-unique-constraints/), etc). -- Implementation -- The hardest part of implementing this (as far as I can tell) would be updating the query planner to consider these unenforced constraints. I see two ways to do that: 1. Modify relation/query_has_unique_index (or some other function in the chain to determine uniqueness) to consider constraints instead of just indexes. At plan time, Postgres could include unenforced unique constraints in RelOptInfo to be retrieved by the relevant function. 2. Create a new "dummy index" index type. This would not include any update triggers and would have an infinite cost to prevent usage in query planning, but it would still serve the purpose of proving the existence of a unique index. I am leaning towards the first solution because it would be much simpler and less hacky to implement, although it would lose the flexibility given by custom unique indexes. On the other hand, a dummy index as described in the 2nd solution would likely require much more code. It also might be less intuitive because it redefines what an index means in that specific case, and would expand unique indexes into their own type instead of being a type of B-Tree index. Most other implementation details should be fairly similar to other unenforced constraints, with some differences when altering constraints. Unlike other unenforced constraints, we can't mark the constraint as NOT VALID, because a NOT VALID constraint doesn't validate any data already inserted, and an enforced UNIQUE must be able to scan all data to build the unique index and properly validate uniqueness. Instead, converting a unique constraint will follow the same logic as adding a unique constraint and either block for an extended period to create the index when altering the constraint, or create the index in parallel and then alter the constraint. Unenforced unique constraints also differ from other unenforced constraints in that their use in query planning can yield poor results, rather than just worsening planning estimates. This problem should be clearly documented, and error handling will likely need some changes, but ultimately, it is not fundamentally different from the many other ways that developers can write faulty queries. Let me know what you think. Thanks! Jacob
If anyone has thoughts on the feasibility, outward design, or implementation plans for this proposal, I would appreciate any feedback you have. Thanks again, Jacob On Sun, Jan 4, 2026 at 12:34 PM Jacob Jackson <jej.jackson.08@gmail.com> wrote: > > I am working through a case that requires very high throughput inserts > that would be slowed down by a unique index, but that has externally > guaranteed unique values and could benefit from the query planning and > documentation provided by a unique constraint. > > To fill this need, I propose adding a UNIQUE NOT ENFORCED constraint, > which would tell Postgres to assume uniqueness without creating a > unique index to enforce it, leaving ensuring uniqueness to the > developer, similar to unenforced CHECK and foreign key constraints > (https://commitfest.postgresql.org/patch/5285/). > > -- Reasoning -- > > In many cases, the idiomatic/generally best way to write a query > requires a uniqueness check (a SELECT WHERE ... = ANY()/IN or really > any semi-join when optimized to an inner join, UNION, etc), meaning a > Unique/HashAggregate node will be added, increasing overhead unless > there is an explicit unique constraint. An unenforced unique > constraint would allow developers to use their knowledge of the > data/previous validation procedures to eliminate the extra node. This > would also help with documentation/observability tools by providing > more context on the data without adding overhead. > > A number of SQL databases already implement this, not only including > most data warehouses but also many more traditional databases (Db2 > (https://www.ibm.com/docs/en/ias?topic=constraints-informational), > SingleStore (https://docs.singlestore.com/cloud/create-a-database/specifying-unenforced-unique-constraints/), > etc). > > -- Implementation -- > > The hardest part of implementing this (as far as I can tell) would be > updating the query planner to consider these unenforced constraints. I > see two ways to do that: > > 1. Modify relation/query_has_unique_index (or some other function in > the chain to determine uniqueness) to consider constraints instead of > just indexes. At plan time, Postgres could include unenforced unique > constraints in RelOptInfo to be retrieved by the relevant function. > > 2. Create a new "dummy index" index type. This would not include any > update triggers and would have an infinite cost to prevent usage in > query planning, but it would still serve the purpose of proving the > existence of a unique index. > > I am leaning towards the first solution because it would be much > simpler and less hacky to implement, although it would lose the > flexibility given by custom unique indexes. On the other hand, a dummy > index as described in the 2nd solution would likely require much more > code. It also might be less intuitive because it redefines what an > index means in that specific case, and would expand unique indexes > into their own type instead of being a type of B-Tree index. > > Most other implementation details should be fairly similar to other > unenforced constraints, with some differences when altering > constraints. Unlike other unenforced constraints, we can't mark the > constraint as NOT VALID, because a NOT VALID constraint doesn't > validate any data already inserted, and an enforced UNIQUE must be > able to scan all data to build the unique index and properly validate > uniqueness. > > Instead, converting a unique constraint will follow the same logic as > adding a unique constraint and either block for an extended period to > create the index when altering the constraint, or create the index in > parallel and then alter the constraint. > > Unenforced unique constraints also differ from other unenforced > constraints in that their use in query planning can yield poor > results, rather than just worsening planning estimates. This problem > should be clearly documented, and error handling will likely need some > changes, but ultimately, it is not fundamentally different from the > many other ways that developers can write faulty queries. > > Let me know what you think. > > Thanks! > Jacob
On Sun, Jan 4, 2026, at 18:34, Jacob Jackson wrote: > I am working through a case that requires very high throughput inserts > that would be slowed down by a unique index, but that has externally > guaranteed unique values and could benefit from the query planning and > documentation provided by a unique constraint. > > To fill this need, I propose adding a UNIQUE NOT ENFORCED constraint, > which would tell Postgres to assume uniqueness without creating a > unique index to enforce it, leaving ensuring uniqueness to the > developer, similar to unenforced CHECK and foreign key constraints > (https://commitfest.postgresql.org/patch/5285/). Can you please share some more details on your use-case? I'm curious to learn more about this "unique" column. Don't you ever need to do lookups/joins on it (e.g. WHERE col = ?)? /Joel
Hi Jacob,
On Mon, Jan 5, 2026 at 1:34 AM Jacob Jackson <jej.jackson.08@gmail.com> wrote:
In many cases, the idiomatic/generally best way to write a query
requires a uniqueness check (a SELECT WHERE ... = ANY()/IN or really
any semi-join when optimized to an inner join, UNION, etc), meaning a
Unique/HashAggregate node will be added, increasing overhead unless
there is an explicit unique constraint. An unenforced unique
constraint would allow developers to use their knowledge of the
data/previous validation procedures to eliminate the extra node. This
would also help with documentation/observability tools by providing
more context on the data without adding overhead.
I'm not very familiar with the UNIQUE NOT ENFORCED constraint, so apologies if I make any mistakes here. If we want the query planner to generate an execution plan as if a column were unique, would setting n_distinct = -1 in the table statistics achieve the same effect?
On Sun, 4 Jan 2026, 18:34 Jacob Jackson, <jej.jackson.08@gmail.com> wrote: > > I am working through a case that requires very high throughput inserts > that would be slowed down by a unique index, but that has externally > guaranteed unique values and could benefit from the query planning and > documentation provided by a unique constraint. > > To fill this need, I propose adding a UNIQUE NOT ENFORCED constraint, > which would tell Postgres to assume uniqueness without creating a > unique index to enforce it, leaving ensuring uniqueness to the > developer, similar to unenforced CHECK and foreign key constraints > (https://commitfest.postgresql.org/patch/5285/). > > -- Reasoning -- > > In many cases, the idiomatic/generally best way to write a query > requires a uniqueness check (a SELECT WHERE ... = ANY()/IN or really > any semi-join when optimized to an inner join, UNION, etc), meaning a > Unique/HashAggregate node will be added, increasing overhead unless > there is an explicit unique constraint. An unenforced unique > constraint would allow developers to use their knowledge of the > data/previous validation procedures to eliminate the extra node. Unenforced constraints should be considered invalid (how would we prove they're valid?), and because the planner is not allowed to rely on invalid constraints, why would the planner be allowed to remove these nodes for unenforced-and-therefore-invalid unique constraints? It's quite likely that the query output would be incorrect when we expected the data to be unique for unenforced constraints, while the data in the table isn't actually unique for that unenforced unique constraint; and knowingly giving incorrect results is not something I can agree to. > This > would also help with documentation/observability tools by providing > more context on the data without adding overhead. Adding such constraints as metadata (without backing index) seems fine, but we can't use the constraint in the planner, nor in the executor, for the reason mentioned above. > A number of SQL databases already implement this, not only including > most data warehouses but also many more traditional databases (Db2 > (https://www.ibm.com/docs/en/ias?topic=constraints-informational), > SingleStore (https://docs.singlestore.com/cloud/create-a-database/specifying-unenforced-unique-constraints/), > etc). I'm concerned they don't actually comply with the SQL standard when they process data from those tables, if they assume unenforced unique constraints to always be valid. > 2. Create a new "dummy index" index type. This would not include any > update triggers and would have an infinite cost to prevent usage in > query planning, but it would still serve the purpose of proving the > existence of a unique index. It would be faster than a btree, but it'd still have the issue of adding the overhead of column projection costs, and it'd be a (false?) target for LR's replication identity, which I'm concerned about. > Unenforced unique constraints also differ from other unenforced > constraints in that their use in query planning can yield poor > results, rather than just worsening planning estimates. Does a table with unenforced constraints produce different plans than one without those unenforced constraints? That seems wrong to me; there is no material difference to the information we have on such a table that we can consider true. > This problem > should be clearly documented, and error handling will likely need some > changes, but ultimately, it is not fundamentally different from the > many other ways that developers can write faulty queries. > > Let me know what you think. I think this feature would be a loaded footgun, especially if the planner starts to consider unenforced constraints as valid. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com)
On 07.01.26 14:21, Matthias van de Meent wrote: >> In many cases, the idiomatic/generally best way to write a query >> requires a uniqueness check (a SELECT WHERE ... = ANY()/IN or really >> any semi-join when optimized to an inner join, UNION, etc), meaning a >> Unique/HashAggregate node will be added, increasing overhead unless >> there is an explicit unique constraint. An unenforced unique >> constraint would allow developers to use their knowledge of the >> data/previous validation procedures to eliminate the extra node. > Unenforced constraints should be considered invalid (how would we > prove they're valid?), and because the planner is not allowed to rely > on invalid constraints, why would the planner be allowed to remove > these nodes for unenforced-and-therefore-invalid unique constraints? > > It's quite likely that the query output would be incorrect when we > expected the data to be unique for unenforced constraints, while the > data in the table isn't actually unique for that unenforced unique > constraint; and knowingly giving incorrect results is not something I > can agree to. Some other SQL implementations have optional implementation-specific modes on top of NOT ENFORCED that mean, the constraint is not enforced, but you can assume it holds for query planning purposes. This is probably what the thinking here was, but that is not how PostgreSQL currently works, so additional work would need to be done to add this additional mode. There are probably also multiple levels to this. It's one thing to use an unenforced-but-probably-true constraint for things like selectivity estimation, where you just get a worse plan if it's not quite true, but it's another to assume something is unique when it might not be. So there is some research to be done here.
On Tue, Jan 6, 2026 at 1:44 AM Joel Jacobson <joel@compiler.org> wrote: > Can you please share some more details on your use-case? > I'm curious to learn more about this "unique" column. > Don't you ever need to do lookups/joins on it (e.g. WHERE col = ?) The unique column (technically 2 columns that combined are unique) functions as a pretty typical id. Yes, I do run joins on it, but the performance advantage of an additional index is outweighed by the ingest performance hit due to the cardinality of the outer relation (hash joins are typically used) and the relatively small row size.
On Wed, Jan 7, 2026 at 5:22 AM Neil Chen <carpenter.nail.cz@gmail.com> wrote: > If we want the query planner to generate an execution plan as if a column were unique, would setting n_distinct = -1 inthe table statistics achieve the same effect? Setting n_distinct is less clear in describing the data (it isn't tied to the table schema itself and can be ambiguous in whether values are actually totally unique or just close enough, which can complicate things), and, because it only impacts statistics estimations, it still doesn't help in queries where uniqueness is required. Postgres will still add a node to ensure uniqueness in that case. There are also some other limitations with n_distinct (e.g. extended stats are required for multi-column pairs, which further complicates documentation and adds complexity/overhead).
On Wed, Jan 7, 2026 at 8:21 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > I'm concerned they don't actually comply with the SQL standard when > they process data from those tables, if they assume unenforced unique > constraints to always be valid. If you are referring to how they violate rules around consistent query results, some of them include a TRUSTED/RELY option (see the SingleStore link) that allows developers to enable the documentation/statistics features without potentially changing results if the data isn't truly unique. That could be implemented similarly in Postgres to force developers to opt in to potentially inconsistent behaviour while still giving them the chance to take advantage of the other features of UNIQUE NOT ENFORCED. > > 2. Create a new "dummy index" index type. This would not include any > > update triggers and would have an infinite cost to prevent usage in > > query planning, but it would still serve the purpose of proving the > > existence of a unique index. > > It would be faster than a btree, but it'd still have the issue of > adding the overhead of column projection costs, and it'd be a (false?) > target for LR's replication identity, which I'm concerned about. Yeah, I am thinking a constraint only would be the better approach. I don't see any reason why the LR index search couldn't be rewritten to exclude any new flagged dummy index, but it does seem like unnecessary added complexity, and, of course, the additional overhead of the index in general isn't great. > I think this feature would be a loaded footgun, especially if the > planner starts to consider unenforced constraints as valid. As mentioned above, there are ways we could mitigate the potential risks by isolating the more risky functionality in the constraint interface.
On Wed, Jan 7, 2026 at 2:12 PM Peter Eisentraut <peter@eisentraut.org> wrote: > Some other SQL implementations have optional implementation-specific > modes on top of NOT ENFORCED that mean, the constraint is not enforced, > but you can assume it holds for query planning purposes. This is > probably what the thinking here was, but that is not how PostgreSQL > currently works, so additional work would need to be done to add this > additional mode. > > There are probably also multiple levels to this. It's one thing to use > an unenforced-but-probably-true constraint for things like selectivity > estimation, where you just get a worse plan if it's not quite true, but > it's another to assume something is unique when it might not be. So > there is some research to be done here. Yeah, after reading through Matthias's and your comments, I am increasingly convinced that the best option is to add a separate trusted/untrusted flag that would decide whether Postgres allows any query plans that could change results if the data isn't actually unique. It seems feasible, but I haven't worked on implementing it yet.
On Wed, Jan 7, 2026 at 8:39 PM Jacob Jackson <jej.jackson.08@gmail.com> wrote: > On Wed, Jan 7, 2026 at 5:22 AM Neil Chen <carpenter.nail.cz@gmail.com> wrote: > > If we want the query planner to generate an execution plan as if a column were unique, would setting n_distinct = -1in the table statistics achieve the same effect? > > Setting n_distinct is less clear in describing the data (it isn't tied > to the table schema itself and can be ambiguous in whether values are > actually totally unique or just close enough, which can complicate > things), and, because it only impacts statistics estimations, it still > doesn't help in queries where uniqueness is required. Postgres will > still add a node to ensure uniqueness in that case. There are also > some other limitations with n_distinct (e.g. extended stats are > required for multi-column pairs, which further complicates > documentation and adds complexity/overhead). > Just thinking out loud here, but I wonder if you might be able to modify pg_hint_plan to have a "NoUnique" (or "NoOpUnique") hint which causes the planner to either skip any unique nodes, or have them function as just a passthrough. I'm not entirely sure that's doable actually; we don't currently have hints for other types of aggregation nodes, and at the moment I don't think we have any hints that can produce wrong answers, and this one feels like it could... Robert Treat https://xzilla.net
On Wed, Jan 7, 2026 at 11:13 PM Robert Treat <rob@xzilla.net> wrote: > Just thinking out loud here, but I wonder if you might be able to > modify pg_hint_plan to have a "NoUnique" (or "NoOpUnique") hint which > causes the planner to either skip any unique nodes, or have them > function as just a passthrough. This might have its own use cases, but I wouldn't consider that a true replacement for a unique constraint. It still doesn't solve the table metadata documentation problem, and it would require modifying all relevant queries. It also feels like the mental model of everything then becomes a little convoluted (instead of associating unique constraints with the data itself, we are now associating it with queries). > I'm not entirely sure that's doable > actually; we don't currently have hints for other types of aggregation > nodes, and at the moment I don't think we have any hints that can > produce wrong answers, and this one feels like it could... It does seem like the ability of these nodes to change results would introduce difficulties. As I understand, pg_hint_plan's negative hints primarily operate by removing plans with the node and, therefore, forcing Postgres to choose a plan without the node. In this case, I don't think Postgres would present plans without the node in the first place (or, at least, if the planner did, the plans wouldn't be otherwise equivalent) because the query planner, not understanding the unique constraint, treats the plan without the unique node as a fundamentally different query that will return different results.