Обсуждение: Exclusion constraints on partitioned tables
Hello Hackers,
I'm trying to get things going again on my temporal tables work, and 
here is a small patch to move that forward.
It lets you create exclusion constraints on partitioned tables, similar 
to today's rules for b-tree primary keys & unique constraints:
just as we permit a PK on a partitioned table when the PK's columns are 
a superset of the partition keys, so we could also allow an exclusion 
constraint when its columns are a superset of the partition keys.
This patch also requires the matching constraint columns to use equality 
comparisons (`(foo WITH =)`), so it is really equivalent to the existing 
b-tree rule. Perhaps that is more conservative than necessary, but we 
can't permit an arbitrary operator, since some might require testing 
rows that fall into other partitions. For example `(foo WITH <>)` would 
obviously cause problems.
The exclusion constraint may still include other columns beyond the 
partition keys, and those may use equality operators or something else.
This patch is required to support temporal partitioned tables, because 
temporal tables use exclusion constraints as their primary key.
Essentially they are `(id WITH =, valid_at with &&)`. Since the primary 
key is not a b-tree, partitioning them would be forbidden prior to this 
patch. But now you could partition that table on `id`, and we could 
still correctly validate the temporal PK without requiring rows from 
other partitions.
This patch may be helpful beyond just temporal tables (or for DIY 
temporal tables), so it seems worth submitting it separately.
Yours,
-- 
Paul              ~{:-)
pj@illuminatedcomputing.com
			
		Вложения
Paul Jungwirth <pj@illuminatedcomputing.com> writes:
> It lets you create exclusion constraints on partitioned tables, similar 
> to today's rules for b-tree primary keys & unique constraints:
> just as we permit a PK on a partitioned table when the PK's columns are 
> a superset of the partition keys, so we could also allow an exclusion 
> constraint when its columns are a superset of the partition keys.
OK.  AFAICS that works in principle.
> This patch also requires the matching constraint columns to use equality 
> comparisons (`(foo WITH =)`), so it is really equivalent to the existing 
> b-tree rule.
That's not quite good enough: you'd better enforce that it's the same
equality operator (and same collation, if relevant) as is being used
in the partition key.  Remember that we don't have a requirement that
a datatype have only one equality operator; and these days I think
collation can affect equality, too.
Another problem is that while we can safely assume that we know what
BTEqualStrategyNumber means in btree, we can NOT assume that we know
what gist opclass strategy numbers mean: each opclass is free to
define those as it sees fit.  The part of your patch that is looking
at RTEqualStrategyNumber seems dangerously broken to me.
It might work better to consider the operator itself and ask if
it's equality in the same btree opfamily that's used by the
partition key.  (Hm, do we use btree opfamilies for all types of
partitioning?)
Anyway, I think something can be made of this, but you need to be less
fuzzy about matching the equality semantics of the partition key.
            regards, tom lane
			
		On 12/15/22 16:12, Tom Lane wrote:
>> This patch also requires the matching constraint columns to use equality
>> comparisons (`(foo WITH =)`), so it is really equivalent to the existing
>> b-tree rule.
> 
> That's not quite good enough: you'd better enforce that it's the same
> equality operator (and same collation, if relevant) as is being used
> in the partition key.
> [snip]
> It might work better to consider the operator itself and ask if
> it's equality in the same btree opfamily that's used by the
> partition key.
Thank you for taking a look! Here is a comparison on just the operator 
itself.
I included a collation check too, but I'm not sure it's necessary. 
Exclusion constraints don't have a collation per se; it comes from the 
index, and we choose it just a little above in this function. (I'm not 
even sure how to elicit that new error message in a test case.)
I'm not sure what to do about matching the opfamily. In practice an 
exclusion constraint will typically use gist, but the partition key will 
always use btree/hash. You're saying that the equals operator can be 
inconsistent between those access methods? That is surprising to me, but 
I admit op classes/families are still sinking in. (Even prior to this 
patch, isn't the code for hash-based partitions looking up ptkey_eqop 
via the hash opfamily, and then comparing it to idx_eqop looked up via 
the btree opfamily?)
If partitions can only support btree-based exclusion constraints, you 
still wouldn't be able to partition a temporal table, because those 
constraints would always be gist. So I guess what I really want is to 
support gist index constraints on partitioned tables.
Regards,
-- 
Paul              ~{:-)
pj@illuminatedcomputing.com
			
		Вложения
Le vendredi 16 décembre 2022, 06:11:49 CET Paul Jungwirth a écrit : > On 12/15/22 16:12, Tom Lane wrote: > >> This patch also requires the matching constraint columns to use equality > >> comparisons (`(foo WITH =)`), so it is really equivalent to the existing > >> b-tree rule. > > > > That's not quite good enough: you'd better enforce that it's the same > > equality operator (and same collation, if relevant) as is being used > > in the partition key. > > [snip] > > It might work better to consider the operator itself and ask if > > it's equality in the same btree opfamily that's used by the > > partition key. > > Thank you for taking a look! Here is a comparison on just the operator > itself. > I've taken a look at the patch, and I'm not sure why you keep the restriction on the Gist operator being of the RTEqualStrategyNumber strategy. I don't think we have any other place where we expect those strategy numbers to match. For hash it's different, as the hash-equality is the only operator strategy and as such there is no other way to look at it. Can't we just enforce partition_operator == exclusion_operator without adding the RTEqualStrategyNumber for the opfamily into the mix ?
On 1/24/23 06:38, Ronan Dunklau wrote:
> I've taken a look at the patch, and I'm not sure why you keep the restriction
> on the Gist operator being of the RTEqualStrategyNumber strategy. I don't
> think  we have any other place where we expect those strategy numbers to
> match. For hash it's different, as the hash-equality is the only operator
> strategy and as such there is no other way to look at it. Can't we just
> enforce partition_operator == exclusion_operator without adding the
> RTEqualStrategyNumber for the opfamily into the mix ?
Thank you for taking a look! I did some research on the history of the 
code here, and I think I understand Tom's concern about making sure the 
index uses the same equality operator as the partition. I was confused 
about his remarks about the opfamily, but I agree with you that if the 
operator is the same, we should be okay.
I added the code about RTEqualStrategyNumber because that's what we need 
to find an equals operator when the index is GiST (except if it's using 
an opclass from btree_gist; then it needs to be BTEqual again). But then 
I realized that for exclusion constraints we have already figured out 
the operator (in RelationGetExclusionInfo) and put it in 
indexInfo->ii_ExclusionOps. So we can just compare against that. This 
works whether your index uses btree_gist or not.
Here is an updated patch with that change (also rebased).
I also included a more specific error message. If we find a matching 
column in the index but with the wrong operator, we should say so, and 
not say there is no matching column.
Thanks,
-- 
Paul              ~{:-)
pj@illuminatedcomputing.com
			
		Вложения
Le vendredi 17 mars 2023, 17:03:09 CET Paul Jungwirth a écrit : > I added the code about RTEqualStrategyNumber because that's what we need > to find an equals operator when the index is GiST (except if it's using > an opclass from btree_gist; then it needs to be BTEqual again). But then > I realized that for exclusion constraints we have already figured out > the operator (in RelationGetExclusionInfo) and put it in > indexInfo->ii_ExclusionOps. So we can just compare against that. This > works whether your index uses btree_gist or not. > > Here is an updated patch with that change (also rebased). Thanks ! This looks fine to me like this. > > I also included a more specific error message. If we find a matching > column in the index but with the wrong operator, we should say so, and > not say there is no matching column. > I agree that's a nicer improvement. Regards, -- Ronan Dunklau
On 17.03.23 17:03, Paul Jungwirth wrote:
> Thank you for taking a look! I did some research on the history of the 
> code here, and I think I understand Tom's concern about making sure the 
> index uses the same equality operator as the partition. I was confused 
> about his remarks about the opfamily, but I agree with you that if the 
> operator is the same, we should be okay.
> 
> I added the code about RTEqualStrategyNumber because that's what we need 
> to find an equals operator when the index is GiST (except if it's using 
> an opclass from btree_gist; then it needs to be BTEqual again). But then 
> I realized that for exclusion constraints we have already figured out 
> the operator (in RelationGetExclusionInfo) and put it in 
> indexInfo->ii_ExclusionOps. So we can just compare against that. This 
> works whether your index uses btree_gist or not.
> 
> Here is an updated patch with that change (also rebased).
> 
> I also included a more specific error message. If we find a matching 
> column in the index but with the wrong operator, we should say so, and 
> not say there is no matching column.
This looks all pretty good to me.  A few more comments:
It seems to me that many of the test cases added in indexing.sql are 
redundant with create_table.sql/alter_table.sql (or vice versa).  Is 
there a reason for this?
This is not really a problem in your patch, but I think in
-   if (partitioned && (stmt->unique || stmt->primary))
+   if (partitioned && (stmt->unique || stmt->primary || 
stmt->excludeOpNames != NIL))
the stmt->primary is redundant and should be removed.  Right now 
"primary" is always a subset of "unique", but presumably a future patch 
of yours wants to change that.
Furthermore, I think it would be more elegant in your patch if you wrote 
stmt->excludeOpNames without the "== NIL" or "!= NIL", so that it 
becomes a peer of stmt->unique.  (I understand some people don't like 
that style.  But it is already used in that file.)
I would consider rearranging some of the conditionals more as a 
selection of cases, like "is it a unique constraint?", "else, is it an 
exclusion constraint?" -- rather than the current "is it an exclusion 
constraint?, "else, various old code".  For example, instead of
     if (stmt->excludeOpNames != NIL)
         idx_eqop = indexInfo->ii_ExclusionOps[j];
     else
         idx_eqop = get_opfamily_member(..., eq_strategy);
consider
     if (stmt->unique)
         idx_eqop = get_opfamily_member(..., eq_strategy);
     else if (stmt->excludeOpNames)
         idx_eqop = indexInfo->ii_ExclusionOps[j];
     Assert(idx_eqop);
Also, I would push the code
     if (accessMethodId == BTREE_AM_OID)
         eq_strategy = BTEqualStrategyNumber;
further down into the loop, so that you don't have to remember in which 
cases eq_strategy is assigned or not.
(It's also confusing that the eq_strategy variable is used for two 
different things in this function, and that would clean that up.)
Finally, this code
+                           att = TupleDescAttr(RelationGetDescr(rel),
+                                               key->partattrs[i] - 1);
+                           ereport(ERROR,
+                                   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                    errmsg("cannot match partition key 
to index on column \"%s\" using non-equal operator \"%s\".",
+                                           NameStr(att->attname), 
get_opname(indexInfo->ii_ExclusionOps[j]))));
could be simplified by using get_attname().
This is all just a bit of polishing.  I think it would be good to go 
after that.
			
		On Thu, Jul 6, 2023 at 1:03 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> This looks all pretty good to me.  A few more comments:
Thanks for the feedback! New patch attached here. Responses below:
> It seems to me that many of the test cases added in indexing.sql are
> redundant with create_table.sql/alter_table.sql (or vice versa).  Is
> there a reason for this?
Yes, there is some overlap. I think that's just because there was
overlap before, and I didn't want to delete the old tests completely.
But since indexing.sql has a fuller list of tests and is a superset of
the others, this new patch removes the redundant tests from
{create,alter}_table.sql.
Btw speaking of tests, I want to make sure this new feature will still
work when you're using btree_gist and and `EXCLUDE WITH (myint =,
mytsrange &&)` (and not just `(myint4range =, mytsrange &&)`). Some of
my early attempts writing this patch worked w/o btree_gist but not w/
(or vice versa). But as far as I know there's no way to test that in
regress. I wound up writing a private shell script that just does
this:
```
--------
-- test against btree_gist since we can't do that in the postgres
regress test suite:
CREATE EXTENSION btree_gist;
create table partitioned (id int, valid_at tsrange, exclude using gist
(id with =, valid_at with &&)) partition by range (id);
-- should fail with a good error message:
create table partitioned2 (id int, valid_at tsrange, exclude using
gist (id with <>, valid_at with &&)) partition by range (id);
```
Is there some place in the repo to include a test like that? It seems
a little funny to put it in the btree_gist suite, but maybe that's the
right answer.
> This is not really a problem in your patch, but I think in
>
> -   if (partitioned && (stmt->unique || stmt->primary))
> +   if (partitioned && (stmt->unique || stmt->primary ||
> stmt->excludeOpNames != NIL))
>
> the stmt->primary is redundant and should be removed.  Right now
> "primary" is always a subset of "unique", but presumably a future patch
> of yours wants to change that.
Done! I don't think my temporal work changes that primary ⊆ unique. It
does change that some primary/unique constraints will have non-null
excludeOpNames, which will require small changes here eventually. But
that should be part of the temporal patches, not this one.
> Furthermore, I think it would be more elegant in your patch if you wrote
> stmt->excludeOpNames without the "== NIL" or "!= NIL", so that it
> becomes a peer of stmt->unique.  (I understand some people don't like
> that style.  But it is already used in that file.)
Done.
> I would consider rearranging some of the conditionals more as a
> selection of cases, like "is it a unique constraint?", "else, is it an
> exclusion constraint?" -- rather than the current "is it an exclusion
> constraint?, "else, various old code".
Done.
> Also, I would push the code
>
>      if (accessMethodId == BTREE_AM_OID)
>          eq_strategy = BTEqualStrategyNumber;
>
> further down into the loop, so that you don't have to remember in which
> cases eq_strategy is assigned or not.
>
> (It's also confusing that the eq_strategy variable is used for two
> different things in this function, and that would clean that up.)
Agreed that it's confusing. Done.
> Finally, this code
>
> +                           att = TupleDescAttr(RelationGetDescr(rel),
> +                                               key->partattrs[i] - 1);
> +                           ereport(ERROR,
> +                                   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                                    errmsg("cannot match partition key
> to index on column \"%s\" using non-equal operator \"%s\".",
> +                                           NameStr(att->attname),
> get_opname(indexInfo->ii_ExclusionOps[j]))));
>
> could be simplified by using get_attname().
Okay, done. I changed the similar error message just below too.
> This is all just a bit of polishing.  I think it would be good to go
> after that.
Thanks!
--
Paul              ~{:-)
pj@illuminatedcomputing.com
			
		Вложения
On 09.07.23 03:21, Paul A Jungwirth wrote:
>> It seems to me that many of the test cases added in indexing.sql are
>> redundant with create_table.sql/alter_table.sql (or vice versa).  Is
>> there a reason for this?
> 
> Yes, there is some overlap. I think that's just because there was
> overlap before, and I didn't want to delete the old tests completely.
> But since indexing.sql has a fuller list of tests and is a superset of
> the others, this new patch removes the redundant tests from
> {create,alter}_table.sql.
This looks better.
> Btw speaking of tests, I want to make sure this new feature will still
> work when you're using btree_gist and and `EXCLUDE WITH (myint =,
> mytsrange &&)` (and not just `(myint4range =, mytsrange &&)`). Some of
> my early attempts writing this patch worked w/o btree_gist but not w/
> (or vice versa). But as far as I know there's no way to test that in
> regress. I wound up writing a private shell script that just does
> this:
> Is there some place in the repo to include a test like that? It seems
> a little funny to put it in the btree_gist suite, but maybe that's the
> right answer.
I'm not sure what value we would get from testing this with btree_gist, 
but if we wanted to do that, then adding a new test file to the 
btree_gist sql/ directory would seem reasonable to me.
(I would make the test a little bit bigger than you had shown, like 
insert a few values.)
If you want to do that, please send another patch.  Otherwise, I'm ok to 
commit this one.
			
		On Mon, Jul 10, 2023 at 7:05 AM Peter Eisentraut <peter@eisentraut.org> wrote: > I'm not sure what value we would get from testing this with btree_gist, > but if we wanted to do that, then adding a new test file to the > btree_gist sql/ directory would seem reasonable to me. > > (I would make the test a little bit bigger than you had shown, like > insert a few values.) > > If you want to do that, please send another patch. Otherwise, I'm ok to > commit this one. I can get you a patch tonight or tomorrow. I think it's worth it since btree_gist uses different strategy numbers than ordinary gist. Thanks! Paul
On Mon, Jul 10, 2023 at 8:06 AM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote: > > On Mon, Jul 10, 2023 at 7:05 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > I'm not sure what value we would get from testing this with btree_gist, > > but if we wanted to do that, then adding a new test file to the > > btree_gist sql/ directory would seem reasonable to me. > > > > (I would make the test a little bit bigger than you had shown, like > > insert a few values.) > > > > If you want to do that, please send another patch. Otherwise, I'm ok to > > commit this one. > > I can get you a patch tonight or tomorrow. I think it's worth it since > btree_gist uses different strategy numbers than ordinary gist. Patch attached. Regards, Paul
Вложения
On 11.07.23 07:52, Paul A Jungwirth wrote: > On Mon, Jul 10, 2023 at 8:06 AM Paul A Jungwirth > <pj@illuminatedcomputing.com> wrote: >> >> On Mon, Jul 10, 2023 at 7:05 AM Peter Eisentraut <peter@eisentraut.org> wrote: >>> I'm not sure what value we would get from testing this with btree_gist, >>> but if we wanted to do that, then adding a new test file to the >>> btree_gist sql/ directory would seem reasonable to me. >>> >>> (I would make the test a little bit bigger than you had shown, like >>> insert a few values.) >>> >>> If you want to do that, please send another patch. Otherwise, I'm ok to >>> commit this one. >> >> I can get you a patch tonight or tomorrow. I think it's worth it since >> btree_gist uses different strategy numbers than ordinary gist. > > Patch attached. Looks good, committed. I had some second thoughts about the use of get_attname(). It seems the previous code used the dominant style of extracting the attribute name from the open relation handle, so I kept it that way. That's also more efficient than going via the syscache.