Обсуждение: Unique constraints for non-btree indexes

Поиск
Список
Период
Сортировка

Unique constraints for non-btree indexes

От
Martijn van Oosterhout
Дата:
Hi,

Currently due to the way unique constraints are tied to btree there is
no way to allow GiST indexes to do the same thing. The thing I'm
specifically interested in is an index where you insert ranges
(start,end) and if unique, the index will complain if they overlap. As
a side-effect, this may make progress toward the goal of deferrable
unique indexes.

Part of the solution is to remove the layering violation from the btree
code, it really shouldn't be accessing the heap directly. What I'm
proposing is to move the bulk of _bt_check_unique into a new function
(say check_unique_index) in the general index machinary and have the
b-tree code do just:

check_unique_index( ctid of inserting tuple, ctid of possibly conflicting tuple)

The point being that GiST indexes could use exactly the same function
to check for duplicates. The function would return InvalidTransactionId
if there's no conflict, or an actual transaction id to wait on, just
like the btree code does now.

It would require some changes to the GiST code since a lot more of the
index may need to be checked for duplicates. I suppose in the general
case, since a key can appear in multiple places, the concurrency issues
could be difficult. I suppose you would insert your key first, then
check for duplicates thus ensuring that at least one of the two
conflicting transactions will see it.

Now, one side-effect is that you could build deferrable unique
constraints on top of this by having the check function always return
InvalidTransactionId but storing the conflicts for later checking. But
I first want to know if there are any real issues with the above.

Any thoughts?
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Unique constraints for non-btree indexes

От
"Jonah H. Harris"
Дата:
I thought gistinsert had checkUnique, it was just ifdef'd out because there was no code to enforce it... and as such, during bootstrap it was marked as amcanunique = false.  Would it be that hard to enable it?


On 1/18/06, Martijn van Oosterhout <kleptog@svana.org > wrote:
Hi,

Currently due to the way unique constraints are tied to btree there is
no way to allow GiST indexes to do the same thing. The thing I'm
specifically interested in is an index where you insert ranges
(start,end) and if unique, the index will complain if they overlap. As
a side-effect, this may make progress toward the goal of deferrable
unique indexes.

Part of the solution is to remove the layering violation from the btree
code, it really shouldn't be accessing the heap directly. What I'm
proposing is to move the bulk of _bt_check_unique into a new function
(say check_unique_index) in the general index machinary and have the
b-tree code do just:

check_unique_index( ctid of inserting tuple, ctid of possibly conflicting tuple)

The point being that GiST indexes could use exactly the same function
to check for duplicates. The function would return InvalidTransactionId
if there's no conflict, or an actual transaction id to wait on, just
like the btree code does now.

It would require some changes to the GiST code since a lot more of the
index may need to be checked for duplicates. I suppose in the general
case, since a key can appear in multiple places, the concurrency issues
could be difficult. I suppose you would insert your key first, then
check for duplicates thus ensuring that at least one of the two
conflicting transactions will see it.

Now, one side-effect is that you could build deferrable unique
constraints on top of this by having the check function always return
InvalidTransactionId but storing the conflicts for later checking. But
I first want to know if there are any real issues with the above.

Any thoughts?
--
Martijn van Oosterhout   < kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDzkmWIB7bNG8LQkwRArgRAJ9E34krswmsSEsMv6h/1d1KJc7crACgg1kp
m32u4QtjXCqd53fjUP6WKUE=
=E0+I
-----END PGP SIGNATURE-----



Re: Unique constraints for non-btree indexes

От
Martijn van Oosterhout
Дата:
On Wed, Jan 18, 2006 at 09:15:04AM -0500, Jonah H. Harris wrote:
> I thought gistinsert had checkUnique, it was just ifdef'd out because there
> was no code to enforce it... and as such, during bootstrap it was marked as
> amcanunique = false.  Would it be that hard to enable it?

Well, it has the argument to gistinsert but it is commented out and
there is no other reference to unique anywhere in the GiST code. Once
the support infrastructure is there we can talk about enabling it. At
the very least we need to decide how to indicate what "unique" is.

For example: saying the two ranges (1,3) and (2,4) cannot co-exist in
the same index is not really what most people would consider the
behaviour of a "unique" index. Indeed, for any particular data-type,
there may be multiple ways of defining a conflict. For 2-D objects it
may refer to having no objects overlap, but it could also refer to no
overlaps in the X or Y axes.

I guess what you're talking about is a constrained index, of which a
unique index is just a particular type. I suppose the actual constraint
would be one of the operators defined for the operator class (since
whatever the test is, it needs to be indexable). Although some would
obviously be more useful than others...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Unique constraints for non-btree indexes

От
"Jonah H. Harris"
Дата:
I think I understand what you're saying, just that I don't think the btree index has anything to do with it.

The extensibility is there for indexes to handle uniques in any way they choose.  If you wanted to add a common unique index checking function for GIST, I'd just add it to GIST.  It just seems to me like the access methods should keep the handling internal to themselves.

On the chance that I'm not be understanding what you're saying, sorry.


On 1/18/06, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Wed, Jan 18, 2006 at 09:15:04AM -0500, Jonah H. Harris wrote:
> I thought gistinsert had checkUnique, it was just ifdef'd out because there
> was no code to enforce it... and as such, during bootstrap it was marked as
> amcanunique = false.  Would it be that hard to enable it?

Well, it has the argument to gistinsert but it is commented out and
there is no other reference to unique anywhere in the GiST code. Once
the support infrastructure is there we can talk about enabling it. At
the very least we need to decide how to indicate what "unique" is.

For example: saying the two ranges (1,3) and (2,4) cannot co-exist in
the same index is not really what most people would consider the
behaviour of a "unique" index. Indeed, for any particular data-type,
there may be multiple ways of defining a conflict. For 2-D objects it
may refer to having no objects overlap, but it could also refer to no
overlaps in the X or Y axes.

I guess what you're talking about is a constrained index, of which a
unique index is just a particular type. I suppose the actual constraint
would be one of the operators defined for the operator class (since
whatever the test is, it needs to be indexable). Although some would
obviously be more useful than others...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org >   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDzm26IB7bNG8LQkwRAiUCAJ9MURp34CmKaxWFPrESKqvx2DDsYQCePSLv
JrKzcRQU7wf25oDv42Oeosc=
=y0WG
-----END PGP SIGNATURE-----



Re: Unique constraints for non-btree indexes

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> check_unique_index( ctid of inserting tuple, ctid of possibly
> conflicting tuple)

I agree it's pretty ugly to have the index AM directly poking into
the heap, but adding a level of subroutine doesn't really make that
a whole lot nicer :-(.

In any case, you've underestimated the amount of coupling here: if
the conflicting tuple is dead, _bt_check_unique also wants to know
just how dead it is, so it can possibly set LP_DELETE on the old index
entry.

> Now, one side-effect is that you could build deferrable unique
> constraints on top of this by having the check function always return
> InvalidTransactionId but storing the conflicts for later checking.

I think this is not as easy as all that; consider race conditions
against VACUUM for instance (the tuples might not be there anymore
when you want to check the conflict).  Also, we really do want to go
back and set LP_DELETE if the conflict tuple is sufficiently dead.
Having that not happen is unappetizing, because you could end up
repeating the check a large number of times over successive updates.
N updates will take O(N^2) time.

My own thoughts about deferred unique checks have been along the lines
of storing the possibly-conflicting key value when the initial check
notes a problem, and then repeating the index search at commit.
        regards, tom lane


Re: Unique constraints for non-btree indexes

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> I guess what you're talking about is a constrained index, of which a
> unique index is just a particular type. I suppose the actual constraint
> would be one of the operators defined for the operator class (since
> whatever the test is, it needs to be indexable). Although some would
> obviously be more useful than others...

I think the generalization that would be appropriate for GIST is that
a "unique" index guarantees there are no two entries x, y such that
x ~ y, where ~ is some boolean operator nominated by the opclass.  We'd
probably have to insist that ~ is commutative (x ~ y iff y ~ x).

Concurrent insertion into a unique GIST index seems a bit nasty.  In
btree we can identify a unique page to lock for any given key value
to ensure that no one else is concurrently inserting a conflicting
key, thus usually allowing concurrent insertions of different keys.
But I don't see how you do that for an arbitrary ~ operator.
        regards, tom lane


Re: Unique constraints for non-btree indexes

От
Martijn van Oosterhout
Дата:
On Wed, Jan 18, 2006 at 04:10:16PM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > check_unique_index( ctid of inserting tuple, ctid of possibly
> > conflicting tuple)
>
> I agree it's pretty ugly to have the index AM directly poking into
> the heap, but adding a level of subroutine doesn't really make that
> a whole lot nicer :-(.

Well, the raionale is that in theory the same logic would be applied
for GiST indexes, so it'd be nice to do it in one place rather than
repeat it for each index AM.

> In any case, you've underestimated the amount of coupling here: if
> the conflicting tuple is dead, _bt_check_unique also wants to know
> just how dead it is, so it can possibly set LP_DELETE on the old index
> entry.

Hmm, ok. There's more info that the index AM would like, but the same
info would be required for both GiST and b-tree, no? (assuming GiST has
the same delete optimisation)

> My own thoughts about deferred unique checks have been along the lines
> of storing the possibly-conflicting key value when the initial check
> notes a problem, and then repeating the index search at commit.

Well, I didn't want to exclude that possibility. Racing with VACUUM is
a tricky one. If we keep an array of ctid in memory, we need to know if
VACUUM removes one of them. OTOH, the recheck will then return either
a blank or a tuple which definitly doesn't match.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Unique constraints for non-btree indexes

От
Martijn van Oosterhout
Дата:
On Wed, Jan 18, 2006 at 04:18:10PM -0500, Tom Lane wrote:
> I think the generalization that would be appropriate for GIST is that
> a "unique" index guarantees there are no two entries x, y such that
> x ~ y, where ~ is some boolean operator nominated by the opclass.  We'd
> probably have to insist that ~ is commutative (x ~ y iff y ~ x).

Commutative, that's the criteria I was looking for. To be senseble for
this purpose, the operator has to be commutative (the commutator is
itself). This works for b-tree by including = and excluding < and >.
Similarly for GiST indexes, contains no, overlaps yes. That's a fairly
easy test.

> Concurrent insertion into a unique GIST index seems a bit nasty.  In
> btree we can identify a unique page to lock for any given key value
> to ensure that no one else is concurrently inserting a conflicting
> key, thus usually allowing concurrent insertions of different keys.
> But I don't see how you do that for an arbitrary ~ operator.

Well, the best I could come up with was to just do the insert for value
X and then do a full index scan for X across the given constraint
operator (~). Any matches would need to go through the
check_unique_index function as defined earlier. The only issue is that
we can't really do easy optimisation. In the case of deferred
constraints you can't even remember where in the tree you were because
new keys could be added anywhere so you always have to do from the top.

The issue I get is deadlocks:

1. Process A inserts value X
2. Process B inserts value Y   (where X ~ Y is true)
3. Process A begins scan, finds Y and waits for B
4. Process B begins scan, finds X and waits for A

Oops. The only way I can think of solving that is by marking the
entries tentative until the scan is complete and provide a way of
resolving conflicts between two tentative entries. Requires more
thinking.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Unique constraints for non-btree indexes

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Wed, Jan 18, 2006 at 04:18:10PM -0500, Tom Lane wrote:
>> In btree we can identify a unique page to lock for any given key value
>> to ensure that no one else is concurrently inserting a conflicting
>> key, thus usually allowing concurrent insertions of different keys.
>> But I don't see how you do that for an arbitrary ~ operator.

> The issue I get is deadlocks:

Right, the deadlock risk is exactly the reason you need some secret
sauce or other.  Btree's page-level lock ensures that two insertions of
conflicting keys can't overlap (even if they ultimately get stored on
different pages).  That's not the only way to fix this but it's a pretty
good way.

BTW, the deadlock risk also applies to deferred uniqueness checks.
Again, in btree it's possible to avoid this if you do a fresh indexscan
(and take a lock on the first scanned page while you do that).  If you
try to do it without consulting the index then you need some other way
to break "ties".
        regards, tom lane


Re: Unique constraints for non-btree indexes

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Martijn van Oosterhout <kleptog@svana.org> writes:
> > I guess what you're talking about is a constrained index, of which a
> > unique index is just a particular type. I suppose the actual constraint
> > would be one of the operators defined for the operator class (since
> > whatever the test is, it needs to be indexable). Although some would
> > obviously be more useful than others...
> 
> I think the generalization that would be appropriate for GIST is that
> a "unique" index guarantees there are no two entries x, y such that
> x ~ y, where ~ is some boolean operator nominated by the opclass.  We'd
> probably have to insist that ~ is commutative (x ~ y iff y ~ x).

I have no big contribution here. I just want to say this is a cool idea.
These Generalized uniqueish constraints could make a lot of neat things
possible.

-- 
greg



Re: Unique constraints for non-btree indexes

От
Martijn van Oosterhout
Дата:
On Wed, Jan 18, 2006 at 05:27:29PM -0500, Tom Lane wrote:
> Right, the deadlock risk is exactly the reason you need some secret
> sauce or other.  Btree's page-level lock ensures that two insertions of
> conflicting keys can't overlap (even if they ultimately get stored on
> different pages).  That's not the only way to fix this but it's a pretty
> good way.

Ok, I'm not that great with locking issues, but how about this:

1. In the root page of the GiST index you store a counter, let's call
it the Insertion ID (IID).
2. An index tuple has a state where it is not visible yet and contains
an IID.
3. When you go to insert, you follow the normal GiST method all the way
down to the leaf node.
4. Once you know where you're going to put it, take an exclusive lock
on the page.
5. Now, take an exclusive lock on the root page, read the IID (this one
is yours) and store the next one back in the root page.
6. Now write the index tuple to the leaf page recording your IID.
7. Release the root page lock, then the leaf page lock.

The thing to note is that once someone has gotten a particular IID, all
IIDs that are less than it will already have been written to the index
leaf pages. Also, there should be no risk of deadlock since we lock the
pages in a consistant order.

8. Do a normal index scan with your ~ operator. Ignore any provisional
tuples with IID greater than yours. If a match has IID less than yours
you may block on it using the same logic as for b-tree indexes.
9. Once the scan in completed and you know that there are no duplicates
with your IID or less, make the tuple fully visible (for your
transaction anyway). At this point the IID is no longer relevent and
you can forget it.

What I tried to acheive was avoiding holding any locks while doing
scans, since for GiST indexes they may cover a lot of ground.

Perhaps a better name for IID is Generation ID?

Storing the IID could take extra space, but you could probably overlap
it with the ctid since no-one's going to lookup the data tuple before
it's fully visible.

OTOH, if the backend dies before completing the process, how does one
clean up the provisional index tuple? The ctid would provide a way for
VACUUM to know when to remove it.

> BTW, the deadlock risk also applies to deferred uniqueness checks.
> Again, in btree it's possible to avoid this if you do a fresh indexscan
> (and take a lock on the first scanned page while you do that).  If you
> try to do it without consulting the index then you need some other way
> to break "ties".

I think the above should work for deferred checks as well, as long as
you store the list as "tuples to check" for each inserted tuple. These
lists should form a directed acyclic graph so there should be no risk
of deadlock. Conflicts with VACUUM is still an issue.

Any thoughts?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.