Обсуждение: Unique Constraints using Non-Unique Indexes

Поиск
Список
Период
Сортировка

Unique Constraints using Non-Unique Indexes

От
Simon Riggs
Дата:
The current Unique constraint relies directly upon a Unique index to
test for uniqueness.

This has two disadvantages: 

* only B-Trees currently support Uniqueness
* We need to create an index on *all* of the columns of the primary key,
which may end up being a very large index as a result

The uniqueness check code searches for the value being inserted and if a
value is found that is visible, then we reject. We currently use the
same index scan key for the uniqueness check as we do for the index
search.

If the uniqueness check used a scan key that consisted of all of the
Primary Key columns, rather than just the index columns then it would be
able to scan through non-unique index entries to check uniqueness.
Interestingly, the current uniqueness check code already scans through
multiple tuples because of the possible existence of multiple row
versions. So we just need to supply a different scan key.

If we extended the definition of a PRIMARY KEY to include an existing
index like this

ALTER TABLE foo ADD PRIMARY KEY (...) USING INDEX index_name;

then we would be able to specify what we want.

This would then allow us to use a Hash Index or other index as the basis
for a Unique Constraint and/or considerably reduce size of indexes.

Frequently the full unique key could be 5 or 6 columns, even though the
leading columns might be sufficiently unique to make this technique
worthwhile. It's also common to want to store the hash() of a value
rather than the value itself, but the hash typically won't be
guaranteeably unique, even though the probability of collisions may be
very low.

Thoughts?

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 
 PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk



Re: Unique Constraints using Non-Unique Indexes

От
Martijn van Oosterhout
Дата:
On Thu, Mar 20, 2008 at 02:35:38PM +0000, Simon Riggs wrote:
> This would then allow us to use a Hash Index or other index as the basis
> for a Unique Constraint and/or considerably reduce size of indexes.

I was under the impression that the reason only b-tree supported unique
indexes was because it could lock one page while to checking visibility
to block concurrent inserts. AIUI other index types (notably gist) would
not be able to easily block concurrent inserts because the place a new
item is entered into the index is not unique nor necessarily deterministic.

Whether hash could support this usage I don't know.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: Unique Constraints using Non-Unique Indexes

От
Simon Riggs
Дата:
On Thu, 2008-03-20 at 18:37 +0100, Martijn van Oosterhout wrote:
> On Thu, Mar 20, 2008 at 02:35:38PM +0000, Simon Riggs wrote:
> > This would then allow us to use a Hash Index or other index as the basis
> > for a Unique Constraint and/or considerably reduce size of indexes.
> 
> I was under the impression that the reason only b-tree supported unique
> indexes was because it could lock one page while to checking visibility
> to block concurrent inserts. AIUI other index types (notably gist) would
> not be able to easily block concurrent inserts because the place a new
> item is entered into the index is not unique nor necessarily deterministic.

The existing code can already move onto other pages. It needs to be able
to do this to support >1 row version(s) in the index. So all the
concepts and code are already there to do what I've suggested.

> Whether hash could support this usage I don't know.

Hash could do it too, but that's other code I've not looked at yet. But
the locking issues are similar are least.

The main barrier to this is really just being able to specify that's
what you want and then have that info passed thru to where its needed.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 
 PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk



Re: Unique Constraints using Non-Unique Indexes

От
Kenneth Marshall
Дата:
On Thu, Mar 20, 2008 at 02:35:38PM +0000, Simon Riggs wrote:
> The current Unique constraint relies directly upon a Unique index to
> test for uniqueness.
> 
> This has two disadvantages: 
> 
> * only B-Trees currently support Uniqueness
> * We need to create an index on *all* of the columns of the primary key,
> which may end up being a very large index as a result
> 
> The uniqueness check code searches for the value being inserted and if a
> value is found that is visible, then we reject. We currently use the
> same index scan key for the uniqueness check as we do for the index
> search.
> 
> If the uniqueness check used a scan key that consisted of all of the
> Primary Key columns, rather than just the index columns then it would be
> able to scan through non-unique index entries to check uniqueness.
> Interestingly, the current uniqueness check code already scans through
> multiple tuples because of the possible existence of multiple row
> versions. So we just need to supply a different scan key.
> 
> If we extended the definition of a PRIMARY KEY to include an existing
> index like this
> 
> ALTER TABLE foo ADD PRIMARY KEY (...) USING INDEX index_name;
> 
> then we would be able to specify what we want.
> 
> This would then allow us to use a Hash Index or other index as the basis
> for a Unique Constraint and/or considerably reduce size of indexes.
> 
> Frequently the full unique key could be 5 or 6 columns, even though the
> leading columns might be sufficiently unique to make this technique
> worthwhile. It's also common to want to store the hash() of a value
> rather than the value itself, but the hash typically won't be
> guaranteeably unique, even though the probability of collisions may be
> very low.
> 
> Thoughts?
> 
> -- 
>   Simon Riggs
>   2ndQuadrant  http://www.2ndQuadrant.com 
> 
>   PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

I like the idea. I am planning to provide unique index support
in the hash index update but wanted to be able to use it for
all of the primary key columns at once. This would allow that
to happen.

Ken


Re: Unique Constraints using Non-Unique Indexes

От
Simon Riggs
Дата:
On Thu, 2008-03-20 at 17:38 +0000, Gregory Stark wrote:
> I don't immediately see any problems aside from reduced concurrency 

Agreed. The index would need to be nearly unique in most cases to make
it sensible. But that's a common situation in complex data models.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 
 PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk



Re: Unique Constraints using Non-Unique Indexes

От
Gregory Stark
Дата:
"Simon Riggs" <simon@2ndquadrant.com> writes:

> If the uniqueness check used a scan key that consisted of all of the
> Primary Key columns, rather than just the index columns then it would be
> able to scan through non-unique index entries to check uniqueness.
> Interestingly, the current uniqueness check code already scans through
> multiple tuples because of the possible existence of multiple row
> versions. So we just need to supply a different scan key.

The tricky part about unique constraints is guaranteeing that only one
transaction can see their insert as "first". If you just did a simple scan and
went ahead if you don't see any conflicts you would risk having two inserters
not see each others insert and go ahead and index their insert and proceed.

As I understand it we normally protect against that by holding a lock on the
first page of the key we're inserting while we perform the uniqueness check.
Could you still do that in this case? I don't immediately see any problems
aside from reduced concurrency but this code isn't really my forté.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning