Обсуждение: Add primary key/unique constraint using prefix columns of an index

Поиск
Список
Период
Сортировка

Add primary key/unique constraint using prefix columns of an index

От
Jeff Janes
Дата:
Now that there are index only scans, there is a use case for having a
composite index which has the primary key or a unique key as the
prefix column(s) but with extra columns after that.  Currently you
would also need another index with exactly the primary/unique key,
which seems like a waste of storage and maintenance.

Should there be a way to declare a "unique" index with the unique
property applying to a prefix of the indexed columns/expression?  And
having that, a way to turn that prefix into a primary key constraint?

Of course this is easier said then done, but is there some reason for
it not to be a to-do item?

Thanks,

Jeff


Re: Add primary key/unique constraint using prefix columns of an index

От
Simon Riggs
Дата:
On 22 May 2012 18:24, Jeff Janes <jeff.janes@gmail.com> wrote:
> Now that there are index only scans, there is a use case for having a
> composite index which has the primary key or a unique key as the
> prefix column(s) but with extra columns after that.  Currently you
> would also need another index with exactly the primary/unique key,
> which seems like a waste of storage and maintenance.
>
> Should there be a way to declare a "unique" index with the unique
> property applying to a prefix of the indexed columns/expression?  And
> having that, a way to turn that prefix into a primary key constraint?
>
> Of course this is easier said then done, but is there some reason for
> it not to be a to-do item?

+1

Very useful

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Add primary key/unique constraint using prefix columns of an index

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> Now that there are index only scans, there is a use case for having a
> composite index which has the primary key or a unique key as the
> prefix column(s) but with extra columns after that.  Currently you
> would also need another index with exactly the primary/unique key,
> which seems like a waste of storage and maintenance.

> Should there be a way to declare a "unique" index with the unique
> property applying to a prefix of the indexed columns/expression?  And
> having that, a way to turn that prefix into a primary key constraint?

> Of course this is easier said then done, but is there some reason for
> it not to be a to-do item?

Um ... other than it being ugly as sin?  I can't say that I can get
excited about this concept.  It'd be better to work on index-organized
tables, which is really more or less what you're wishing for here.
Duplicating most of a table into an index is always going to be a loser
in the end because of the redundant storage.
        regards, tom lane


Re: Add primary key/unique constraint using prefix columns of an index

От
Robert Haas
Дата:
On Tue, May 22, 2012 at 1:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> Now that there are index only scans, there is a use case for having a
>> composite index which has the primary key or a unique key as the
>> prefix column(s) but with extra columns after that.  Currently you
>> would also need another index with exactly the primary/unique key,
>> which seems like a waste of storage and maintenance.
>
>> Should there be a way to declare a "unique" index with the unique
>> property applying to a prefix of the indexed columns/expression?  And
>> having that, a way to turn that prefix into a primary key constraint?
>
>> Of course this is easier said then done, but is there some reason for
>> it not to be a to-do item?
>
> Um ... other than it being ugly as sin?  I can't say that I can get
> excited about this concept.  It'd be better to work on index-organized
> tables, which is really more or less what you're wishing for here.
> Duplicating most of a table into an index is always going to be a loser
> in the end because of the redundant storage.

An index on pgbench_accounts (aid, abalance) is the same size as an
index on pgbench_accounts (aid), but even if it were larger, there's
no theoretical reason it couldn't have enough utility to justify its
existence.   A bigger problem is that creating such an index turns all
of pgbench's write traffic from HOT updates into non-HOT updates,
which means this is probably only going to be a win if the write
volume is miniscule.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Add primary key/unique constraint using prefix columns of an index

От
Jeff Janes
Дата:
On Tue, May 22, 2012 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> Now that there are index only scans, there is a use case for having a
>> composite index which has the primary key or a unique key as the
>> prefix column(s) but with extra columns after that.  Currently you
>> would also need another index with exactly the primary/unique key,
>> which seems like a waste of storage and maintenance.
>
>> Should there be a way to declare a "unique" index with the unique
>> property applying to a prefix of the indexed columns/expression?  And
>> having that, a way to turn that prefix into a primary key constraint?
>
>> Of course this is easier said then done, but is there some reason for
>> it not to be a to-do item?
>
> Um ... other than it being ugly as sin?  I can't say that I can get
> excited about this concept.  It'd be better to work on index-organized
> tables, which is really more or less what you're wishing for here.

IOT would probably be a nice feature too, but adding one more
strategically chosen column to an index is quite different from adding
every column into the index.  At least in the general case.

Cheers,

Jeff


Re: Add primary key/unique constraint using prefix columns of an index

От
Vik Reykja
Дата:
On Tue, May 22, 2012 at 1:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 22 May 2012 18:24, Jeff Janes <jeff.janes@gmail.com> wrote:
> Now that there are index only scans, there is a use case for having a
> composite index which has the primary key or a unique key as the
> prefix column(s) but with extra columns after that.  Currently you
> would also need another index with exactly the primary/unique key,
> which seems like a waste of storage and maintenance.
>
> Should there be a way to declare a "unique" index with the unique
> property applying to a prefix of the indexed columns/expression?  And
> having that, a way to turn that prefix into a primary key constraint?
>
> Of course this is easier said then done, but is there some reason for
> it not to be a to-do item?

+1

Very useful


Semi-private note to Simon: isn't this pretty much what I was advocating at the London meetup last month?

Re: Add primary key/unique constraint using prefix columns of an index

От
Simon Riggs
Дата:
On 22 May 2012 19:01, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 22, 2012 at 1:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Jeff Janes <jeff.janes@gmail.com> writes:
>>> Now that there are index only scans, there is a use case for having a
>>> composite index which has the primary key or a unique key as the
>>> prefix column(s) but with extra columns after that.  Currently you
>>> would also need another index with exactly the primary/unique key,
>>> which seems like a waste of storage and maintenance.
>>
>>> Should there be a way to declare a "unique" index with the unique
>>> property applying to a prefix of the indexed columns/expression?  And
>>> having that, a way to turn that prefix into a primary key constraint?
>>
>>> Of course this is easier said then done, but is there some reason for
>>> it not to be a to-do item?
>>
>> Um ... other than it being ugly as sin?  I can't say that I can get
>> excited about this concept.  It'd be better to work on index-organized
>> tables, which is really more or less what you're wishing for here.
>> Duplicating most of a table into an index is always going to be a loser
>> in the end because of the redundant storage.
>
> An index on pgbench_accounts (aid, abalance) is the same size as an
> index on pgbench_accounts (aid), but even if it were larger, there's
> no theoretical reason it couldn't have enough utility to justify its
> existence.

Agreed

>  A bigger problem is that creating such an index turns all
> of pgbench's write traffic from HOT updates into non-HOT updates,
> which means this is probably only going to be a win if the write
> volume is miniscule.

Not sure whether you see that as an argument against the proposal.
This argument applies to any index. In particular covered indexes are
specifically encouraged by index only scans, so is not a reason to
avoid implementing the feature as Jeff describes.

The main reason for the feature as described by Jeff is that it avoids
having 2 indexes when only one is required. In 9.2, with index only
scans, Jeff showed elsewhere that we can get an amazing speed up by
having a covered index. However, what Jeff is noticing is that he
needs 2 indexes on the table: 1 PK on (aid) and another index on (aid,
abalance). The first index can be avoided altogether, allowing a good
improvement in cache efficiency and general performance.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Add primary key/unique constraint using prefix columns of an index

От
Simon Riggs
Дата:
On 22 May 2012 18:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> It'd be better to work on index-organized tables

My earlier analysis showed that IOTs are essentially the same thing as
block-level indexes, referred to as GITs by Heikki. (Robert referred
to these as Lossy Indexes recently, which was not the case - that
aspect was exactly the reason for rejection previously, so we should
not retread that path - indexes can operate at block level without
being lossy).

The number of index pointers is identical in each case, so IOTs are
not any more efficient in terms of space usage or I/O.

IOTs are much more difficult to implement, so I can't see any reason
to work on them. For example, having heap rows migrate on a block
split will cause havoc with our index implementation. We haven't
worked out how to re-join blocks that have split while maintaining
concurrency, so IOTs would require some pretty drastic repacking with
a severe lock type. Please lets avoid IOTs.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Add primary key/unique constraint using prefix columns of an index

От
Robert Haas
Дата:
On Tue, May 22, 2012 at 5:26 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>  A bigger problem is that creating such an index turns all
>> of pgbench's write traffic from HOT updates into non-HOT updates,
>> which means this is probably only going to be a win if the write
>> volume is miniscule.
>
> Not sure whether you see that as an argument against the proposal.
> This argument applies to any index. In particular covered indexes are
> specifically encouraged by index only scans, so is not a reason to
> avoid implementing the feature as Jeff describes.

I don't object to the feature, but I think it's real-world utility
will be more limited than we might hope.   When covering indexes are
not in play, someone might choose to index only, say, the primary key.And maybe the primary key doesn't change very
often,so HOT still 
applies to nearly all updates.  But then when they try to make a
covering index, they're now indexing columns that they *do* update
pretty regularly.  Now the gain from index-only scans is fighting with
the loss from some updates no longer being HOT.  I think in many cases
losing the benefit of HOT will cost more than the covering index
gains, at least in 9.2.  I hope we'll be able to improve it in future
releases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Add primary key/unique constraint using prefix columns of an index

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I don't object to the feature, but I think it's real-world utility
> will be more limited than we might hope.   When covering indexes are
> not in play, someone might choose to index only, say, the primary key.
>  And maybe the primary key doesn't change very often, so HOT still
> applies to nearly all updates.  But then when they try to make a
> covering index, they're now indexing columns that they *do* update
> pretty regularly.  Now the gain from index-only scans is fighting with
> the loss from some updates no longer being HOT.  I think in many cases
> losing the benefit of HOT will cost more than the covering index
> gains, at least in 9.2.  I hope we'll be able to improve it in future
> releases.

Well, if the table is getting enough update traffic that it matters much
whether your updates are HOT or not, then index-only scans are probably
a dead letter anyhow, because too small a fraction of the pages will be
all-visible.  I think the IOS feature is only good for something on
read-mostly tables.
        regards, tom lane


Re: Add primary key/unique constraint using prefix columns of an index

От
Jeff Janes
Дата:
On Tue, May 22, 2012 at 11:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 22, 2012 at 1:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Jeff Janes <jeff.janes@gmail.com> writes:
>>> Now that there are index only scans, there is a use case for having a
>>> composite index which has the primary key or a unique key as the
>>> prefix column(s) but with extra columns after that.  Currently you
>>> would also need another index with exactly the primary/unique key,
>>> which seems like a waste of storage and maintenance.
>>
>>> Should there be a way to declare a "unique" index with the unique
>>> property applying to a prefix of the indexed columns/expression?  And
>>> having that, a way to turn that prefix into a primary key constraint?
>>
>>> Of course this is easier said then done, but is there some reason for
>>> it not to be a to-do item?
>>
>> Um ... other than it being ugly as sin?  I can't say that I can get
>> excited about this concept.  It'd be better to work on index-organized
>> tables, which is really more or less what you're wishing for here.
>> Duplicating most of a table into an index is always going to be a loser
>> in the end because of the redundant storage.
>
> An index on pgbench_accounts (aid, abalance) is the same size as an
> index on pgbench_accounts (aid), but even if it were larger, there's
> no theoretical reason it couldn't have enough utility to justify its
> existence.   A bigger problem is that creating such an index turns all
> of pgbench's write traffic from HOT updates into non-HOT updates,
> which means this is probably only going to be a win if the write
> volume is miniscule.

That seems overly pessimistic to me.  pgbench_accounts only has one
index on it, and that index is already being used to find the row in
the first place, so the relevant leaf block is already in memory.  If
you have a table with 12 indexes on it, then the cost of non-HOT would
be much higher.  But then again, with that number of indexes it is
probably already non-HOT anyway.

Since pgbench does not actually attempt to violate the PK constraint,
I can drop it without altering the behavior of the system.  This
neglects the overhead of checking the "prefix" constraint were that to
be possible, but that overhead should be almost entirely CPU, and so
is negligible to this IO bound workload.

I'm running some tests where I mix the work load of pgbench by doing
"TPC-B (sort of)" transaction mixed in with a variable number of
SELECT-only transactions, at a ratio varying between 1:0 to 1:10.

It is often said that the default pgbench is an unrealistically
write-heavy workload.  So mixing in some SELECT-only is probably only
going to improve its real-world alignment.  In fact I wondering if it
would make sense to add a feature to pgbench to make such admixture
easy to do, rather than the current pain of creating multiple sql
files, specifying a bunch of -f switches in various ratios, and
remembering to always specify the correct -s flag.


From preliminary test it looks like an index on (aid, abalance) wins
at all ratios from 1:1 to 1:10, and at 1:0 it is mostly a toss up.  I
still want to do a few overnight runs to see how the decay in the
visibility map, and perhaps autovacuum kicking in or failing to kick
in, effect things.


Cheers,

Jeff


Re: Add primary key/unique constraint using prefix columns of an index

От
Robert Haas
Дата:
On Wed, May 23, 2012 at 2:00 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, May 22, 2012 at 11:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, May 22, 2012 at 1:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Jeff Janes <jeff.janes@gmail.com> writes:
>>>> Now that there are index only scans, there is a use case for having a
>>>> composite index which has the primary key or a unique key as the
>>>> prefix column(s) but with extra columns after that.  Currently you
>>>> would also need another index with exactly the primary/unique key,
>>>> which seems like a waste of storage and maintenance.
>>>
>>>> Should there be a way to declare a "unique" index with the unique
>>>> property applying to a prefix of the indexed columns/expression?  And
>>>> having that, a way to turn that prefix into a primary key constraint?
>>>
>>>> Of course this is easier said then done, but is there some reason for
>>>> it not to be a to-do item?
>>>
>>> Um ... other than it being ugly as sin?  I can't say that I can get
>>> excited about this concept.  It'd be better to work on index-organized
>>> tables, which is really more or less what you're wishing for here.
>>> Duplicating most of a table into an index is always going to be a loser
>>> in the end because of the redundant storage.
>>
>> An index on pgbench_accounts (aid, abalance) is the same size as an
>> index on pgbench_accounts (aid), but even if it were larger, there's
>> no theoretical reason it couldn't have enough utility to justify its
>> existence.   A bigger problem is that creating such an index turns all
>> of pgbench's write traffic from HOT updates into non-HOT updates,
>> which means this is probably only going to be a win if the write
>> volume is miniscule.
>
> That seems overly pessimistic to me.  pgbench_accounts only has one
> index on it, and that index is already being used to find the row in
> the first place, so the relevant leaf block is already in memory.  If
> you have a table with 12 indexes on it, then the cost of non-HOT would
> be much higher.  But then again, with that number of indexes it is
> probably already non-HOT anyway.
>
> Since pgbench does not actually attempt to violate the PK constraint,
> I can drop it without altering the behavior of the system.  This
> neglects the overhead of checking the "prefix" constraint were that to
> be possible, but that overhead should be almost entirely CPU, and so
> is negligible to this IO bound workload.
>
> I'm running some tests where I mix the work load of pgbench by doing
> "TPC-B (sort of)" transaction mixed in with a variable number of
> SELECT-only transactions, at a ratio varying between 1:0 to 1:10.
>
> It is often said that the default pgbench is an unrealistically
> write-heavy workload.  So mixing in some SELECT-only is probably only
> going to improve its real-world alignment.  In fact I wondering if it
> would make sense to add a feature to pgbench to make such admixture
> easy to do, rather than the current pain of creating multiple sql
> files, specifying a bunch of -f switches in various ratios, and
> remembering to always specify the correct -s flag.
>
> From preliminary test it looks like an index on (aid, abalance) wins
> at all ratios from 1:1 to 1:10, and at 1:0 it is mostly a toss up.  I
> still want to do a few overnight runs to see how the decay in the
> visibility map, and perhaps autovacuum kicking in or failing to kick
> in, effect things.

Interesting!  If that holds up under more careful testing, it would be
a great result.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Add primary key/unique constraint using prefix columns of an index

От
Simon Riggs
Дата:
On 23 May 2012 18:13, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 22, 2012 at 5:26 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>  A bigger problem is that creating such an index turns all
>>> of pgbench's write traffic from HOT updates into non-HOT updates,
>>> which means this is probably only going to be a win if the write
>>> volume is miniscule.
>>
>> Not sure whether you see that as an argument against the proposal.
>> This argument applies to any index. In particular covered indexes are
>> specifically encouraged by index only scans, so is not a reason to
>> avoid implementing the feature as Jeff describes.
>
> I don't object to the feature, but I think it's real-world utility
> will be more limited than we might hope.   When covering indexes are
> not in play, someone might choose to index only, say, the primary key.
>  And maybe the primary key doesn't change very often, so HOT still
> applies to nearly all updates.  But then when they try to make a
> covering index, they're now indexing columns that they *do* update
> pretty regularly.  Now the gain from index-only scans is fighting with
> the loss from some updates no longer being HOT.  I think in many cases
> losing the benefit of HOT will cost more than the covering index
> gains, at least in 9.2.  I hope we'll be able to improve it in future
> releases.

Seems like very poor logic to me. Why would adding a column to an
index mean that column was heavily updated?

HOT provides a negative effect if you index certain columns.
IOS provides a positive effect if you add columns to an index, and you
have a high percentage of access to now-readonly data.

The area of effect of those two things are not mutually exclusive, not
even close. Or put another way HOT has got nothing at all to do with
covered indexes.

The proposal allows us to have 1 index instead of 2, with obvious
gains in performance. This only has benefit if IOS are effective. If
we believe them to be effective, and I really hope you believe that,
then this will also have benefit to in all the same cases.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Add primary key/unique constraint using prefix columns of an index

От
Jeff Davis
Дата:
On Tue, 2012-05-22 at 10:24 -0700, Jeff Janes wrote:
> Now that there are index only scans, there is a use case for having a
> composite index which has the primary key or a unique key as the
> prefix column(s) but with extra columns after that.  Currently you
> would also need another index with exactly the primary/unique key,
> which seems like a waste of storage and maintenance.
> 
> Should there be a way to declare a "unique" index with the unique
> property applying to a prefix of the indexed columns/expression?  And
> having that, a way to turn that prefix into a primary key constraint?
> 
> Of course this is easier said then done, but is there some reason for
> it not to be a to-do item?

Technically, this can be done today using exclusion constraints if you
define an indexable operator that always returns true.

A similar idea, which I brought up during 9.0 development, is that it
might be useful to have one index that can enforce two UNIQUE
constraints. For instance, an index on (a,b,c) could be used to enforce
UNIQUE(a,b) and UNIQUE(a,c) using the exclusion constraint mechanism.

I didn't offer a lot of convincing evidence of practical value, so it
was left out. But it might be worth a second look.

Either way, it seems like the deviations from normal UNIQUE would be
better expressed in terms of an exclusion constraint, which offers a
little more room in the language.

Regards,Jeff Davis



Re: Add primary key/unique constraint using prefix columns of an index

От
Jeff Janes
Дата:
On Wed, May 23, 2012 at 11:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, May 23, 2012 at 2:00 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>
>> I'm running some tests where I mix the work load of pgbench by doing
>> "TPC-B (sort of)" transaction mixed in with a variable number of
>> SELECT-only transactions, at a ratio varying between 1:0 to 1:10.
>>
>> It is often said that the default pgbench is an unrealistically
>> write-heavy workload.  So mixing in some SELECT-only is probably only
>> going to improve its real-world alignment.  In fact I wondering if it
>> would make sense to add a feature to pgbench to make such admixture
>> easy to do, rather than the current pain of creating multiple sql
>> files, specifying a bunch of -f switches in various ratios, and
>> remembering to always specify the correct -s flag.
>>
>> From preliminary test it looks like an index on (aid, abalance) wins
>> at all ratios from 1:1 to 1:10, and at 1:0 it is mostly a toss up.  I
>> still want to do a few overnight runs to see how the decay in the
>> visibility map, and perhaps autovacuum kicking in or failing to kick
>> in, effect things.
>
> Interesting!  If that holds up under more careful testing, it would be
> a great result.

I'm using -s2000 with 2GB of RAM (meaning there is little opportunity
for write-combining on pgbench_accounts table or the leaf pages of its
index, as they generally are written to disk before a second update is
expected).

The comparisons are between the normal PK index, which gets HOT
updates, and the (aid, abalance) index which can't be HOT updated but
can benefit from IOS.

The IOS performance is much more variable than the HOT performance,
measured as TPS over 30 second windows.  The oscillations for IOS seem
to be associated with the 5 minute checkpoint interval.  Turning off
FPW doesn't smooth things out any, against my expectations.

On a workload of only TPC-B, HOT just barely wins on average (81.9 to
77.8 TPS, averaged over 50 windows of 30 seconds) but clearly wins on
if you look for the best worst-case or similar (75.2 to 48.3 TPS for
the 10th percentile performance on the same data)

If you mix in one -S transaction for each TPC-B transaction, than the
IOS start to win on average and at the 10th percentile, but still
loses on the absolute worst 30 second window.

At 2 -S per 1 TPC-B, then the IOS wins on all measures.  However, at
finer windows than 30 TPS it very well might still lose the
worst-case--I never recorded individual latencies.

So it is probably kind of a toss-up, depending on how much weight you
put on worst-case latency.  But once you get up to 5 or 10 -S per
TPC-B, then the IOS becomes more clearly a win.

Over longer time frames, the IOS performance degrades as the vm
degrades.  I never ran it long enough for autovacuum to kick in, as
that would take many days.  I also never ran it long enough for it to
degrade enough so the IOS stopped winning at higher -S ratios, it was
just winning by less.

I'm not really sure what to make of this, except it would probably be
good to have a way to set vm bits without having to do a vacuum.

Cheers,

Jeff