Обсуждение: changing multiple pk's in one update

Поиск
Список
Период
Сортировка

changing multiple pk's in one update

От
Stuart McGraw
Дата:
Hello all, 

I have a table with a primary key column
that contains sequential numbers.

Sometimes I need to shift them all up or down
by a fixed amount.  For example, if I have
four rows with primary keys, 2, 3, 4, 5, I 
might want to shift them down by 1 by doing:
 UPDATE mytable SET id=id-1

(where "id" is the pk column) so that the pk's 
are now 1, 2, 3, 4.

When I try to shift them up by using +1 in the
above update statement, I get (not surprisingly)
a duplicate key error.  I also realize that the
-1 case above works only by luck.

So my question:
Is there some way, perhaps with ORDER BY, that
I can achieve the change I want with a single 
update statement?  (If I have an unused key
range large enough, I suppose I could update 
all the keys to that range, and then back to 
my target range but the requires two updates 
(there are a lot of foreign keys referencing 
these primary keys) and requires that I have 
an available range, so a single update statement
would be preferable.)

Thanks for any enlightenment.



Re: changing multiple pk's in one update

От
Jasen Betts
Дата:
On 2009-04-08, Stuart McGraw <smcg2297@frii.com> wrote:
> Hello all, 
>
> I have a table with a primary key column
> that contains sequential numbers.
>
> Sometimes I need to shift them all up or down
> by a fixed amount.  For example, if I have
> four rows with primary keys, 2, 3, 4, 5, I 
> might want to shift them down by 1 by doing:
>
>   UPDATE mytable SET id=id-1
>
> (where "id" is the pk column) so that the pk's 
> are now 1, 2, 3, 4.
>
> When I try to shift them up by using +1 in the
> above update statement, I get (not surprisingly)
> a duplicate key error.  I also realize that the
> -1 case above works only by luck.
>
> So my question:
> Is there some way, perhaps with ORDER BY, that
> I can achieve the change I want with a single 
> update statement?  (If I have an unused key
> range large enough, I suppose I could update 
> all the keys to that range, and then back to 
> my target range but the requires two updates 
> (there are a lot of foreign keys referencing 
> these primary keys) and requires that I have 
> an available range, so a single update statement
> would be preferable.)
>
> Thanks for any enlightenment.

begin a transaction
suspend the constraint (use SET CONSTRAINTS ... DEFERRED)
drop the index 
do the update(s)
recreate the index
commit the transaction.

I see no reason to keep the index (and its associated UNIQUE
constraint) during the update, AFAICT all it does is slow the process
down.



Re: changing multiple pk's in one update

От
Scott Marlowe
Дата:
2009/4/7 Stuart McGraw <smcg2297@frii.com>:
> Hello all,
>
> I have a table with a primary key column
> that contains sequential numbers.
>
> Sometimes I need to shift them all up or down
> by a fixed amount.  For example, if I have
> four rows with primary keys, 2, 3, 4, 5, I
> might want to shift them down by 1 by doing:
>

Generally speaking, when you need to do this more than once or twice
in the lifetime of your data, there's something wrong with your data
model.


Re: changing multiple pk's in one update

От
Stuart McGraw
Дата:
Jasen Betts wrote:
> On 2009-04-08, Stuart McGraw <smcg2297@frii.com> wrote:
>> Hello all, 
>>
>> I have a table with a primary key column
>> that contains sequential numbers.
>>
>> Sometimes I need to shift them all up or down
>> by a fixed amount.  For example, if I have
>> four rows with primary keys, 2, 3, 4, 5, I 
>> might want to shift them down by 1 by doing:
>>
>>   UPDATE mytable SET id=id-1
>>
>> (where "id" is the pk column) so that the pk's 
>> are now 1, 2, 3, 4.
>>
>> When I try to shift them up by using +1 in the
>> above update statement, I get (not surprisingly)
>> a duplicate key error.  I also realize that the
>> -1 case above works only by luck.
>>
>> So my question:
>> Is there some way, perhaps with ORDER BY, that
>> I can achieve the change I want with a single 
>> update statement?  (If I have an unused key
>> range large enough, I suppose I could update 
>> all the keys to that range, and then back to 
>> my target range but the requires two updates 
>> (there are a lot of foreign keys referencing 
>> these primary keys) and requires that I have 
>> an available range, so a single update statement
>> would be preferable.)
>>
>> Thanks for any enlightenment.
> 
> begin a transaction
> suspend the constraint (use SET CONSTRAINTS ... DEFERRED)
> drop the index 
> do the update(s)
> recreate the index
> commit the transaction.
> 
> I see no reason to keep the index (and its associated UNIQUE
> constraint) during the update, AFAICT all it does is slow the process
> down.

Thanks for the suggestion.

Unfortunately I am doing this key renumbering in
an interactive gui app and as there are several million 
rows involved, rebuilding indexes take too long.

I have continued searching for other ways to do this
but most of my google results are floating in a sea
of "update if insert fails, like mysql" results so 
I still have a little hope it is possible.
I thought I remember seeing, a year or two ago, an 
update statement with an ordered subquery that avoided 
duplicate key errors but I am probably misrembering.


Re: changing multiple pk's in one update

От
Richard Broersma
Дата:
On Fri, Apr 10, 2009 at 11:20 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> Generally speaking, when you need to do this more than once or twice
> in the lifetime of your data, there's something wrong with your data
> model.

True, but there are a few non-traditional data models that would
benefit from this feature.   The two that come to my mind are the
Celko nested set tree model, and sequenced temporal tables.


-- 
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug


Re: changing multiple pk's in one update

От
Stuart McGraw
Дата:
Scott Marlowe wrote:
> 2009/4/7 Stuart McGraw <smcg2297@frii.com>:
>> Hello all,
>>
>> I have a table with a primary key column
>> that contains sequential numbers.
>>
>> Sometimes I need to shift them all up or down
>> by a fixed amount.  For example, if I have
>> four rows with primary keys, 2, 3, 4, 5, I
>> might want to shift them down by 1 by doing:
>>
> 
> Generally speaking, when you need to do this more than once or twice
> in the lifetime of your data, there's something wrong with your data
> model.

Generally speaking, I agree.  But in this case I am not sure
how to do it any better.  Simplified, the database models 
a dictionary.  An entry in the dictionary can have a number
senses where each sense is a sentence that gives the meaning 
of the sense.  The order of the senses is significant.  The 
sense table has three columns, an fk to the dictionary entry 
(id number) it is part of, a sense number (small number from
1 to the number of sentences in the entry) and the sentence 
text.  The pk is of course the entry id and the sense number.
There are other tables that have fk's to the senses.

I could dispense with the sentence number and use the 
sentence text as the second part of the composite key but 
the sentence text is far more volatile than the number, and 
can be very long making it quite unwieldy to use as a key.

I guess I could add an "order"[1] column and use the sense number 
as a surrogate partial key to avoid the need for key renumbering,
but all the api's (and the normal human way of thinking) are based
on "sense number 1 of entry x", "sense number 2 of entry y", so
one would need to maintain "order" as a gapless sequence (or add 
a new mapping layer to map from/to a arbitrary monotonic sequence
to a 1,2,3,... sequence) -- the gain doesn't seem that big.

Or maybe there is some other approach?

[1] yes, I know "order" is a reserved word.




Re: changing multiple pk's in one update

От
Glenn Maynard
Дата:
(JMdict?  I was playing with importing that into a DB a while back,
but the attributes in that XML are such a pain--and then my email died
while I was trying to get those changed, and I never picked it up
again.)

On Mon, Apr 13, 2009 at 1:20 PM, Stuart McGraw <smcg2297@frii.com> wrote:
> 1 to the number of sentences in the entry) and the sentence text.  The pk is
> of course the entry id and the sense number.
> There are other tables that have fk's to the senses.

Your PK is a composite of (entry, order)?  Won't your foreign keys
elsewhere all break when you shift the order around?

> I guess I could add an "order"[1] column and use the sense number as a
> surrogate partial key to avoid the need for key renumbering,
> but all the api's (and the normal human way of thinking) are based
> on "sense number 1 of entry x", "sense number 2 of entry y", so
> one would need to maintain "order" as a gapless sequence (or add a new
> mapping layer to map from/to a arbitrary monotonic sequence
> to a 1,2,3,... sequence) -- the gain doesn't seem that big.

Why not do this in the straightforward way: three separate fields: a
regular, sequential PK; an FK to the entry; and an order number.  Add
an index on (entry_key, order_number).  It's a little more expensive
since you have a new column and index (the PK), but in a table with a
lot of plain text that's probably insignificant.  Now you can use the
plain PK for your FK's.

I'd anticipate other problems with changing your PKs.  If you're
referring to sense (100,3), and a concurrent user adds a sense, you
may suddenly find yourself referring to a different sense due to them
being reordered out from under you.  You'd have similar problems if
your sense is deleted entirely: instead of the row simply ceasing to
exist (and resulting in predictable, checkable errors), you may end up
silently referring to another sense.

Maybe I'm misunderstanding what you're doing, though.

You'd have to have no UNIQUE constraint on the (entry, order)
composite index, though, or you'll have the same problem when you
reorder them.

--
Glenn Maynard


Re: changing multiple pk's in one update

От
Richard Broersma
Дата:
On Mon, Apr 13, 2009 at 2:09 PM, Glenn Maynard <glennfmaynard@gmail.com> wrote:

> Your PK is a composite of (entry, order)?  Won't your foreign keys
> elsewhere all break when you shift the order around?

If there really are foreign keys, then an update will not be allowed
to shift a primary key unless the foreign key is set with ON UPDATE
CASCADE then the shifts will be cascaded to all references
automatically.

--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug


Re: changing multiple pk's in one update

От
Glenn Maynard
Дата:
On Mon, Apr 13, 2009 at 5:18 PM, Richard Broersma
<richard.broersma@gmail.com> wrote:
>> Your PK is a composite of (entry, order)?  Won't your foreign keys
>> elsewhere all break when you shift the order around?
>
> If there really are foreign keys, then an update will not be allowed
> to shift a primary key unless the foreign key is set with ON UPDATE
> CASCADE then the shifts will be cascaded to all references
> automatically.

Right.  I'm still dusting off my SQL after a long period of corrosion.  :)

The same applies, though, if you have external references to the PK;
for example, if you have a web interface where "/sense/10/3" refers to
entry 10, sense 3.  If your senses are reordered by someone else, and
you operate on /10/3, you may suddenly find yourself viewing or
modifying (or deleting!) a different sense.  This could even happen
within the same transaction, if you're not very careful with
locking...

--
Glenn Maynard


Re: changing multiple pk's in one update

От
Richard Broersma
Дата:
On Mon, Apr 13, 2009 at 2:32 PM, Glenn Maynard <glennfmaynard@gmail.com> wrote:

> If your senses are reordered by someone else, and
> you operate on /10/3, you may suddenly find yourself viewing or
> modifying (or deleting!) a different sense.  This could even happen
> within the same transaction, if you're not very careful with
> locking...


From what I've seen, this problem can affect both surrogate and
natural key designs.  In both cases, care must be taken to ensure that
an underling tuple hasn't been changed by any other clients before it
attempts to commit its changed.   Probably the most common solution is
to use optimistic locking, another solution that I know of is to use
serialized transaction isolation.


--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug


Re: changing multiple pk's in one update

От
Glenn Maynard
Дата:
On Mon, Apr 13, 2009 at 5:43 PM, Richard Broersma
<richard.broersma@gmail.com> wrote:
> From what I've seen, this problem can affect both surrogate and
> natural key designs.  In both cases, care must be taken to ensure that
> an underling tuple hasn't been changed by any other clients before it
> attempts to commit its changed.   Probably the most common solution is
> to use optimistic locking, another solution that I know of is to use
> serialized transaction isolation.

Right, but if you have a delayed UI, you probably don't want to hold a
lock open--if the user is viewing "/100/3" and clicks "delete", you
need to make sure that the one you delete is the same /100/3 that the
user was viewing at the time.  That's harder to do...

--
Glenn Maynard


Re: changing multiple pk's in one update

От
Stuart McGraw
Дата:
Glenn Maynard wrote:
> (JMdict?  

yup.  ;-)

> I was playing with importing that into a DB a while back,
> but the attributes in that XML are such a pain--and then my email died
> while I was trying to get those changed, and I never picked it up
> again.)
> 
> On Mon, Apr 13, 2009 at 1:20 PM, Stuart McGraw <smcg2297@frii.com> wrote:
>> 1 to the number of sentences in the entry) and the sentence text.  The pk is
>> of course the entry id and the sense number.
>> There are other tables that have fk's to the senses.
> 
> Your PK is a composite of (entry, order)?  Won't your foreign keys
> elsewhere all break when you shift the order around?

As Richard Broersma pointed out, the fk's are all 
"cascade update".  

>> I guess I could add an "order"[1] column and use the sense number as a
>> surrogate partial key to avoid the need for key renumbering,
>> but all the api's (and the normal human way of thinking) are based
>> on "sense number 1 of entry x", "sense number 2 of entry y", so
>> one would need to maintain "order" as a gapless sequence (or add a new
>> mapping layer to map from/to a arbitrary monotonic sequence
>> to a 1,2,3,... sequence) -- the gain doesn't seem that big.
> 
> Why not do this in the straightforward way: three separate fields: a
> regular, sequential PK; an FK to the entry; and an order number.  Add
> an index on (entry_key, order_number).  It's a little more expensive
> since you have a new column and index (the PK), but in a table with a
> lot of plain text that's probably insignificant.  Now you can use the
> plain PK for your FK's.

I have done a few other simple database apps in the 
past and always did things that way.  The current app 
also started that way but I tried, experimentally, using 
composite keys, and was amazed how much simpler things 
became.  For example, to copy an entry one duplicates 
the entry's rows (in all the related tables) and need
only change the entry id number in the duplicated rows.  
When each child table has a independent serial id, this 
becomes more complicated and requires looping in the 
copy function (AFAICT).  I've noted that a number of 
other things I was doing in the database api and in 
Postgresql functions similarly became simpler.

> I'd anticipate other problems with changing your PKs.  If you're
> referring to sense (100,3), and a concurrent user adds a sense, you
> may suddenly find yourself referring to a different sense due to them
> being reordered out from under you.  You'd have similar problems if
> your sense is deleted entirely: instead of the row simply ceasing to
> exist (and resulting in predictable, checkable errors), you may end up
> silently referring to another sense.

I try to avoid a lot of problems like that by doing very 
little actual updating (in the database sense).  Most 
changes are made to a new copy of an entry.  When an 
update is actually done (such as to delete the original
copy of an entry), it is done in a very short time interval
with serializable isolation.

> Maybe I'm misunderstanding what you're doing, though.
> 
> You'd have to have no UNIQUE constraint on the (entry, order)
> composite index, though, or you'll have the same problem when you
> reorder them.

Since the data in the database is of primary importance,
and the app(s) secondary, I really would like as much
constraint checking in the database as possible. 


Re: changing multiple pk's in one update

От
Steve Midgley
Дата:
>
> Date: Mon, 13 Apr 2009 17:09:49 -0400
> From: Glenn Maynard <glennfmaynard@gmail.com>
> To: pgsql-sql@postgresql.org
> Subject: Re: changing multiple pk's in one update
> Message-ID: <d18085b50904131409g10d43d6cs35dd14ede13bc02@mail.gmail.com>
>
> (JMdict?  I was playing with importing that into a DB a while back,
> but the attributes in that XML are such a pain--and then my email died
> while I was trying to get those changed, and I never picked it up
> again.)
>
> On Mon, Apr 13, 2009 at 1:20 PM, Stuart McGraw <smcg2297@frii.com> wrote:
>   
>> > 1 to the number of sentences in the entry) and the sentence text. Â The pk is
>> > of course the entry id and the sense number.
>> > There are other tables that have fk's to the senses.
>>     
>
> Your PK is a composite of (entry, order)?  Won't your foreign keys
> elsewhere all break when you shift the order around?
>
>   
>> > I guess I could add an "order"[1] column and use the sense number as a
>> > surrogate partial key to avoid the need for key renumbering,
>> > but all the api's (and the normal human way of thinking) are based
>> > on "sense number 1 of entry x", "sense number 2 of entry y", so
>> > one would need to maintain "order" as a gapless sequence (or add a new
>> > mapping layer to map from/to a arbitrary monotonic sequence
>> > to a 1,2,3,... sequence) -- the gain doesn't seem that big.
>>     
>
> Why not do this in the straightforward way: three separate fields: a
> regular, sequential PK; an FK to the entry; and an order number.  Add
> an index on (entry_key, order_number).  It's a little more expensive
> since you have a new column and index (the PK), but in a table with a
> lot of plain text that's probably insignificant.  Now you can use the
> plain PK for your FK's.
>
>   
I'd agree with this approach. I have a number of tables which are 
sensitive to arbitrary ordering and they sound roughly similar to your 
use-case (though my tables are probably smaller).

My approach is to create a string column in the table which permits 
defining arbitrary ordering. I use a string field b/c it's easier for me 
to stuff (by hand) new ordered records in between other existing 
records. But an integer would work just as well, so long as you make 
sure you keep enough space between the integers (e.g. 1000, 2000, 3000).

Also, if your ordered list is truly "ordinal" (each record is either 
1st, 2nd, 3rd, etc in a single list) you could just use 1,2,3,4 for the 
ordering, but then you have to mess with two records in order to swap 
the positions of (say) item 2 and 3. Of course you can do this pretty 
easily inside a transaction, and you don't have to worry about the mess 
of moving PK's.

Steve




Re: changing multiple pk's in one update

От
Jasen Betts
Дата:
On 2009-04-13, Stuart McGraw <smcg2297@frii.com> wrote:
> Jasen Betts wrote:

>> I see no reason to keep the index (and its associated UNIQUE
>> constraint) during the update, AFAICT all it does is slow the process
>> down.
>
> Thanks for the suggestion.
>
> Unfortunately I am doing this key renumbering in
> an interactive gui app and as there are several million 
> rows involved, rebuilding indexes take too long.

the update takes a long time too if it's updating all the rows.
and updating the index piecewise at the same time.
with the index extant I get from 20 (if the start and end ranges don't overlap) and 28s (with ,
to 28 seconds (maximum overlap) for a table with 1000000 (short) rows

it takes 18 seconds if I first drop the index, then update, then 
restore the index.

so, about the same amount of time or slightly faster by dropping the
index.

if wrap them in a transaction it takes 30s each way

> I have continued searching for other ways to do this
> but most of my google results are floating in a sea
> of "update if insert fails, like mysql" results so 
> I still have a little hope it is possible.
> I thought I remember seeing, a year or two ago, an 
> update statement with an ordered subquery that avoided 
> duplicate key errors but I am probably misrembering.

maybe you can do it using a cursor? I've not looked at them yet.



Re: changing multiple pk's in one update

От
Glenn Maynard
Дата:
On Wed, Apr 15, 2009 at 8:43 AM, Jasen Betts <jasen@xnet.co.nz> wrote:
> the update takes a long time too if it's updating all the rows.
> and updating the index piecewise at the same time.
> with the index extant I get from 20 (if the start and end ranges don't overlap) and 28s (with ,
> to 28 seconds (maximum overlap) for a table with 1000000 (short) rows

He's using a compound key as his PK, eg. (entry, order), where there
are only a few order values for each entry, so I think the real case
is only updating order for a specific set of entries.

>> I have continued searching for other ways to do this
>> but most of my google results are floating in a sea
>> of "update if insert fails, like mysql" results so
>> I still have a little hope it is possible.
>> I thought I remember seeing, a year or two ago, an
>> update statement with an ordered subquery that avoided
>> duplicate key errors but I am probably misrembering.

Bear in mind that the update is failing based on the order the data is
in the table, not the PK order.

create table test (id integer primary key);
insert into test (id) values (2), (1), (3);
update test set id=id+1;
ERROR:  duplicate key value violates unique constraint "test_pkey"
update test set id=id-1;
ERROR:  duplicate key value violates unique constraint "test_pkey"

Both fail, because it tries to update 2 first.

I suppose in a real pinch, you could renumber in two steps.  For
example, if you wanted to delete id 6 and move everything else down:

insert into test (id) values (2), (7), (3), (1), (4), (5), (6), (8), (9);
begin;
set transaction isolation level serializable;
delete from test where id=6;
update test set id = id+1000000 where id >= 6;
update test set id = id-1000001 where id >= 6;
commit;

Not very nice, but if "id" is really a sequence number starting at 1
in your case and not an always-increasing generated regular serial (so
there's no chance of it actually reaching the arbitrarily large number
1000000), it should work.  (It'd probably be workable for real
serials, too, with a much larger offset.)

If someone else creates a new sense for that entry after the first
update, it'll sit on the order number you were about to use and the
operation will fail.  Serialize so nobody else will insert until
you're done.

-- 
Glenn Maynard