Обсуждение: Better alternative for Primary Key then serial??
Hi - I'm new to PostGres, but have used MSSQL for about a year. I'm going through the documentation, but after reading about serials have a lot of worries about keeping referential integrity in place and other things. Specifically, here are a few scenarios: a.) CREATE TABLE TestTable ( TestID SERIAL NOT NULL PRIMARY KEY, TestData varchar(20) NOT NULL ); INSERT INTO TestTable(TestData) VALUES ('Data1'); INSERT INTO TestTable(TestData) VALUES ('Data2'); INSERT INTO TestTable(TestData) VALUES ('Data3'); UPDATE TestTable SET TestID = 10 WHERE TestID = 1; Ok, red flag for me right here. The above works just fine. Granted, if another table referenced the row w/ TestID = 1, it should violate foreign key constraints and error out. However, with the use of serial, this is going to run into another problem, down the road right?? Demonstrated here: b.) CREATE TABLE TestTable2 ( TestID SERIAL NOT NULL PRIMARY KEY, TestData varchar(20) NOT NULL ); INSERT INTO TestTable2(TestID, TestData) VALUES (1, 'DataData'); INSERT INTO TestTable(TestData) VALUES ('NextData'); --duplicate key violation occurs INSERT INTO TestTable(TestData) VALUES ('NextData'); --Works fine To phrase what happens, the next number from serial is '1', but that number was already explicitly entered. The next call works, because the next serial number is '2'. Ideally, the first insert would -never- happen and TestID wouldn't ever be explicitly given a value, but if it were, its a guaranteed error down the road some ways. Does stuff like this cause any aches and pains to developers out there, or do I just need to get in a new mindset??? Also, is there a way to be sure the primary key is *ONLY* ever given a value by serial, and not subject to updates??? Thanks, -- View this message in context: http://www.nabble.com/Better-alternative-for-Primary-Key-then-serial---tp14289409p14289409.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Tuesday 11 December 2007 9:42 pm, pilzner wrote: > Hi - I'm new to PostGres, but have used MSSQL for about a year. I'm going > through the documentation, but after reading about serials have a lot of > worries about keeping referential integrity in place and other things. > Specifically, here are a few scenarios: > > a.) > CREATE TABLE TestTable ( > TestID SERIAL NOT NULL PRIMARY KEY, > TestData varchar(20) NOT NULL > ); > > INSERT INTO TestTable(TestData) VALUES ('Data1'); > INSERT INTO TestTable(TestData) VALUES ('Data2'); > INSERT INTO TestTable(TestData) VALUES ('Data3'); > > UPDATE TestTable SET TestID = 10 WHERE TestID = 1; > > > Ok, red flag for me right here. The above works just fine. Granted, if > another table referenced the row w/ TestID = 1, it should violate foreign > key constraints and error out. However, with the use of serial, this is > going to run into another problem, down the road right?? Demonstrated here: > > b.) > CREATE TABLE TestTable2 ( > TestID SERIAL NOT NULL PRIMARY KEY, > TestData varchar(20) NOT NULL > ); > > INSERT INTO TestTable2(TestID, TestData) VALUES (1, 'DataData'); > > INSERT INTO TestTable(TestData) VALUES ('NextData'); > --duplicate key violation occurs > > INSERT INTO TestTable(TestData) VALUES ('NextData'); > --Works fine > > To phrase what happens, the next number from serial is '1', but that number > was already explicitly entered. The next call works, because the next > serial number is '2'. Ideally, the first insert would -never- happen and > TestID wouldn't ever be explicitly given a value, but if it were, its a > guaranteed error down the road some ways. > > Does stuff like this cause any aches and pains to developers out there, or > do I just need to get in a new mindset??? Also, is there a way to be sure > the primary key is *ONLY* ever given a value by serial, and not subject to > updates??? > > Thanks, Each table that has SERIAL column created gets its own sequence, so there will be no conflict between tables. That case would only arise if you assigned the same sequence to multiple tables using DEFAULT nextval("some_sequence") and mixed manual updating of the sequence and auto updating. By default a sequence will always increment forward so you will have a fresh number for the next request. This means a sequence can have holes as it increments even if a transaction fails. You can create a duplicate key violation within a single table by manualling entering a SERIAL id that was already generated. This applies to any PRIMARY KEY and is sort of the point. The best thing to do is let the SERIAL sequence work on it own. If you want to deal with sequences you should take a look at: http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html As to preventing updates. You have a couple of choices. 1)Do not let that field be changed by the user. I usually in either hid the field or prevented data entry on that field. 2) Create an ON UPDATE TRIGGER that does what you want with the field. -- Adrian Klaver aklaver@comcast.net
pilzner wrote: > Does stuff like this cause any aches and pains to developers out there, or > do I just need to get in a new mindset??? Also, is there a way to be sure > the primary key is *ONLY* ever given a value by serial, and not subject to > updates??? It doesn't. Just do not update the ID -- what use do you have for that anyway? If you want to prevent it, you can put a trigger to the column, but IMHO it would be a waste of your time and machine resources. -- Alvaro Herrera http://www.flickr.com/photos/alvherre/ "Vivir y dejar de vivir son soluciones imaginarias. La existencia está en otra parte" (Andre Breton)
Alvaro Herrera-3 wrote: > > Just do not update the ID -- what use do you have for that > anyway? If you want to prevent it, you can put a trigger to the column, > but IMHO it would be a waste of your time and machine resources. > I have absolutely no use to update the ID. I'm not sure why anyone ever would, and I guess I was a little shocked to find that PostGres even allows it. In MSSQL, an identity() is used instead of a serial, and once in place that sucker is pretty much set in stone without a little know-how (it won't happen by accident). I'm definitely not here for a "my way is better because thats what I'm familiar with" discussion, just to get a feel of why its done that way, if I'm doing anything wrong, or if there is an accepted way to lock it down. -- View this message in context: http://www.nabble.com/Better-alternative-for-Primary-Key-then-serial---tp14289409p14303222.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
pilzner wrote: > > > > Alvaro Herrera-3 wrote: > > > > Just do not update the ID -- what use do you have for that > > anyway? If you want to prevent it, you can put a trigger to the column, > > but IMHO it would be a waste of your time and machine resources. > > > > I have absolutely no use to update the ID. I'm not sure why anyone ever > would, and I guess I was a little shocked to find that PostGres even allows > it. Oh, I see. Fact is, Postgres defines it like a plain integer column and attaches a DEFAULT clause. That's it. There's no magic to prevent it from being modified. The SQL standard specifies a strange beast called GENERATED BY which has all sorts of funny behaviors, one of which is what you describe. There have been attempts at implementing the semantics described be the spec, but they are so contorted that so far there hasn't been definitive success. It is expected that in the next release (8.4) something will get done about it. -- Alvaro Herrera http://www.advogato.org/person/alvherre "How strange it is to find the words "Perl" and "saner" in such close proximity, with no apparent sense of irony. I doubt that Larry himself could have managed it." (ncm, http://lwn.net/Articles/174769/)
Em Wednesday 12 December 2007 03:42:55 pilzner escreveu: > > Does stuff like this cause any aches and pains to developers out there, or > do I just need to get in a new mindset??? Also, is there a way to be sure > the primary key is *ONLY* ever given a value by serial, and not subject to > updates??? Shouldn't the definition of a primary key be an immutable thing that is unique to the row? If you change it, then it is not immutable anymore... You can also have triggers to prevent PK updates... But I wouldn't go this route. Why are you needing updating the PKs of your data? -- Jorge Godoy <jgodoy@gmail.com>
On Dec 13, 2007, at 10:19, Jorge Godoy wrote: > Em Wednesday 12 December 2007 03:42:55 pilzner escreveu: >> >> Does stuff like this cause any aches and pains to developers out >> there, or >> do I just need to get in a new mindset??? Also, is there a way to >> be sure >> the primary key is *ONLY* ever given a value by serial, and not >> subject to >> updates??? > > Shouldn't the definition of a primary key be an immutable thing > that is unique > to the row? If you change it, then it is not immutable anymore... Why? If you're worried about foreign keys pointing to them, you can either define them to follow the change (on update cascade) or to throw an integrity violation error (default behaviour). Now I realise this isn't particularly useful for surrogate (primary) keys, but it sure is for natural keys. They're both keys, is there a reason to handle surrogate keys differently from natural keys? The problem the OP is pointing out seems difficult to solve. A sequence doesn't know about existing records with a possibly higher number than the sequence is at. This may be worked around by keeping a list of numbers used up beyond the current sequence value so the sequence knows what numbers to skip, but that has problems of its own (if there are many such numbers, or if the sequence gets created after data has been added to the list). It gets ugly. The convention with sequences is that if you use a sequence on a column (beyond defining one) that you don't insert records with hand- coded values for that column (unless you're sure you're using an existing gap before the sequences current value). Regards, -- Alban Hertroys Sometimes you wake up thinking: "Galileo was right, the world does turn" !DSPAM:737,476112479655680816383!
Alban Hertroys wrote: > The problem the OP is pointing out seems difficult to solve. A > sequence doesn't know about existing records with a possibly higher > number than the sequence is at. > > This may be worked around by keeping a list of numbers used up > beyond the current sequence value so the sequence knows what > numbers to skip, but that has problems of its own (if there are > many such numbers, or if the sequence gets created after data has > been added to the list). It gets ugly. I would just have a trigger that sets the serial to NEW.id + 1. Dunno if this has concurrency issues, though, and it may leave huge gaps in the key space, and (more importantly) use up your sequence too quickly. I have, in fact, had situations where I wanted a serial PK, =and= I needed to insert with external IDs sometimes - essentially a mix of natural and surrogate keys (shudder). It turned out that the natural keys were always positive, so I set up the sequence to range =downward= from 0. - John D. Burger MITRE
On Wed, Dec 12, 2007 at 12:28:37PM -0800, pilzner wrote: > thats what I'm familiar with" discussion, just to get a feel of why its done > that way, if I'm doing anything wrong, or if there is an accepted way to > lock it down. It'd be easy to lock down with a trigger that RAISEs ERROR in case OLD.id != NEW.id. A
On Dec 13, 2007, at 14:12, John D. Burger wrote: > Alban Hertroys wrote: > >> The problem the OP is pointing out seems difficult to solve. A >> sequence doesn't know about existing records with a possibly >> higher number than the sequence is at. >> >> This may be worked around by keeping a list of numbers used up >> beyond the current sequence value so the sequence knows what >> numbers to skip, but that has problems of its own (if there are >> many such numbers, or if the sequence gets created after data has >> been added to the list). It gets ugly. > > I would just have a trigger that sets the serial to NEW.id + 1. > Dunno if this has concurrency issues, though, and it may leave huge > gaps in the key space, and (more importantly) use up your sequence > too quickly. That thought had crossed my mind, but I rejected it. Besides those issues, you'd need to add that trigger to every table that potentially has this "issue", it does indeed have concurrency issues, and you're in for some fun if someone decides to use the last possible value of a serial field to define a special case (although the sequence could be defined to end before that value of course) and causes immediate wraparound of the sequence (provided it's allowed to rotate) and thus causing duplicate key violations as soon as the sequence matches the first record in the table. I was looking at a general solution that would work as automatic as sequences already do. Not that it's a big issue anyway, you just have to keep in mind that it works like it does. Just to prevent any mistakes, I am not requesting a change in behaviour, I'm fine with how it works (and has worked for ages in many database servers). It is an interesting puzzle though ;) > I have, in fact, had situations where I wanted a serial PK, =and= I > needed to insert with external IDs sometimes - essentially a mix of > natural and surrogate keys (shudder). It turned out that the > natural keys were always positive, so I set up the sequence to > range =downward= from 0. That's a common solution to the problem, although it leaves the possibility that people are being smart and enter negative integers exactly to prevent this problem. And of course you cannot rely on sorting it by index to get your data more or less in the order inserted. -- Alban Hertroys "If you lose your memory, you can't remember where you left it." !DSPAM:737,476419a19654199211162!