Обсуждение: Surrogate VS natural keys
OK so which is the "correct" way to do it? E.g., Say I have a table with users, and a table with clubs, and a table that links them. Each user can be in more than one club and each club has more than one member. Standard M:M relationship. Which link table is the "right" way to do it? This: CREATE TABLE ( userid INTEGER NOT NULL REFERENCES users, clubid INTEGER NOT NULL REFERENCES clubs, PRIMARY KEY (userid, clubid) ); Or this: CREATE TABLE ( id SERIAL PRIMARY KEY, userid INTEGER NOT NULL REFERENCES users, clubid INTEGER NOT NULL REFERENCES clubs ); I've always favored natural keys (the first option) as it just seems to me a more natural and semantic representation of the data, however I often get app designers complaining about it being more complex or something. Comments?
Naz Gassiep wrote: > OK so which is the "correct" way to do it? > > E.g., Say I have a table with users, and a table with clubs, and a table > that links them. Each user can be in more than one club and each club > has more than one member. Standard M:M relationship. Which link table is > the "right" way to do it? > > This: > > CREATE TABLE ( > userid INTEGER NOT NULL REFERENCES users, > clubid INTEGER NOT NULL REFERENCES clubs, > PRIMARY KEY (userid, clubid) > ); > > Or this: > > CREATE TABLE ( > id SERIAL PRIMARY KEY, > userid INTEGER NOT NULL REFERENCES users, > clubid INTEGER NOT NULL REFERENCES clubs > ); > The former uses a primary key across both columns to enforce a unique constraint. In the latter, you have a seperate ID column, which does not enforce that constraint. And you have to ask yourself if you'll ever be referencing that ID column for anything at all. I doubt i ever would. Generally, you'd be using this to relate rows from a more generalised table using either the club ID or the user ID. I can't see how having a seperate serial ID column would be useful for any kind of select. brian
On Wed, 20 Jun 2007, brian wrote: > The former uses a primary key across both columns to enforce a unique > constraint. In the latter, you have a seperate ID column, which does not > enforce that constraint. And you have to ask yourself if you'll ever be > referencing that ID column for anything at all. I doubt i ever would. > Generally, you'd be using this to relate rows from a more generalised > table using either the club ID or the user ID. I can't see how having a > seperate serial ID column would be useful for any kind of select. Also, the reason for a third, M-M, table is to relate multiple players and multiple clubs. If you think of the logic involved, your third table has only one row for each player-club combination. Therefore, each row is unique by definition and a surrogate key adds no value. Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Wed, Jun 20, 2007 at 08:39:23AM -0700, Rich Shepard wrote: > Also, the reason for a third, M-M, table is to relate multiple players and > multiple clubs. If you think of the logic involved, your third table has > only one row for each player-club combination. Therefore, each row is unique > by definition and a surrogate key adds no value. While true in this simple case, it can quickly become more complicated if your relationship starts gaining attributes. For example, if you add start and stop dates, so the (player,club) combination is not unique anymore. If you track invoices, games or scores it may be easier to reference the relatioship via a surrogate key rather than copying the other IDs around everywhere. For simple tables like this I generally don't bother, but sometimes I find myself adding a surrogate key later. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Вложения
Martijn van Oosterhout wrote: > On Wed, Jun 20, 2007 at 08:39:23AM -0700, Rich Shepard wrote: >> Also, the reason for a third, M-M, table is to relate multiple players and >> multiple clubs. If you think of the logic involved, your third table has >> only one row for each player-club combination. Therefore, each row is unique >> by definition and a surrogate key adds no value. > > While true in this simple case, it can quickly become more complicated > if your relationship starts gaining attributes. For example, if you add > start and stop dates, so the (player,club) combination is not unique > anymore. If you track invoices, games or scores it may be easier to > reference the relatioship via a surrogate key rather than copying the > other IDs around everywhere. > > For simple tables like this I generally don't bother, but sometimes I > find myself adding a surrogate key later. The value of a surrogate key is easy retrieval and really has nothing to do with normalization or proper modeling. I often add a surrogate key, even when one is not required just so I don't have to worry about have a 4 element where clause. Joshua D. Drake > > Have a nice day, -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On Wed, 20 Jun 2007, Martijn van Oosterhout wrote: > While true in this simple case, it can quickly become more complicated if > your relationship starts gaining attributes. For example, if you add start > and stop dates, so the (player,club) combination is not unique anymore. If > you track invoices, games or scores it may be easier to reference the > relatioship via a surrogate key rather than copying the other IDs around > everywhere. That's very true, Martijn. It did not seem to be the case in the original post. When you add a time history and need to identify a player's team at a specified point in time, it gets much more complicated. That's when reading Rick Snodgrass' book helps a lot. Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
--- "Joshua D. Drake" <jd@commandprompt.com> wrote: > The value of a surrogate key is easy retrieval and really has nothing to > do with normalization or proper modeling. > > I often add a surrogate key, even when one is not required just so I > don't have to worry about have a 4 element where clause. I've often wondered about this. Since PostgreSQL allows FOREIGN KEYS to be referenced from UNIQUE (non-primary) natural keys, couldn't the schema be designed so that every table has a surrogate PRIMARY KEY and yet still maintain the relationships using UNIQUE natural keys. Would a design like this be practical? Regards, Richard Broersma Jr.
Richard Broersma Jr wrote: > --- "Joshua D. Drake" <jd@commandprompt.com> wrote: >> The value of a surrogate key is easy retrieval and really has nothing to >> do with normalization or proper modeling. >> >> I often add a surrogate key, even when one is not required just so I >> don't have to worry about have a 4 element where clause. > > > I've often wondered about this. Since PostgreSQL allows FOREIGN KEYS to be referenced from UNIQUE > (non-primary) natural keys, couldn't the schema be designed so that every table has a surrogate > PRIMARY KEY and yet still maintain the relationships using UNIQUE natural keys. > > Would a design like this be practical? I would do it the other way. Have your primary keys be natural. Joshua D. Drake > > Regards, > Richard Broersma Jr. > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On Jun 20, 2007, at 11:28 , Joshua D. Drake wrote: > Richard Broersma Jr wrote: > >> I've often wondered about this. Since PostgreSQL allows FOREIGN >> KEYS to be referenced from UNIQUE >> (non-primary) natural keys, couldn't the schema be designed so >> that every table has a surrogate >> PRIMARY KEY and yet still maintain the relationships using UNIQUE >> natural keys. >> Would a design like this be practical? > > I would do it the other way. Have your primary keys be natural. The albeit small advantage of using PRIMARY KEY on your surrogate if you're using the surrogate for foreign key constraints is that you can leave off the column name when using REFERENCES: it'll default to the PRIMARY KEY column(s). For example: CREATE TABLE foos ( foo_id SERIAL PRIMARY KEY , foo TEXT NOT NULL , bal TEXT NOT NULL, UNIQUE (foo, bal) ); CREATE TABLE bars ( bar_id SERIAL PRIMARY KEY , bar TEXT NOT NULL , foo_id INTEGER NOT NULL REFERENCES foos ); \d bars Table "public.bars" Column | Type | Modifiers --------+--------- +------------------------------------------------------- bar_id | integer | not null default nextval('bars_bar_id_seq'::regclass) bar | text | not null foo_id | integer | not null Indexes: "bars_pkey" PRIMARY KEY, btree (bar_id) Foreign-key constraints: "bars_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foos(foo_id) It does work for multi-column primary keys as well: CREATE TABLE baz_quuxen ( baz TEXT NOT NULL , quux TEXT NOT NULL , PRIMARY KEY (baz, quux) ); CREATE TABLE blurfls ( blurfl TEXT PRIMARY KEY , baz TEXT NOT NULL , quux TEXT NOT NULL , FOREIGN KEY (baz, quux) REFERENCES baz_quuxen ); \d blurfls Table "public.blurfls" Column | Type | Modifiers --------+------+----------- blurfl | text | not null baz | text | not null quux | text | not null Indexes: "blurfls_pkey" PRIMARY KEY, btree (blurfl) Foreign-key constraints: "blurfls_baz_fkey" FOREIGN KEY (baz, quux) REFERENCES baz_quuxen (baz, quux) Having PRIMARY KEY on your natural key does provide some additional documentation. Michael Glaesemann grzm seespotcode net
Michael Glaesemann wrote: > > On Jun 20, 2007, at 11:28 , Joshua D. Drake wrote: > >> Richard Broersma Jr wrote: >> >>> I've often wondered about this. Since PostgreSQL allows FOREIGN KEYS >>> to be referenced from UNIQUE >>> (non-primary) natural keys, couldn't the schema be designed so that >>> every table has a surrogate >>> PRIMARY KEY and yet still maintain the relationships using UNIQUE >>> natural keys. >>> Would a design like this be practical? >> >> I would do it the other way. Have your primary keys be natural. > > The albeit small advantage of using PRIMARY KEY on your surrogate if > you're using the surrogate for foreign key constraints is that you can > leave off the column name when using REFERENCES: it'll default to the > PRIMARY KEY column(s). For example: Sure but for the sake of doing normalization correctly ;) a primary key should be natural. Joshua D. Drake > > CREATE TABLE foos > ( > foo_id SERIAL PRIMARY KEY > , foo TEXT NOT NULL > , bal TEXT NOT NULL, UNIQUE (foo, bal) > ); > > CREATE TABLE bars > ( > bar_id SERIAL PRIMARY KEY > , bar TEXT NOT NULL > , foo_id INTEGER NOT NULL > REFERENCES foos > ); > > \d bars > Table "public.bars" > Column | Type | Modifiers > --------+---------+------------------------------------------------------- > bar_id | integer | not null default nextval('bars_bar_id_seq'::regclass) > bar | text | not null > foo_id | integer | not null > Indexes: > "bars_pkey" PRIMARY KEY, btree (bar_id) > Foreign-key constraints: > "bars_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foos(foo_id) > > It does work for multi-column primary keys as well: > > CREATE TABLE baz_quuxen > ( > baz TEXT NOT NULL > , quux TEXT NOT NULL > , PRIMARY KEY (baz, quux) > ); > > CREATE TABLE blurfls > ( > blurfl TEXT PRIMARY KEY > , baz TEXT NOT NULL > , quux TEXT NOT NULL > , FOREIGN KEY (baz, quux) REFERENCES baz_quuxen > ); > > \d blurfls > Table "public.blurfls" > Column | Type | Modifiers > --------+------+----------- > blurfl | text | not null > baz | text | not null > quux | text | not null > Indexes: > "blurfls_pkey" PRIMARY KEY, btree (blurfl) > Foreign-key constraints: > "blurfls_baz_fkey" FOREIGN KEY (baz, quux) REFERENCES > baz_quuxen(baz, quux) > > Having PRIMARY KEY on your natural key does provide some additional > documentation. > > Michael Glaesemann > grzm seespotcode net > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On 6/20/07, Richard Broersma Jr <rabroersma@yahoo.com> wrote: > > --- "Joshua D. Drake" <jd@commandprompt.com> wrote: > > The value of a surrogate key is easy retrieval and really has nothing to > > do with normalization or proper modeling. > > > > I often add a surrogate key, even when one is not required just so I > > don't have to worry about have a 4 element where clause. > > > I've often wondered about this. Since PostgreSQL allows FOREIGN KEYS to be referenced from UNIQUE > (non-primary) natural keys, couldn't the schema be designed so that every table has a surrogate > PRIMARY KEY and yet still maintain the relationships using UNIQUE natural keys. > > Would a design like this be practical? yeah, although I prefer to throw the primary key on the natural. Either way, the natural key is identified...my major issue with the surrogate design style is that the natural key is often not identified which inevitably leads to a mess. I also find databases with natural keys to be much easier to follow and feel much 'cleaner' to me. People who've never seen a large database without surrogates will be amazed at how much more expressive the tables are. Surrogates have certain advantages but I classify them as an optimization, meaning they should be introduced at the last possible moment in the design. merlin