Обсуждение: Re: Surrogate keys (Was: enums)

Поиск

Список

Период

Сортировка

Re: Surrogate keys (Was: enums)

От

"Dann Corbit"

Дата:

18 января 2006 г., 20:03:53

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Josh Berkus
> Sent: Wednesday, January 18, 2006 3:59 PM
> To: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Surrogate keys (Was: enums)
>
> Martjin,
>
> > Interesting. However, in my experience very few things have "natural
> > keys". There are no combination of attributes for people, phone
calls
> > or even real events that make useful natural keys.
>
> I certainly hope that I never have to pick up one of your projects.
A
> table without a natural key is a data management disaster.   Without a
> key, it's not data, it's garbage.

I have a different opinion.

The data should absolutely never use a natural key as a primary key.

The data should use something like a sequence for the primary key.

Examples:
SSN --> believe it or not, SSN's sometimes change.

First, Middle, Last names --> Not really unique

Street Address --> More than one person can live there.  They can move.

Basically, every physical attribute or logical attribute is a terrible
choice for a primary key.  They won't cause problems very often, it's
true.  But when they do cause problems, it is a terrible doozie of a
problem.

Now, on the other hand, if we are talking about INDEXES here, that's a
horse of a different color.   Lots of natural attributes and
combinations of natural attributes make excellent candidates for keys.
Such things as SSN, names, addresses, phone numbers, etc.

Therefore, I am guessing the two posters upstream in this thread that I
am responding to were therefore talking about different subjects
altogether.

One was talking about using natural attributes for indexes, which is a
superior idea that I agree with.

The other was talking about never using natural attributes for keys,
which I also agree with.

Therefore, I am guessing that everyone is in complete agreement, but it
is a nomenclature thing.

Just a guess.

Re: Surrogate keys (Was: enums)

От

"Dann Corbit"

Дата:

18 января 2006 г., 20:06:17

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Dann Corbit
> Sent: Wednesday, January 18, 2006 4:04 PM
> To: josh@agliodbs.com; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Surrogate keys (Was: enums)
>
> > -----Original Message-----
> > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> > owner@postgresql.org] On Behalf Of Josh Berkus
> > Sent: Wednesday, January 18, 2006 3:59 PM
> > To: pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] Surrogate keys (Was: enums)
> >
> > Martjin,
> >
> > > Interesting. However, in my experience very few things have
"natural
> > > keys". There are no combination of attributes for people, phone
> calls
> > > or even real events that make useful natural keys.
> >
> > I certainly hope that I never have to pick up one of your projects.
> A
> > table without a natural key is a data management disaster.   Without
a
> > key, it's not data, it's garbage.
>
> I have a different opinion.
>
> The data should absolutely never use a natural key as a primary key.
>
> The data should use something like a sequence for the primary key.
>
> Examples:
> SSN --> believe it or not, SSN's sometimes change.
>
> First, Middle, Last names --> Not really unique
>
> Street Address --> More than one person can live there.  They can
move.
>
> Basically, every physical attribute or logical attribute is a terrible
> choice for a primary key.  They won't cause problems very often, it's
> true.  But when they do cause problems, it is a terrible doozie of a
> problem.
>
> Now, on the other hand, if we are talking about INDEXES here, that's a
> horse of a different color.   Lots of natural attributes and
> combinations of natural attributes make excellent candidates for keys.

Make that: "combinations of natural attributes make excellent candidates
for indexes."
See. I even messed it up, when I was trying to highlight the
distinction.
Of course, we can probably just chalk that up to "dumb as a box of
hammers."

> Such things as SSN, names, addresses, phone numbers, etc.
>
> Therefore, I am guessing the two posters upstream in this thread that
I
> am responding to were therefore talking about different subjects
> altogether.
>
> One was talking about using natural attributes for indexes, which is a
> superior idea that I agree with.
>
> The other was talking about never using natural attributes for keys,
> which I also agree with.
>
> Therefore, I am guessing that everyone is in complete agreement, but
it
> is a nomenclature thing.
>
> Just a guess.
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 6: explain analyze is your friend

Re: Surrogate keys (Was: enums)

От

"Dann Corbit"

Дата:

18 января 2006 г., 21:51:48

> -----Original Message-----
> From: Michael Glaesemann [mailto:grzm@myrealbox.com]
> Sent: Wednesday, January 18, 2006 5:48 PM
> To: Dann Corbit
> Cc: Leandro Guimarães Faria Corcete Dutra; Jim C. Nasby; pgsql-
> hackers@postgresql.org
> Subject: Re: [HACKERS] Surrogate keys (Was: enums)
>
>
> On Jan 19, 2006, at 10:34 , Dann Corbit wrote:
>
> > http://www.db.ucsd.edu/cse132B/Thirdmanifesto.pdf
>
> > "PROPOSITION 1.4: Unique Identifiers (UIDs) for records should be
> > assigned by the DBMS only if a user-defined primary key is not
> > available.
>
> <snip />
>
> > An immutable primary key has an extra advantage over a system-
> > assigned unique identifier because it has a natural, human readable
> > meaning. Consequently, in data interchange or debugging this may be
> > an advantage.  If no primary key is available for a collection,
> > then it is imperative that a system-assigned UID be provided.
>
> <snip />
>
> Dann Corbit:
>
> > The primary key should be immutable, meaning that its value should
> > not be changed during the course of normal operations of the
> > database.  What natural key is immutable?  The answer is that such
> > an attribute does not exist.  To use them for such a purpose is
> > begging for trouble.
>
> As far as I can tell, the only difference between your position,
> Dann, and Date and Darwen's, is that you think no natural key is
> immutable. If you *could* find an immutable natural key, would it be
> an acceptable key for you? Date and Darwen say explicitly that if no
> immutable (natural) (primary) key is available a system-assigned UID
> is required. If you think there is no immutable natural key
> available, Darwen and Date would agree that you should use a system-
> generated key. Or do you think I'm misreading you or The Third
> Manifesto?

If you could find an immutable natural key, it would be the *BEST* thing to use.  Unfortunately, I believe that
immutablenatural keys are rarer than horse feathers and pickle smoke.  Furthermore, because of statements like the one
thatI collected and pasted from the above document, I believe that people will choose totally inappropriate things (I
haveseen it many times and had to deal with the repercussions) to use as natural keys (e.g. SSN) and cause enormous
damagethrough those choices. 

But I suppose on a sort of "mathematical" level the statement is fully true.

Re: Surrogate keys (Was: enums)

От

"Pollard, Mike"

Дата:

19 января 2006 г., 10:37:13

Martijn van Oosterhout wrote:

> Please provides natural keys for any of the following:
>
> - A Person
> - A phone call: (from,to,date,time,duration) is not enough
> - A physical address
> - A phone line: (phone numbers arn't unique over time)
> - An internet account: (usernames not unique over time either)

Ahh, a challenge.  Hmm, not sure about all of them, but here goes:

A Person - well, you could use a bit map of their fingerprints, or maybe
their retinal scan.  Of course, that could change due to serious injury.
Maybe some kind of representation of their DNA?

A physical address - how about longitude/latitude/height from sea level?

The point here is two-fold.  First, what we call 'natural' is frequently
itself a surrogate key (yes, even your name is really just a surrogate
key.  As with all surrogate keys, it is a sequence of symbols that you
use to represent yourself).  The second point is even when you find a
truly 'natural' key (something not arbitrarily made up by anyone, and
uniquely identifying the data in question), it may be completely and
utterly inappropriate to use in a database.

What is 'natural' anyway?  If someone phones in an order, we usually
assign an order number to that request.  This order number is not the
actual order, and the customer couldn't care a less what it is, but I've
never heard a DBA argue we should get rid of it (well, to be fair, I've
never discussed order numbers with a DBA at all).  After all, would it
make sense for the key for that order to be the customer's name, the
date/time of the order, all the items ordered, and the address to ship
the order?  That isn't a key, but it's the only 'natural' thing that
identifies that order that immediately comes to my mind.

On the other hand, would anyone argue that an order_item table should
have a surrogate key?  Well, I wouldn't.  The key for the order_item
table should be something like the order number and the inventory item
number together (IMHO).

The point?  Surrogate keys and natural keys are two tools in the
database arsenal.  Just as it is unwise to use a hammer to drive a screw
just because you don't believe in screwdrivers, it is unwise to just off
hand discard either method of specifying a key.  Rather, use
intelligence and education (one of which is discussions such as this) in
deciding how best to represent your data to aide in performance, ease of
use, and adaptability.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.

Re: Surrogate keys (Was: enums)

От

mark@mark.mielke.cc

Дата:

19 января 2006 г., 10:53:07

On Thu, Jan 19, 2006 at 09:37:12AM -0500, Pollard, Mike wrote:
> Martijn van Oosterhout wrote:
> > Please provides natural keys for any of the following:
> > - A Person
> > - A phone call: (from,to,date,time,duration) is not enough
> > - A physical address
> > - A phone line: (phone numbers arn't unique over time)
> > - An internet account: (usernames not unique over time either)
> Ahh, a challenge.  Hmm, not sure about all of them, but here goes:
> A Person - well, you could use a bit map of their fingerprints, or maybe
> their retinal scan.  Of course, that could change due to serious injury.
> Maybe some kind of representation of their DNA?

Yes. Representation of the DNA is probably best. But - that's a lot of
data to use as a key in multiple tables. :-)

> A physical address - how about longitude/latitude/height from sea level?

Planet? Solar system? Galaxy? Universe? :-)

I agreed with what you had to say (the stuff I deleted). Just felt
like being funny. Not sure if I'm successful. Hehe...

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/

Re: Surrogate keys

От

Chris Browne

Дата:

19 января 2006 г., 12:04:12

mark@mark.mielke.cc writes:

> On Thu, Jan 19, 2006 at 09:37:12AM -0500, Pollard, Mike wrote:
>> Martijn van Oosterhout wrote:
>> > Please provides natural keys for any of the following:
>> > - A Person
>> > - A phone call: (from,to,date,time,duration) is not enough
>> > - A physical address
>> > - A phone line: (phone numbers arn't unique over time)
>> > - An internet account: (usernames not unique over time either)
>> Ahh, a challenge.  Hmm, not sure about all of them, but here goes:
>> A Person - well, you could use a bit map of their fingerprints, or maybe
>> their retinal scan.  Of course, that could change due to serious injury.
>> Maybe some kind of representation of their DNA?
>
> Yes. Representation of the DNA is probably best. But - that's a lot of
> data to use as a key in multiple tables. :-)

That is arguably about the best choice possible, for a human being, as
DNA isn't supposed to be able to change (much).  

Mind you, there do exist odd cases where a person might have two sets
of DNA in different parts of their body.  This commonly (well, it's
not really very common...) happens when non-identical twins share a
blood supply; that twins were involved may not be noticed if one does
not survive to birth...

>> A physical address - how about longitude/latitude/height from sea level?
>
> Planet? Solar system? Galaxy? Universe? :-)
>
> I agreed with what you had to say (the stuff I deleted). Just felt
> like being funny. Not sure if I'm successful. Hehe...

Well, that's useful for representing a key for a piece of real estate.

It's fairly interestingly useless for representing a human attribute,
at least in terms of being a primary key...
-- 
output = ("cbbrowne" "@" "acm.org")
http://cbbrowne.com/info/linux.html
"It's a pretty rare beginner who isn't clueless.  If beginners weren't
clueless, the infamous Unix learning cliff wouldn't be a problem."
-- david parsons

Re: Surrogate keys (Was: enums)

От

"Jim C. Nasby"

Дата:

19 января 2006 г., 13:42:37

On Thu, Jan 19, 2006 at 09:37:12AM -0500, Pollard, Mike wrote:
> The point?  Surrogate keys and natural keys are two tools in the
> database arsenal.  Just as it is unwise to use a hammer to drive a screw
> just because you don't believe in screwdrivers, it is unwise to just off
> hand discard either method of specifying a key.  Rather, use
> intelligence and education (one of which is discussions such as this) in
> deciding how best to represent your data to aide in performance, ease of
> use, and adaptability.

There is one thing to consider: consistency. If you mix and match
'natural' keys and surrogate keys as PK, then how do you know which one
you're supposed to be joining on? How does everyone else on the team
know?

Sure, there's many examples where you don't really need a surrogate key.
But there's just as many (if not more) where you want a surrogate key so
that you don't have to deal with the pain of a multiple-field key. (Note
that I don't consider simply defining a multiple-field key to be unique
as painful). So ISTM it's much easier to just use surrogate keys and be
done with it. Only deviate when you have a good reason to do so.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Surrogate keys (Was: enums)

От

Josh Berkus

Дата:

19 января 2006 г., 14:17:49

Jim,
> So ISTM it's much easier to just use surrogate keys and be
> done with it. Only deviate when you have a good reason to do so.

"The lazy man's guide to SQL database design", but Jim Nasby.

;-)

--Josh

Re: Surrogate keys (Was: enums)

От

mark@mark.mielke.cc

Дата:

19 января 2006 г., 15:02:08

On Thu, Jan 19, 2006 at 10:11:51AM -0800, Josh Berkus wrote:
> > So ISTM it's much easier to just use surrogate keys and be
> >done with it. Only deviate when you have a good reason to do so.
> "The lazy man's guide to SQL database design", but Jim Nasby.
> ;-)

Hehe... I was thinking the same thing. I've definately seen cases
where the use of surrogate keys verges on ridiculous. It hasn't
harmed the application, except it terms of complexity. It still works.
It still performs fine. The SQL queries are awful looking. :-)

That's where I would tend to draw the line. For me, I find
implementation and maintenance to be the most expensive part of my
applications. My data hasn't yet become large enough to make disk
space, compute resources, or I/O bandwidth a serious concern.

If I think the use of surrogate keys may make my life harder, I'll try
not to use them. If I think they may make my life easier, I'll use
them without blinking an eye. Harder vs. easier = cost to implement.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...

                          http://mark.mielke.cc/

Re: Surrogate keys (Was: enums)

От

"Dann Corbit"

Дата:

19 января 2006 г., 15:22:27


> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Josh Berkus
> Sent: Thursday, January 19, 2006 10:09 AM
> To: Martijn van Oosterhout
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Surrogate keys (Was: enums)
>
> Martjin,
>
> > In any of these either misspellings, changes of names, ownership or
> > even structure over time render the obvious useless as keys. There
are
> > techniques for detecting and reducing duplication but the point is
that
> > for any of these duplicates *can* be valid data.
>
> Please point me out where, in the writings of E.F. Codd or in the SQL
> Standard, it says that keys have to be immutable for the life of the
row.

Only do that for data that you care about.  If you think that the data
has no value, there is no need to have a way to identify a row.

> Duplicate *values* can be valid data.  Duplicate *tuples* show some
> serious flaws in your database design.  If you have a personnel
> directory on which you've not bothered to define any unique
constraints
> other than the ID column, then you can't match your data to reality.
If
> you have two rows with the same first and last name, you don't know if
> they are two different people or the same person, duplicated.  Which
> will be a big problem come paycheck time.
>
> Per E.F. Codd, each tuple is a *unique* predicate (or key) comprising
a
> set of values definining a *unique* data entity.  i.e. "The employeee
> named "John" "Little" at extension "4531".  There is nothing anywhere
> said about keys never changing.
>
> This is Databases 101 material.  Really!

I give it an 'F.'

When the data changes, the problems generated are not just due to
repercussions related to the child and parent tables related through the
primary key.

Someone has an invoice, and they call in with a question. A combination
of their name and address was used as a primary key.  They moved, and
sent in a forwarding address.  The DBA was smart enough to design the
database to cascade results, so that there are no orphan records and we
have not compromised the structure of the database.
The customer calls in with a question about an old invoice.
"We have no record of that transaction."

I was a DBA for a database for a company with many millions of customers
worldwide (e.g. the product registration table was 24 GB).

Their design had natural keys in it.  It caused dozens of problems,
every single day.

I content that most people are not smart enough to decide when a natural
key is a good idea.  The engineers that designed the database were
probably pretty smart, since it sort of worked and had thousands of
tables and hundreds of millions of rows in it.  But one bad decision on
a natural key will cause literally millions of dollars of damage.

The primary defense I have heard so far is that the Oids are hard to
understand.  They are nothing in comparison with understanding what to
do when you have 25 changes to primary keys on various tables every
single day.

Once you get used to Oids, I find it hard to believe that any
intelligent person finds them confusing.  Confusion resulting from
having primary keys that are a moving target?  Now that's confusion for
you.

IMO-YMMV.

I think it is time for me to give it a rest, though.  My experience may
be very atypical and I feel strangely passionate about it.

Re: Surrogate keys (Was: enums)

От

"Jim C. Nasby"

Дата:

19 января 2006 г., 15:48:23

On Thu, Jan 19, 2006 at 02:01:14PM -0500, mark@mark.mielke.cc wrote:
> On Thu, Jan 19, 2006 at 10:11:51AM -0800, Josh Berkus wrote:
> > > So ISTM it's much easier to just use surrogate keys and be
> > >done with it. Only deviate when you have a good reason to do so.
> > "The lazy man's guide to SQL database design", but Jim Nasby.
> > ;-)
> 
> Hehe... I was thinking the same thing. I've definately seen cases
> where the use of surrogate keys verges on ridiculous. It hasn't
> harmed the application, except it terms of complexity. It still works.
> It still performs fine. The SQL queries are awful looking. :-)

Got an example?

> That's where I would tend to draw the line. For me, I find
> implementation and maintenance to be the most expensive part of my
> applications. My data hasn't yet become large enough to make disk
> space, compute resources, or I/O bandwidth a serious concern.

Which is exactly what my thought process is. If you mix surrogate and
non-surrogate keys, how do you know which table has which? Sure, while
you're actively writing the code it's not an issue, but what about 6
months later? What about if someone else picks up the code?

I know Josh was poking fun with his comment about me being lazy, but
lazy can make for better code. I can go back to code I wrote 3 years ago
and I know that 99% of tables will have something_id (where something is
almost certain to be the table name) as a surrogate key to join on;
there's no need for me to go and figure out what does and what doesn't
have a surrogate key. The 1% that don't fall into that generally aren't
an issue because they're normally very large tables that nothing joins
to.

There's actually an article floating around somewhere about how lazy
coders are good coders... :)
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Surrogate keys (Was: enums)

От

"Jim C. Nasby"

Дата:

19 января 2006 г., 15:59:14

On Thu, Jan 19, 2006 at 11:22:24AM -0800, Dann Corbit wrote:
> > > In any of these either misspellings, changes of names, ownership or
> > > even structure over time render the obvious useless as keys. There
> are
> > > techniques for detecting and reducing duplication but the point is
> that
> > > for any of these duplicates *can* be valid data.
> > 
> > Please point me out where, in the writings of E.F. Codd or in the SQL
> > Standard, it says that keys have to be immutable for the life of the
> row.
> 
> Only do that for data that you care about.  If you think that the data
> has no value, there is no need to have a way to identify a row.

Erm... if you don't care, why are you storing it? :)

> I was a DBA for a database for a company with many millions of customers
> worldwide (e.g. the product registration table was 24 GB).
> 
> Their design had natural keys in it.  It caused dozens of problems,
> every single day.
> 
> I content that most people are not smart enough to decide when a natural
> key is a good idea.  The engineers that designed the database were
> probably pretty smart, since it sort of worked and had thousands of
> tables and hundreds of millions of rows in it.  But one bad decision on
> a natural key will cause literally millions of dollars of damage.
> 
> The primary defense I have heard so far is that the Oids are hard to
> understand.  They are nothing in comparison with understanding what to
> do when you have 25 changes to primary keys on various tables every
> single day.
> 
> Once you get used to Oids, I find it hard to believe that any
> intelligent person finds them confusing.  Confusion resulting from
> having primary keys that are a moving target?  Now that's confusion for
> you.

Well, I wouldn't use OIDs as in the PostgreSQL OID, but I agree. If
nothing else an ID gives you a fallback... if you absolutely can't find
a customer (or whatever else) through natural keys, you ask them for
their customer ID/number, which has no reason to ever change.

BTW, if you want to see a mess*, take a look at the distributed.net stats
code, which unfortunately uses email as the means to identify
participants. It made perfect sense originally, anyone running the
client was bound to have an email address, and they all had to be
unique, right? Worked great until the first person contacted us
wondering how to change his email address in stats because he'd changed
ISPs. If you look at todays statscode (at least the database portion of
it) approximately 50% of it is there to deal with people retiring one
email address into another, and I'd say that 90%+ of the bugs are in
this code. Had we just required new users to register to get a nice
shiny unique numeric ID (or a unique username...), none of that code
would exist.

* note that I'm not trying to rag on any of the numerous people who've
been involved in the stats code over the years, but it is insightful to
look at some of the 'dumb mistakes' that have been made and the large
amount of pain that it's caused.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Surrogate keys (Was: enums)

От

mark@mark.mielke.cc

Дата:

19 января 2006 г., 16:11:59

On Thu, Jan 19, 2006 at 01:48:18PM -0600, Jim C. Nasby wrote:
> On Thu, Jan 19, 2006 at 02:01:14PM -0500, mark@mark.mielke.cc wrote:
> > On Thu, Jan 19, 2006 at 10:11:51AM -0800, Josh Berkus wrote:
> > > > So ISTM it's much easier to just use surrogate keys and be
> > > >done with it. Only deviate when you have a good reason to do so.
> > > "The lazy man's guide to SQL database design", but Jim Nasby.
> > > ;-)
> > Hehe... I was thinking the same thing. I've definately seen cases
> > where the use of surrogate keys verges on ridiculous. It hasn't
> > harmed the application, except it terms of complexity. It still works.
> > It still performs fine. The SQL queries are awful looking. :-)
> Got an example?

Sure, but I have to be vague, because it's a company DB. :-)

The DB has a primary table, that maps the primary key to a surrogate
key. The surrogate key is used in several 1:1 and 1:N relationships.
Pretty straight forward. (I tend to put the primary key in the most
primary table that would have a 1:1 relationship, of which there is
one in this database - but whatever) The primary key is an identifier
used for all inputs and outputs to the application. It is used by
manual and automatic processes internal and external to the company.
Definately a primary key / surrogate key scenario.

The problem here, is that the primary key *is* a natural key. It
is generated to be unique, and it is immutable. There are no interfaces
provided to allow the rename of the key. It is a short character
string of 5 to 20 characters.

All queries to the table are joined with this primary key/surrogate
key table, to allow lookup by the primary key, for records only
identified by the surrogate key.

The database is only likely to have a few thousands records, with
the 1:N relationships not exceeding 5 or 10, and not recursive.
For performance, or disk space, it doesn't really matter which way
they went.

The confusion, though, of joining using a surrogate, that is
intended to be opaque (the value is never queried), ensures that
the program has no simple queries. All queries involve at least
one join.

I said almost ridiculous. It's not enough for me to complain, and
request a re-design. I don't really care what it does, as long as
it accepts my data, and allows me to query my data. But, it does
seem silly to me.

> > That's where I would tend to draw the line. For me, I find
> > implementation and maintenance to be the most expensive part of my
> > applications. My data hasn't yet become large enough to make disk
> > space, compute resources, or I/O bandwidth a serious concern.
> Which is exactly what my thought process is. If you mix surrogate and
> non-surrogate keys, how do you know which table has which? Sure, while
> you're actively writing the code it's not an issue, but what about 6
> months later? What about if someone else picks up the code?

It's usually pretty obvious, looking at a database diagram. You look
up the primary key, and see that it only shows up in one table. :-)

> I know Josh was poking fun with his comment about me being lazy, but
> lazy can make for better code. I can go back to code I wrote 3 years ago
> and I know that 99% of tables will have something_id (where something is
> almost certain to be the table name) as a surrogate key to join on;
> there's no need for me to go and figure out what does and what doesn't
> have a surrogate key. The 1% that don't fall into that generally aren't
> an issue because they're normally very large tables that nothing joins
> to.

I don't disagree with you. I just don't mind deciding to use a surrogate
key if I'm unsure, and not using a surrogate if it seems more effort than
gain.

> There's actually an article floating around somewhere about how lazy
> coders are good coders... :)

Dunno where it started, but that's one of the tenets of the developers
of Perl. Of course, with Perl 6, they admitted to having made quite a
few deisgn errors with Perl 5 and earlier... :-)

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...

                          http://mark.mielke.cc/

Re: Surrogate keys (Was: enums)

От

Christopher Kings-Lynne

Дата:

19 января 2006 г., 21:44:44

> Yes. Representation of the DNA is probably best. But - that's a lot of
> data to use as a key in multiple tables. :-)

No then you have problems with identical twins :)

Chris

Re: Surrogate keys (Was: enums)

От

Michael Glaesemann

Дата:

19 января 2006 г., 22:41:48

On Jan 20, 2006, at 10:50 , Christopher Kings-Lynne wrote:

>> Yes. Representation of the DNA is probably best. But - that's a  
>> lot of
>> data to use as a key in multiple tables. :-)
>
> No then you have problems with identical twins :)

And, looking forward, clones.

Michael Glaesemann
grzm myrealbox com

Re: Surrogate keys (Was: enums)

От

Richard Huxton

Дата:

20 января 2006 г., 10:01:05

Dann Corbit wrote:
> 
> When the data changes, the problems generated are not just due to
> repercussions related to the child and parent tables related through the
> primary key.
> 
> Someone has an invoice, and they call in with a question. A combination
> of their name and address was used as a primary key.  They moved, and
> sent in a forwarding address.  The DBA was smart enough to design the
> database to cascade results, so that there are no orphan records and we
> have not compromised the structure of the database.
> The customer calls in with a question about an old invoice.
> "We have no record of that transaction."

Aside:
Even if not using name+address as a primary key, a separate record 
should be kept of these details *at the time of the invoice* otherwise 
you'll never be able to match up a printed invoice with its digital 
source. Usually of course this is by inv_name, inv_address columns in 
the invoice header, but it could be by some fancy temporal versioning on 
client details.

--   Richard Huxton  Archonet Ltd

Re: Surrogate keys (Was: enums)

От

Martijn van Oosterhout

Дата:

20 января 2006 г., 10:37:27

On Fri, Jan 20, 2006 at 01:58:16PM +0000, Richard Huxton wrote:
> Aside:
> Even if not using name+address as a primary key, a separate record
> should be kept of these details *at the time of the invoice* otherwise
> you'll never be able to match up a printed invoice with its digital
> source. Usually of course this is by inv_name, inv_address columns in
> the invoice header, but it could be by some fancy temporal versioning on
> client details.

Aside: Using name+address is terrible way for customers to identify
themselves. After you have a couple of experiences of ex-spouses trying
to hijack accounts you get fairly strict about what they have to know.
Bill identifiers, account identifiers, other surrogate keys are fine
but information that's reasonably general knowledge is nice, but not
enough.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Surrogate keys (Was: enums)

От

Mike Rylander

Дата:

20 января 2006 г., 17:34:23

On 1/19/06, Pollard, Mike <mpollard@cincom.com> wrote:
> Martijn van Oosterhout wrote:
>
> > Please provides natural keys for any of the following:
> >
> > - A Person
> > - A phone call: (from,to,date,time,duration) is not enough
> > - A physical address
> > - A phone line: (phone numbers arn't unique over time)
> > - An internet account: (usernames not unique over time either)
>
> Ahh, a challenge.  Hmm, not sure about all of them, but here goes:
>
> A Person - well, you could use a bit map of their fingerprints, or maybe
> their retinal scan.  Of course, that could change due to serious injury.
> Maybe some kind of representation of their DNA?

Unless the person in question happens to be a chimera (yes, they do exist).

;-)

--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: Surrogate keys

От

Christopher Browne

Дата:

20 января 2006 г., 23:38:07

> On 1/19/06, Pollard, Mike <mpollard@cincom.com> wrote:
>> Martijn van Oosterhout wrote:
>>
>> > Please provides natural keys for any of the following:
>> >
>> > - A Person
>> > - A phone call: (from,to,date,time,duration) is not enough
>> > - A physical address
>> > - A phone line: (phone numbers arn't unique over time)
>> > - An internet account: (usernames not unique over time either)
>>
>> Ahh, a challenge.  Hmm, not sure about all of them, but here goes:
>>
>> A Person - well, you could use a bit map of their fingerprints, or maybe
>> their retinal scan.  Of course, that could change due to serious injury.
>> Maybe some kind of representation of their DNA?
>
> Unless the person in question happens to be a chimera (yes, they do exist).
>
> ;-)

Yeah, I saw that episode of CSI!  :-)
-- 
"cbbrowne","@","gmail.com"
http://linuxdatabases.info/info/lsf.html
"Cat's motto: No matter what you've done wrong, always try to make it
look like the dog did it." -- Unknown

Re: Surrogate keys

От

Mike Rylander

Дата:

21 января 2006 г., 00:51:18

On 1/21/06, Christopher Browne <cbbrowne@acm.org> wrote:
> > On 1/19/06, Pollard, Mike <mpollard@cincom.com> wrote:
> >> Martijn van Oosterhout wrote:
> >>
> >> > Please provides natural keys for any of the following:
> >> >
> >> > - A Person
> >> > - A phone call: (from,to,date,time,duration) is not enough
> >> > - A physical address
> >> > - A phone line: (phone numbers arn't unique over time)
> >> > - An internet account: (usernames not unique over time either)
> >>
> >> Ahh, a challenge.  Hmm, not sure about all of them, but here goes:
> >>
> >> A Person - well, you could use a bit map of their fingerprints, or maybe
> >> their retinal scan.  Of course, that could change due to serious injury.
> >> Maybe some kind of representation of their DNA?
> >
> > Unless the person in question happens to be a chimera (yes, they do exist).
> >
> > ;-)
>
> Yeah, I saw that episode of CSI!  :-)

Heh.  I didn't realize they did that already.  I was thinking of the
show "I Am My Own Twin" from the Discovery Health Channel.

Couldn't find a link from the official page, but:
http://www.globalspin.com/mt/archives/000547.html .

> --
> "cbbrowne","@","gmail.com"
> http://linuxdatabases.info/info/lsf.html
> "Cat's motto: No matter what you've done wrong, always try to make it
> look like the dog did it." -- Unknown
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: Surrogate keys (Was: enums)

От

Bruno Wolff III

Дата:

24 января 2006 г., 05:57:27

On Thu, Jan 19, 2006 at 09:53:11 -0500, mark@mark.mielke.cc wrote:
> 
> Yes. Representation of the DNA is probably best. But - that's a lot of
> data to use as a key in multiple tables. :-)

On a simple level, this would be a problem for twins.
There are other complications as well. People are going to have slightly
different DNA in different cells due to mutations. Though you could probably
do some averaging over a number of cells to get a single value.
For people that have had transplants, you could probably define something
for doing the sample for original material.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: Surrogate keys (Was: enums)