FW: Surrogate keys (Was: enums)

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема FW: Surrogate keys (Was: enums)
Дата
Msg-id D1D2D51E3BE3FC4E98598248901F759402C890D0@ausmail2k4.aus.pervasive.com
обсуждение исходный текст
Ответы Re: FW: Surrogate keys (Was: enums)  ("Jim C. Nasby" <jnasby@pervasive.com>)
Список pgsql-hackers
Ooops, fat-finger'd -hackers...

-----Original Message-----
Adding -hackers back to the list.

> From: Leandro Guimarães Faria Corcete Dutra
> Em Seg, 2006-01-16 às 12:52 -0600, Jim C. Nasby escreveu:
> > On Sat, Jan 14, 2006 at 07:28:21PM +0900, Michael Glaesemann wrote:
> > > 
> > > For UPDATEs and INSERTs, the "proper" primary key also 
> needs to be  
> > > checked, but keys are used for more than just checking 
> uniqueness:  
> > > they're also often used in JOINs. Joining against a 
> single integer  
> > > I'd think it quite a different proposition (I'd think 
> faster in terms  
> > > of performance) than joining against, say, a text column or a  
> > > composite key.
> 
>     How different is that?

Comparing two ints is much, much faster than comparing two text fields. For a small number of comparisons, it doesn't
matter.When you're joining tables together, it's a different story.
 

> > a) the optimizer does a really poor job on multi-column 
> index statistics
> 
>     Then it should eventually be fixed?

It's on the to-do, but it's not an easy nut to crack.

> > b) If each parent record will have many children, the space 
> savings from
> > using a surrogate key can be quite large
> 
>     Only where the surrogate is significantly smaller than 
> the natural?

#define significant

Here's a real-life example: the primary table for stats.distributed.net has about 120M rows. One field in that table
(participant_id)links back to the participant table; it's an int. If instead we used participant_name and that averaged
8characters in length, that would grow the main table by 1GB (8 chars takes 8 bytes instead of 4, plus there's the
varlenaheader of 4 bytes). The machine that stats runs on has 4G of memory, so cutting 1G of wasted space out of that
tablehelps quite a bit.
 

(In actuality, there isn't participant_name... participants are identified by email address (not a great idea, but I
wasn'taround when that was chosen). As you can imagine, email addresses are substantially longer than 4 bytes. When we
normalizedemail out of that main table things got substantially faster. That was a number of years ago, so the table
wasprobably 15-25% of it's current size, but it still made a huge difference.)
 

> > c) depending on how you view things, putting actual keys 
> all over the
> > place is denormalized
> 
>     How come?

See my other reply... :)

> > Generally, I just use surrogate keys for everything unless 
> performance
> > dictates something else.
> 
>     Shouldn't it be the other way round, for the user's sake?

Why should it? It's trivial to create views that abstract surrogate keys out, and if you really want to you can even
makethe views updatable. But here's two other things to consider:
 

In many cases you can't define a single field as a unique key. So you end up with having to add many extra keys to all
yourjoin clauses. Not very friendly, and prone to error.
 

Not every language has equal support for text comparisons (and in my experience, almost all real keys are mostly
text).
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: No heap lookups on index
Следующее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: FW: Surrogate keys (Was: enums)