Re: Avoiding surrogate keys

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: Avoiding surrogate keys
Дата
Msg-id t2ub42b73151004211256r80eda9e3h4fe9e0548cc06dfb@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Avoiding surrogate keys  (Bill Moran <wmoran@potentialtech.com>)
Список pgsql-general
On Wed, Apr 21, 2010 at 3:32 PM, Bill Moran <wmoran@potentialtech.com> wrote:
> In response to "Joshua D. Drake" <jd@commandprompt.com>:
>
>> On Wed, 2010-04-21 at 15:07 -0400, Bill Moran wrote:
>> > I'd use an ENUM for the status, as that's not liable to change.
>> >
>> > The only problem I see with avoiding the surrogate key for the country
>> > is that the table might require more disk space if a lot of the country
>> > names end up being very long.
>> >
>>
>> Yeah but that isn't generally a problem now and the gain he gets from
>> the lack of join performance is more than worth it.
>
> Agreed.  I guess I didn't explain properly: the disk space _may_ be an
> issue if you have LOTS of REALLY LONG names.
>
> Like, if the shortest name in your country table is 'Federated States
> of Micronesia (Esquire)', then the disk space used by the table and
> index might become an issue.
>
> Also, if this is an embedded application where disk is at a premium ...
>
> As with many things, "big" and "small" are relative, ambiguous and
> copy-written by someone who thinks they can charge me every time I
> use them.
>
> One thing that a lot of people seem to get confused about is that they
> subconsciously think that ints or bigints take up less space when the
> numbers are small.  I.e.: I want to use an int for my state identifier
> instead of the 2-digit code, because it will use less space -- wrong,
> an int is 4 bytes, but a 2 byte char column is 1/2 that ... even if the
> number never gets higher than 50.
>
> Personally, I think the only good reason to use a surrogate key is when
> there's a high likelihood that your primary text identifier might change.
> Although ON UPDATE CASCADE can even handle that, it just might take a
> while.

Another semi-related case is when the keys are large which in turn
gives you larger indexes.  Large indexes pressure your cache which in
turn gives you more disk faults which can really nail you if your
database is large.  Both of these reasons fall away if/when storage
catches up with the rest of hardware IMO.

Another penalty of surrogate keys I neglected to mention is more
sorts.  It's typical to want to pull data in natural order which is
obfuscated behind the surrogate key.  So besides giving you a 'free
join', you get a free sort as well if you pull data off the index.

Interesting aside: I would estimate that a fairly large percentage of
-performance problems (say, 25%) can be directly traced to poor
indexing strategy and loose relationships between tables.  Natural
keys tend to deflect both of those problems.

merlin

В списке pgsql-general по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Avoiding surrogate keys
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Performance impact of hundreds of partitions