Обсуждение: UUID performance as primary key

Поиск
Список
Период
Сортировка

UUID performance as primary key

От
Navkirat Singh
Дата:
Hi Guys,

I am interested in finding out the pros/cons of using UUID as a primary key field. My requirement states that UUID
wouldbe perfect in my case as I will be having many small databases which will link up to a global database using the
UUID.Hence, the need for a unique key across all databases. It would be extremely helpful if someone could help me
figurethis out, as it is critical for my project. 

Thanks in advance,

Nav

Re: UUID performance as primary key

От
Craig Ringer
Дата:
On 16/10/2010 9:58 AM, Navkirat Singh wrote:
> Hi Guys,
>
> I am interested in finding out the pros/cons of using UUID as a primary key field. My requirement states that UUID
wouldbe perfect in my case as I will be having many small databases which will link up to a global database using the
UUID.Hence, the need for a unique key across all databases. It would be extremely helpful if someone could help me
figurethis out, as it is critical for my project. 

Pro: No need for (serverid,serverseq) pair primary keys or hacks with
modulus based key generation. Doesn't set any pre-determined limit on
how many servers/databases may be in a cluster.

Con: Slower than modulo key generation approach, uses more storage.
Foreign key relationships may be slower too.

Overall, UUIDs seem to be a favoured approach. The other way people seem
to do this is by assigning a unique instance id to each server/database
out of a maximum "n" instances decided at setup time. Every key
generation sequence increments by "n" whenever it generates a key, with
an offset of the server/database id. That way, if n=100, server 1 will
generate primary keys 001, 101, 201, 301, ..., server 2 will generate
primary keys 002, 102, 202, 302, ... and so on.

That works great until you need more than 100 instances, at which point
you're really, REALLY boned. In really busy systems it also limits the
total amount of primary key space - but with BIGINT primary keys, that's
unlikely to be something you need to worry about.

The composite primary key (serverid,sequenceid) approach avoids the need
for a pre-defined maximum number of servers, but can be slow to index
and can require more storage, especially because of tuple headers.

I have no firsthand experience with any of these approaches so I can't
offer you a considered opinion. I know that the MS-SQL crowd at least
strongly prefer UUIDs, but they have very strong in-database UUID
support. MySQL folks seem to mostly favour the modulo primary key
generation approach. I don't see much discussion of the issue here - I
get the impression Pg doesn't see heavy use in sharded environments.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: UUID performance as primary key

От
Andy
Дата:
Wouldn't UUID PK cause a significant drop in insert performance because every insert is now out of order, which leads to a constant re-arranging of the B+ tree? The amount of random IO's that's going to generate would just kill the performance.

--- On Fri, 10/15/10, Craig Ringer <craig@postnewspapers.com.au> wrote:

From: Craig Ringer <craig@postnewspapers.com.au>
Subject: Re: [PERFORM] UUID performance as primary key
To: "Navkirat Singh" <navkirats@gmail.com>
Cc: pgsql-performance@postgresql.org
Date: Friday, October 15, 2010, 10:59 PM

On 16/10/2010 9:58 AM, Navkirat Singh wrote:
> Hi Guys,
>
> I am interested in finding out the pros/cons of using UUID as a primary key field. My requirement states that UUID would be perfect in my case as I will be having many small databases which will link up to a global database using the UUID. Hence, the need for a unique key across all databases. It would be extremely helpful if someone could help me figure this out, as it is critical for my project.

Pro: No need for (serverid,serverseq) pair primary keys or hacks with modulus based key generation. Doesn't set any pre-determined limit on how many servers/databases may be in a cluster.

Con: Slower than modulo key generation approach, uses more storage. Foreign key relationships may be slower too.

Overall, UUIDs seem to be a favoured approach. The other way people seem to do this is by assigning a unique instance id to each server/database out of a maximum "n" instances decided at setup time. Every key generation sequence increments by "n" whenever it generates a key, with an offset of the server/database id. That way, if n=100, server 1 will generate primary keys 001, 101, 201, 301, ..., server 2 will generate primary keys 002, 102, 202, 302, ... and so on.

That works great until you need more than 100 instances, at which point you're really, REALLY boned. In really busy systems it also limits the total amount of primary key space - but with BIGINT primary keys, that's unlikely to be something you need to worry about.

The composite primary key (serverid,sequenceid) approach avoids the need for a pre-defined maximum number of servers, but can be slow to index and can require more storage, especially because of tuple headers.

I have no firsthand experience with any of these approaches so I can't offer you a considered opinion. I know that the MS-SQL crowd at least strongly prefer UUIDs, but they have very strong in-database UUID support. MySQL folks seem to mostly favour the modulo primary key generation approach. I don't see much discussion of the issue here - I get the impression Pg doesn't see heavy use in sharded environments.

-- Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: UUID performance as primary key

От
Craig James
Дата:
On 10/15/10 6:58 PM, Navkirat Singh wrote:
> I am interested in finding out the pros/cons of using UUID as a
>  primary key field. My requirement states that UUID would be perfect
>  in my case as I will be having many small databases which will link
> up to a global database using the UUID. Hence, the need for a unique
>  key across all databases.

You left out one piece of information: How many keys per second do you need?

We put a sequence in the global database that all secondary databases use to get their IDs.  It means an extra
connect/disconnect(a pooler can minimize this), so if you're issuing thousands of IDs per second, this isn't a good
idea. But for a small-ish number of IDs per second, it gets you the benefit of a universal ID without the size of the
UUIDfield. 

Craig (the other one)

Re: UUID performance as primary key

От
Merlin Moncure
Дата:
On Fri, Oct 15, 2010 at 10:59 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 16/10/2010 9:58 AM, Navkirat Singh wrote:
>>
>> Hi Guys,
>>
>> I am interested in finding out the pros/cons of using UUID as a primary
>> key field. My requirement states that UUID would be perfect in my case as I
>> will be having many small databases which will link up to a global database
>> using the UUID. Hence, the need for a unique key across all databases. It
>> would be extremely helpful if someone could help me figure this out, as it
>> is critical for my project.
>
> Pro: No need for (serverid,serverseq) pair primary keys or hacks with
> modulus based key generation. Doesn't set any pre-determined limit on how
> many servers/databases may be in a cluster.
>
> Con: Slower than modulo key generation approach, uses more storage. Foreign
> key relationships may be slower too.
>
> Overall, UUIDs seem to be a favoured approach. The other way people seem to
> do this is by assigning a unique instance id to each server/database out of
> a maximum "n" instances decided at setup time. Every key generation sequence
> increments by "n" whenever it generates a key, with an offset of the
> server/database id. That way, if n=100, server 1 will generate primary keys
> 001, 101, 201, 301, ..., server 2 will generate primary keys 002, 102, 202,
> 302, ... and so on.
>
> That works great until you need more than 100 instances, at which point
> you're really, REALLY boned. In really busy systems it also limits the total
> amount of primary key space - but with BIGINT primary keys, that's unlikely
> to be something you need to worry about.
>
> The composite primary key (serverid,sequenceid) approach avoids the need for
> a pre-defined maximum number of servers, but can be slow to index and can
> require more storage, especially because of tuple headers.
>
> I have no firsthand experience with any of these approaches so I can't offer
> you a considered opinion. I know that the MS-SQL crowd at least strongly
> prefer UUIDs, but they have very strong in-database UUID support. MySQL
> folks seem to mostly favour the modulo primary key generation approach. I
> don't see much discussion of the issue here - I get the impression Pg
> doesn't see heavy use in sharded environments.

I think your analysis is right on the money except for one thing: the
composite approach doesn't need server_id as part of the key and could
be left off the index.  In fact, it can be left off the table
completely since the value is static for the entire database.  You
obviously can't check RI between databases so storing the value
everywhere is of no value.  server_id only matters when comparing data
from one database to another, which will rarely happen inside a
particular client database (and if it does, you'd have to store the
foreign server_id).

Any 'master' database that did control operations would of course have
to store server_id for each row but I suspect that's not where the
bulk of the data would be.  Ditto any application code...it would have
to do something like this:

select server_id(), foo_id from foo where ..

server_id() is of course immutable function.  Since you are not
managing 2 billion+ servers, this will be an 'int', or even a
smallint.  I think this approach is stronger than UUID approach in
every way.  Even stronger would be to not use surrogate keys at all,
but involve what ever makes the decision that routes data between
databases as part of a more natural key (no way to know for sure if
this works for OP w/available info).

I personally dislike sequence hacks.

merlin