Обсуждение: UUID column as pimrary key?

Поиск
Список
Период
Сортировка

UUID column as pimrary key?

От
Dennis Gearon
Дата:
I haven't been able to find anywhere, easily, in the documentation using google
where a list of allowed data types for primary keys is.

So, UUIDs can be primary keys?
Any issues wtih them on sorting or paging of index tables, etc.?

Also, the documentation says that UUIDs are 128 bit value, but never explicitly
says that's how it's stored. Nor does it use one of the nice, blue headered
tables for UUID (or ENUM) showing storage and other attributes as it does for
numeric, character,boolean, date/time, binary, monetary, geometric, or network
types.



 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.


Re: UUID column as pimrary key?

От
"Joshua D. Drake"
Дата:
On Tue, 2011-01-04 at 11:07 -0800, Dennis Gearon wrote:
> I haven't been able to find anywhere, easily, in the documentation using google
> where a list of allowed data types for primary keys is.

Anything that can be UNIQUE NOT NULL

>
> So, UUIDs can be primary keys?

Yes.

JD


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


Re: UUID column as pimrary key?

От
David Wall
Дата:
We're using UUID for primary keys in PG 8.4 without any issues.  I have
no real insights into the details or performance issues, but always
figured it was stored as a binary 128-bit value, but with added benefits
of being able to enter and view them using a standard string format.  We
don't sort them as they have no real meaning for us.

On 1/4/2011 11:07 AM, Dennis Gearon wrote:
> I haven't been able to find anywhere, easily, in the documentation using google
> where a list of allowed data types for primary keys is.
>
> So, UUIDs can be primary keys?
> Any issues wtih them on sorting or paging of index tables, etc.?
>
> Also, the documentation says that UUIDs are 128 bit value, but never explicitly
> says that's how it's stored. Nor does it use one of the nice, blue headered
> tables for UUID (or ENUM) showing storage and other attributes as it does for
> numeric, character,boolean, date/time, binary, monetary, geometric, or network
> types.

Re: UUID column as pimrary key?

От
Chris Browne
Дата:
d.wall@computer.org (David Wall) writes:
> We're using UUID for primary keys in PG 8.4 without any issues.  I
> have no real insights into the details or performance issues, but
> always figured it was stored as a binary 128-bit value, but with added
> benefits of being able to enter and view them using a standard string
> format.  We don't sort them as they have no real meaning for us.

In principle, this might be a reason to want to do the long-outstanding
work on hash indexes; with UUIDs, it mayn't be useful to sort the
values, but you *do* want to be able to validate that they're unique.
--
output = ("cbbrowne" "@" "gmail.com")
http://www3.sympatico.ca/cbbrowne/
"Computers are  like air conditioners:  They stop working  properly if
you open windows."

Re: UUID column as pimrary key?

От
Radosław Smogura
Дата:
 On Tue, 4 Jan 2011 11:07:00 -0800 (PST), Dennis Gearon
 <gearond@sbcglobal.net> wrote:
> I haven't been able to find anywhere, easily, in the documentation
> using google
> where a list of allowed data types for primary keys is.
>
> So, UUIDs can be primary keys?
> Any issues wtih them on sorting or paging of index tables, etc.?

 Disadvantage
 * aren't ordered in way as records are added to DB.
 * 128bit length (PSQL stores them as 128bit value)
 * slower to generate you need to use random number generator
 * ... if you do select * on table with two uuids you will need to
 scroll GUI to see data :)
 * ... really unhandy if you want to make manual updates :)

 Advantage:
 * simple to generate, and 128bit random is almost globally unique,
 * you have your id, before executing query, (in contrast to all this
 autoincrement) so you may put it in dependant rows
 * almost every platform has UUID generator

 Advantage / disadvantage
 * depending on UUID generator, UUID can store some "privacy"
 information e.g. MAC address of your card, such UUID.

 Personally I prefer pooled incremental id's. Fast, unique, you have Id
 before query - but you need to write "code" by self.

> Also, the documentation says that UUIDs are 128 bit value, but never
> explicitly
> says that's how it's stored. Nor does it use one of the nice, blue
> headered
> tables for UUID (or ENUM) showing storage and other attributes as it
> does for
> numeric, character,boolean, date/time, binary, monetary, geometric,
> or network
> types.
>
>
>
>  Dennis Gearon
>
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is
> usually a better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.


Re: UUID column as pimrary key?

От
Craig Ringer
Дата:
On 01/05/2011 07:31 PM, Radosław Smogura wrote:

> * you have your id, before executing query, (in contrast to all this
> autoincrement) so you may put it in dependant rows

Do you mean that with a UUID, you don't need to talk to the database at
all, you can generate an ID with no interaction with / involvement with
the database at all? Because other than that, there's not much
difference in how you normally work with them.


With a sequence, you might:

CREATE SEQUENCE x_id_seq;
CREATE TABLE x (
     id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
     y integer
);
INSERT INTO x(y) VALUES (1);


With a uuid, you'd:

CREATE TABLE x (
     id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
     y integer
);
INSERT INTO x(y) VALUES (1);


In either case, you can explicitly call the generator function for
seq/uuid - nextval(seqname) or uuid_generate_v4() respectively - or you
can omit the PK column in your inserts and let the database generate it.

> Personally I prefer pooled incremental id's. Fast, unique, you have Id
> before query - but you need to write "code" by self.

Many libraries / ORMs / etc that interact with Pg will happily take care
of this for you. In fact, I had to fight to convince Hibernate that I
*didn't* want it to increment all my counters in steps of 50.

--
Craig Ringer

Re: UUID column as pimrary key?

От
Radosław Smogura
Дата:
 On Wed, 05 Jan 2011 21:50:11 +1100, Craig Ringer
 <craig@postnewspapers.com.au> wrote:
> On 01/05/2011 07:31 PM, Radosław Smogura wrote:
>
>> * you have your id, before executing query, (in contrast to all this
>> autoincrement) so you may put it in dependant rows
>
> Do you mean that with a UUID, you don't need to talk to the database
> at all, you can generate an ID with no interaction with / involvement
> with the database at all? Because other than that, there's not much
> difference in how you normally work with them.
>
>
> With a sequence, you might:
>
> CREATE SEQUENCE x_id_seq;
> CREATE TABLE x (
>     id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
>     y integer
> );
> INSERT INTO x(y) VALUES (1);
>
 I mean situation, when You create e.g. in one transaction, book and
 chapters, in some way You need retrieve book's id, by returning clause
 of insert, or by obtaining id form sequence.
 It's simpler to write:
 book_id = new uuid();
 insert into book values(book_id,....);
 insert into chapters values(new uuid(), book_id, ...);
 isn't it?

Re: UUID column as pimrary key?

От
Szymon Guz
Дата:


On 5 January 2011 15:28, Radosław Smogura <rsmogura@softperience.eu> wrote:
On Wed, 05 Jan 2011 21:50:11 +1100, Craig Ringer <craig@postnewspapers.com.au> wrote:
On 01/05/2011 07:31 PM, Radosław Smogura wrote:

* you have your id, before executing query, (in contrast to all this
autoincrement) so you may put it in dependant rows

Do you mean that with a UUID, you don't need to talk to the database
at all, you can generate an ID with no interaction with / involvement
with the database at all? Because other than that, there's not much
difference in how you normally work with them.


With a sequence, you might:

CREATE SEQUENCE x_id_seq;
CREATE TABLE x (
   id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
   y integer
);
INSERT INTO x(y) VALUES (1);

I mean situation, when You create e.g. in one transaction, book and chapters, in some way You need retrieve book's id, by returning clause of insert, or by obtaining id form sequence.
It's simpler to write:
book_id = new uuid();
insert into book values(book_id,....);
insert into chapters values(new uuid(), book_id, ...);
isn't it?



For me it is simpler just to write this:

bookid = insert into books(...) values(...) returning book_id;
insert into chapters(book_id, ...) values( bookid, ...);

but it's a matter of taste, I think.

regards
Szymon

Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:

> * simple to generate, and 128bit random is almost globally unique,

Almost? Should be totally unique, as long as your random source is decent quality.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 7:28 AM, Radosław Smogura wrote:

> It's simpler to write:
...
> isn't it?

Depends on the situation, the libraries you're using, and so on.

Now, if you're generating records in a distributed system, where your node might be disconnected when it's creating a
record,it is *much* simpler in that case ;-) 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Grzegorz Jaśkiewicz
Дата:
On Wed, Jan 5, 2011 at 2:37 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:
>
>> * simple to generate, and 128bit random is almost globally unique,
>
> Almost? Should be totally unique, as long as your random source is decent quality.

But I would never rely on that alone. You always have a strategy in
place, in case there's a duplicate.


--
GJ

Re: UUID column as pimrary key?

От
Mike Christensen
Дата:
2011/1/5 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
> On Wed, Jan 5, 2011 at 2:37 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
>> On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:
>>
>>> * simple to generate, and 128bit random is almost globally unique,
>>
>> Almost? Should be totally unique, as long as your random source is decent quality.
>
> But I would never rely on that alone. You always have a strategy in
> place, in case there's a duplicate.

As long as all your UUIDs are generated with the same algorithm, they
are guaranteed to be unique.

Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 7:55 AM, Grzegorz Jaśkiewicz wrote:

> But I would never rely on that alone. You always have a strategy in
> place, in case there's a duplicate.

That's really unnecessary, basically a total waste of effort.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Bill Moran
Дата:
In response to Scott Ribe <scott_ribe@elevated-dev.com>:

> On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:
>
> > * simple to generate, and 128bit random is almost globally unique,
>
> Almost? Should be totally unique, as long as your random source is decent quality.

This is going off-topic, but I did some research on this because we
were considering using UUIDs for various keys ...

Fact is, UUIDs are not _guaranteed_ to be unique.  If you use the generating
system that includes a MAC address, then in theory, they are guaranteed
to be unique, but in practice, MAC addresses aren't guaranteed to be
unique either, so that's not 100% either.

Beyond that, the namespace size for a UUID is so incomprehensibly huge
that the chance of two randomly generated UUIDs having the same value
is incomprehensibly unlikely ... it is, however, not a 100% guarantee.

Anyway, in our case, we determined that the chance of UUID collision
for the dataset in question was extremely unlikely, however, the
consequences of such a collision were pretty bad.  We also determined
that we were able to control a "unit ID" for each independent system
that would generate IDs, which could (a) be part of a unique seed for
UUIDs, or (b) be a prefix to a autonumber ID that would be a lot easier
to read and work with manually.  In the end, we chose b for the human
factor.

Face it, reading, remembering, and typing UUIDs kinda sucks.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 8:03 AM, Mike Christensen wrote:

> As long as all your UUIDs are generated with the same algorithm, they
> are guaranteed to be unique.

There is no requirement that they be generated with the same algorithm in order to be unique. A MAC/time-based UUID
cannotduplicate a random one, and vice versa. (Also applies to the 3rd flavor of UUID whose details I do not remember.) 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:

> Beyond that, the namespace size for a UUID is so incomprehensibly huge
> that the chance of two randomly generated UUIDs having the same value
> is incomprehensibly unlikely

Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be
simultaneouslystruck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be
wipedout by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and
earthwill fly off into space... If you're worried about UUID collisions, then either your priorities are completely
wrong,or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so
vanishinglysmall that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including
thechance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that
firesstart in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more
likely,I mean many many many orders of magnitude...) 

> In the end, we chose b for the human
> factor.

A very good decision, in the case where you're actually able to control each independent system.

> Face it, reading, remembering, and typing UUIDs kinda sucks.

Lots of copy & paste, or custom GUI tools for devs & DBAs, or abuse like '...%', all of them painful in their own way.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Bill Moran
Дата:
In response to Scott Ribe <scott_ribe@elevated-dev.com>:

> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
>
> > Beyond that, the namespace size for a UUID is so incomprehensibly huge
> > that the chance of two randomly generated UUIDs having the same value
> > is incomprehensibly unlikely
>
> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be
simultaneouslystruck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be
wipedout by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and
earthwill fly off into space... If you're worried about UUID collisions, then either your priorities are completely
wrong,or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so
vanishinglysmall that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including
thechance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that
firesstart in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more
likely,I mean many many many orders of magnitude...) 

That statement demonstrates a lack of investigation and/or consideration
of the circumstances.

I can't find my math or I'd reproduce it here, but consider this:

If you have 50 devices, each generating 100 UUIDs per hour, and you'll
keep records for 1 year, then your argument above is probably accurate.

However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
be keeping those records for 10+ years, the chances of collisions near
the end of that 10 year span get high enough to actually make developers
nervous.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: UUID column as pimrary key?

От
Tom Lane
Дата:
Scott Ribe <scott_ribe@elevated-dev.com> writes:
> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
>> Beyond that, the namespace size for a UUID is so incomprehensibly huge
>> that the chance of two randomly generated UUIDs having the same value
>> is incomprehensibly unlikely

> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be
simultaneouslystruck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be
wipedout by an asteroid impact, 

I see those sorts of arguments all the time, and I consider them pure BS.
Yes, the namespace is theoretically large.  However, the questions you
really have to answer are (a) how much of the namespace is actually
being used by the UUID generation methods in use in a particular
application; (b) how sure can you be that there is not correlation
between UUIDs generated in different places/sessions.

In practical use I think the odds of a collision are *far* higher than
you are suggesting, unless the UUID generation is being done with a lot
more care than is likely if the user takes these sorts of claims at face
value.  The odds may still be low enough to be a very good risk, but
you need to think about it not just bet your database on it without
thinking.

Being paranoid is a good thing.  It's what DBAs are paid for.

            regards, tom lane

Re: UUID column as pimrary key?

От
Rob Sargent
Дата:

On 01/05/2011 08:55 AM, Bill Moran wrote:
> In response to Scott Ribe <scott_ribe@elevated-dev.com>:
>
>> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
>>
>>> Beyond that, the namespace size for a UUID is so incomprehensibly huge
>>> that the chance of two randomly generated UUIDs having the same value
>>> is incomprehensibly unlikely
>>
>> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be
simultaneouslystruck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be
wipedout by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and
earthwill fly off into space... If you're worried about UUID collisions, then either your priorities are completely
wrong,or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so
vanishinglysmall that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including
thechance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that
firesstart in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more
likely,I mean many many many orders of magnitude...) 
>
> That statement demonstrates a lack of investigation and/or consideration
> of the circumstances.
>
> I can't find my math or I'd reproduce it here, but consider this:
>
> If you have 50 devices, each generating 100 UUIDs per hour, and you'll
> keep records for 1 year, then your argument above is probably accurate.
>
> However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
> be keeping those records for 10+ years, the chances of collisions near
> the end of that 10 year span get high enough to actually make developers
> nervous.
>

But we're talking about a primary key.  Postgres guarantees the
uniqueness.  1 transaction  in 10^^100 rolls back due to a second
instance of an (otherwise/almost) uuid.  Big deal.

Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 8:55 AM, Bill Moran wrote:

> That statement demonstrates a lack of investigation and/or consideration
> of the circumstances.

No, it doesn't.

> However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
> be keeping those records for 10+ years, the chances of collisions near
> the end of that 10 year span get high enough to actually make developers
> nervous.

No, they don't. At the end of your hypothetical 10-year period, you will have used about 43,000,000,000 UUIDs, or about
1/100,000,000,000,000,000,000,000,000thof the UUID space (assuming random UUIDs). Leaving you with a chance of a single
collisionof about 1/18,000,000,000,000,000. 

Assuming of course good entropy. If the generation of random numbers is bad, then UUIDs are not so useful ;-)

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Leif Biberg Kristensen
Дата:
On Wednesday 5. January 2011 16.05.29 Bill Moran wrote:

> Beyond that, the namespace size for a UUID is so incomprehensibly huge
> that the chance of two randomly generated UUIDs having the same value
> is incomprehensibly unlikely ... it is, however, not a 100% guarantee.

I can't help thinking of the «Birthday Paradox»:

http://en.wikipedia.org/wiki/Birthday_problem

regards, Leif

Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 9:01 AM, Tom Lane wrote:

> In practical use I think the odds of a collision are *far* higher than
> you are suggesting, unless the UUID generation is being done with a lot
> more care than is likely if the user takes these sorts of claims at face
> value.

Eh? The user taking such claims at face value has no bearing whatsoever on the quality of the UUID generation algorithm
providedby the OS. So, unless we're talking about users coming up with their own algorithms, it seems reasonable to
assumethat the generation is done with a great deal of care. (And if we are talking about users coming up with their
ownalgorithms, then all bets are off; feel free to assume the worst.) I know that is the case on OS X & Linux. I would
beshocked if it were not the case on Solaris. I would even be surprised if it were not the case on Windows. 

The IETF Network Working Group designed UUIDs to ensure that their uniqueness guarantee would be strong enough that no
applicationwould need to worry about duplicates, ever. Claims that collisions are too likely to depend on UUIDs being
uniquereally are claims that the IETF Network Working Group didn't know what it was doing, which I find a bit
ridiculous.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 9:19 AM, Leif Biberg Kristensen wrote:

> I can't help thinking of the «Birthday Paradox»:

Yes, the calculation of the probability of a collision is the same for the "birthday paradox" as for random UUID
collisions.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Grzegorz Jaśkiewicz
Дата:
On Wed, Jan 5, 2011 at 3:03 PM, Mike Christensen <mike@kitchenpc.com> wrote:
> 2011/1/5 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>> On Wed, Jan 5, 2011 at 2:37 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
>>> On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:
>>>
>>>> * simple to generate, and 128bit random is almost globally unique,
>>>
>>> Almost? Should be totally unique, as long as your random source is decent quality.
>>
>> But I would never rely on that alone. You always have a strategy in
>> place, in case there's a duplicate.
>
> As long as all your UUIDs are generated with the same algorithm, they
> are guaranteed to be unique.
>

Good luck with that ....



--
GJ

Re: UUID column as pimrary key?

От
Adrian Klaver
Дата:
On 01/05/2011 08:29 AM, Scott Ribe wrote:
> On Jan 5, 2011, at 9:01 AM, Tom Lane wrote:
>
>> In practical use I think the odds of a collision are *far* higher than
>> you are suggesting, unless the UUID generation is being done with a lot
>> more care than is likely if the user takes these sorts of claims at face
>> value.
>
> Eh? The user taking such claims at face value has no bearing whatsoever on the quality of the UUID generation
algorithmprovided by the OS. So, unless we're talking about users coming up with their own algorithms, it seems
reasonableto assume that the generation is done with a great deal of care. (And if we are talking about users coming up
withtheir own algorithms, then all bets are off; feel free to assume the worst.) I know that is the case on OS X&
Linux.I would be shocked if it were not the case on Solaris. I would even be surprised if it were not the case on
Windows.
>
> The IETF Network Working Group designed UUIDs to ensure that their uniqueness guarantee would be strong enough that
noapplication would need to worry about duplicates, ever. Claims that collisions are too likely to depend on UUIDs
beingunique really are claims that the IETF Network Working Group didn't know what it was doing, which I find a bit
ridiculous.
>

Maybe or maybe not:)

http://www.ietf.org/rfc/rfc4122.txt

"4.3.  Algorithm for Creating a Name-Based UUID

    The version 3 or 5 UUID is meant for generating UUIDs from "names"
    that are drawn from, and unique within, some "name space".  The
    concept of name and name space should be broadly construed, and not
    limited to textual names.  For example, some name spaces are the
    domain name system, URLs, ISO Object IDs (OIDs), X.500 Distinguished
    Names (DNs), and reserved words in a programming language.  The
    mechanisms or conventions used for allocating names and ensuring
    their uniqueness within their name spaces are beyond the scope of
    this specification.

    The requirements for these types of UUIDs are as follows:

    o  The UUIDs generated at different times from the same name in the
       same namespace MUST be equal.

    ......
"

--
Adrian Klaver
adrian.klaver@gmail.com

Re: UUID column as pimrary key?

От
Radosław Smogura
Дата:
Sorry for not citation...

When I was talking about "almost unique", I was meaning that the UUID is
random so there is no guarantee that you will not generate two indencital
UUIDs even in subsequent calls, but it has looooow probability (you have
greater chances to win in LOTTO).

128bits is huge for now, but what will happen in next 2,3 years? In 20th
century, people think storing only last two digit of year will be enaugh!!!
Suppose we are creating gloabl, distributed database for storing information
about mobile device and base station it logged in.

If we want to guarantee uniquness of UUID across calls, we could talk about
much more far _pseudo_ random generator, then "normal" pseudo - randoms, and
what I think we need to keep state of such random generator, and share and
lock it for multiple concurrent calls. So it will not be something different
then ordinal serial column...

Now I think it's clear there is no "magical" algorithm for UUIDs.

My opinion about all of those UUID with MAC, IP addresses, Microsoft "growing"
UUIDs. All of this decrases chance of uniqness of UUID. If we will keep n
first bits of UUID constant, then we have only 128-n bits random (truly in
UUIDs we have one decimal field reserved for UUID version). If we want next
UUID to be greater then previous, at each call we will remove (next-prev) of
possible values.

Shouldn't this be enaugh for namespace UUIDs
new UUID("namespece".hashCode(), "name".hashChode())

or a little joke...
new UUID(1,1) meats this condition
>     o  The UUIDs generated at different times from the same name in the
>
>        same namespace MUST be equal.




Re: *****SPAM***** Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 10:30 AM, Radosław Smogura wrote:

> 128bits is huge for now, but what will happen in next 2,3 years?

It will still be large. When you get up to around 100 trillion UUIDs, you'll be getting up to around a 1 in a billion
chanceof a single collision. Before you claim that we'll outgrow them in a few years, you might want to consider the
actualnumbers, the growth rate of storage density, the number of disks required to hold that many records. I have a
feelingthe answers might surprise you ;-) 

> If we want to guarantee uniquness of UUID across calls, we could talk about
> much more far _pseudo_ random generator, then "normal" pseudo - randoms, and
> what I think we need to keep state of such random generator, and share and
> lock it for multiple concurrent calls. So it will not be something different
> then ordinal serial column...

No, we don't really have to think about it much at all. The IETF and OS engineers already did. Random UUIDs are not
generatedwith some sloppy amateurish algorithm. 

> My opinion about all of those UUID with MAC, IP addresses, Microsoft "growing"
> UUIDs. All of this decrases chance of uniqness of UUID.

Well, a decrease from one insanely small chance to another insanely small chance is not anything to worry about. After
all,you could argue that 128 bits is a "decrease" from 256 bits. It's the same argument. UUIDs were designed to avoid
collisions,by people who knew what they were doing. More significant bits would lower the chance, from "already low
enough"to "even more low enough". 

> Shouldn't this be enaugh for namespace UUIDs
> new UUID("namespece".hashCode(), "name".hashChode())
>
> or a little joke...
> new UUID(1,1) meats this condition
>>    o  The UUIDs generated at different times from the same name in the
>>
>>       same namespace MUST be equal.
>

People, people, people, please. *Namespace* UUIDs are intended to map *unique* names to UUIDs in the case where you
alreadyhave *unique* names and just need to map them to a more compact form, thus the requirement that the same
namespace+ name always yields the same UUID. This is *not* a weakness in UUIDs, nor is it the kind of UUID you get from
asimple uuid_gen or similar call, nor is it a possible source of collisions for database UUIDs (unless you do something
enormouslystupid, like use database fields to construct a name to give to a UUID generator). 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Radosław Smogura
Дата:
Scott Ribe <scott_ribe@elevated-dev.com> Wednesday 05 January 2011 17:33:51
> On Jan 5, 2011, at 9:19 AM, Leif Biberg Kristensen wrote:
> > I can't help thinking of the «Birthday Paradox»:
> Yes, the calculation of the probability of a collision is the same for the
> "birthday paradox" as for random UUID collisions.

Depends on probability measure we will choose. The true is that probability
that in two coin drops we will get two reverses is 1/4, but true is, too, as
Newton said, it's 1/3, because if in 1st drop we don't get reverse we don't
need to drop again.

Re: UUID column as pimrary key?

От
Bill Moran
Дата:
In response to Rob Sargent <robjsargent@gmail.com>:

>
>
> On 01/05/2011 08:55 AM, Bill Moran wrote:
> > In response to Scott Ribe <scott_ribe@elevated-dev.com>:
> >
> >> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
> >>
> >>> Beyond that, the namespace size for a UUID is so incomprehensibly huge
> >>> that the chance of two randomly generated UUIDs having the same value
> >>> is incomprehensibly unlikely
> >>
> >> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be
simultaneouslystruck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be
wipedout by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and
earthwill fly off into space... If you're worried about UUID collisions, then either your priorities are completely
wrong,or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so
vanishinglysmall that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including
thechance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that
firesstart in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more
likely,I mean many many many orders of magnitude...) 
> >
> > That statement demonstrates a lack of investigation and/or consideration
> > of the circumstances.
> >
> > I can't find my math or I'd reproduce it here, but consider this:
> >
> > If you have 50 devices, each generating 100 UUIDs per hour, and you'll
> > keep records for 1 year, then your argument above is probably accurate.
> >
> > However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
> > be keeping those records for 10+ years, the chances of collisions near
> > the end of that 10 year span get high enough to actually make developers
> > nervous.
> >
>
> But we're talking about a primary key.  Postgres guarantees the
> uniqueness.  1 transaction  in 10^^100 rolls back due to a second
> instance of an (otherwise/almost) uuid.  Big deal.

That doesn't make any sense.  If you're using a single PostgreSQL instance,
then why not just use the built in SERIAL mechanism that guarantees that
you will NEVER have a conflict?

In our case (and I expect it's the case with most people considering UUIDs)
we're talking about independent devices that occasionally synchronize
data between themselves.  These devices need to generate a unique ID
of some sort without having to check with every other device.  This is
one of the problems that UUIDs were intended to fix.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: UUID column as pimrary key?

От
Bill Moran
Дата:
In response to Scott Ribe <scott_ribe@elevated-dev.com>:

> On Jan 5, 2011, at 8:55 AM, Bill Moran wrote:
>
> > That statement demonstrates a lack of investigation and/or consideration
> > of the circumstances.
>
> No, it doesn't.
>
> > However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
> > be keeping those records for 10+ years, the chances of collisions near
> > the end of that 10 year span get high enough to actually make developers
> > nervous.
>
> No, they don't. At the end of your hypothetical 10-year period, you will have used about 43,000,000,000 UUIDs, or
about1/100,000,000,000,000,000,000,000,000th of the UUID space (assuming random UUIDs). Leaving you with a chance of a
singlecollision of about 1/18,000,000,000,000,000. 

For crying out loud.  If you're going to pick me apart with numbers, then
actually do it with some intelligence.

I could easily counter your argument by upping the numbers to 500,000
mobile devices generating 10000 UUIDs per hour over 20 years ... or raise
it even higher if you come back with that same argument ...

But the point (that you are trying to sidestep) is that the UUID namespace
is finite, so therefore you WILL hit a problem with conflicts at some point.
Just because that point is larger than most people have to concern themselves
with isn't an invalidation.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: *****SPAM***** Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 11:31 AM, Radosław Smogura wrote:

> The true is that probability
> that in two coin drops we will get two reverses is 1/4, but true is, too, as
> Newton said, it's 1/3, because if in 1st drop we don't get reverse we don't
> need to drop again.

Nonsense. You don't stop generating UUIDs just because you haven't yet got a collision.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Dennis Gearon
Дата:
Yes, the ID is generated before the database is contacted, in my case anyway.

The type of UUID that I want is a hybrid, so I would have to write a stored
procedure and then a trigger upon insert to get the UUID. Not rocket science, I
just have more on my plate than I can handle. So PHP is my 'main thing', so I
quickly wrote it in that. Plus, it makes it more database agnostic.

I don't get next/currval behavior, but I don't think I'll need it.

BTW, Switching from Postgres? Not likely anytime soon. I'd have to be up in the
500M+ rows and be in the  data warehousing/map reducing  arena before I'd
consider THAT. And there's 'flavors' of Postgres that will do that, anyway.



Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better

idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: Craig Ringer <craig@postnewspapers.com.au>
To: Radosław Smogura <rsmogura@softperience.eu>
Cc: Dennis Gearon <gearond@sbcglobal.net>; pgsql-general@postgresql.org
Sent: Wed, January 5, 2011 2:50:11 AM
Subject: Re: [GENERAL] UUID column as pimrary key?

On 01/05/2011 07:31 PM, Radosław Smogura wrote:

> * you have your id, before executing query, (in contrast to all this
> autoincrement) so you may put it in dependant rows

Do you mean that with a UUID, you don't need to talk to the database at all, you

can generate an ID with no interaction with / involvement with the database at
all? Because other than that, there's not much difference in how you normally
work with them.


With a sequence, you might:

CREATE SEQUENCE x_id_seq;
CREATE TABLE x (
    id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
    y integer
);
INSERT INTO x(y) VALUES (1);


With a uuid, you'd:

CREATE TABLE x (
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    y integer
);
INSERT INTO x(y) VALUES (1);


In either case, you can explicitly call the generator function for seq/uuid -
nextval(seqname) or uuid_generate_v4() respectively - or you can omit the PK
column in your inserts and let the database generate it.

> Personally I prefer pooled incremental id's. Fast, unique, you have Id
> before query - but you need to write "code" by self.

Many libraries / ORMs / etc that interact with Pg will happily take care of this

for you. In fact, I had to fight to convince Hibernate that I *didn't* want it
to increment all my counters in steps of 50.

--
Craig Ringer

Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 12:03 PM, Bill Moran wrote:

> For crying out loud.  If you're going to pick me apart with numbers, then
> actually do it with some intelligence.

If you're going to get nasty, at least try to be accurate.

> I could easily counter your argument by upping the numbers to 500,000
> mobile devices generating 10000 UUIDs per hour over 20 years ... or raise
> it even higher if you come back with that same argument ...

Yeah, then you get into the realm of 1 in 10s of millions of a chance of collision. But you would need to explain to me
howyou would get that many records into the database committed to disk, when the UUIDs alone without any other data
representa stream of 22MB/s ;-) Or, looked at another way, inserting 1,388,889 rows/second would indeed be difficult to
sustain. 

> But the point (that you are trying to sidestep) is that the UUID namespace
> is finite, so therefore you WILL hit a problem with conflicts at some point.
> Just because that point is larger than most people have to concern themselves
> with isn't an invalidation.

I'm not sidestepping the point at all. The point is that the finiteness of the space is a red herring. The space is
largeenough that there's no chance of collision in any realistic scenario. In order to get to a point where the
probabilityof collision is high enough to worry about, you have to generate (and collect) UUIDs at a rate that is
simplynot realistic--as in your second example quoted above. If you just keep raising your numbers, you could go for
100,000,000,000,000devices generating 100,000,000,000,000 UUIDs an hour for 10,000 years. Collisions would be
guaranteed,but that does not make it a useful scenario to consider. 

2^256 is a finite space as well. Would you argue that because it "is finite, so therefore you WILL hit a problem with
conflictsat some point"? How about 2^512? (Bearing in mind that even though finite that space would be large enough to
assignapproximately 10^74 UUIDs to every atom in the observable universe, or 10^51 UUIDs to every atom in the total
universeusing high-end estimates of the size of the non-observable universe)? 


--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Rob Sargent
Дата:

On 01/05/2011 11:57 AM, Bill Moran wrote:
> In response to Rob Sargent <robjsargent@gmail.com>:
>
>>
>>
>> On 01/05/2011 08:55 AM, Bill Moran wrote:
>>> In response to Scott Ribe <scott_ribe@elevated-dev.com>:
>>>
>>>> On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
>>>>
>>>>> Beyond that, the namespace size for a UUID is so incomprehensibly huge
>>>>> that the chance of two randomly generated UUIDs having the same value
>>>>> is incomprehensibly unlikely
>>>>
>>>> Yes, as in: it is *far* more likely that all of your team members and all of your client contacts will be
simultaneouslystruck by lightning and killed in their sleep, and it is *far* more likely that all life on earth will be
wipedout by an asteroid impact, and it is *far* more likely that the solar system orbits are not actually stable and
earthwill fly off into space... If you're worried about UUID collisions, then either your priorities are completely
wrong,or you live in a bomb shelter--that's not sarcasm by the way, it's simply true, the chance of a collision is so
vanishinglysmall that it is dwarfed by all sorts of risks that we all ignore because the chances are so low, including
thechance that all drives in all your RAIDs across all your replicas will simultaneously fail on the same day that
firesstart in all the locations where your tapes are stored and all the sprinkler systems fail... (By "far" more
likely,I mean many many many orders of magnitude...) 

>>>
>>> That statement demonstrates a lack of investigation and/or consideration
>>> of the circumstances.
>>>
>>> I can't find my math or I'd reproduce it here, but consider this:
>>>
>>> If you have 50 devices, each generating 100 UUIDs per hour, and you'll
>>> keep records for 1 year, then your argument above is probably accurate.
>>>
>>> However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
>>> be keeping those records for 10+ years, the chances of collisions near
>>> the end of that 10 year span get high enough to actually make developers
>>> nervous.
>>>
>>
>> But we're talking about a primary key.  Postgres guarantees the
>> uniqueness.  1 transaction  in 10^^100 rolls back due to a second
>> instance of an (otherwise/almost) uuid.  Big deal.
>
> That doesn't make any sense.  If you're using a single PostgreSQL instance,
> then why not just use the built in SERIAL mechanism that guarantees that
> you will NEVER have a conflict?
>
> In our case (and I expect it's the case with most people considering UUIDs)
> we're talking about independent devices that occasionally synchronize
> data between themselves.  These devices need to generate a unique ID
> of some sort without having to check with every other device.  This is
> one of the problems that UUIDs were intended to fix.
>
Indeed there is a finite space.  A very large, but finite space, just as
sequence has I suspect.  If your software cannot handle a rollback for
whatever reason, you have much bigger problem on your hand than the the
remote chance of a collision in uuid generation.

From wikipedia, "only after generating 1 billion UUIDs every second for
the next 100 years, the probability of creating just one duplicate would
be about 50%. The probability of one duplicate would be about 50% if
every person on earth owns 600 million UUIDs."

Re: *****SPAM***** Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 1:13 PM, Raymond O'Donnell wrote:

> Is that taking dark matter into account? :-)

It's not clear to me ;-)

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Raymond O'Donnell
Дата:
On 05/01/2011 19:41, Scott Ribe wrote:
> to every atom in the observable universe, or 10^51 UUIDs to every
> atom in the total universe using high-end estimates of the size of
> the non-observable universe)?

Is that taking dark matter into account? :-)

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

Re: UUID column as pimrary key?

От
Bill Moran
Дата:
In response to Rob Sargent <robjsargent@gmail.com>:
> >
> > In our case (and I expect it's the case with most people considering UUIDs)
> > we're talking about independent devices that occasionally synchronize
> > data between themselves.  These devices need to generate a unique ID
> > of some sort without having to check with every other device.  This is
> > one of the problems that UUIDs were intended to fix.
>
> Indeed there is a finite space.  A very large, but finite space, just as
> sequence has I suspect.  If your software cannot handle a rollback for
> whatever reason, you have much bigger problem on your hand than the the
> remote chance of a collision in uuid generation.

You missed the point.

Despite the fact that the chance of a collision is very, very small, there
is no easy way to fix it if it happens.  Zero.  It can't be done without
shutting the system down, recalling all the remote devices and manually
reconciling the problem ... which is not an option.

Also, it's hard to DETECT the collision.  If a device creates a new record
with a duplicate UUID, how do we tell that apart, during synchronization,
from an update to the record?  Now I have your SSN or private medical
data included as part of my record, which violates laws, could lead to
incorrect medical care that could kill someone ... etc.

So, despite the chance being very small, the consequences are HUGE.  I
can't just detect it and roll back.

If you have a single DB backend, these problem are easy to solve -- so
easy, in fact, that using UUIDs is overkill.

But when you have mission-critical data that must be correct and available
under every circumstance, you have to consider that UUIDs are not a
guarantee.  And when a method that DOES have a guarantee is one of the
options, why would you take ANY risk at all, no matter how small?

(BTW: I hope that the people who think that the risk is acceptable aren't
writing medical software.  Even if it only kills one person every 10,000
years because they were given the wrong medicine, that's too often in
my opinion)

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: UUID column as pimrary key?

От
Rob Sargent
Дата:

On 01/05/2011 01:43 PM, Bill Moran wrote:
> In response to Rob Sargent <robjsargent@gmail.com>:
>>>
>>> In our case (and I expect it's the case with most people considering UUIDs)
>>> we're talking about independent devices that occasionally synchronize
>>> data between themselves.  These devices need to generate a unique ID
>>> of some sort without having to check with every other device.  This is
>>> one of the problems that UUIDs were intended to fix.
>>
>> Indeed there is a finite space.  A very large, but finite space, just as
>> sequence has I suspect.  If your software cannot handle a rollback for
>> whatever reason, you have much bigger problem on your hand than the the
>> remote chance of a collision in uuid generation.
>
> You missed the point.
>
> Despite the fact that the chance of a collision is very, very small, there
> is no easy way to fix it if it happens.  Zero.  It can't be done without
> shutting the system down, recalling all the remote devices and manually
> reconciling the problem ... which is not an option.
>
> Also, it's hard to DETECT the collision.  If a device creates a new record
> with a duplicate UUID, how do we tell that apart, during synchronization,
> from an update to the record?  Now I have your SSN or private medical
> data included as part of my record, which violates laws, could lead to
> incorrect medical care that could kill someone ... etc.
>
> So, despite the chance being very small, the consequences are HUGE.  I
> can't just detect it and roll back.
>
> If you have a single DB backend, these problem are easy to solve -- so
> easy, in fact, that using UUIDs is overkill.
>
> But when you have mission-critical data that must be correct and available
> under every circumstance, you have to consider that UUIDs are not a
> guarantee.  And when a method that DOES have a guarantee is one of the
> options, why would you take ANY risk at all, no matter how small?
>
> (BTW: I hope that the people who think that the risk is acceptable aren't
> writing medical software.  Even if it only kills one person every 10,000
> years because they were given the wrong medicine, that's too often in
> my opinion)
>

Seems the software would need to know whether an update or an add was
begin performed: is this a new patient or one all ready in the db.  New
record patient record and new id (of some arbitrary nature) and any new
subsidiary data associated appropriately.


BTW, you're opinion of efficacy of current medical practice is also out
of whack.  From a JAMA artical: "Similar to previous studies, almost a
quarter (22.7%) of active-care patient deaths were rated as at least
possibly preventable by optimal care, with 6.0% rated as probably or
definitely preventable. " That's 3 in 50. How d'ya like them odds?

Re: UUID column as pimrary key?

От
Andrew Sullivan
Дата:
On Wed, Jan 05, 2011 at 12:41:43PM -0700, Scott Ribe wrote:
> I'm not sidestepping the point at all.

You may be missing it, however, because. . .

> The point is that the finiteness of the space is a red herring. The
> space is large enough that there's no chance of collision in any
> realistic scenario.
> In order to get to a point where the probability
> of collision is high enough to worry about, you have to generate
> (and collect) UUIDs at a rate that is simply not realistic--as in
> your second example quoted above.

. . .the example was not that UUIDs are being generated and collected
in one place at that rate, but that they're being generated in several
independent places at a time, and if the cost of the collision is
extremely high, there might be reasons not to use the UUID strategy
but instead to use something else that is generated algorithmically by
the database.  There's a trade-off in having distributed systems
acting completely independently, and while I have lots of confidence
in my colleagues at the IETF (and agree with you that for the
overwhelming majority of cases UUIDs are guaranteed-unique enough),
correctly making these trade-offs still requires thought and
analysis.  It's exactly the kind of of analysis that professional
paranoids like DBAs are for.

A

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: UUID column as pimrary key?

От
Alban Hertroys
Дата:
>> From wikipedia, "only after generating 1 billion UUIDs every second for
> the next 100 years, the probability of creating just one duplicate would
> be about 50%. The probability of one duplicate would be about 50% if
> every person on earth owns 600 million UUIDs."


Even if the chances of one person encountering a UUID collision are about once in say 100 billion years, that could be
tomorrow.It could also not happen at all in that time span. That's how chance works. You can never assume it won't
happen.If the costs incurred by a collision are lower than the costs of preventing it, you can choose to just take the
hit,but that doesn't go for _every_ problem domain. The medical systems mentioned up-thread are an example of that. 

Next to that, UUID's are generated by computers. I have no doubts that the numeric space that makes up a UUID allows
forcollision chances as low as described, but are computers capable of generating those numbers sufficiently random
thatthey actually achieve that low a chance? I think that's pushing it. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d24f31d11541020617287!



Re: UUID column as pimrary key?

От
Rob Sargent
Дата:

On 01/05/2011 03:39 PM, Alban Hertroys wrote:
>>> From wikipedia, "only after generating 1 billion UUIDs every second for
>> the next 100 years, the probability of creating just one duplicate would
>> be about 50%. The probability of one duplicate would be about 50% if
>> every person on earth owns 600 million UUIDs."
>
>
> Even if the chances of one person encountering a UUID collision are about once in say 100 billion years, that could
betomorrow. It could also not happen at all in that time span. That's how chance works. You can never assume it won't
happen.If the costs incurred by a collision are lower than the costs of preventing it, you can choose to just take the
hit,but that doesn't go for _every_ problem domain. The medical systems mentioned up-thread are an example of that. 
>
> Next to that, UUID's are generated by computers. I have no doubts that the numeric space that makes up a UUID allows
forcollision chances as low as described, but are computers capable of generating those numbers sufficiently random
thatthey actually achieve that low a chance? I think that's pushing it. 
>
> Alban Hertroys
>
> --
> If you can't see the forest for the trees,
> cut the trees and you'll see there is no forest.
>
>
> !DSPAM:1214,4d24f31711541026711723!
>
>
I'm not talking about chance. I'm talking about robust software.  You
have to expect some sort of rollback for any of a number of reasons and
deal with them.  The potential UUID collision is just one example of a
contingency one might plan for after having taken care of all the more
probable failure points.

Re: UUID column as pimrary key?

От
Michael Satterwhite
Дата:
I've been enjoying this discussion, but it seems to me there are two
possibilities that guarantee unique results, use less storage, and neither
would use a UUID for the primary key.

The first would assume that only one machine is doing the work on the database.
As has been pointed out earlier, in this case the simplest solution would be
to simply use a series. As that is guaranteed unique on a single machine and
is incredibly simple to implement.

Once multiple machines are linked to maintain the database, this has a flaw in
it as a series is not guaranteed to be unique between machines (actually I
think they're pretty well guaranteed *NOT* to be, but that's another issue. In
that case, another field is needed as follows:

     Create table test
    (  id serial,
      machine_id integer,
    .....
    primary_key(machine_id, id);
    )

Each machine would have a unique machine_id. This would guarantee uniqueness
and be very easy to maintain. In addition - *IF* it were ever necessary - it
would be easy to determine which machine generated records.

OK, I'm probably going to regret jumping in here, but I couldn't resist.

Sorry

Re: UUID column as pimrary key?

От
dennis jenkins
Дата:
On Wed, Jan 5, 2011 at 1:03 PM, Bill Moran <wmoran@potentialtech.com> wrote:
>
> But the point (that you are trying to sidestep) is that the UUID namespace
> is finite, so therefore you WILL hit a problem with conflicts at some point.
> Just because that point is larger than most people have to concern themselves
> with isn't an invalidation.

The UUID itself is 128 bits.  Some of those bits are pre-determined.
I don't recall, but I think that a "normal" UUID has 121 bits of
randomness.

How many would one have to store in a database before a collision
would even be a concern.  Such a database would be freaking huge.
Probably far larger than anything that anyone has.

Lets say (I'm pulling numbers out of my ass here), that you wanted to
store 2^100 rows in a table.  Each row would have a UUID and some
other meaningful data.  Maybe a short string or something.  I don't
recall what the postgresql row overhead is (~20 bytes?), but lets say
that each row in your magic table of death required 64 bytes.  A table
with 2^100 rows would require nearly 10^31 bytes ( = log_10(64 *
2^100)).  How on Earth would you store that much data?  And why would
you ever need to?

I postulate that UUID collisions in Postgresql, using a "good" source
for UUID generation, is unlikely to have collisions for any reasonable
database.

Food for thought:
http://blogs.sun.com/dcb/entry/zfs_boils_the_ocean_consumes

ps- If my math is off, I apologize.  Its been a long day...

Re: UUID column as pimrary key?

От
Radosław Smogura
Дата:
 Pooled id.. child machine connects to main and says give 10000 of ids,
 main increments counter by 10000, child allocates in given pool.

 On Wed, 5 Jan 2011 17:11:35 -0600, Michael Satterwhite
 <michael@weblore.com> wrote:
> Once multiple machines are linked to maintain the database, this has
> a flaw in
> it as a series is not guaranteed to be unique between machines
> (actually I
> think they're pretty well guaranteed *NOT* to be

Re: UUID column as pimrary key?

От
Szymon Guz
Дата:


On 6 January 2011 00:32, dennis jenkins <dennis.jenkins.75@gmail.com> wrote:
On Wed, Jan 5, 2011 at 1:03 PM, Bill Moran <wmoran@potentialtech.com> wrote:
>
> But the point (that you are trying to sidestep) is that the UUID namespace
> is finite, so therefore you WILL hit a problem with conflicts at some point.
> Just because that point is larger than most people have to concern themselves
> with isn't an invalidation.

The UUID itself is 128 bits.  Some of those bits are pre-determined.
I don't recall, but I think that a "normal" UUID has 121 bits of
randomness.

How many would one have to store in a database before a collision
would even be a concern.  Such a database would be freaking huge.
Probably far larger than anything that anyone has.

Not necessarily.

Bad luck: two inserts, and one collision.
Good luck: many many inserts, without any collision.

If you have ten pairs of socks... how many do you need to choose, to have two from the same pair (to have a collision)?
Good luck: 2
Bad luck: 11

With uuid you can write a program that most of the time works correctly, but sometimes not.
If you need a program that works correctly all the time, you can use UUID, with the overhead of a procedure, that checks the collisions, and does something with that, or you could generate the PK without UUID, and be sure that the generator won't have collisions, just because it is a normal sequence.

regards
Szymon

Re: UUID column as pimrary key?

От
Chris Browne
Дата:
dalroi@solfertje.student.utwente.nl (Alban Hertroys) writes:
>>> From wikipedia, "only after generating 1 billion UUIDs every second for
>> the next 100 years, the probability of creating just one duplicate would
>> be about 50%. The probability of one duplicate would be about 50% if
>> every person on earth owns 600 million UUIDs."
>
>
> Even if the chances of one person encountering a UUID collision are
> about once in say 100 billion years, that could be tomorrow. It could
> also not happen at all in that time span. That's how chance works. You
> can never assume it won't happen. If the costs incurred by a collision
> are lower than the costs of preventing it, you can choose to just take
> the hit, but that doesn't go for _every_ problem domain. The medical
> systems mentioned up-thread are an example of that.
>
> Next to that, UUID's are generated by computers. I have no doubts that
> the numeric space that makes up a UUID allows for collision chances as
> low as described, but are computers capable of generating those
> numbers sufficiently random that they actually achieve that low a
> chance? I think that's pushing it.

I think I'd rather analyze this by:
a) Reading the RFC, and
b) Looking at some of the common implementations likely to get used

rather than to merely have "a doubt."

RFC 4122 does NOT point to randomness as the only criterion to
discourage collisions, and treating UUIDs as if they were merely about
being "sufficiently random to achieve low chance of collision" is
insulting to the drafters of the standard, because they were certainly
NOT so naive as to think that was sufficient.
--
"cbbrowne","@","acm.org"
"What I find most amusing  about com and .NET  is that they are trying
to solve a problem I only had when programming using MS tools."
-- Max M <maxm@mxm.dk> (on comp.lang.python)

Re: UUID column as pimrary key?

От
Chris Browne
Дата:
ajs@crankycanuck.ca (Andrew Sullivan) writes:
> On Wed, Jan 05, 2011 at 12:41:43PM -0700, Scott Ribe wrote:
>> I'm not sidestepping the point at all.
>
> You may be missing it, however, because. . .
>
>> The point is that the finiteness of the space is a red herring. The
>> space is large enough that there's no chance of collision in any
>> realistic scenario.
>> In order to get to a point where the probability
>> of collision is high enough to worry about, you have to generate
>> (and collect) UUIDs at a rate that is simply not realistic--as in
>> your second example quoted above.
>
> . . .the example was not that UUIDs are being generated and collected
> in one place at that rate, but that they're being generated in several
> independent places at a time, and if the cost of the collision is
> extremely high, there might be reasons not to use the UUID strategy
> but instead to use something else that is generated algorithmically by
> the database.  There's a trade-off in having distributed systems
> acting completely independently, and while I have lots of confidence
> in my colleagues at the IETF (and agree with you that for the
> overwhelming majority of cases UUIDs are guaranteed-unique enough),
> correctly making these trade-offs still requires thought and
> analysis.  It's exactly the kind of of analysis that professional
> paranoids like DBAs are for.

But it seems to me that some of the analytics are getting a little *too*
paranoid, on the "perhaps UUIDs are the wrong answer" side of the
column.

There's no panaceas, here; if the process that is using IDs is fragile,
then things can break down whether one is using UUID or SERIAL.

I prefer the "probably unique enough" side of the fence, myself.

And the process that uses the IDs needs to be robust enough that things
won't just fall apart in tatters if it runs into non-uniqueness.

I'd expect that to not need to be a terribly big deal - if there's a
UNIQUE index on a UUID-based column, then an insert will fail, and the
process can pick between things like:
 - Responding that it had a problem, or
 - Retrying.

And if the system isn't prepared for that sort of condition, then it's
also not prepared for some seemingly more likely error conditions such
as:
 - The DB connection timed out because something fuzzed out on the
   network
 - The DB server fell over and is restarting because (power failed,
   someone kicked the switch, disk ran out, ...)

It seems rather silly to be *totally* paranoid about the
not-infinite-uniqueness of UUIDs when there are plenty of other risks
lurking around that also need erro checking.
--
"cbbrowne","@","gmail.com"
http://linuxdatabases.info/info/slony.html
"How can you dream the impossible dream when you can't get any sleep?"
-- Sam Robb

Re: *****SPAM***** Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 3:03 PM, Andrew Sullivan wrote:

> ...the example was not that UUIDs are being generated and collected
> in one place at that rate, but that they're being generated in several
> independent places at a time...

In order for a collision to matter, really in order for there to *be* a collision, the duplicate pair has to be
collectedin one place. 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: *****SPAM***** Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 5, 2011, at 4:11 PM, Michael Satterwhite wrote:

> Each machine would have a unique machine_id. This would guarantee uniqueness
> and be very easy to maintain.

And if somebody clones the disk to a new machine, but leaves the old one in service? Or do you use the MAC address and
hopethat's unique? 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Andrew Sullivan
Дата:
On Wed, Jan 05, 2011 at 06:22:08PM -0500, Chris Browne wrote:
> But it seems to me that some of the analytics are getting a little *too*
> paranoid, on the "perhaps UUIDs are the wrong answer" side of the
> column.

That could be.  I was simply noting that there are cases where one
could legitimately decide that the UUID is a bad fit.  I was just
objecting to the seeming (to me anyway) claim that UUIDs can always be
used.

> There's no panaceas, here; if the process that is using IDs is fragile,
> then things can break down whether one is using UUID or SERIAL.

Not necessarily.  In a distributed system where some rows will
sometimes (and unpredictably) be shared, guaranteeing that two systems
cannot generate the same ID could be really important.  In those
cases, it's probably better to have a fixed generator ID plus a serial
(or, for that matter, a UUID) because then you can be sure you don't
run into one another.  (Of course, some UUID examples already have
this built in.  It sort of depends on the algorithm one is using.)

> I prefer the "probably unique enough" side of the fence, myself.

Me too, most of the time.

> And the process that uses the IDs needs to be robust enough that things
> won't just fall apart in tatters if it runs into non-uniqueness.

But that robustness requirement might be impossible in a distributed
system, was all I was trying to point out.

> It seems rather silly to be *totally* paranoid about the
> not-infinite-uniqueness of UUIDs when there are plenty of other risks
> lurking around that also need erro checking.

I fully agree with this.

A
--
Andrew Sullivan
ajs@crankycanuck.ca

Re: UUID column as pimrary key?

От
Andrew Sullivan
Дата:
On Wed, Jan 05, 2011 at 07:44:29PM -0700, Scott Ribe wrote:

> In order for a collision to matter, really in order for there to *be* a collision, the duplicate pair has to be
collectedin one place. 

Not at the time of generation, though.  They only have to end up in
the same place at once.  For most cases, the value of the data
compared to the improbably low liklihood of collision means that you
deal with this case by saying, "Yuck.  Well, someone loses."  But if
the data is valuable enough (say, unique identifiers for nuclear
warheads), that's just not an acceptable trade-off.

A

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: UUID column as pimrary key?

От
Andrew Sullivan
Дата:
On Wed, Jan 05, 2011 at 07:46:18PM -0700, Scott Ribe wrote:
> And if somebody clones the disk to a new machine, but leaves the old one in service? Or do you use the MAC address
andhope that's unique? 
>

Excellent questions, and exactly the sort (as I was arguing elsewhere
in this thread) one has to answer in order to decide what the right
strategy is.

For instance, it may be that your application need is dealing with
incredibly valuable identifiers, that not cloning the disk is a matter
of national security, that the cost of finding and fixing a failed
unique identification case runs into the billions of dollars (not to
mention the millions of deaths), and that the assignment is performed
in real time by distributed systems that only eventually, if at all,
deliver data to a unified system.  In this case, the cost of the
failure of uniqueness is very high, and it would be prudent to arrange
something that guaranteed that no assigner node could possibly create
a duplicate identifier.  There are UUID approaches that can do this;
they mostly work by guaranteeing a fixed and assignable local part.
UUID vs. serial for one of those columns seems to me to be nothing but
an application efficiency problem.

What is far more likely is that ordinary-value identifiers are being
generated, and that you are using them for more or less pedestrian
reasons.  The cost of an undetected duplicate is still maybe millions
of dollars, but the chances of a duplicate going undetected at
creation time (like, say, the MAC address of the creating machine is
used, the useful lifetime of the UUID is on the order of single-digit
years, and there are no more than 10 database back ends involved) are
fairly low.  In this case, using a proven UUID generator seems like a
no-brainer to me (and in fact, I work on an application where we do
this).

I don't understand the number of knees in this conversation that seem
to be jerking against the answer, "It depends."  Of course it depends.
If there were one answer for everything, developing good
database-backed systems wouldn't be something people would pay any of
us for.

A

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: UUID column as pimrary key?

От
Alban Hertroys
Дата:
On 6 Jan 2011, at 24:27, Chris Browne wrote:

>> Next to that, UUID's are generated by computers. I have no doubts that
>> the numeric space that makes up a UUID allows for collision chances as
>> low as described, but are computers capable of generating those
>> numbers sufficiently random that they actually achieve that low a
>> chance? I think that's pushing it.
>
> RFC 4122 does NOT point to randomness as the only criterion to
> discourage collisions, and treating UUIDs as if they were merely about
> being "sufficiently random to achieve low chance of collision" is
> insulting to the drafters of the standard, because they were certainly
> NOT so naive as to think that was sufficient.

I'm sure the designers knew what they were getting into. This comment was aimed at people claiming things like "with
thisand that huge number of events a collision won't occur in 100 billion years", which - to me at least - looks like
they'reonly looking at the big number of bits involved without understanding statistical analysis. 
Let's just say, if the developers of "Microsoft Visual Nuclear Power Plant Designer Professional" were claiming things
likethat, would you trust their product? 

The main point with the randomness of UUID's remains that you _can_ have a collision at any given moment. It's unlikely
toever happen, but you can't predict when it will happen if it does. The possible consequences of a collision matter a
lotin designing whether and how to handle these collisions. Maybe it doesn't matter at all, maybe you should get a
reallyhefty insurance, or maybe you need to evacuate the country. 

Opposed to that, a sequence isn't random and therefore you can predict when you will run into collisions - namely once
thesequence wraps. Considering that even a 32-bit sequence allows for several billions of rows before collisions _can_
occur,you can be certain that your problem is pretty far into the future. 
It _will_ be a big problem without an obvious solution if it occurs though, as from that point on you will run into a
lotof collisions and the resolution to the problem is rather dependent on what you're working on. 

Now that is not an argument against protecting your application against collisions, if there is a chance that you will
runinto collisions (you won't in a 10-record lookup table, for example) then you need to take that into consideration
inyour designs, but there are many (usually obvious) cases in which it's safe to omit it. With UUID's that's a little
morecomplicated. 

I don't think anyone in this discussion is saying "Don't use UUID's!". Just be aware of their limitations and the
problemdomains where they are sensible to use. The same goes for sequences. 
It would, for example, be (obviously) pretty insane to use UUID's for a 10-record lookup table. There's plenty of
examplesin this thread where they shine, I don't need to repeat that. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d25777f11541886517442!



Re: UUID column as pimrary key?

От
Stuart Bishop
Дата:
>>> Next to that, UUID's are generated by computers. I have no doubts that
>>> the numeric space that makes up a UUID allows for collision chances as
>>> low as described, but are computers capable of generating those
>>> numbers sufficiently random that they actually achieve that low a
>>> chance? I think that's pushing it.

> The main point with the randomness of UUID's remains that you _can_ have a collision at any given moment. It's
unlikelyto ever happen, but you can't predict when it will happen if it does. The possible consequences of a collision
mattera lot in designing whether and how to handle these collisions. Maybe it doesn't matter at all, maybe you should
geta really hefty insurance, or maybe you need to evacuate the country. 

Given the math, I suspect the chance of a UUID collision generated by
a respected library is roughly the same or less than any other method
of choosing a unique id. If you are looking at these extreme
improbabilities, your SERIAL isn't guaranteed unique either when you
take into account cosmic rays flipping the right bits in your ECC
memory or on your disk platter.

So if you are worried about such things, give up now because your
application is going to have to run with hardware and software in this
universe with all its limitations.

Maybe I should start a business in providing UUID collision insurance?

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

Re: UUID column as pimrary key?

От
Jasen Betts
Дата:
On 2011-01-05, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> On Jan 5, 2011, at 9:01 AM, Tom Lane wrote:
>
>> In practical use I think the odds of a collision are *far* higher than
>> you are suggesting, unless the UUID generation is being done with a lot
>> more care than is likely if the user takes these sorts of claims at face
>> value.
>
> Eh? The user taking such claims at face value has no bearing whatsoever on the quality of the UUID generation
algorithmprovided by the OS. So, unless we're talking about users coming up with their own algorithms, it seems
reasonableto assume that the generation is done with a great deal of care. (And if we are talking about users coming up
withtheir own algorithms, then all bets are off; feel free to assume the worst.) I know that is the case on OS X &
Linux.I would be shocked if it were not the case on Solaris. I would even be surprised if it were not the case on
Windows.


> The IETF Network Working Group designed UUIDs to ensure that their
> uniqueness guarantee would be strong enough that no application would
> need to worry about duplicates, ever. Claims that collisions are too
> likely to depend on UUIDs being unique really are claims that the IETF
> Network Working Group didn't know what it was doing, which I find a
> bit ridiculous.

Who was it that decided on 32 bits for IP addresses?

--
⚂⚃ 100% natural

Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 6, 2011, at 2:51 AM, Jasen Betts wrote:

> Who was it that decided on 32 bits for IP addresses?

Nice try, but that was rather long before the IETF existed ;-)

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 6, 2011, at 1:52 AM, Stuart Bishop wrote:

> If you are looking at these extreme
> improbabilities, your SERIAL isn't guaranteed unique either when you
> take into account cosmic rays flipping the right bits in your ECC
> memory or on your disk platter.

Yes, that's rather the point, the probability is so extremely low that it in most cases it should be treated as 0. Some
peopleseem to have a problem wrapping their heads around relative magnitudes that extreme. 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Bill Moran
Дата:
In response to Scott Ribe <scott_ribe@elevated-dev.com>:

> On Jan 6, 2011, at 1:52 AM, Stuart Bishop wrote:
>
> > If you are looking at these extreme
> > improbabilities, your SERIAL isn't guaranteed unique either when you
> > take into account cosmic rays flipping the right bits in your ECC
> > memory or on your disk platter.
>
> Yes, that's rather the point, the probability is so extremely low that it in most cases it should be treated as 0.
Somepeople seem to have a problem wrapping their heads around relative magnitudes that extreme. 

My head is totally wrapped around probability -- yours is not.  It's
amazing the number of people on this list who are confused by a
bunch of x in y #s being thrown around.

If the chance of a duplicate is 1 in a hundred gazillion, then I can
still hit a dupe the VERY FIRST TIME I USE IT.

I'm writing software that is intended to be used to save lives in the
event of an earthquake or flood or cosmic ray flipping bits or any
other massive disaster.  The last thing I need while a bunch of EMTs
are digging bodies out of the rubble trying to save their lives is to
need to resolve a key conflict with a bunch of mobile devices, most of
which are not reachable because the cell network is down because of
the disaster.  There's going to be enough other shit going wrong ...
my software is supposed to be part of the solution.

I don't give a fuck how small the chance of conflict is, the only
viable option for that chance is 0.  Period.  Any argument to the
contrary is stupid, asinine and outright negligent.

Now, if I were writing facebook or twitter or some other pointless
entertainment app, the odd chance of losing a post or whatever to a
key conflict is not something I'm going to worry about for more than
the time it takes to yawn.

And that's been my point all along, despite people trying to dilute it
with nonsense numbers that they don't understand: UUIDs are sufficiently
unique for 99.99999999999% of the applications out there.  However, you
should always take 5 or 10 minutes to consider whether your application
is one of the .00000000001% that can't tolerate the tiny risk.

And also, if your entire solution to that risk is to rollback the
transaction in the event of a conflict, then your application is simple
enough that UUIDs are overkill anyway.  Use them anyway if you want.

As far as statistics are concerned, the chance that someone as batfuck
insane as Hitler would rise to power in a major country is 1 in
1,102,196,287,287,859,992,396,273,293,203 -- yet it happened.

There.  I Godwined the damn thing.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: *****SPAM***** Re: UUID column as pimrary key?

От
Michael Satterwhite
Дата:
On Wednesday, January 05, 2011 08:46:18 pm Scott Ribe wrote:
> On Jan 5, 2011, at 4:11 PM, Michael Satterwhite wrote:
> > Each machine would have a unique machine_id. This would guarantee
> > uniqueness and be very easy to maintain.
>
> And if somebody clones the disk to a new machine, but leaves the old one in
> service? Or do you use the MAC address and hope that's unique?

That would be a matter of incompetent administration. *NOTHING* can protect
against that.

Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 6, 2011, at 8:14 AM, Bill Moran wrote:

> I don't give a fuck how small the chance of conflict is, the only
> viable option for that chance is 0.  Period.  Any argument to the
> contrary is stupid, asinine and outright negligent.

Do you give a fuck about the chance that bits will flip in the RAM and not be detected? Do you give a fuck about the
chancethat bits will flip in whatever persistent storage is in your device and not be detected? Do you give a fuck
aboutthe chance that bits will be flipped in the network and not be detected? Do you give a fuck about the chance that
atransistor will go into a metastable state instead of flipping, and lock up the device and lose data not yet saved?
Well,of course you do. But now what are the relative odds? All of these things can happen already (and all of them can
happenthe very first time you use it), so already your system is not precisely 100% reliable. Now is the risk of UUID
collisionsomewhere near the same as these risks, or orders of magnitude higher, or orders of magnitude lower? It does
matter.

> However, you
> should always take 5 or 10 minutes to consider whether your application
> is one of the .00000000001% that can't tolerate the tiny risk.

If an application/device truly can't tolerate a risk of an error of 1 in 10^-16, that's a problem because you can't
builda device without risk of error below some threshold (in that general neighborhood I think). You can only push
risksto lower & lower probability, and it makes no sense to focus on a single risk and spend time and effort to push it
toorders of magnitude lower probability than all the other risks in the system. (As long as there are risks at orders
ofmagnitude higher priority, those should get the time & expense.) 

> And that's been my point all along, despite people trying to dilute it
> with nonsense numbers that they don't understand...

No, it hasn't been your point all along. Your point has shifted twice now as you've been shown to be wrong about the
odds.And the numbers used are not nonsense at all. All of which somewhat contradicts your statement that your "head is
totallywrapped around probability" ;-) Added to your apparent ignoring of other error sources in order to focus on one
extremelyunlikely one, well... 

> And also, if your entire solution to that risk is to rollback the
> transaction in the event of a conflict, then your application is simple
> enough that UUIDs are overkill anyway.

I kind of doubt that the person who posted that intended it as the entire solution. It seemed to me that was intended
asjust the event that triggers conflict resolution and the next step would be to inform the device that the conflicting
recordis getting a new UUID, update appropriately, and so on. 

Just so you know, I'm done talking to you. Your arrogance, rudeness, insults, condescension and personal attacks are
notsomething that I will deal with anymore. 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Adrian Klaver
Дата:
On Thursday 06 January 2011 7:14:00 am Bill Moran wrote:
> In response to Scott Ribe <scott_ribe@elevated-dev.com>:
> > On Jan 6, 2011, at 1:52 AM, Stuart Bishop wrote:
> > > If you are looking at these extreme
> > > improbabilities, your SERIAL isn't guaranteed unique either when you
> > > take into account cosmic rays flipping the right bits in your ECC
> > > memory or on your disk platter.
> >
> > Yes, that's rather the point, the probability is so extremely low that it
> > in most cases it should be treated as 0. Some people seem to have a
> > problem wrapping their heads around relative magnitudes that extreme.
>

>
> There.  I Godwined the damn thing.
>
> --
> Bill Moran

Maybe a wrap up is in order:)
As I said earlier this is one of those arguments that could go forever because
everyone is right, so to summarize:
1) UUIDs can have a very to extremely large namespace but less than infinite.
2) There are other alternatives i.e SERIAL
3) Managing the above is based on the interaction of three components -
software,hardware,wetware(people). Any one of which can have a weakness and in
combination their are many permutations.
4) DBAs need to plan for the worse. Worse being somewhat contextual. Real time
control of a nuclear plant versus Web social media. Choosing a unique number
generator and dealing with possible collisions is contingent on this context.


--
Adrian Klaver
adrian.klaver@gmail.com

Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 6, 2011, at 8:19 AM, Michael Satterwhite wrote:

> That would be a matter of incompetent administration. *NOTHING* can protect
> against that.

Well, no, not necessarily. It might well be a goal (in fact, is a goal with some software that I'm developing), that
users/adminsdon't have to worry about data caches moving across machines. My primary point, which I stated
incompletely,was that in order to depend on node ids as part of unique ids, requires a degree of control over the
administrationof nodes, and for a given application this might or might not be practical. For instance, if your app
runson cell phones, and the OSs you deploy on give you access to the device id, and you don't mind using a rather long
prefixto form your unique ids, then you have an obvious solution that, as far as I know, is guaranteed to be unique.
(Ignoringthe possibility of hacking of the device id, because no matter what you choose as a prefix, if an adversary
managesto deliberately change the prefix, you can get duplicates.) My secondary point was that this is rather difficult
todetect in time to prevent conflicts. 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Re: UUID column as pimrary key?

От
Anthony
Дата:
On Wed, Jan 5, 2011 at 3:43 PM, Bill Moran <wmoran@potentialtech.com> wrote:
> Despite the fact that the chance of a collision is very, very small, there
> is no easy way to fix it if it happens.  Zero.  It can't be done without
> shutting the system down, recalling all the remote devices and manually
> reconciling the problem ... which is not an option.

Sounds like a security problem - an intentional collision is much more
likely than a random one.

Re: UUID column as pimrary key?

От
Chris Browne
Дата:
dennis.jenkins.75@gmail.com (dennis jenkins) writes:
> The UUID itself is 128 bits.  Some of those bits are pre-determined.
> I don't recall, but I think that a "normal" UUID has 121 bits of
> randomness.

That doesn't match RFC 4122 very well...

It indicates 5 forms of UUIDs:

1) Time-based, where about 1/2 the data comes from local timestamp data,
   and 48 bits come from MAC address (or similar)

2) "DCE Security" (about which it says little)

3) Name-based, using MD5 hashing

4) Randomly generated UUIDs (which are quite likely what you're thinking
   about) have 122 bits of random data

5) Name-based, using SHA-1 hashing

The reasonable choices for a would-be artificial primary key seem to be
1 and 3; in a distributed system, I'd expect to prefer 1, as the time +
host data are likely to eliminate the "oh, it might just randomly match"
problem.

Note the set of functions in the uuid-ossp contrib module fit this, down
to omitting Version 2 :-).

test@localhost->  \df+ public.uuid_generate*
                                                                     List of functions
 Schema |        Name        | Result data type |    Argument data types    |  Type  | Volatility |  Owner   | Language
|   Source code     | Description 

--------+--------------------+------------------+---------------------------+--------+------------+----------+----------+--------------------+-------------
 public | uuid_generate_v1   | uuid             |                           | normal | volatile   | postgres | c
|uuid_generate_v1   | 
 public | uuid_generate_v1mc | uuid             |                           | normal | volatile   | postgres | c
|uuid_generate_v1mc | 
 public | uuid_generate_v3   | uuid             | namespace uuid, name text | normal | immutable  | postgres | c
|uuid_generate_v3   | 
 public | uuid_generate_v4   | uuid             |                           | normal | volatile   | postgres | c
|uuid_generate_v4   | 
 public | uuid_generate_v5   | uuid             | namespace uuid, name text | normal | immutable  | postgres | c
|uuid_generate_v5   | 
(5 rows)
--
"I'm all for advancement.  However  rich text on an ephemeral media is
a totally brain-dead idea.  Nobody  in their right mind would take the
effort to prettyfy documents that are going to be gone in a few days."
-- Jay Denebeim <denebeim@deepthot.ml.org>

Re: UUID column as pimrary key?

От
Chris Browne
Дата:
wmoran@potentialtech.com (Bill Moran) writes:
> If the chance of a duplicate is 1 in a hundred gazillion, then I can
> still hit a dupe the VERY FIRST TIME I USE IT.
>
> I'm writing software that is intended to be used to save lives in the
> event of an earthquake or flood or cosmic ray flipping bits or any
> other massive disaster.  The last thing I need while a bunch of EMTs
> are digging bodies out of the rubble trying to save their lives is to
> need to resolve a key conflict with a bunch of mobile devices, most of
> which are not reachable because the cell network is down because of
> the disaster.  There's going to be enough other shit going wrong ...
> my software is supposed to be part of the solution.
>
> I don't give a fuck how small the chance of conflict is, the only
> viable option for that chance is 0.  Period.  Any argument to the
> contrary is stupid, asinine and outright negligent.

If your system is sufficiently negligently designed that this particular
conflict causes it to kill people, then I wouldn't be too inclined to
point at this issue with UUIDs being the Real Problem with the system.

This is NOT the only risk that the system faces; you can't get *nearly*
as low probabilities attached to hardware and network issues such as:
 - Disks failing
 - Cosmic rays twiddling bits in memory
 - Network connections failing part way through the work
 - Dumb techs blindly cloning the same "host key" onto every one of the
   EMTs' data collection devices

That last item is an argument in *FAVOR* of using UUIDs for the "EMTs
digging bodies out of rubble" scenario...  The typical alternative to
UUIDs would be to have (Sequence + Host Data), and I'd expect there to
be a considerable risk of Dumb Techs making mistakes there.  It wouldn't
be difficult for such a scenario to induce systematic key collisions.

It's never just about one risk; it's about *all the risks.*

> And that's been my point all along, despite people trying to dilute it
> with nonsense numbers that they don't understand: UUIDs are sufficiently
> unique for 99.99999999999% of the applications out there.  However, you
> should always take 5 or 10 minutes to consider whether your application
> is one of the .00000000001% that can't tolerate the tiny risk.

If your application is so fragile that it cannot tolerate *that* tiny
risk, then I have to wonder if your system isn't Just Plain Too Fragile,
because there are other unavoidable risks likely to be of much greater
probability.
--
"Have  you noticed   that,  when we were   young,  we were   told that
`everybody else   is doing  it' was   a  really  stupid reason  to  do
something,  but now it's the standard  reason for picking a particular
software package?" -- Barry Gehm

Re: UUID column as pimrary key?

От
Scott Ribe
Дата:
On Jan 6, 2011, at 9:31 AM, Chris Browne wrote:

> The reasonable choices for a would-be artificial primary key seem to be
> 1 and 3; in a distributed system, I'd expect to prefer 1, as the time +
> host data are likely to eliminate the "oh, it might just randomly match"
> problem.

In some contexts, 1 is considered a security weakness, as it reveals information about which machine generated it and
when,which is why most OS-supplied uuid generators now default to 4 (random). This tends to be more of a concern with
encryption/securityuses, and if it's not a concern for your db[*], then your are correct that 1 is likely the best
choice.

[*] After all, in many dbs we log all sorts of explicit where/who/when for auditing purposes. In that case, having ids
thatprovide a clue of where/when most certainly does not add any legitimate security concern. 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





Apology

От
Bill Moran
Дата:
It was recently brought to my attention that in my enthusiasm to
have my point of view understood, that I may have been crude,
or "self-important" or something else that people find offensive.

My apologies to anyone who was offended, and to anyone who considers
this thread a bikeshed or flamewar that I have been contributing to.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: UUID column as pimrary key?

От
Alban Hertroys
Дата:
On 6 Jan 2011, at 17:51, Chris Browne wrote:

> wmoran@potentialtech.com (Bill Moran) writes:
> If your system is sufficiently negligently designed that this particular
> conflict causes it to kill people, then I wouldn't be too inclined to
> point at this issue with UUIDs being the Real Problem with the system.
>
> This is NOT the only risk that the system faces; you can't get *nearly*
> as low probabilities attached to hardware and network issues such as:
> - Disks failing
> - Cosmic rays twiddling bits in memory
> - Network connections failing part way through the work
> - Dumb techs blindly cloning the same "host key" onto every one of the
>   EMTs' data collection devices


Let's say that you actually build a mission critical system for which you'd need to evacuate the country if it fails.
Youpick the best ECC RAM you can find, the most reliable type of disk storage available, your fallback network has a
fallbacknetwork of it's own, etc. Basically you have done everything you could to ensure that the chances of the system
failingare as small as technically possible. 

All those little failure chances add up to a certain number. Using UUID's for your ID's is not required for the design
ofthe system, yet you chose to do so. You added a nearly infinite chance of UUID collisions to the accumulated chance
ofthe system failing. 

Now the system miraculously fails and the country needs evacuating. A committee is going to investigate why it failed.
Ifthe dumb techy above is responsible, they just found themselves a scape-goat. If they didn't, but stumble upon your
unnecessaryusage of UUID's instead... Let's just say I don't want to be that person. 

I have to agree with Bill here, if lives depend on your system then anything that adds to the failure chances is very
hardto defend. In the end it often boils down to responsibility in case of failure, not to mention what it does to your
ownpeace of mind. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d260af211541129314545!



Re: UUID column as pimrary key?

От
Chris Browne
Дата:
dalroi@solfertje.student.utwente.nl (Alban Hertroys) writes:
> On 6 Jan 2011, at 17:51, Chris Browne wrote:
>
>> wmoran@potentialtech.com (Bill Moran) writes:
>> If your system is sufficiently negligently designed that this particular
>> conflict causes it to kill people, then I wouldn't be too inclined to
>> point at this issue with UUIDs being the Real Problem with the system.
>>
>> This is NOT the only risk that the system faces; you can't get *nearly*
>> as low probabilities attached to hardware and network issues such as:
>> - Disks failing
>> - Cosmic rays twiddling bits in memory
>> - Network connections failing part way through the work
>> - Dumb techs blindly cloning the same "host key" onto every one of the
>>   EMTs' data collection devices
>
> Let's say that you actually build a mission critical system for which
> you'd need to evacuate the country if it fails. You pick the best ECC
> RAM you can find, the most reliable type of disk storage available,
> your fallback network has a fallback network of it's own,
> etc. Basically you have done everything you could to ensure that the
> chances of the system failing are as small as technically possible.
>
> All those little failure chances add up to a certain number. Using
> UUID's for your ID's is not required for the design of the system, yet
> you chose to do so. You added a nearly infinite chance of UUID
> collisions to the accumulated chance of the system failing.

Infinite?  The probability can't conceivably exceed 1.

It's scarcely likely to exceed "infinitesimal."

I've built clustered systems, and frequently, the resulting Rube
Goldberg apparatus that tries to protect against failures of the other
apparatus trying to protect against failures of further apparatus that
tries to protect against failures introduces a tall and unwieldy stack
of intricately interwoven components such that operators need to be
*mighty* careful not to tip anything over lest the protective apparatus
collapse, knocking over the system it was supposed to protect.

> Now the system miraculously fails and the country needs evacuating. A
> committee is going to investigate why it failed. If the dumb techy
> above is responsible, they just found themselves a scape-goat. If they
> didn't, but stumble upon your unnecessary usage of UUID's
> instead... Let's just say I don't want to be that person.

If the system is that mission critical, then it well and truly warrants
doing enough proper analysis of the risks to *know* the risks of the
various expectable failure conditions, and to do so in rather more
detail than the oversimplification of characterizing them as
"infinitesimal" or "infinite."

> I have to agree with Bill here, if lives depend on your system then
> anything that adds to the failure chances is very hard to defend. In
> the end it often boils down to responsibility in case of failure, not
> to mention what it does to your own peace of mind.

It seems to me that using serially assigned values, along with manually
assigned server IDs, to construct a would-be-unique value, is likely to
introduce quite a lot *more risk* of system failure than would the use
of UUIDs.

So someone that rules out UUIDs based on some fallacious imagined
"infinite chance of collisions" is jumping away from a small risk, and
accepting one much more likely to take lives.

We haven't seen any indication that would distinguish between "infinite"
and "infinitesimal," beyond the fact that "infinite" is infinitely
larger than the largest probability that one can find for an event,
which is 1.
--
(format nil "~S@~S" "cbbrowne" "gmail.com")
"But   life  wasn't yes-no,   on-off.   Life was shades   of gray, and
rainbows not in the order of the spectrum."
-- L. E. Modesitt, Jr., _Adiamante_

Re: UUID column as pimrary key?

От
Bill Moran
Дата:
In response to Chris Browne <cbbrowne@acm.org>:
>
> It seems to me that using serially assigned values, along with manually
> assigned server IDs, to construct a would-be-unique value, is likely to
> introduce quite a lot *more risk* of system failure than would the use
> of UUIDs.

First off, server IDs are not randomly assigned.  They are assigned
automatically when the software is activated and tracked by the master
server.

Second, there is a very simple mechanism to prevent attempted "device
cloning" from causing a problem. (it's not a particularly difficult
problem to solve, actually)

Third, and very important, is the following conversation we had during
planning:
"Can we get good quality random data on mobile devices?"
"We'll have to research it, we don't know yet."
"Wait, before we do the research -- if we find that some mobile devices
have good quality random data and others don't -- can we tell the client
which devices they're allowed to use?"
"No, the client will dictate which devices it must work on."

So basically, we realized that even if our concerns about UUID
collisions we're unreasonably paranoid, we still couldn't guarantee
that the devices would be up to the task of generating UUIDs with
sufficient randomness.

As I tried to point out earlier in the thread, the collision issue was
one of MANY things we considered in this design.  It is, however, the
ONLY one that's directly relevant to the original conversation.  Now
that my description of our design seems to be coming under fire, I
feel the need to at least point out that we thought it through more
than that.  I suppose that's mostly my own fault for trying to bring
in an example that it's impractical to discuss in full detail.

As other people have pointed out, there are concerns about malicious
impersonation, device failure, data corruption ... each of these may
be more or less likely or dangerous than UUID collision, but not all
of them are _intended_ to be solved by not using UUIDs, so claiming
that abandoning UUIDs does not fix these problems is completely
correct, and also a straw man.

> So someone that rules out UUIDs based on some fallacious imagined
> "infinite chance of collisions" is jumping away from a small risk, and
> accepting one much more likely to take lives.

The possibility of collisions is not fallacious, however, the use of
"infinite" (I don't remember who wrote that) is obviously not correct.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: UUID column as pimrary key?

От
"A.M."
Дата:
On Jan 6, 2011, at 3:52 AM, Stuart Bishop wrote:
> Maybe I should start a business in providing UUID collision insurance?

Your ideas are intriguing to me and I wish to subscribe to your newsletter.

-M

Re: UUID column as pimrary key?

От
Dennis Gearon
Дата:
Hey guys and gals,
     As the originator of this topic, I've received a lot of good answers,
opinions, and advice. Thank  you.

    I'm not sure that more conversation on this will go anywhere but down. It
seems that UUID vs. Integer is one of those 'values' subjects, like:

  Sexual practices and preferences.
  The right way to raise children.
  How to handle money in a family.
  What are the defintions of a woman and a man?
  What religion is best and correct?
  Which political party really has the best interests of the country as a goal,
and is the better party?
  What's an appropriate lifestyle?

Uh . . . . my mother never taught me this(deprived childhood), but I learned the
hardway. NEVER discuss these things in public, (especially drunk in a bar in a
part of town at a different socio economic level than you), but learn what any
possible mate things of these ;-)

    And UUID vs. Integer? Seems to be something not to discuss too long in a
newsgroup. Seriously, there's some emotions here on this subject that as the
originator of this topic, I'd like to see calm down and get back to productive
work. Is it a deal, guys and gals?

   This newsgroup I always brag about to my other geek friends. 'Those people on
the postgres news group are the steadiest, most helpful, most logical people I
know on the Internet. The don't jump on fancy new trends nor hide technology in
market speak. Their work revolves around the best tested, theory based
practices. They just make the tech world a better place with some of the most
reliable, best supported OSS out there.' And you guys and gals ARE that
description.

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.


Re: UUID column as pimrary key?

От
Alban Hertroys
Дата:
On 6 Jan 2011, at 20:36, Chris Browne wrote:

> Infinite?  The probability can't conceivably exceed 1.


Don't start picking om words please, "infinitely small" or "infinitesimal" is obviously what I meant to write there.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d263d0511543498117537!



Re: UUID column as pimrary key?

От
Jasen Betts
Дата:
On 2011-01-06, Chris Browne <cbbrowne@acm.org> wrote:

> (Sequence + Host Data), and I'd expect there to
> be a considerable risk of Dumb Techs making mistakes there.  It wouldn't
> be difficult for such a scenario to induce systematic key collisions.

I've seen that happen. cleanup of the effected records was tricky,
a few customers were inconvenienced

--
⚂⚃ 100% natural