Обсуждение: Using PK value as a String
I have a table named table_Users: CREATE TABLE table_Users ( UserID character(40) NOT NULL default '', Username varchar(256) NOT NULL default '', Email varchar(256) NOT NULL default '' etc... ); The UserID is a character(40) and is generated using UUID function. We started making making other tables and ended up not really using UserID, but instead using Username as the unique identifier for the other tables. Now, we pass and insert the Username to for discussions, wikis, etc, for all the modules we have developed. I was wondering if it would be a performance improvement to use the 40 Character UserID instead of Username when querying the other tables, or if we should change the UserID to a serial value and use that to query the other tables. Or just keep the way things are because it doesn't really make much a difference. We are still in development and its about half done, but if there is going to be performance issues because using PK as a String value, we can just take a day change it before any production as been started. Anyway advice you can give would be much appreciated. Postgres performance guru where are you?
"Jay" <arrival123@gmail.com> writes: > I have a table named table_Users: > > CREATE TABLE table_Users ( > UserID character(40) NOT NULL default '', > Username varchar(256) NOT NULL default '', > Email varchar(256) NOT NULL default '' > etc... > ); > > The UserID is a character(40) and is generated using UUID function. We > started making making other tables and ended up not really using > UserID, but instead using Username as the unique identifier for the > other tables. Now, we pass and insert the Username to for discussions, > wikis, etc, for all the modules we have developed. I was wondering if > it would be a performance improvement to use the 40 Character UserID > instead of Username when querying the other tables, or if we should > change the UserID to a serial value and use that to query the other > tables. Or just keep the way things are because it doesn't really make > much a difference. Username would not be any slower than UserID unless you have a lot of usernames longer than 40 characters. However making UserID an integer would be quite a bit more efficient. It would take 4 bytes instead of as the length of the Username which adds up when it's in all your other tables... Also internationalized text collations are quite a bit more expensive than a simple integer comparison. But the real question here is what's the better design. If you use Username you'll be cursing if you ever want to provide a facility to allow people to change their usernames. You may not want such a facility now but one day... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
--- On Mon, 11/8/08, Gregory Stark <stark@enterprisedb.com> wrote: > From: Gregory Stark <stark@enterprisedb.com> > Subject: Re: [PERFORM] Using PK value as a String > To: "Jay" <arrival123@gmail.com> > Cc: pgsql-performance@postgresql.org > Date: Monday, 11 August, 2008, 10:30 AM > "Jay" <arrival123@gmail.com> writes: > > > I have a table named table_Users: > > > > CREATE TABLE table_Users ( > > UserID character(40) NOT NULL default > '', > > Username varchar(256) NOT NULL default > '', > > Email varchar(256) NOT NULL default > '' > > etc... > > ); > > > > The UserID is a character(40) and is generated using > UUID function. We > > started making making other tables and ended up not > really using > > UserID, but instead using Username as the unique > identifier for the > > other tables. Now, we pass and insert the Username to > for discussions, > > wikis, etc, for all the modules we have developed. I > was wondering if > > it would be a performance improvement to use the 40 > Character UserID > > instead of Username when querying the other tables, or > if we should > > change the UserID to a serial value and use that to > query the other > > tables. Or just keep the way things are because it > doesn't really make > > much a difference. > > Username would not be any slower than UserID unless you > have a lot of > usernames longer than 40 characters. > > However making UserID an integer would be quite a bit more > efficient. It would > take 4 bytes instead of as the length of the Username which > adds up when it's > in all your other tables... Also internationalized text > collations are quite a > bit more expensive than a simple integer comparison. > > But the real question here is what's the better design. > If you use Username > you'll be cursing if you ever want to provide a > facility to allow people to > change their usernames. You may not want such a facility > now but one day... > I don't understand Gregory's suggestion about the design. I thought using natural primary keys as opposed to surrogate onesis a better design strategy, even when it comes to performance considerations and even more so if there are complex relationshipswithin the database. Regards, Valentin > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > Ask me about EnterpriseDB's On-Demand Production > Tuning > > -- > Sent via pgsql-performance mailing list > (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
If UserID just be unique internal key and the unique id of other tables, I'd like sequence, which is unique and just use 8 bytes(bigint) When it querying other tables, it will faster , and disk space smaller than UUID(40 bytes). 莫建祥 阿里巴巴软件(上海)有限公司 研发中心-IM服务端开发部 联系方式:86-0571-85022088-13072 贸易通ID:jaymo 淘宝ID:jackem 公司网站:www.alisoft.com wiki:http://10.0.32.21:1688/confluence/pages/viewpage.action?pageId=10338 -----邮件原件----- 发件人: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] 代表 Jay 发送时间: 2008年8月11日 15:35 收件人: pgsql-performance@postgresql.org 主题: [PERFORM] Using PK value as a String I have a table named table_Users: CREATE TABLE table_Users ( UserID character(40) NOT NULL default '', Username varchar(256) NOT NULL default '', Email varchar(256) NOT NULL default '' etc... ); The UserID is a character(40) and is generated using UUID function. We started making making other tables and ended up not really using UserID, but instead using Username as the unique identifier for the other tables. Now, we pass and insert the Username to for discussions, wikis, etc, for all the modules we have developed. I was wondering if it would be a performance improvement to use the 40 Character UserID instead of Username when querying the other tables, or if we should change the UserID to a serial value and use that to query the other tables. Or just keep the way things are because it doesn't really make much a difference. We are still in development and its about half done, but if there is going to be performance issues because using PK as a String value, we can just take a day change it before any production as been started. Anyway advice you can give would be much appreciated. Postgres performance guru where are you? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
On Aug 11, 2008, at 4:30 AM, Gregory Stark wrote: > "Jay" <arrival123@gmail.com> writes: > >> I have a table named table_Users: >> >> CREATE TABLE table_Users ( >> UserID character(40) NOT NULL default '', >> Username varchar(256) NOT NULL default '', >> Email varchar(256) NOT NULL default '' >> etc... >> ); >> >> The UserID is a character(40) and is generated using UUID function. >> We >> started making making other tables and ended up not really using >> UserID, but instead using Username as the unique identifier for the >> other tables. Now, we pass and insert the Username to for >> discussions, >> wikis, etc, for all the modules we have developed. I was wondering if >> it would be a performance improvement to use the 40 Character UserID >> instead of Username when querying the other tables, or if we should >> change the UserID to a serial value and use that to query the other >> tables. Or just keep the way things are because it doesn't really >> make >> much a difference. > > Username would not be any slower than UserID unless you have a lot of > usernames longer than 40 characters. > > However making UserID an integer would be quite a bit more > efficient. It would > take 4 bytes instead of as the length of the Username which adds up > when it's > in all your other tables... Also internationalized text collations > are quite a > bit more expensive than a simple integer comparison. > > But the real question here is what's the better design. If you use > Username > you'll be cursing if you ever want to provide a facility to allow > people to > change their usernames. You may not want such a facility now but one > day... > If you generate UUID's with the UUID function and you are on 8.3, why not use the UUID type to store it? Ries > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > Ask me about EnterpriseDB's On-Demand Production Tuning > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance Ries van Twisk tags: Freelance TYPO3 Glassfish JasperReports JasperETL Flex Blaze-DS WebORB PostgreSQL DB-Architect email: ries@vantwisk.nl web: http://www.rvantwisk.nl/ skype: callto://r.vantwisk
Valentin Bogdanov wrote: > --- On Mon, 11/8/08, Gregory Stark <stark@enterprisedb.com> wrote: > >> From: Gregory Stark <stark@enterprisedb.com> >> Subject: Re: [PERFORM] Using PK value as a String >> To: "Jay" <arrival123@gmail.com> >> Cc: pgsql-performance@postgresql.org >> Date: Monday, 11 August, 2008, 10:30 AM >> "Jay" <arrival123@gmail.com> writes: >> >>> I have a table named table_Users: >>> >>> CREATE TABLE table_Users ( >>> UserID character(40) NOT NULL default >> '', >>> Username varchar(256) NOT NULL default >> '', >>> Email varchar(256) NOT NULL default >> '' >>> etc... >>> ); >>> ... >> But the real question here is what's the better design. >> If you use Username >> you'll be cursing if you ever want to provide a >> facility to allow people to >> change their usernames. You may not want such a facility >> now but one day... >> > > I don't understand Gregory's suggestion about the design. I thought > using natural primary keys as opposed to surrogate ones is a better > design strategy, even when it comes to performance considerations > and even more so if there are complex relationships within the database. No, exactly the opposite. Data about users (such as name, email address, etc.) are rarely a good choice as a foreign key,and shouldn't be considered "keys" in most circumstances. As Gregory points out, you're spreading the user's name acrossthe database, effectively denormalizing it. Instead, you should have a user record, with an arbitrary key, an integer or OID, that you use as the foreign key for allother tables. That way, when the username changes, only one table will be affected. And it's much more efficient touse an integer as the key than a long string. Craig
Valentin Bogdanov schrieb: > --- On Mon, 11/8/08, Gregory Stark <stark@enterprisedb.com> wrote: > > >> From: Gregory Stark <stark@enterprisedb.com> >> Subject: Re: [PERFORM] Using PK value as a String >> To: "Jay" <arrival123@gmail.com> >> Cc: pgsql-performance@postgresql.org >> Date: Monday, 11 August, 2008, 10:30 AM >> "Jay" <arrival123@gmail.com> writes: >> >> >>> I have a table named table_Users: >>> >>> CREATE TABLE table_Users ( >>> UserID character(40) NOT NULL default >>> >> '', >> >>> Username varchar(256) NOT NULL default >>> >> '', >> >>> Email varchar(256) NOT NULL default >>> >> '' >> >>> etc... >>> ); >>> >>> The UserID is a character(40) and is generated using >>> >> UUID function. We >> >>> started making making other tables and ended up not >>> >> really using >> >>> UserID, but instead using Username as the unique >>> >> identifier for the >> >>> other tables. Now, we pass and insert the Username to >>> >> for discussions, >> >>> wikis, etc, for all the modules we have developed. I >>> >> was wondering if >> >>> it would be a performance improvement to use the 40 >>> >> Character UserID >> >>> instead of Username when querying the other tables, or >>> >> if we should >> >>> change the UserID to a serial value and use that to >>> >> query the other >> >>> tables. Or just keep the way things are because it >>> >> doesn't really make >> >>> much a difference. >>> >> Username would not be any slower than UserID unless you >> have a lot of >> usernames longer than 40 characters. >> >> However making UserID an integer would be quite a bit more >> efficient. It would >> take 4 bytes instead of as the length of the Username which >> adds up when it's >> in all your other tables... Also internationalized text >> collations are quite a >> bit more expensive than a simple integer comparison. >> >> But the real question here is what's the better design. >> If you use Username >> you'll be cursing if you ever want to provide a >> facility to allow people to >> change their usernames. You may not want such a facility >> now but one day... >> >> > > I don't understand Gregory's suggestion about the design. I thought using natural primary keys as opposed to surrogateones is a better design strategy, even when it comes to performance considerations and even more so if there arecomplex relationships within the database. > > Regards, > Valentin > > UUID is already a surrogate key not a natural key, in no aspect better than a numeric key, just taking a lot more space. So why not use int4/int8?
You guys totally rock!
I guess, bottom line, we should take that extra day to convert our PK and FK to a numerical value, using BIG INT to be on the save side. (Even though Wikipedia's UserID uses just an integer as data type)
To Gregory: Thank you for you valuable statement.
"But the real question here is what's the better design. If you use Username you'll be cursing if you ever want to provide a facility to allow people to change their usernames. You may not want such a facility now but one day" I think you hit the nail on the head with this comment. If a member really wants to change their username, IE: Choose to go with IloveUSara, only to be dumped on the alter, who am I to say no.
To Valentin: I wish someone would prove us both wrong or right. I still thought it wasn't a bad idea to use username a varchar(256) to interact with all the modules... Well thats what I thought when I first started writing the tables...
To Jay: Thanks for keeping it short and simple. "I'd like sequence, which is unique and just use 8 bytes(bigint) When it querying other tables, it will faster , and disk space smaller than UUID(40 bytes)." I'm taking your advice on this^^ Although wikipedia's postgresql database schema still stands.
To Craig: Yes, I agree. Please see my comment on IloveUSara.
To Mario: Let's go! I'm Mario... Sorry, I love Mario Kart. Especially on the old super famacon. Going with int8, thank you for the advice.
--
Regards,
Jay Kang
This e-mail is intended only for the proper person to whom it is addressed and may contain legally privileged and/or confidential information. If you received this communication erroneously, please notify me by reply e-mail, delete this e-mail and all your copies of this e-mail and do not review, disseminate, redistribute, make other use of, rely upon, or copy this communication. Thank you.
I guess, bottom line, we should take that extra day to convert our PK and FK to a numerical value, using BIG INT to be on the save side. (Even though Wikipedia's UserID uses just an integer as data type)
To Gregory: Thank you for you valuable statement.
"But the real question here is what's the better design. If you use Username you'll be cursing if you ever want to provide a facility to allow people to change their usernames. You may not want such a facility now but one day" I think you hit the nail on the head with this comment. If a member really wants to change their username, IE: Choose to go with IloveUSara, only to be dumped on the alter, who am I to say no.
To Valentin: I wish someone would prove us both wrong or right. I still thought it wasn't a bad idea to use username a varchar(256) to interact with all the modules... Well thats what I thought when I first started writing the tables...
To Jay: Thanks for keeping it short and simple. "I'd like sequence, which is unique and just use 8 bytes(bigint) When it querying other tables, it will faster , and disk space smaller than UUID(40 bytes)." I'm taking your advice on this^^ Although wikipedia's postgresql database schema still stands.
To Craig: Yes, I agree. Please see my comment on IloveUSara.
To Mario: Let's go! I'm Mario... Sorry, I love Mario Kart. Especially on the old super famacon. Going with int8, thank you for the advice.
On Tue, Aug 12, 2008 at 6:58 PM, Mario Weilguni <mweilguni@sime.com> wrote:
Valentin Bogdanov schrieb:UUID is already a surrogate key not a natural key, in no aspect better than a numeric key, just taking a lot more space.--- On Mon, 11/8/08, Gregory Stark <stark@enterprisedb.com> wrote:
From: Gregory Stark <stark@enterprisedb.com>
Subject: Re: [PERFORM] Using PK value as a String
To: "Jay" <arrival123@gmail.com>
Cc: pgsql-performance@postgresql.org
Date: Monday, 11 August, 2008, 10:30 AM
"Jay" <arrival123@gmail.com> writes:
I have a table named table_Users:'',
CREATE TABLE table_Users (
UserID character(40) NOT NULL default
Username varchar(256) NOT NULL default'',
Email varchar(256) NOT NULL default''
etc...UUID function. We
);
The UserID is a character(40) and is generated using
started making making other tables and ended up notreally using
UserID, but instead using Username as the uniqueidentifier for the
other tables. Now, we pass and insert the Username tofor discussions,
wikis, etc, for all the modules we have developed. Iwas wondering if
it would be a performance improvement to use the 40Character UserID
instead of Username when querying the other tables, orif we should
change the UserID to a serial value and use that toquery the other
tables. Or just keep the way things are because itdoesn't really make
much a difference.Username would not be any slower than UserID unless you
have a lot of
usernames longer than 40 characters.
However making UserID an integer would be quite a bit more
efficient. It would
take 4 bytes instead of as the length of the Username which
adds up when it's
in all your other tables... Also internationalized text
collations are quite a
bit more expensive than a simple integer comparison.
But the real question here is what's the better design.
If you use Username
you'll be cursing if you ever want to provide a
facility to allow people to
change their usernames. You may not want such a facility
now but one day...
I don't understand Gregory's suggestion about the design. I thought using natural primary keys as opposed to surrogate ones is a better design strategy, even when it comes to performance considerations and even more so if there are complex relationships within the database.
Regards,
Valentin
So why not use int4/int8?
--
Regards,
Jay Kang
This e-mail is intended only for the proper person to whom it is addressed and may contain legally privileged and/or confidential information. If you received this communication erroneously, please notify me by reply e-mail, delete this e-mail and all your copies of this e-mail and do not review, disseminate, redistribute, make other use of, rely upon, or copy this communication. Thank you.
"Mario Weilguni" <mweilguni@sime.com> writes: > UUID is already a surrogate key not a natural key, in no aspect better than a > numeric key, just taking a lot more space. > > So why not use int4/int8? The main reason to use UUID instead of sequences is if you want to be able to generate unique values across multiple systems. So, for example, if you want to be able to send these userids to another system which is taking registrations from lots of places. Of course that only works if that other system is already using UUIDs and you're all using good generators. You only need int8 if you might someday have more than 2 *billion* users... Probably not an urgent issue. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
In response to Gregory Stark <stark@enterprisedb.com>: > "Mario Weilguni" <mweilguni@sime.com> writes: > > > UUID is already a surrogate key not a natural key, in no aspect better than a > > numeric key, just taking a lot more space. > > > > So why not use int4/int8? > > The main reason to use UUID instead of sequences is if you want to be able to > generate unique values across multiple systems. So, for example, if you want > to be able to send these userids to another system which is taking > registrations from lots of places. Of course that only works if that other > system is already using UUIDs and you're all using good generators. Note that in many circumstances, there are other options than UUIDs. If you have control over all the systems generating values, you can prefix each generated value with a system ID (i.e. make the high 8 bits the system ID and the remaining bits come from a sequence) This allows you to still use int4 or int8. UUID is designed to be a universal solution. But universal solutions are frequently less efficient than custom-tailored solutions. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Bill Moran wrote:
Other benefits include:
- Reduced management cost. As described above, one would have to allocate keyspace in each system. By using a UUID, one can skip this step.
- Increased keyspace. Even if keyspace allocation is performed, an int4 only has 32-bit of keyspace to allocate. The IPv4 address space is already over 85% allocated as an example of how this can happen. 128-bits has a LOT more keyspace than 32-bits or 64-bits.
- Reduced sequence predictability. Certain forms of exploits when the surrogate key is exposed to the public, are rendered ineffective as guessing the "next" or "previous" generated key is far more difficult.
- Used as the key into a cache or other lookup table. Multiple types of records can be cached to the same storage as the sequence is intended to be universally unique.
- Flexibility to merge systems later, even if unplanned. For example, System A and System B are run independently for some time. Then, it is determined that they should be merged. If unique keys are specific to the system, this becomes far more difficult to implement than if the unique keys are universal.
That said, most uses of UUID do not require any of the above. It's a "just in case" measure, that suffers the performance cost, "just in case."
Cheers,
mark
The main reason to use UUID instead of sequences is if you want to be able to generate unique values across multiple systems. So, for example, if you want to be able to send these userids to another system which is taking registrations from lots of places. Of course that only works if that other system is already using UUIDs and you're all using good generators.Note that in many circumstances, there are other options than UUIDs. If you have control over all the systems generating values, you can prefix each generated value with a system ID (i.e. make the high 8 bits the system ID and the remaining bits come from a sequence) This allows you to still use int4 or int8. UUID is designed to be a universal solution. But universal solutions are frequently less efficient than custom-tailored solutions.
Other benefits include:
- Reduced management cost. As described above, one would have to allocate keyspace in each system. By using a UUID, one can skip this step.
- Increased keyspace. Even if keyspace allocation is performed, an int4 only has 32-bit of keyspace to allocate. The IPv4 address space is already over 85% allocated as an example of how this can happen. 128-bits has a LOT more keyspace than 32-bits or 64-bits.
- Reduced sequence predictability. Certain forms of exploits when the surrogate key is exposed to the public, are rendered ineffective as guessing the "next" or "previous" generated key is far more difficult.
- Used as the key into a cache or other lookup table. Multiple types of records can be cached to the same storage as the sequence is intended to be universally unique.
- Flexibility to merge systems later, even if unplanned. For example, System A and System B are run independently for some time. Then, it is determined that they should be merged. If unique keys are specific to the system, this becomes far more difficult to implement than if the unique keys are universal.
That said, most uses of UUID do not require any of the above. It's a "just in case" measure, that suffers the performance cost, "just in case."
Cheers,
mark
-- Mark Mielke <mark@mielke.cc>
"Mark Mielke" <mark@mark.mielke.cc> writes: > - Increased keyspace. Even if keyspace allocation is performed, an int4 only > has 32-bit of keyspace to allocate. The IPv4 address space is already over 85% > allocated as an example of how this can happen. 128-bits has a LOT more > keyspace than 32-bits or 64-bits. The rest of your points are valid (though not particularly convincing to me for most applications) but this example is bogus. The IPv4 address space is congested because of the hierarchic nature of allocations. Not because there is an actual shortage of IPv4 addresses themselves. There would be enough IPv4 for every ethernet device on the planet for decades to come if we could allocate them individually -- but we can't. That is, when allocating an organization 100 addresses if they want to be able to treat them as a contiguous network they must be allocated 128 addresses. And if they might ever grow to 129 they're better off just justifying 256 addresses today. That's not an issue for a sequence generated primary key. Arguably it *is* a problem for UUID which partitions up that 128-bits much the way the original pre-CIDR IPv4 addressing scheme partitioned up the address. But 128-bits is so much bigger it avoids running into the issue. The flip side is that sequence generated keys have to deal with gaps if record is deleted later. So the relevant question is not whether you plan to have 2 billion users at any single point in the future but rather whether you plan to ever have had 2 billion users total over your history. I suspect large networks like Yahoo or Google might be nearing or past that point now even though they probably only have a few hundred million current users. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
Hi! We use normal sequences to generate id's across multiple nodes. We use the "increment" parameter for the sequence and we specify each node to increment its sequence with for example 10 and the the first node to start the sequence at 1 and the second at 2 and so on. In that way you get an unique ID across each nodes thats an INT. Not in chronological order but it's unique ;) The only issue with this is that the value you chose for increment value is your node limit. Cheers! Mathias On 12 aug 2008, at 14.51, Gregory Stark wrote: > "Mario Weilguni" <mweilguni@sime.com> writes: > >> UUID is already a surrogate key not a natural key, in no aspect >> better than a >> numeric key, just taking a lot more space. >> >> So why not use int4/int8? > > The main reason to use UUID instead of sequences is if you want to > be able to > generate unique values across multiple systems. So, for example, if > you want > to be able to send these userids to another system which is taking > registrations from lots of places. Of course that only works if that > other > system is already using UUIDs and you're all using good generators. > > You only need int8 if you might someday have more than 2 *billion* > users... > Probably not an urgent issue. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > Ask me about EnterpriseDB's Slony Replication support! > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
Вложения
Gregory Stark wrote:
I don't disagree. Obviously, most systems people work with do not require 2**32 records. You trimmed my bottom statement where "most systems don't require any of these benefits - it's only a just in case." :-)
The point is valid - 128-bits has more keyspace than 32-bits or 64-bits. The relevance of this point to a particular application other than Facebook, Google, or Yahoo, is probably low or non-existent.
Cheers,
mark
"Mark Mielke" <mark@mark.mielke.cc> writes:- Increased keyspace. Even if keyspace allocation is performed, an int4 only has 32-bit of keyspace to allocate. The IPv4 address space is already over 85% allocated as an example of how this can happen. 128-bits has a LOT more keyspace than 32-bits or 64-bits.The rest of your points are valid (though not particularly convincing to me for most applications) but this example is bogus. The IPv4 address space is congested because of the hierarchic nature of allocations. Not because there is an actual shortage of IPv4 addresses themselves. There would be enough IPv4 for every ethernet device on the planet for decades to come if we could allocate them individually -- but we can't.
I don't disagree. Obviously, most systems people work with do not require 2**32 records. You trimmed my bottom statement where "most systems don't require any of these benefits - it's only a just in case." :-)
The point is valid - 128-bits has more keyspace than 32-bits or 64-bits. The relevance of this point to a particular application other than Facebook, Google, or Yahoo, is probably low or non-existent.
Cheers,
mark
That is, when allocating an organization 100 addresses if they want to be able to treat them as a contiguous network they must be allocated 128 addresses. And if they might ever grow to 129 they're better off just justifying 256 addresses today. That's not an issue for a sequence generated primary key. Arguably it *is* a problem for UUID which partitions up that 128-bits much the way the original pre-CIDR IPv4 addressing scheme partitioned up the address. But 128-bits is so much bigger it avoids running into the issue. The flip side is that sequence generated keys have to deal with gaps if record is deleted later. So the relevant question is not whether you plan to have 2 billion users at any single point in the future but rather whether you plan to ever have had 2 billion users total over your history. I suspect large networks like Yahoo or Google might be nearing or past that point now even though they probably only have a few hundred million current users.
-- Mark Mielke <mark@mielke.cc>
On Tue, Aug 12, 2008 at 9:46 AM, Gregory Stark <stark@enterprisedb.com> wrote: > "Mark Mielke" <mark@mark.mielke.cc> writes: > >> - Increased keyspace. Even if keyspace allocation is performed, an int4 only >> has 32-bit of keyspace to allocate. The IPv4 address space is already over 85% >> allocated as an example of how this can happen. 128-bits has a LOT more >> keyspace than 32-bits or 64-bits. > > The rest of your points are valid (though not particularly convincing to me > for most applications) but this example is bogus. The IPv4 address space is > congested because of the hierarchic nature of allocations. Not because there > is an actual shortage of IPv4 addresses themselves. There would be enough IPv4 > for every ethernet device on the planet for decades to come if we could > allocate them individually -- but we can't. Only because of NAT. There are a _lot_ of IP devices out there maybe not billions, but maybe so, and 'enough for decades' is quite a stretch. merlin
We chose UUID as PK because there is still some information in an integer key. You can see if a user has registered before someone else (user1.id < user2.id) or you can see how many new users registered in a specific period of time (compare the id of the newest user to the id a week ago). This is information which is in some cases critical. moritz
In response to Moritz Onken <onken@houseofdesign.de>: > We chose UUID as PK because there is still some information in an > integer key. > You can see if a user has registered before someone else (user1.id < > user2.id) > or you can see how many new users registered in a specific period of > time > (compare the id of the newest user to the id a week ago). This is > information > which is in some cases critical. So you're accidentally storing critical information in magic values instead of storing it explicitly? Good luck with that. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Am 12.08.2008 um 17:04 schrieb Bill Moran: > In response to Moritz Onken <onken@houseofdesign.de>: > >> We chose UUID as PK because there is still some information in an >> integer key. >> You can see if a user has registered before someone else (user1.id < >> user2.id) >> or you can see how many new users registered in a specific period of >> time >> (compare the id of the newest user to the id a week ago). This is >> information >> which is in some cases critical. > > So you're accidentally storing critical information in magic values > instead of storing it explicitly? > > Good luck with that. > How do I store critical information? I was just saying that it easy to get some information out of a primary key which is an incrementing integer. And it makes sense, in some rare cases, to have a PK which is some kind of random like UUIDs where you cannot guess the next value. moritz
In response to Moritz Onken <onken@houseofdesign.de>: > > Am 12.08.2008 um 17:04 schrieb Bill Moran: > > > In response to Moritz Onken <onken@houseofdesign.de>: > > > >> We chose UUID as PK because there is still some information in an > >> integer key. > >> You can see if a user has registered before someone else (user1.id < > >> user2.id) > >> or you can see how many new users registered in a specific period of > >> time > >> (compare the id of the newest user to the id a week ago). This is > >> information > >> which is in some cases critical. > > > > So you're accidentally storing critical information in magic values > > instead of storing it explicitly? > > > > Good luck with that. > > How do I store critical information? I was just saying that it easy > to get some information out of a primary key which is an incrementing > integer. And it makes sense, in some rare cases, to have a PK which > is some kind of random like UUIDs where you cannot guess the next value. I just repeated your words. Read above "this is information which is in some cases critical." If I misunderstood, then I misunderstood. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Am 12.08.2008 um 17:21 schrieb Bill Moran: > In response to Moritz Onken <onken@houseofdesign.de>: > >> >> Am 12.08.2008 um 17:04 schrieb Bill Moran: >> >>> In response to Moritz Onken <onken@houseofdesign.de>: >>> >>>> We chose UUID as PK because there is still some information in an >>>> integer key. >>>> You can see if a user has registered before someone else >>>> (user1.id < >>>> user2.id) >>>> or you can see how many new users registered in a specific period >>>> of >>>> time >>>> (compare the id of the newest user to the id a week ago). This is >>>> information >>>> which is in some cases critical. >>> >>> So you're accidentally storing critical information in magic values >>> instead of storing it explicitly? >>> >>> Good luck with that. >> >> How do I store critical information? I was just saying that it easy >> to get some information out of a primary key which is an incrementing >> integer. And it makes sense, in some rare cases, to have a PK which >> is some kind of random like UUIDs where you cannot guess the next >> value. > > I just repeated your words. Read above "this is information which > is in > some cases critical." > > If I misunderstood, then I misunderstood. If you are using incrementing integers as pk then you are storing this data implicitly with your primary key. Using UUIDs is a way to avoid that.
On Aug 12, 2008, at 8:21 AM, Bill Moran wrote: > In response to Moritz Onken <onken@houseofdesign.de>: > >> >> Am 12.08.2008 um 17:04 schrieb Bill Moran: >> >>> In response to Moritz Onken <onken@houseofdesign.de>: >>> >>>> We chose UUID as PK because there is still some information in an >>>> integer key. >>>> You can see if a user has registered before someone else >>>> (user1.id < >>>> user2.id) >>>> or you can see how many new users registered in a specific period >>>> of >>>> time >>>> (compare the id of the newest user to the id a week ago). This is >>>> information >>>> which is in some cases critical. >>> >>> So you're accidentally storing critical information in magic values >>> instead of storing it explicitly? >>> >>> Good luck with that. >> >> How do I store critical information? I was just saying that it easy >> to get some information out of a primary key which is an incrementing >> integer. And it makes sense, in some rare cases, to have a PK which >> is some kind of random like UUIDs where you cannot guess the next >> value. > > I just repeated your words. Read above "this is information which > is in > some cases critical." > > If I misunderstood, then I misunderstood. > I think Moritz is more concerned about leakage of critical information, rather than intentional storage of it. When a simple incrementing integer is used as an identifier in publicly visible places (webapps, ticketing systems) then that may leak more information than intended. Cheers, Steve
In response to Steve Atkins <steve@blighty.com>: > > On Aug 12, 2008, at 8:21 AM, Bill Moran wrote: > > > In response to Moritz Onken <onken@houseofdesign.de>: > > > >> > >> Am 12.08.2008 um 17:04 schrieb Bill Moran: > >> > >>> In response to Moritz Onken <onken@houseofdesign.de>: > >>> > >>>> We chose UUID as PK because there is still some information in an > >>>> integer key. > >>>> You can see if a user has registered before someone else > >>>> (user1.id < > >>>> user2.id) > >>>> or you can see how many new users registered in a specific period > >>>> of > >>>> time > >>>> (compare the id of the newest user to the id a week ago). This is > >>>> information > >>>> which is in some cases critical. > >>> > >>> So you're accidentally storing critical information in magic values > >>> instead of storing it explicitly? > >>> > >>> Good luck with that. > >> > >> How do I store critical information? I was just saying that it easy > >> to get some information out of a primary key which is an incrementing > >> integer. And it makes sense, in some rare cases, to have a PK which > >> is some kind of random like UUIDs where you cannot guess the next > >> value. > > > > I just repeated your words. Read above "this is information which > > is in > > some cases critical." > > > > If I misunderstood, then I misunderstood. > > > > I think Moritz is more concerned about leakage of critical information, > rather than intentional storage of it. When a simple incrementing > integer > is used as an identifier in publicly visible places (webapps, ticketing > systems) then that may leak more information than intended. Then I did misunderstand. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Bill Moran wrote: >>>>>> We chose UUID as PK because there is still some information in an >>>>>> integer key. >>>>>> You can see if a user has registered before someone else >>>>>> (user1.id < >>>>>> user2.id) >>>>>> or you can see how many new users registered in a specific period >>>>>> of >>>>>> time >>>>>> (compare the id of the newest user to the id a week ago). This is >>>>>> information >>>>>> which is in some cases critical. >> I think Moritz is more concerned about leakage of critical information, >> rather than intentional storage of it. When a simple incrementing >> integer >> is used as an identifier in publicly visible places (webapps, ticketing >> systems) then that may leak more information than intended. >> While we are on this distraction - UUID will sometimes encode "critical" information such as: 1) The timestamp (allowing users to be compared), and 2) The MAC address of the computer that generated it. So, I wouldn't say that UUID truly protects you here unless you are sure to use one of the UUID formats that is not timestamp or MAC address based. The main benefit of UUID here is the increased keyspace, so predicting sequence becomes more difficult. (Note that an all-random UUID is not better than two pairs of all-random 64-bit integers with a good random number source. :-) ) Cheers, mark -- Mark Mielke <mark@mielke.cc>
Bill Moran wrote: > In response to Steve Atkins <steve@blighty.com>: > > >> On Aug 12, 2008, at 8:21 AM, Bill Moran wrote: >> >> >>> In response to Moritz Onken <onken@houseofdesign.de>: >>> >>> >>>> Am 12.08.2008 um 17:04 schrieb Bill Moran: >>>> >>>> >>>>> In response to Moritz Onken <onken@houseofdesign.de>: >>>>> >>>>> >>>>>> We chose UUID as PK because there is still some information in an >>>>>> integer key. >>>>>> You can see if a user has registered before someone else >>>>>> (user1.id < >>>>>> user2.id) >>>>>> or you can see how many new users registered in a specific period >>>>>> of >>>>>> time >>>>>> (compare the id of the newest user to the id a week ago). This is >>>>>> information >>>>>> which is in some cases critical. >>>>>> >>>>> So you're accidentally storing critical information in magic values >>>>> instead of storing it explicitly? >>>>> >>>>> Good luck with that. >>>>> >>>> How do I store critical information? I was just saying that it easy >>>> to get some information out of a primary key which is an incrementing >>>> integer. And it makes sense, in some rare cases, to have a PK which >>>> is some kind of random like UUIDs where you cannot guess the next >>>> value. >>>> Interesting. Ordered chronologically and the next value is unguessable. >>> I just repeated your words. Read above "this is information which >>> is in >>> some cases critical." >>> >>> If I misunderstood, then I misunderstood. >>> >>> >> I think Moritz is more concerned about leakage of critical information, >> rather than intentional storage of it. When a simple incrementing >> integer >> is used as an identifier in publicly visible places (webapps, ticketing >> systems) then that may leak more information than intended. >> I think there are better ways to accomplish this than encoding and decoding/decrypting a PK. Store the sensitive data in a session variable or store session data in the database neither of which is accessible to users. It is usually a big mistake to de-normalize a table by encoding several fields in a single column PK or not. If you want to do something with the encoded data such as find one or more rows with an encoded value then you will have to inspect every row and decode it and then compare it or add it or whatever. > > Then I did misunderstand. > > -- H. Hall ReedyRiver Group LLC http://www.reedyriver.com