Обсуждение: Proposal for GUID datatype
Folks, I would like to submit the following proposal regarding the implementation of the GUID datatype. Based on the findings, thoughts and the discussion we have had in the past, I am going to propose the following: 1) Datatype name would be "uuid" or "guid". example: create table tbl (fld uuid, fld2 ....); 2) Accepted input/output datatype and formats: The input/output datatype would be string(36) 2a) Three input formats are supported. example: insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); 2b) Only one default output format is supported. example: # select fld from tbl; fld --------------------------------------+ 1dfb39af-b56a-40b8-a903-b5b31567c3ce | 2b.a) An additional function will be available to provide other output formats or an existing function like to_char will support the additional formatting options. 3) Internal datatype Because there is not going to be any kind of (mathematically meaningful) calculation on the guid values, the internal datatype would be just a simple 16 byte unsigned char (uint8). This would help when comparing these values and can also be compressed inline Proposed data structure would be: typedef struct uuid_t { char data[16]; } uuid_t; 4) Comparing functions and operators The default comparing functions and operators like = < != > etc, etc.. would be implemented as required. Note that guid >= guid would not mean anything. The values will internally be compared as strings. 5) support functions: because uuid could also be used as PK or unique values, additional function(s) will be available to produce a uuid value to be used in a field's default value like sequences or PL/pgSQL etc.. etc... example; create table tbl( ID uuid default ('new_uuid()'),.... ); 5.a) If needed an additional macro-type like SERIAL could also be developed in later stage. 6) pg_type layout: typname = uuid typnamespace = pg_catalog typowner = (default) // db owner typlen = 16 typbyval = FALSE // type is byref typtype = b // built-in type typisdefiled = true typdelim = ',' // ',' seperator for array of uuid typrelid = 0 typelem = 0 typinput = to be defined later typoutput = to be defined later typreceive = not supported typsend = not supported typanalyze = 0 // default analyze typalign = c typstorage = m // stored compressed inline typnotnull = false // can be null other pg_type attributes are set to default values. Please send your comments and suggestions to complete or modify this proposal. Regards, Gevik
Martijn van Oosterhout wrote: > Just a few comments, > > On Fri, Sep 08, 2006 at 09:18:20PM +0200, Gevik Babakhani wrote: > >> 5) support functions: >> because uuid could also be used as PK or unique values, additional >> function(s) will be available to produce a uuid value to be used in >> a field's default value like sequences or PL/pgSQL etc.. etc... >> >> example; >> >> create table tbl( >> ID uuid default ('new_uuid()'), >> .... >> ); >> > > That would be: ID uuid default new_uuid(); > > >> typstorage = m // stored compressed inline >> > > Compression is not going to work on such short values, the header will > almost longer, just use 'p' like every other fixed length type. > > Have a nice day, > Point taken, thank you.
Gevik Babakhani <pgdev@xs4all.nl> writes: > typreceive = not supported > typsend = not supported Really? Why not? I would suggest that the default output format just be 32 hex characters, since that would render the type useful for purposes other than one narrow definition of UUID. regards, tom lane
Just a few comments, On Fri, Sep 08, 2006 at 09:18:20PM +0200, Gevik Babakhani wrote: > 5) support functions: > because uuid could also be used as PK or unique values, additional > function(s) will be available to produce a uuid value to be used in > a field's default value like sequences or PL/pgSQL etc.. etc... > > example; > > create table tbl( > ID uuid default ('new_uuid()'), > .... > ); That would be: ID uuid default new_uuid(); > typstorage = m // stored compressed inline Compression is not going to work on such short values, the header will almost longer, just use 'p' like every other fixed length type. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Fri, 2006-09-08 at 16:17 -0400, Tom Lane wrote: > Gevik Babakhani <pgdev@xs4all.nl> writes: > > typreceive = not supported > > typsend = not supported > > Really? Why not? You are right, typreceive/typsend are also needed. How would you advice to test this? > I would suggest that the default output format just be 32 hex > characters, since that would render the type useful for purposes > other than one narrow definition of UUID. Agreed. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
On Friday 08 September 2006 15:18, Gevik Babakhani wrote: > 2a) Three input formats are supported. > example: > insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); > insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); > insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); Please extend your list. java's RMI package generates UIDs with the following format: [head order 21:19]$ bsh.sh BeanShell 1.3.0 - by Pat Niemeyer (pat@pat.net) bsh % print(new java.rmi.server.UID()); 3b732da7:10d9029b3eb:-8000 bsh % So forms that use colons instead of dashes seem appropriate. Or better still, make it configurable. jan -- -------------------------------------------------------------- Jan de Visser jdevisser@digitalfairway.com Baruk Khazad! Khazad ai-menu! --------------------------------------------------------------
On Fri, Sep 08, 2006 at 09:24:19PM -0400, Jan de Visser wrote: > On Friday 08 September 2006 15:18, Gevik Babakhani wrote: > > 2a) Three input formats are supported. > > example: > > insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); > > insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); > > insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); > Please extend your list. java's RMI package generates UIDs with the > following format: > [head order 21:19]$ bsh.sh > BeanShell 1.3.0 - by Pat Niemeyer (pat@pat.net) > bsh % print(new java.rmi.server.UID()); > 3b732da7:10d9029b3eb:-8000 > bsh % That doesn't look like a UUID/GUID - unless it trims leading zeroes? UUID/GUID = 128-bits = 32 hexadecimal characters. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Friday 08 September 2006 21:34, mark@mark.mielke.cc wrote: > On Fri, Sep 08, 2006 at 09:24:19PM -0400, Jan de Visser wrote: > > On Friday 08 September 2006 15:18, Gevik Babakhani wrote: > > > 2a) Three input formats are supported. > > > example: > > > insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); > > > insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); > > > insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); > > > > Please extend your list. java's RMI package generates UIDs with the > > following format: > > > > [head order 21:19]$ bsh.sh > > BeanShell 1.3.0 - by Pat Niemeyer (pat@pat.net) > > bsh % print(new java.rmi.server.UID()); > > 3b732da7:10d9029b3eb:-8000 > > bsh % > > That doesn't look like a UUID/GUID - unless it trims leading zeroes? > > UUID/GUID = 128-bits = 32 hexadecimal characters. I know, it's a UID. Not a GUID. But many people (including myself) use it as a base to generate GUIDs. And that's not really the point. The point is that I'm afraid he's going to restrict himself to much. > > Cheers, > mark jan -- -------------------------------------------------------------- Jan de Visser jdevisser@digitalfairway.com Baruk Khazad! Khazad ai-menu! --------------------------------------------------------------
On Fri, Sep 08, 2006 at 10:49:21PM -0400, Jan de Visser wrote: > On Friday 08 September 2006 21:34, mark@mark.mielke.cc wrote: > > On Fri, Sep 08, 2006 at 09:24:19PM -0400, Jan de Visser wrote: > > > On Friday 08 September 2006 15:18, Gevik Babakhani wrote: > > > > 2a) Three input formats are supported. > > > > example: > > > > insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); > > > > insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); > > > > insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); > > > > > > Please extend your list. java's RMI package generates UIDs with the > > > following format: > > > > > > [head order 21:19]$ bsh.sh > > > BeanShell 1.3.0 - by Pat Niemeyer (pat@pat.net) > > > bsh % print(new java.rmi.server.UID()); > > > 3b732da7:10d9029b3eb:-8000 > > > bsh % > > That doesn't look like a UUID/GUID - unless it trims leading zeroes? > > UUID/GUID = 128-bits = 32 hexadecimal characters. > I know, it's a UID. Not a GUID. But many people (including myself) > use it as a base to generate GUIDs. And that's not really the > point. The point is that I'm afraid he's going to restrict himself > to much. I'm not sure what it would do with this input though. Where do the bits go in a fixed 128-bit field? I think with and with dashes is fine. { and } starts to get into the unnecessary. At this point, it may as well ignore all punctuation characters, which stays fine. Less error checking required. But if the input isn't 32 hexadecimal characters - I don't see how it fits the UUID/GUID type. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Saturday 09 September 2006 00:42, mark@mark.mielke.cc wrote: > But if the input isn't 32 hexadecimal characters - I don't see how > it fits the UUID/GUID type. Again, it wasn't about that particular *value* (which, as I concurred, is not a [GU]UID). It was about the fact that different tools spit out stuff in different formats, and that it would suck royally if you would have to convert because Gevik didn't think of the particular format of your particular tool. But I guess your suggestion of just ignoring punctuation would work just fine. jan -- -------------------------------------------------------------- Jan de Visser jdevisser@digitalfairway.com Baruk Khazad! Khazad ai-menu! --------------------------------------------------------------
Jan de Visser wrote: > On Saturday 09 September 2006 00:42, mark@mark.mielke.cc wrote: > > But if the input isn't 32 hexadecimal characters - I don't see how > > it fits the UUID/GUID type. > > Again, it wasn't about that particular *value* (which, as I concurred, is not > a [GU]UID). It was about the fact that different tools spit out stuff in > different formats, and that it would suck royally if you would have to > convert because Gevik didn't think of the particular format of your > particular tool. > > But I guess your suggestion of just ignoring punctuation would work just fine. Would it? The output you showed was 3b732da7:10d9029b3eb:-8000 What to do with the :-8000 part? Do you discard the whole :-8000? Do you discard the :- and keep the 8000? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Sat, Sep 09, 2006 at 01:03:24AM -0400, Jan de Visser wrote: > On Saturday 09 September 2006 00:42, mark@mark.mielke.cc wrote: > > But if the input isn't 32 hexadecimal characters - I don't see how > > it fits the UUID/GUID type. > Again, it wasn't about that particular *value* (which, as I > concurred, is not a [GU]UID). It was about the fact that different > tools spit out stuff in different formats, and that it would suck > royally if you would have to convert because Gevik didn't think of > the particular format of your particular tool. > But I guess your suggestion of just ignoring punctuation would work > just fine. I don't think so. If it isn't 128 bits - and you want to fit it into 128 bits, it means padding. Where should the padding go? As application specific, it is up to the application to convert. I don't agree that a GUID/UUID type should do much more than allow a GUID/UUID to fit into it. It is not a type that can be used to represent any UID you happen to have. With a 64 bit UID or 96 bit UID - why would you want to store it in 128-bits anyways? This would be a waste of space. For the same reason I wouldn't want to store an MD5SUM into a UUID type. It's an abuse of the type. If the type was called hexstring32 or something like that, sure. Philosophy is involved. Making UUID understand every possible format, and know how to pad each format that doesn't use 32 hexadecimal characters is outside the scope of this discussion, and guarantees that it will never be implemented. There will always be one more format somebody wants to cram into it. UUID is not a UID. UUID is a 128-bit number with a fairly specific format that includes a version number, and each of the bit segments within it are used to signify a different type of data, based upon the version number. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Saturday 09 September 2006 01:33, mark@mark.mielke.cc wrote: > I don't think so. If it isn't 128 bits - and you want to fit it into > 128 bits, it means padding. Where should the padding go? As application > specific, it is up to the application to convert. I am not saying that. I am just saying that you shouldn't limit yourself to any particular input formats. I understand that the example I gave is not a full GUID. As I said, I use that result as a base for a 128 bit GUID. Aargh. jan -- -------------------------------------------------------------- Jan de Visser jdevisser@digitalfairway.com Baruk Khazad! Khazad ai-menu! --------------------------------------------------------------
On Sat, Sep 09, 2006 at 07:06:23AM -0400, Jan de Visser wrote: > On Saturday 09 September 2006 01:33, mark@mark.mielke.cc wrote: > > I don't think so. If it isn't 128 bits - and you want to fit it into > > 128 bits, it means padding. Where should the padding go? As application > > specific, it is up to the application to convert. > I am not saying that. I am just saying that you shouldn't limit yourself to > any particular input formats. I understand that the example I gave is not a > full GUID. As I said, I use that result as a base for a 128 bit GUID. > Aargh. You say Aargh - but you still haven't explained what the parser would do. It ignores punctuation - great - so now what? You provided a number that used an odd number of hexadecimal characters, that was less than 128-bits worth of data. Exactly which 128-bit value would you expect it to represent? Where does the padding go? The beginning? The end? Around the punctuation? When you say you use it as a base - what do you mean, and may your intention match the intention of anybody else who wishes to stuff a non-UUID into a UUID? I'm not trying to be difficult - you never answered this, and it is a decision that the software would need to make, if it were to do as you suggest. I think it is improper, and that it is up to the application to pad the non-128 bit value to 128-bits *before* passing it in - although I still think this is invalid. What "version" of UUID are you using? Please don't answer again that you aren't saying something. Three answers like this hasn't got us anywhere. Just tell me what 128-bit value you think your original suggestion would represent (where should the padding go?), and then justify why it should be a UUID at all. Thanks, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Sat, Sep 09, 2006 at 08:29:16AM -0400, mark@mark.mielke.cc wrote: > On Sat, Sep 09, 2006 at 07:06:23AM -0400, Jan de Visser wrote: > > On Saturday 09 September 2006 01:33, mark@mark.mielke.cc wrote: > > > I don't think so. If it isn't 128 bits - and you want to fit it into > > > 128 bits, it means padding. Where should the padding go? As application > > > specific, it is up to the application to convert. > > I am not saying that. I am just saying that you shouldn't limit yourself to > > any particular input formats. I understand that the example I gave is not a > > full GUID. As I said, I use that result as a base for a 128 bit GUID. > > Aargh. > ... > It ignores punctuation - great - so now what? You provided a number that > used an odd number of hexadecimal characters, that was less than 128-bits > worth of data. > Exactly which 128-bit value would you expect it to represent? Where does > the padding go? The beginning? The end? Around the punctuation? When you > say you use it as a base - what do you mean, and may your intention match > the intention of anybody else who wishes to stuff a non-UUID into a UUID? I'll be more specific, because I obviously wasn't getting through before. According to the JavaDoc, it is: 32-bit int process identifier 64-bit int time identiier 16-bit int counter If non-standard, and possibly allowing for overlap with a standard UUID, I believe this could map to a UUID. The 32-bit value would fit where a random number or MAC address is normally represented as a 48-bit value. Some 16-bit prefix, possibly all-zero might be used. The 64-bit value would fit where the time value is normally put into the UUID. The counter could be where the counter is normally put. A somewhat compatible version number could be put in the version number part. You see now? Punctuation is actually *significant* in this case, but only the Java RMI libraries know what the value represents, and how it might be mapped to UUID. If you look at your original value, you should note the odd number of hexadecimal characters. This probably means leading zeroes were stripped, and they would need to be restored in the correct place. This is intimate knowledge of a very specific UID type, with a specific number of bits, and automatic conversion to a different UUID, with a greater number of bits, where the bits are in the wrong order and not properly padded if interpretted raw. Make sense now? Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Sat, 9 Sep 2006, Jan de Visser wrote: > On Saturday 09 September 2006 01:33, mark@mark.mielke.cc wrote: > > I don't think so. If it isn't 128 bits - and you want to fit it into > > 128 bits, it means padding. Where should the padding go? As application > > specific, it is up to the application to convert. > > I am not saying that. I am just saying that you shouldn't limit yourself to > any particular input formats. I'd wonder if it'd be better to have a set of literal formats and "input" functions like to_guid(text, text) for more complicated cases. The broader we make the literal format, the harder it is to determine if the input actually is what was intended. For example, did the user mean to put that ipv6 address in this guid column and what about this other ipv6 address looking thing which is abbreviated, are we putting in what the user expects?
Hi, Gevik, Gevik Babakhani wrote: > typreceive = not supported > typsend = not supported Any reason why you don't want to support binary transmissions? Thanks, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
Jan de Visser wrote: > On Friday 08 September 2006 15:18, Gevik Babakhani wrote: >> 2a) Three input formats are supported. >> example: >> insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); >> insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); >> insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); > > Please extend your list. java's RMI package generates UIDs with the following > format: > > [head order 21:19]$ bsh.sh > BeanShell 1.3.0 - by Pat Niemeyer (pat@pat.net) > bsh % print(new java.rmi.server.UID()); > 3b732da7:10d9029b3eb:-8000 > bsh % > > So forms that use colons instead of dashes seem appropriate. > > Or better still, make it configurable. > > jan > RMI UID's has nothing in common with UUID's so I fail to see how they have any relevance here. I've never seen any other representation of the UUID's that the two that Gevik wants to support. Please note that UUID is a very well known concept and not an arbitrary 128 bit storage. http://en.wikipedia.org/wiki/UUID is a good source of information. The appointed RFC actually contains source code. Kind Regards, Thomas Hallgren
On Monday 11 September 2006 11:05, Thomas Hallgren wrote: > Jan de Visser wrote: > > On Friday 08 September 2006 15:18, Gevik Babakhani wrote: > >> 2a) Three input formats are supported. > >> example: > >> insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); > >> insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); > >> insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); > > > > Please extend your list. java's RMI package generates UIDs with the > > following format: > > > > [head order 21:19]$ bsh.sh > > BeanShell 1.3.0 - by Pat Niemeyer (pat@pat.net) > > bsh % print(new java.rmi.server.UID()); > > 3b732da7:10d9029b3eb:-8000 > > bsh % > > > > So forms that use colons instead of dashes seem appropriate. > > > > Or better still, make it configurable. > > > > jan > > RMI UID's has nothing in common with UUID's so I fail to see how they > have any relevance here. I've never seen any other representation of the > UUID's that the two that Gevik wants to support. Please note that UUID > is a very well known concept and not an arbitrary 128 bit storage. > > http://en.wikipedia.org/wiki/UUID is a good source of information. The > appointed RFC actually contains source code. OK, good info, and something I was missing/didn't know. Apparently there is a JDK1.5 class that does "the right thing". > > Kind Regards, > Thomas Hallgren Crawling back under my rock, jan -- -------------------------------------------------------------- Jan de Visser jdevisser@digitalfairway.com Baruk Khazad! Khazad ai-menu! --------------------------------------------------------------
I forgot the mention that I did not want to support those two in my initial version. But yesterday I started to work on those anyway :) typreceive and typsend On Mon, 2006-09-11 at 15:58 +0200, Markus Schaber wrote: > Hi, Gevik, > > Gevik Babakhani wrote: > > > typreceive = not supported > > typsend = not supported > > Any reason why you don't want to support binary transmissions? > > Thanks, > Markus > >