Обсуждение: RFD: hexstring(n) data type
Following the discussion on making UUID data type to be much more liberal ( http://archives.postgresql.org/pgsql-hackers/2008-02/msg01214.php ) I have decided to try to approach it from more general perspective. The current state of code is available at: http://qnex.net/hexstring-0.1-2008-03-03.tgz And now for more details: The idea is to have a data type HEXSTRING(n) which can have an optional typemod specifying the size of data (in bytes). Internally the datatype is binary compatible with bytea, but I/O is done as hex-encoded strings. The format is liberal, isspace() and ispunct() characters are skipped while the digits are read. I have played with two versions of hexstringin() function, one which uses strtoul() function and the other which uses "home brew" code. The latter appears to be faster, so I stayed with that. But I would appreciate comments on this from more experienced. So, what are the use cases? CREATE DOMAIN liberal_uuid AS hexstring(16); CREATE DOMAIN liberal_macaddr AS hexstring(6); ...it allows for creating other standard hex-types, as for example: CREATE DOMAIN wwn AS hexstring(8); -- http://en.wikipedia.org/wiki/World_Wide_Name Also it can be a convenient alternative to bytea format (I know, the encode()/decode() pair), especially when you have to format output data as some fancy hex-string. The code is currently just a bunch of input/output/typemod functions which appear to work. I will add casts, operators, etc -- they most likely will be nicked from bytea. What I would like to also add is ubiquitous to_char(hex, format) function. For an UUID-compatilbe format it would be called as: SELECT to_char(hex, 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx') or SELECT to_char(hex, '8x-4x-4x-4x-12x') where x is expanded as [0-9a-f] digit and X is expanded as [0-9A-F]. I am not sure what to do about variable length hexstrings, I am considering something like to_char(hex, '8X-') which would produce something like '00000000-11111111-22222222' for a 12-byte hexstring (what to do about dangling '-' ?). ...but the original case against liberal UUID was that it would make the I/O slower. My simple test: postgres=# CREATE TABLE uuids AS SELECT uuid_generate_v4()::text AS u FROM generate_series(1,10000000); CREATE postgres=# CREATE TEMP TABLE text ON COMMIT DROP AS SELECT u::hexstring(16) FROM uuids; SELECT Time: 13058.486 ms postgres=# CREATE TEMP TABLE test ON COMMIT DROP AS SELECT u::uuid FROM uuids; SELECT Time: 13536.816 ms ...now hexstring is varlena type but does not use strtoul. Perhaps uuid might be more liberal too. What do you think about it? Regards, Dawid
I am confused how a hex type is any better than using the 'hex' decode()
format we already support:
test=> select decode('5476', 'hex'); decode-------- Tv(1 row)
---------------------------------------------------------------------------
Dawid Kuroczko wrote:
> Following the discussion on making UUID data type to be much more liberal
> ( http://archives.postgresql.org/pgsql-hackers/2008-02/msg01214.php )
> I have decided to try to approach it from more general perspective.
>
> The current state of code is available at:
> http://qnex.net/hexstring-0.1-2008-03-03.tgz
>
>
> And now for more details:
>
> The idea is to have a data type HEXSTRING(n) which can have an optional
> typemod specifying the size of data (in bytes).
>
> Internally the datatype is binary compatible with bytea, but I/O is done as
> hex-encoded strings. The format is liberal, isspace() and ispunct() characters
> are skipped while the digits are read.
>
> I have played with two versions of hexstringin() function, one which uses
> strtoul() function and the other which uses "home brew" code. The latter
> appears to be faster, so I stayed with that. But I would appreciate
> comments on this from more experienced.
>
> So, what are the use cases?
>
> CREATE DOMAIN liberal_uuid AS hexstring(16);
> CREATE DOMAIN liberal_macaddr AS hexstring(6);
>
> ...it allows for creating other standard hex-types, as for example:
> CREATE DOMAIN wwn AS hexstring(8); --
> http://en.wikipedia.org/wiki/World_Wide_Name
>
> Also it can be a convenient alternative to bytea format (I know, the
> encode()/decode() pair),
> especially when you have to format output data as some fancy hex-string.
>
> The code is currently just a bunch of input/output/typemod functions
> which appear
> to work. I will add casts, operators, etc -- they most likely will be
> nicked from bytea.
>
> What I would like to also add is ubiquitous to_char(hex, format) function.
> For an UUID-compatilbe format it would be called as:
> SELECT to_char(hex, 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx') or
> SELECT to_char(hex, '8x-4x-4x-4x-12x') where x is expanded as
> [0-9a-f] digit and X is expanded as [0-9A-F].
> I am not sure what to do about variable length hexstrings, I am
> considering something
> like to_char(hex, '8X-') which would produce something like
> '00000000-11111111-22222222'
> for a 12-byte hexstring (what to do about dangling '-' ?).
>
> ...but the original case against liberal UUID was that it would make
> the I/O slower.
> My simple test:
>
> postgres=# CREATE TABLE uuids AS SELECT uuid_generate_v4()::text AS u
> FROM generate_series(1,10000000);
> CREATE
>
> postgres=# CREATE TEMP TABLE text ON COMMIT DROP AS SELECT
> u::hexstring(16) FROM uuids;
> SELECT
> Time: 13058.486 ms
> postgres=# CREATE TEMP TABLE test ON COMMIT DROP AS SELECT u::uuid FROM uuids;
> SELECT
> Time: 13536.816 ms
>
> ...now hexstring is varlena type but does not use strtoul. Perhaps
> uuid might be more liberal too.
>
> What do you think about it?
>
> Regards,
> Dawid
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your Subscription:
> http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-hackers
-- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB
http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
I don't think we should define types based on how they print. We should define types based on what they contain. bytea is a perfectly good datatype for storing binary data, though perhaps we could use a bytea(n) for fixed size binary data. However bytea has an inconvenient string representation. Really I would prefer if bytea were *always* read and written in hex format. We could have a guc to enable this but it would make it hard to write code which would function consistently as part of a larger application. I think this is more akin to the MONEY data type. Really it would be better if we could declare columns as NUMERIC but attach a "default format" to them for use when string representation is desired. Similarly with bytea we could choose a default string representation different from the default in/out functions. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Gregory Stark <stark@enterprisedb.com> writes:
> However bytea has an inconvenient string representation. Really I would prefer
> if bytea were *always* read and written in hex format.
Yeah, the escaping rule is a mess: hard to work with and even less
storage-efficient than hex would be.
> We could have a guc to
> enable this but it would make it hard to write code which would function
> consistently as part of a larger application.
Well, the datetime types have had DateStyle for years and people have
managed to deal with it, so maybe you're overestimating the problem.
regards, tom lane