Re: patch: Allow the UUID type to accept non-standard formats

Поиск
Список
Период
Сортировка
От Mark Mielke
Тема Re: patch: Allow the UUID type to accept non-standard formats
Дата
Msg-id 48EF6A1E.2020400@mark.mielke.cc
обсуждение исходный текст
Ответ на Re: patch: Allow the UUID type to accept non-standard formats  ("Robert Haas" <robertmhaas@gmail.com>)
Ответы Re: patch: Allow the UUID type to accept non-standard formats  ("Robert Haas" <robertmhaas@gmail.com>)
Re: patch: Allow the UUID type to accept non-standard formats  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Robert Haas wrote: <blockquote cite="mid:603c8f070810100519w65857a95h72bc2ab8ff6ab4a@mail.gmail.com"
type="cite"><blockquotetype="cite"><pre wrap="">  1) Reduced error checking. 2) The '-' is not the only character that
peoplehave used. ClearCase uses
 
'.' and ':' as punctuation. 3) People already have the option of translating the UUID from their
application to a standard format. 4) As you find below, and is probably possible to improve on, a fixed
format can be parsed more efficient.   </pre></blockquote><pre wrap="">
Scenario 1.  I have some standard format UUIDs and I want to parse
them.  This change doesn't bother me at all because if I'm parsing
anywhere enough UUIDs for it to matter, the speed of my CPU, disk, and
memory subsystems will vastly outweigh the difference between the two
implementations.  I measured the different between the two by running
them both in a tight loop on a fixed string.  I challenge anyone to
produce a measurable performance distinction by issuing SQL queries.
I doubt that it is possible. </pre></blockquote><br /> Put a few changes of 2%-3% impact together and you get 10% or
more.I'm not saying you are wrong, but I disagree that performance should be sacrificed for everybody without providing
substantialbenefit to everybody. The question is then, does relaxed UUID parsing provide substantial benefit to
everybody?<br/><br /><blockquote cite="mid:603c8f070810100519w65857a95h72bc2ab8ff6ab4a@mail.gmail.com" type="cite"><pre
wrap="">Scenario2. I have some non-standard format UUIDs and I want to parse
 
them.  This change helps me a lot, because I'm almost positive that
calling regexp_replace() and then uuid_in() is going to be MUCH slower
than just calling uuid_in().  And if I do that then my error checking
will be REALLY weak, unless I write a custom PL function to make sure
that dashes only occur where they're supposed to be, in which case it
will be even slower. </pre></blockquote><br /> You should know the non-standard format of the UUID, and your
applicationshould be doing the error checking. It might be slower for *you*, but *you* are the one with the special
needs.That is, unless you are representing a significant portion of the population. What percentage are you
representing?<br/><br /><blockquote cite="mid:603c8f070810100519w65857a95h72bc2ab8ff6ab4a@mail.gmail.com"
type="cite"><prewrap="">Scenario 3. I only want standard-format UUIDs to be accepted into my
 
database.  Any non-standard format UUIDs should be rejected at parse
time.  This change is pretty irritating, because now I have to use
regexp matching or something to make sure I've got the right format,
and it's going to be significantly slower.

My suspicion is that scenario 2 is a lot more common than scenario 3. </pre></blockquote><br /> I prefer strict formats
andearly failures. I like that PostgreSQL refuses to truncate on insertion. If I have a special format, I'm more than
willingto convert it from the special format to a standard format before doing INSERT/UPDATE. What percentage of people
outthere feel that they benefit from pedantic syntax checking? :-)<br /><br /> I don't know.<br /><br /><blockquote
cite="mid:603c8f070810100519w65857a95h72bc2ab8ff6ab4a@mail.gmail.com"type="cite"><blockquote type="cite"><pre wrap="">I
don'tknow which implementation was used for the PostgreSQL core, but any
 
hard coded constants would allow for the optimizer to generate instructions
that can run in parallel, or that are better aligned to machine words.

2-3% slow down for what gain? It still doesn't handle all cases, and it's
less able to check the format for correctness.   </pre></blockquote><pre wrap="">
This change is a long way from letting any old thing through as a
UUID.  I'm sure there are lots of crazy ways to write UUIDs, but
everything I found with a quick Google search would be covered by this
patch, so I think that's pretty good.  A key point for me is that it's
hard to imagine this patch accepting anything that was intended to be
something other than a UUID.  (I am sure someone will now write back
and tell me about their favorite non-UUID thing that happens to have
32 hex digits with dashes for separators, but come on.) </pre></blockquote><br /> It's not that long. If you get
ColdFusionsupport(?), somebody else will want the ':', and somebody else will want the '-'.<br /><br /> Anyways - I
onlysomewhat disagree. I remember the original discussions, and I remember agreeing with the points to keep PostgreSQL
UUIDsupport thin and rigid. It's valuable for it to be built-in to the database. It's not necessarily valuable for
PostgreSQLto support every UUID version or every format. Supporting additional formats is the direction of supporting
everyUUID format. Three months from now, somebody is going to propose allowing '-' or ':'. What should the answer be
then?<br/><br /> Cheers,<br /> mark<br /><br /><pre class="moz-signature" cols="72">-- 
 
Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a>
</pre>

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: TODO item: adding VERBOSE option to CLUSTER [with patch]
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: head's linking problem