Re: patch: General purpose utility functions used by the JSON data type

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: patch: General purpose utility functions used by the JSON data type
Дата
Msg-id AANLkTi=BY1_TNakKmSAbz-NN1wn-d4hPCRvqHUUXQYC1@mail.gmail.com
обсуждение исходный текст
Ответ на patch: General purpose utility functions used by the JSON data type  (Joseph Adams <joeyadams3.14159@gmail.com>)
Ответы Re: patch: General purpose utility functions used by the JSON data type  (Joseph Adams <joeyadams3.14159@gmail.com>)
Список pgsql-hackers
On Fri, Aug 13, 2010 at 5:45 AM, Joseph Adams
<joeyadams3.14159@gmail.com> wrote:
> getEnumLabelOids
>  * Useful-ometer: ()-----------------------------------o
>  * Rationale: There is currently no streamlined way to return a custom
> enum value from a PostgreSQL function written in C.  This function
> performs a batch lookup of enum OIDs, which can then be cached with
> fn_extra.  This should be reasonably efficient, and it's quite elegant
> to use.

It's possible that there might be a contrib module out there someplace
that wants to do this, but it's clearly a waste of time for a core
datatype.  The OIDs are fixed.  Just get rid of the enum altogether
and use the OIDs directly wherever you would have used the enum.  Then
you don't need this.

Incidentally, if we were going to accept this function, I think we'd
need to add some kind of a check to throw an error if any of the
labels can't be mapped onto an Oid.  Otherwise, an error in this area
might lead to difficult-to-find misbehavior.

> FN_EXTRA, FN_EXTRA_ALLOC, FN_MCXT
>  * Useful-ometer: ()--------------------o
>  * Rationale: Using fcinfo->flinfo->fn_extra takes a lot of
> boilerplate.  These macros help cut down the boilerplate, and the
> comment explains what fn_extra is all about.

I think this is not a good idea.  Macros that use local variables
under the covers make it hard for new contributors to read the code
and understand what it does.  It's only worth doing if it saves a lot
of typing, and this doesn't.  If we were going to do this, the right
thing for your patch to do would be to update every instance of this
coding pattern in our source base, so that people who copy those
examples in the future do it the new way.  But I think there's no real
advantage in that: it would complicate back-patching future bug fixes
for no particularly compelling reason.

> getTypeInfo
>  * Useful-ometer: ()---------------------------o
>  * Rationale: The get_type_io_data "six-fer" function is very
> cumbersome to use, since one has to declare all the output variables.
> The getTypeInfo puts the results in a structure.  It also performs the
> fmgr_info_cxt step, which is a step done after every usage of
> get_type_io_data in the PostgreSQL code.
>  * Other thoughts: getTypeInfo also retrieves typcategory (and
> typispreferred), which is rather ad-hoc.  This benefits the JSON code
> because to_json() uses the typcategory to figure out what type of JSON
> value to convert something to (for instance, things in category 'A'
> become JSON arrays).  Other data types could care less about the
> typcategory.  Should getTypeInfo leave that step out?

Well, again, you have to decide whether this is a function that you're
adding just for the JSON code or whether it's really a general purpose
utility function.  If you want it to be a general purpose utility
function, you need to change all the callers that could potentially
leverage it.  If it's JSON specific, put it in the JSON code.  It
might be simpler to just declare the variables.

> pg_substring, pg_encoding_substring
>  * Useful-ometer: ()-------o
>  * Rationale: The JSONPath code uses it / will use it for extracting
> substrings, which is probably not a very useful feature (but who am I
> to say that).  This function could probably benefit the
> text_substring() function in varlena.c , but it would take a bit of
> work to ensure it continues to comply with standards.

I'm more positive about this idea.  If you can make this generally
useful, I'd encourage you to do that.  On a random coding style note,
I see that you have two copies of the following code, which can fairly
obviously be written in a single line rather than five, assuming it's
actually safe.

+               if (sub_end + len > e)
+               {
+                       Assert(false);          /* Clipped multibyte
character */
+                       break;
+               }


> server_to_utf8, utf8_to_server, text_to_utf8_cstring,
> utf8_cstring_to_text, utf8_cstring_to_text_with_len
>  * Useful-ometer: ()--------------o
>  * Rationale:  The JSON data type operates in UTF-8 rather than the
> server encoding because it needs to deal with Unicode escapes, but
> individual Unicode characters can't be converted to/from the server
> encoding simply and efficiently (as far as I know).  These routines
> made the conversions done by the JSON data type vastly simpler, and
> they could simplify other data types in the future (XML does a lot of
> server<->UTF-8 conversions too).

Sounds interesting.  But again, you would need to modify the XML code
to use these also, so that we can clearly see that this is reusable
code, and actually reuse it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hitoshi Harada
Дата:
Сообщение: Re: "micro bucket sort" ...
Следующее
От: Robert Haas
Дата:
Сообщение: Re: typos in HS source code comment