Обсуждение: Is there a good reason why PL languages do not support cstring type arguments and return values ?

Поиск
Список
Период
Сортировка

Is there a good reason why PL languages do not support cstring type arguments and return values ?

От
Hannu Krosing
Дата:
Is the lack of support of cstring in PLs just laziness/ovelook or is 
there a good
reason why PL languages do not support cstring type arguments and return 
values ?

I'm currently adding this to pl/pythonu with an aim to prototype type io 
functions for a new type.

If it is just some security concern this could be alleviated by only 
allowing it in the untrusted languages.

--------------------
Hannu Krosing





Hannu Krosing <hannu@2ndQuadrant.com> writes:
> Is the lack of support of cstring in PLs just laziness/ovelook or is 
> there a good
> reason why PL languages do not support cstring type arguments and return 
> values ?

In general I don't think we should encourage the use of cstring as a
user-level data type.  The number of text-like types in the system is
already enough to confuse users, and this one brings no redeeming social
value to the party.  Besides which, it has essentially no built-in
operators, and I *don't* want to have to add a pile of them for it.

> I'm currently adding this to pl/pythonu with an aim to prototype type io 
> functions for a new type.

The PLs aren't meant to be usable as I/O functions.  cstring is not the
problem there, it's access to the bit-level representation of the other
datatype.  It's hard for me to see how you'd make the above work without
circularity, ie the PL manager would end up recursively calling itself
trying to construct or deconstruct the value.
        regards, tom lane



Re: Is there a good reason why PL languages do not support cstring type arguments and return values ?

От
Hannu Krosing
Дата:
On 10/10/2012 02:58 PM, Tom Lane wrote:
> Hannu Krosing <hannu@2ndQuadrant.com> writes:
>> Is the lack of support of cstring in PLs just laziness/ovelook or is
>> there a good
>> reason why PL languages do not support cstring type arguments and return
>> values ?
> In general I don't think we should encourage the use of cstring as a
> user-level data type.  The number of text-like types in the system is
> already enough to confuse users, and this one brings no redeeming social
> value to the party.  Besides which, it has essentially no built-in
> operators, and I *don't* want to have to add a pile of them for it.
>
>> I'm currently adding this to pl/pythonu with an aim to prototype type io
>> functions for a new type.
> The PLs aren't meant to be usable as I/O functions.  cstring is not the
> problem there, it's access to the bit-level representation of the other
> datatype.
I don't understand where you see the problem here, python (and
I guess also most other pl-languages, possibly with the exception of
pl/pgsql) are well capable of accessing raw data.

> It's hard for me to see how you'd make the above work without
> circularity, ie the PL manager would end up recursively calling itself
> trying to construct or deconstruct the value.
Again, could you be a bit more specific.
Recursion itself should not be a problem (except maybe for performance).
We already support calling pl* functions from inside other pl functions at
least via executing SELECT "plfunc()" .
>
>             regards, tom lane
>
>




Re: Is there a good reason why PL languages do not support cstring type arguments and return values ?

От
Heikki Linnakangas
Дата:
On 10.10.2012 16:58, Tom Lane wrote:
> Hannu Krosing<hannu@2ndQuadrant.com>  writes:
>> Is the lack of support of cstring in PLs just laziness/ovelook or is
>> there a good
>> reason why PL languages do not support cstring type arguments and return
>> values ?
>
> In general I don't think we should encourage the use of cstring as a
> user-level data type.  The number of text-like types in the system is
> already enough to confuse users, and this one brings no redeeming social
> value to the party.  Besides which, it has essentially no built-in
> operators, and I *don't* want to have to add a pile of them for it.
>
>> I'm currently adding this to pl/pythonu with an aim to prototype type io
>> functions for a new type.
>
> The PLs aren't meant to be usable as I/O functions.  cstring is not the
> problem there, it's access to the bit-level representation of the other
> datatype.  It's hard for me to see how you'd make the above work without
> circularity, ie the PL manager would end up recursively calling itself
> trying to construct or deconstruct the value.

I don't see the problem. The input function converts a text string to an 
opaque chunk of bytes, and the output function does the reverse. In a 
nutshell, an input function is like this:

bytea mytype_in(text_representation text)

and the output function is like this:

text mytype_out(internal_representation bytea)

In reality, of course, input functions take a cstring as argument, not 
text, and returns a "mytype" datum, not bytea. But I don't see why we 
couldn't allow the above signatures with text/bytea instead. That would 
make it clear to the PL how to deal with the datums.

I've wanted to allow writing i/o functions in non-C languages for a long 
time as well, but never got around to do anything about it. Custom 
datatypes are really powerful, but as soon as you have to write C code, 
that raises the bar significantly. I/O functions written in, say, 
PL/pgSQL would be an order of magnitude slower than ones written in C, 
but for many applications it would be OK.

- Heikki



Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 10.10.2012 16:58, Tom Lane wrote:
>> The PLs aren't meant to be usable as I/O functions.  cstring is not the
>> problem there, it's access to the bit-level representation of the other
>> datatype.  It's hard for me to see how you'd make the above work without
>> circularity, ie the PL manager would end up recursively calling itself
>> trying to construct or deconstruct the value.

> I don't see the problem. The input function converts a text string to an 
> opaque chunk of bytes, and the output function does the reverse. In a 
> nutshell, an input function is like this:

> bytea mytype_in(text_representation text)

> and the output function is like this:

> text mytype_out(internal_representation bytea)

OK, that would work as long as you were willing to confine the feature
to types that are representation-equivalent to bytea.  However, there's
a small problem, which is that I/O functions aren't actually *declared*
that way.

> In reality, of course, input functions take a cstring as argument, not 
> text, and returns a "mytype" datum, not bytea. But I don't see why we 
> couldn't allow the above signatures with text/bytea instead.

Because you'd totally destroy the tiny modicum of error checking that
exists now on whether CREATE TYPE's function arguments are sane.

> I've wanted to allow writing i/o functions in non-C languages for a long 
> time as well, but never got around to do anything about it.

If we're going to do that, it should not be done by blowing a truck-size
hole in the semantics of I/O functions.  I would prefer to see some
kind of wart added to the PL manager that says, in effect, "treat this
argument or result as bytea even though it's declared differently".
For one thing, that would scale easily to cases that are not
representation-compatible to bytea but something else (eg int4), so long
as the underlying language had an equivalent native type.
        regards, tom lane



Hannu Krosing <hannu@krosing.net> writes:
> On 10/10/2012 02:58 PM, Tom Lane wrote:
>> It's hard for me to see how you'd make the above work without
>> circularity, ie the PL manager would end up recursively calling itself
>> trying to construct or deconstruct the value.

> Again, could you be a bit more specific.

If you try to write "foo_out(foo) returns cstring" in Python, the
first thing plpython will try to do is convert the argument value
to a Python object, which for a non-built-in type such as "foo" is
going to reduce to conversion to text, which will result in ...
you guessed it ... calling foo_out to convert the argument value
to text.  Lather, rinse, repeat till stack overflow.

As I was mentioning to Heikki, it's possible that you could work around
that by somehow telling plpython to do the argument conversion as though
the argument were of some bit-compatible built-in type rather than foo.
But without some such type cheat you can't write an I/O function in a
PL, and it's not the cstring end of it that's the problem.
        regards, tom lane



Re: Is there a good reason why PL languages do not support cstring type arguments and return values ?

От
Hannu Krosing
Дата:
On 10/10/2012 03:46 PM, Tom Lane wrote:
> Hannu Krosing <hannu@krosing.net> writes:
>> On 10/10/2012 02:58 PM, Tom Lane wrote:
>>> It's hard for me to see how you'd make the above work without
>>> circularity, ie the PL manager would end up recursively calling itself
>>> trying to construct or deconstruct the value.
>> Again, could you be a bit more specific.
> If you try to write "foo_out(foo) returns cstring" in Python, the
> first thing plpython will try to do is convert the argument value
> to a Python object, which for a non-built-in type such as "foo" is
> going to reduce to conversion to text,
This should still be solvable _inside_ pl/python handler with a little 
hackery.

My request for "reasons not to do it" was more about things in
pl manager or postgresql itself.

One way would be to check that we are in an any --> cstring
function - perhaps just by setting some static flag et entry and resetting
it at exit - and pass the original byte representation as "bytes" (or 
string for py2.x)
or wrapped into plpy specific RawPg(rawinput) object.

> which will result in ...
> you guessed it ... calling foo_out to convert the argument value
> to text.  Lather, rinse, repeat till stack overflow.
>
> As I was mentioning to Heikki, it's possible that you could work around
> that by somehow telling plpython to do the argument conversion as though
> the argument were of some bit-compatible built-in type rather than foo.
> But without some such type cheat you can't write an I/O function in a
> PL, and it's not the cstring end of it that's the problem.
>
>             regards, tom lane
>
>




Hannu Krosing <hannu@krosing.net> writes:
> One way would be to check that we are in an any --> cstring
> function - perhaps just by setting some static flag et entry and resetting
> it at exit - and pass the original byte representation as "bytes" (or 
> string for py2.x)

Totally aside from the ugliness of driving that off the *other* end
being cstring, it seems quite insufficient to me.  For example, if the
data type in question is toastable, you don't really want to leave the
Python code with the problem of detoasting a toasted value.  Even if
it's just an int, your proposal saddles the Python code with enddianness
problems.

I think my suggestion of a way to pretend the argument or result is of
some specified other type for conversion purposes is quite a lot superior.
In the toastable-type case, referencing bytea would be enough to get the
Python code out from under detoasting and length-word management.  There
might also be cases where the new type is really a skin over some
built-in type, and you can leverage that type's I/O behavior to simplify
what the Python code has to do.
        regards, tom lane



Re: Is there a good reason why PL languages do not support cstring type arguments and return values ?

От
Hannu Krosing
Дата:
On 10/10/2012 04:34 PM, Tom Lane wrote:
> Hannu Krosing <hannu@krosing.net> writes:
>> One way would be to check that we are in an any --> cstring
>> function - perhaps just by setting some static flag et entry and resetting
>> it at exit - and pass the original byte representation as "bytes" (or
>> string for py2.x)
> Totally aside from the ugliness of driving that off the *other* end
> being cstring,
The cstring case seems trivial - you just have to omit the initial 
conversion
to cstring that is happening now for most types and only do only the second
part which is the cstring_to_python or cstring_to_postgresql conversion
depending on if it is an input or output function.

> it seems quite insufficient to me.  For example, if the
> data type in question is toastable, you don't really want to leave the
> Python code with the problem of detoasting a toasted value.  Even if
> it's just an int, your proposal saddles the Python code with enddianness
> problems.
>
> I think my suggestion of a way to pretend the argument or result is of
> some specified other type for conversion purposes is quite a lot superior.
Agreed, and it even seems that we can reuse current existing basetype
support present in CREATE TYPE and pg_proc. If not functionally then at
least for storing the equivalent type info.

> In the toastable-type case, referencing bytea would be enough to get the
> Python code out from under detoasting and length-word management.  There
> might also be cases where the new type is really a skin over some
> built-in type, and you can leverage that type's I/O behavior to simplify
> what the Python code has to do.
>
>             regards, tom lane




Re: Is there a good reason why PL languages do not support cstring type arguments and return values ?

От
Dimitri Fontaine
Дата:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> I've wanted to allow writing i/o functions in non-C languages for a long
> time as well, but never got around to do anything about it. Custom datatypes
> are really powerful, but as soon as you have to write C code, that raises
> the bar significantly. I/O functions written in, say, PL/pgSQL would be an
> order of magnitude slower than ones written in C, but for many applications
> it would be OK.

Do you want a crazy idea now? Yes, I do mean Yet Another One.

I'm thinking about what it would take to have a new PL/C language where
the backend would actually compile and link/load the C code at CREATE
FUNCTION time, using dynamic code generation techniques.

That would allow writing functions in C and not have to ship a binary
executable file on the system, which would solve a bunch of problems.
With that tool and this use case, you could simply ship inline your C
coded IO functions in the middle of the PL/pythonu extension, using the
exact same mechanisms.

In the more general view of our offerings, that would fix C coded
extensions for Hot Standby, for one thing.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: Is there a good reason why PL languages do not support cstring type arguments and return values ?

От
Pavel Stehule
Дата:
2012/10/11 Dimitri Fontaine <dimitri@2ndquadrant.fr>:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> I've wanted to allow writing i/o functions in non-C languages for a long
>> time as well, but never got around to do anything about it. Custom datatypes
>> are really powerful, but as soon as you have to write C code, that raises
>> the bar significantly. I/O functions written in, say, PL/pgSQL would be an
>> order of magnitude slower than ones written in C, but for many applications
>> it would be OK.
>
> Do you want a crazy idea now? Yes, I do mean Yet Another One.
>
> I'm thinking about what it would take to have a new PL/C language where
> the backend would actually compile and link/load the C code at CREATE
> FUNCTION time, using dynamic code generation techniques.
>
> That would allow writing functions in C and not have to ship a binary
> executable file on the system, which would solve a bunch of problems.
> With that tool and this use case, you could simply ship inline your C
> coded IO functions in the middle of the PL/pythonu extension, using the
> exact same mechanisms.
>
> In the more general view of our offerings, that would fix C coded
> extensions for Hot Standby, for one thing.

long time I am thinking about it. I would to  use our embedded C - but
replace libpq by SPI. Other idea - compile PL/pgSQL to C - and ...

Regards

Pavel

>
> Regards,
> --
> Dimitri Fontaine
> http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



On 10/10/2012 05:34 PM, Tom Lane wrote:
> Hannu Krosing <hannu@krosing.net> writes:
>> One way would be to check that we are in an any --> cstring
>> function - perhaps just by setting some static flag et entry and resetting
>> it at exit - and pass the original byte representation as "bytes" (or
>> string for py2.x)
> Totally aside from the ugliness of driving that off the *other* end
> being cstring, it seems quite insufficient to me.
Please find attached a patch, which solves the typeio function recursion
problem by simply testing if the function we are currently in is a type-io
function (fn_oid == argTypeStruct->typoutput ... )

This is definitely WIP, put here just to verify the approach is mostly sane.
Also there are not integrated tests in the patch or docs yet.
See attached  test-pytypeio.sql for sample usage.

It is usable for simple cases, like non-toastable fixed length types
- both pass by value and pass py reference - and
non-toastable varlen types. It has no expicit support yet for any
more fancy  things like toasting or new short varlen headers.

It also has the beginnings of support for type "internal" so that also
send and receive functions can be written in plpython.

Some of the work also went into accepting shell types so that you
actually can define typeio functions,
>   For example, if the
> data type in question is toastable, you don't really want to leave the
> Python code with the problem of detoasting a toasted value.
The next think I'll do is to fashion my raw input/output functions for
toastable cases after bytea. Currently they are just tested for simple
"old postgresql varlen type".
> Even if
> it's just an int, your proposal saddles the Python code with enddianness
> problems.
This can also be seen as a feature, that is you _can_ encode the binary
exactly as you like. For example you have 4-byte strings encoded in
int4-sized pass-by value chunks. or 16 digit decimals encoded as 16
4-bit nibbles.

And endianness dead simple to do in pythons struct module, as
it's just one char prefix in format string.
> I think my suggestion of a way to pretend the argument or result is of
> some specified other type for conversion purposes is quite a lot superior.
> In the toastable-type case, referencing bytea would be enough to get the
> Python code out from under detoasting and length-word management.
Will look into it.

----
Hannu Krosing








Вложения