Обсуждение: Function-manager redesign: second draft (long)
This is a followup to a message I wrote in June about reworking the fmgr
interface. I've had a couple of good ideas (I think ;-)) since then,
but there are some parts of the proposal that still need work before
implementation can begin.
I could particularly use some feedback from Jan and anyone else who's
worked with function-call handlers: does this design eliminate the
kluges that you've had to use in the past? If not, what else is needed?
regards, tom lane
Proposal for function-manager redesign
--------------------------------------
We know that the existing mechanism for calling Postgres functions needs
to be redesigned. It has portability problems because it makes
assumptions about parameter passing that violate ANSI C; it fails to
handle NULL arguments and results cleanly; and "function handlers" that
support a class of functions (such as fmgr_pl) can only be done via a
really ugly, non-reentrant kluge. (Global variable set during every
function call, forsooth.) Here is a proposal for fixing these problems.
In the past, the major objections to redoing the function-manager
interface have been (a) it'll be quite tedious to implement, since every
built-in function and everyplace that calls such functions will need to
be touched; (b) such wide-ranging changes will be difficult to make in
parallel with other development work; (c) it will break existing
user-written loadable modules that define "C language" functions. While
I have no solution to the "tedium" aspect, I believe I see an answer to
the other problems: by use of function handlers, we can support both old
and new interfaces in parallel for both callers and callees, at some
small efficiency cost for the old styles. That way, most of the changes
can be done on an incremental file-by-file basis --- we won't need a
"big bang" where everything changes at once. Support for callees
written in the old style can be left in place indefinitely, to provide
backward compatibility for user-written C functions.
The new function-manager interface
----------------------------------
The core of the new design is revised data structures for representing
the result of a function lookup and for representing the parameters
passed to a specific function invocation. (We want to keep function
lookup separate, since many parts of the system apply the same function
over and over; the lookup overhead should be paid once per query, not
once per tuple.)
When a function is looked up in pg_proc, the result is represented as
typedef struct
{ PGFunction fn_addr; /* pointer to function or handler to be called */ Oid fn_oid; /* OID of
function(NOT of handler, if any) */ int fn_nargs; /* 0..MAXFMGRARGS, or -1 if variable arg count */ void
*fn_extra; /* extra space for use by handler */
} FunctionLookupInfoData;
typedef FunctionLookupInfoData* FunctionLookupInfo;
For an ordinary built-in function, fn_addr is just the address of the C
routine that implements the function. Otherwise it is the address of a
handler for the class of functions that includes the target function.
The handler can use the function OID and perhaps also the fn_extra slot
to find the specific code to execute. (fn_oid = InvalidOid can be used
to denote a not-yet-initialized FunctionLookupInfoData struct. fn_extra
will always be NULL when a FunctionLookupInfoData is first filled by the
function lookup code, but a function handler could set it to avoid
making repeated lookups of its own when the same FunctionLookupInfoData
is used repeatedly during a query.) fn_nargs is the number of arguments
expected by the function.
FunctionLookupInfo replaces the present FmgrInfo structure (but I'm
giving it a new name so that the old struct definition can continue
to exist during the transition phase).
During a call of a function, the following data structure is created
and passed to the function:
typedef struct
{ FunctionLookupInfo flinfo; /* ptr to lookup info used for this call */ bool isnull; /*
input/outputflag, see below */ int nargs; /* # arguments actually passed */ Datum
arg[MAXFMGRARGS]; /* Arguments passed to function */ bool argnull[MAXFMGRARGS]; /* T if arg[i] is actually
NULL*/
} FunctionCallInfoData;
typedef FunctionCallInfoData* FunctionCallInfo;
Note that all the arguments passed to a function (as well as its result
value) will now uniformly be of type Datum. As discussed below, callers
and callees should apply the standard Datum-to-and-from-whatever macros
to convert to the actual argument types of a particular function. The
value in arg[i] is unspecified when argnull[i] is true.
It is generally the responsibility of the caller to ensure that the
number of arguments passed matches what the callee is expecting: except
for callees that take a variable number of arguments, the callee will
typically ignore the nargs field and just grab values from arg[].
The meaning of the struct elements should be pretty obvious with the
exception of isnull. isnull must be set by the caller to the logical OR
of the argnull[i] flags --- ie, isnull is true if any argument is NULL.
(Of course, isnull is false if nargs == 0.) On return from the
function, isnull is the null flag for the function result: if it is true
the function's result is NULL, regardless of the actual function return
value. Overlapping the input and output flags in this way provides a
simple, convenient, fast implementation for the most common case of a
"strict" function (whose result is NULL if any input is NULL):
if (finfo->isnull) return (Datum) 0; /* specific value doesn't matter */
... else do normal calculation ignoring argnull[] ...
Non-strict functions can easily be implemented; they just need to check
the individual argnull[] flags and set the appropriate isnull value
before returning.
FunctionCallInfo replaces FmgrValues plus a bunch of ad-hoc parameter
conventions.
Callees, whether they be individual functions or function handlers,
shall always have this signature:
Datum function (FunctionCallInfo finfo);
which is represented by the typedef
typedef Datum (*PGFunction) (FunctionCallInfo finfo);
The function is responsible for setting finfo->isnull appropriately
as well as returning a result represented as a Datum. Note that since
all callees will now have exactly the same signature, and will be called
through a function pointer declared with exactly that signature, we
should have no portability or optimization problems.
When the function's result type is pass-by-reference, the result value
must always be stored in freshly-palloc'd space (it can't be a constant
or a copy of an input pointer). This rule will eventually allow
automatic reclamation of storage space during expression evaluation.
Function coding conventions
---------------------------
As an example, int4 addition goes from old-style
int32
int4pl(int32 arg1, int32 arg2)
{ return arg1 + arg2;
}
to new-style
Datum
int4pl(FunctionCallInfo finfo)
{ if (finfo->isnull) return (Datum) 0; /* value doesn't really matter ... */ /* we can ignore flinfo,
nargsand argnull */
return Int32GetDatum(DatumGetInt32(finfo->arg[0]) + DatumGetInt32(finfo->arg[1]));
}
This is, of course, much uglier than the old-style code, but we can
improve matters with some well-chosen macros for the boilerplate parts.
What we actually end up writing might look something like
Datum
int4pl(PG_FUNCTION_ARGS)
{ PG_STRICT_FUNCTION; /* encapsulates null check */ { PG_ARG1_INT32; PG_ARG2_INT32;
PG_RESULT_INT32( arg1 + arg2 ); }
}
where the macros expand to things like
"int32 arg1 = DatumGetInt32(finfo->arg[0])"
and "return Int32GetDatum( x )". I don't yet have a detailed proposal
for convenience macros for function authors, but I think it'd be well
worth while to define some.
For the standard pass-by-reference types (int8, float4, float8) these
macros should also hide the indirection and space allocation involved,
so that the function's code is not explicitly aware that the types are
pass-by-ref. This will allow future conversion of these types to
pass-by-value on machines where it's feasible to do that. (For example,
on an Alpha it's pretty silly to make int8 be pass-by-ref, since Datum
is going to be 64 bits anyway.)
Call-site coding conventions
----------------------------
There are many places in the system that call either a specific function
(for example, the parser invokes "textin" by name in places) or a
particular group of functions that have a common argument list (for
example, the optimizer invokes selectivity estimation functions with
a fixed argument list). These places will need to change, but we should
try to avoid making them significantly uglier than before.
Places that invoke an arbitrary function with an arbitrary argument list
can simply be changed to fill a FunctionCallInfoData structure directly;
that'll be no worse and possibly cleaner than what they do now.
When invoking a specific built-in function by name, we have generally
just written something likeresult = textin ( ... args ... )
which will not work after textin() is converted to the new call style.
I suggest that code like this be converted to use "helper" functions
that will create and fill in a FunctionCallInfoData struct. For
example, if textin is being called with one argument, it'd look
something likeresult = DirectFunctionCall1(textin, PointerGetDatum(argument));
These helper routines will have declarations likeDatum DirectFunctionCall2(PGFunction func, Datum arg1, Datum arg2);
Note it will be the caller's responsibility to convert to and from
Datum; appropriate conversion macros should be used.
The DirectFunctionCallN routines will not bother to fill in
finfo->flinfo (indeed cannot, since they have no idea about an OID for
the target function); they will just set it NULL. This is unlikely to
bother any built-in function that could be called this way. Note also
that this style of coding cannot check for a NULL result (it couldn't
before, either!). We could reasonably make the helper routines elog an
error if they see that the function returns a NULL.
(Note: direct calls like this will have to be changed at the same time
that the called routines are changed to the new style. But that will
still be a lot less of a constraint than a "big bang" conversion.)
When invoking a function that has a known argument signature, we have
usually written eitherresult = fmgr(targetfuncOid, ... args ... );
orresult = fmgr_ptr(FmgrInfo *finfo, ... args ... );
depending on whether an FmgrInfo lookup has been done yet or not.
This kind of code can be recast using helper routines, in the same
style as above:result = OidFunctionCall1(funcOid, PointerGetDatum(argument));result = FunctionCall2(funcCallInfo,
PointerGetDatum(argument), Int32GetDatum(argument));
Again, this style of coding does not recognize the possibility of a
null result. We could provide variant helper routines that allow
a null return rather than raising an error, which could be called in
a style likeif (FunctionCall1IsNull(&result, funcCallInfo, PointerGetDatum(argument))){ ...
copewith null result ...}else{ ... OK, use 'result' here ...}
But I'm unsure that there are enough places in the system that need this
to justify the extra set of helpers. If there are only a few places
that need a non-error response to a null result, they could just be
changed to fill and examine a FunctionCallInfoData structure directly.
As with the callee-side situation, I am strongly inclined to add
argument conversion macros that hide the pass-by-reference nature of
int8, float4, and float8, with an eye to making those types relatively
painless to convert to pass-by-value.
The existing helper functions fmgr(), fmgr_c(), etc will be left in
place until all uses of them are gone. Of course their internals will
have to change in the first step of implementation, but they can
continue to support the same external appearance.
Notes about function handlers
-----------------------------
Handlers for classes of functions should find life much easier and
cleaner in this design. The OID of the called function is directly
reachable from the passed parameters; we don't need the global variable
fmgr_pl_finfo anymore. Also, by modifying finfo->flinfo->fn_extra,
the handler can cache lookup info to avoid repeat lookups when the same
function is invoked many times. (fn_extra can only be used as a hint,
since callers are not required to re-use a FunctionLookupInfo struct.
But in performance-critical paths they normally will do so.)
I observe that at least one other global variable, CurrentTriggerData,
is being used as part of the call convention for some function handlers.
That's just as grotty as fmgr_pl_finfo, so I'd like to get rid of it.
Any comments on the cleanest way to do so?
Are there any other things needed by the call handlers for PL/pgsql and
other languages?
During the conversion process, support for old-style builtin functions
and old-style user-written C functions will be provided by appropriate
function handlers. For example, the handler for old-style builtins
will look roughly like fmgr_c() does now.
System table updates
--------------------
In the initial phase, pg_language type 11 ("builtin") will be renamed
to "old_builtin", and a new language type named "builtin" will be
created with a new OID. Then pg_proc entries will be changed from
language code 11 to the new code piecemeal, as the associated routines
are rewritten. (This will imply several rounds of forced initdbs as
the contents of pg_proc change. It would be a good idea to add a
"catalog contents version number" to the database version info checked
at startup before we begin this process.)
The existing pg_language entry for "C" functions will continue to
describe user functions coded in the old style, and we will need to add
a new language name for user functions coded in the new style. (Any
suggestions for what the new name should be?) We should deprecate
old-style functions because of their portability problems, but the
support for them will only be one small function handler routine,
so we can leave them in place for as long as necessary.
The expected calling convention for PL call handlers will need to change
all-at-once, but fortunately there are not very many of them to fix.
> This is a followup to a message I wrote in June about reworking the fmgr > interface. I've had a couple of good ideas (I think ;-)) since then, > but there are some parts of the proposal that still need work before > implementation can begin. > > I could particularly use some feedback from Jan and anyone else who's > worked with function-call handlers: does this design eliminate the > kluges that you've had to use in the past? If not, what else is needed? Sounds good. My only question is whether people need backward compatibility, and whether we can remove the compatiblity part of the interface and small overhead after 7.1 or later? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
>
> > This is a followup to a message I wrote in June about reworking the fmgr
> > interface. I've had a couple of good ideas (I think ;-)) since then,
> > but there are some parts of the proposal that still need work before
> > implementation can begin.
> >
> > I could particularly use some feedback from Jan and anyone else who's
> > worked with function-call handlers: does this design eliminate the
> > kluges that you've had to use in the past? If not, what else is needed?
>
> Sounds good. My only question is whether people need backward
> compatibility, and whether we can remove the compatiblity part of the
> interface and small overhead after 7.1 or later?
Backward compatibility is a common source of problems, and I
don't like it generally. In the case of the fmgr it is quite
a little difficult, and I thought about it too already. I
like the current interface for it's simpleness from the user
function developers point of view. And converting ALL
internal plus most of the contrib directories ones to
something new is really a huge project.
All function calls through the fmgr use the FmgrInfo
structure, but there are alot of calls to internal functions
like textout() etc. too. Changing their interface would IMHO
introduce many problems. And there are only a few internal
functions where a new fmgr interface really is required due
to incomplete NULL handling or the like.
Therefore I would prefer an interface extension, that doesn't
require changes to existing functions. What about adding a
proifversion to pg_proc, telling the fmgr which call
interface the function uses? This is then held in the
FmgrInfo struct too so the fmgr can call a function using the
old and new interface.
First fmgr_info() is extended to put the interface version
into the FmgrInfo too. Then fmgr_faddr() is renamed to
fmgr_faddr_v1() and it has to check that only functions using
the old interface are called through it (there aren't that
many calls to it as you might think). After that you have
all the time in the world to implement another interface and
add a switch into fmgr() and sisters handling the different
versions.
My thoughts for the new interface:
o Each function argument must have it's separate NULL flag.
o The functions result must have another NULL flag too.
o Argument names and default values for omitted ones aren't
IMHO something to go into the interface itself. The
function is allways called with all arguments positional,
the parser must provide this list.
o The new interface must at least be designed for a later
implementation of tuple set returns. I think this must be
implemented as a temp table, collecting all return tuples
since for a procedural language it might be impossible to
implement a real RETURN AND RESUME (like it is already
for PL/Tcl and it would be for PL/Perl). Therefore
another STRUCT kind of relation must be added too,
providing the tupdesc for the returned set. This temp
table, filled by calling the function at the first time a
tuple is needed and then it is simply another RTE.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
> System table updates
> --------------------
>
> In the initial phase, pg_language type 11 ("builtin") will be renamed
> to "old_builtin", and a new language type named "builtin" will be
> created with a new OID. Then pg_proc entries will be changed from
> language code 11 to the new code piecemeal, as the associated routines
> are rewritten. (This will imply several rounds of forced initdbs as
> the contents of pg_proc change. It would be a good idea to add a
> "catalog contents version number" to the database version info checked
> at startup before we begin this process.)
>
> The existing pg_language entry for "C" functions will continue to
> describe user functions coded in the old style, and we will need to add
> a new language name for user functions coded in the new style. (Any
> suggestions for what the new name should be?) We should deprecate
> old-style functions because of their portability problems, but the
> support for them will only be one small function handler routine,
> so we can leave them in place for as long as necessary.
>
> The expected calling convention for PL call handlers will need to change
> all-at-once, but fortunately there are not very many of them to fix.
This approach nearly matches all my thoughts about the
redesign of the fmgr. In the system table section I miss
named arguments.
I think we need a new system table
pg_proargs (
Oid pargprooid,
int2 pargno,
name pargname,
bool pargdflnull,
text pargdefault
);
plus another flag in pg_proc that tells if this function
prototype information is available.
The parser then has to handle function calls like
... myfunc(userid = 123, username = 'hugo');
and build a complete function argument list that has all the
arguments in the correct order and defaults for omitted
arguments filled in as const nodes. This new prototype
information than must also be used in the PL handlers to
choose the given names for arguments.
In addition, we could add an argument type at this time
(INPUT, OUTPUT and INOUT) but only support INPUT ones for now
from the SQL level. PL functions calling other functions
could eventually use these argument types in the future.
Also I miss the interface for tuple set returns. I know that
this requires much more in other sections than only the fmgr,
but we need to cover it now or we'll not be able to do it
without another change to the fmgr interface at the time we
want to support real stored procedures. As said in another
mail, I think it must be done via some temp table since most
interpreter languages will not be able to do RETURN AND
RESUME in any other way - not even PL/pgSQL will be easy and
would need a total internal redesign of the bytecode
interpreter since it otherwise needs to recover the internal
call stack maybe across recursive calls.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Sounds good. My only question is whether people need backward
> compatibility, and whether we can remove the compatiblity part of the
> interface and small overhead after 7.1 or later?
I think we could drop it after a decent interval, but I don't see any
reason to be in a hurry. I do think that we'll get complaints if 7.0
doesn't have any backward compatibility for existing user functions.
regards, tom lane
[ responding to both of Jan's messages in one ]
wieck@debis.com (Jan Wieck) writes:
> I like the current interface for it's simpleness from the user
> function developers point of view.
There is that; even with a good set of convenience macros there will be
more to learn about writing user functions. OTOH, the way it is now
is not exactly transparent --- in particular, NULL handling is so easy
to forget about/get wrong. We can make that much easier. We can also
simplify the interface noticeably for standard types like float4/float8.
BTW: I am not in nearly as big a hurry as Bruce is to rip out support
for the current user-function interface. I want to get rid of old-style
builtin functions before 7.0 because of the portability issue (see
below). But if a particular user is using old-style user functions
and isn't having portability problems on his machine, there's no need
to force him to convert, it seems to me.
> And converting ALL
> internal plus most of the contrib directories ones to
> something new is really a huge project.
It is. I was hoping to get some help ;-) ... but I will do it myself
if I have to.
> ... And there are only a few internal
> functions where a new fmgr interface really is required due
> to incomplete NULL handling or the like.
If your goal is *only* to deal with the NULL issue, or *only* to get
things working on a specific platform like Alpha, then yes we could
patch in a few dozen places and not undertake a complete changeover.
But I believe that we really need to fix this right, because it will
keep coming back to haunt us until we do. I will not be happy as
long as we have ports that have to compile "-O0" because of fmgr
brain-damage. I fear there are going to be more and more such ports
as people install smarter compilers that assume they are working with
ANSI-compliant source code. We're giving up performance system-wide
because of fmgr.
> Therefore I would prefer an interface extension, that doesn't
> require changes to existing functions. What about adding a
> proifversion to pg_proc, telling the fmgr which call
> interface the function uses?
I was going to let the prolang column tell that, by having different
language codes for old vs. new builtin function and old vs. new
dynamic-linked C function. We could add a version column instead,
but that seems like unnecessary complication.
> First fmgr_info() is extended to put the interface version
> into the FmgrInfo too. Then fmgr_faddr() is renamed to
> fmgr_faddr_v1() and it has to check that only functions using
> the old interface are called through it (there aren't that
> many calls to it as you might think).
I think it should be possible to make fmgr_faddr call both old and
new functions --- I haven't actually written code yet, but I think
I can do it. You're right that that's important in order to spread
out the repair work instead of having a "big bang".
> This approach nearly matches all my thoughts about the
> redesign of the fmgr. In the system table section I miss
> named arguments.
As you said in your earlier message, that is a parser-level feature
that has nothing to do with what happens at the fmgr level. I don't
want to worry about it for this revision.
> Also I miss the interface for tuple set returns. I know that
> this requires much more in other sections than only the fmgr,
> but we need to cover it now or we'll not be able to do it
> without another change to the fmgr interface at the time we
> want to support real stored procedures.
OK, I'm willing to worry about that. But what, exactly, needs to
be provided in the fmgr interface?
regards, tom lane
>
> [ responding to both of Jan's messages in one ]
>
> wieck@debis.com (Jan Wieck) writes:
> > I like the current interface for it's simpleness from the user
> > function developers point of view.
>
> There is that; even with a good set of convenience macros there will be
> more to learn about writing user functions. OTOH, the way it is now
> is not exactly transparent --- in particular, NULL handling is so easy
> to forget about/get wrong. We can make that much easier. We can also
> simplify the interface noticeably for standard types like float4/float8.
>
> BTW: I am not in nearly as big a hurry as Bruce is to rip out support
> for the current user-function interface. I want to get rid of old-style
> builtin functions before 7.0 because of the portability issue (see
> below). But if a particular user is using old-style user functions
> and isn't having portability problems on his machine, there's no need
> to force him to convert, it seems to me.
Personally, I could live with dropping the entire old
interface. That's not the problem. But at least Bruce and his
book need to know the final programming conventions if we
ought to change it at all, so it can be covered in his
manuscript when it is sent down to the paper.
> I was going to let the prolang column tell that, by having different
> language codes for old vs. new builtin function and old vs. new
> dynamic-linked C function. We could add a version column instead,
> but that seems like unnecessary complication.
Right - language is all needed to tell.
> > This approach nearly matches all my thoughts about the
> > redesign of the fmgr. In the system table section I miss
> > named arguments.
>
> As you said in your earlier message, that is a parser-level feature
> that has nothing to do with what happens at the fmgr level. I don't
> want to worry about it for this revision.
Right too. I just hoped to expand the scope of this change so
there would only ONE change to the PL handlers covering both,
new interface with proper NULL handling and enhanced function
prototypes.
>
> > Also I miss the interface for tuple set returns. I know that
> > this requires much more in other sections than only the fmgr,
> > but we need to cover it now or we'll not be able to do it
> > without another change to the fmgr interface at the time we
> > want to support real stored procedures.
>
> OK, I'm willing to worry about that. But what, exactly, needs to
> be provided in the fmgr interface?
First we need another relation type in pg_class. It's like a
table or view, but none of the NORMAL SQL statements can be
used with it (e.g. INSERT, SELECT, ...). It just describes a
row structure.
Then, a function returning a SET of any row type (this time,
regular relations or views too) can be used in the rangetable
as
SELECT A.col1, B.col2 FROM mysetfunc() A, anothertab B
WHERE A.col1 = B.col1;
Of course, it requires some new fields in the rangetable
entry. Anyway, at the beginning of execution for such a
query, the executor initializes those RTE's by creating a
temptable with the schema of the specified structure or
relation. Then it calls the user function, passing in some
handle to the temp table and the user function fills in the
tuples. Now the rest of query execution is if it is a regular
table. After execution, the temp table is dropped.
Isn't that all required for true stored procedures?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > Sounds good. My only question is whether people need backward > > compatibility, and whether we can remove the compatiblity part of the > > interface and small overhead after 7.1 or later? > > I think we could drop it after a decent interval, but I don't see any > reason to be in a hurry. I do think that we'll get complaints if 7.0 > doesn't have any backward compatibility for existing user functions. True. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > > > [ responding to both of Jan's messages in one ] > > > > wieck@debis.com (Jan Wieck) writes: > > > I like the current interface for it's simpleness from the user > > > function developers point of view. > > > > There is that; even with a good set of convenience macros there will be > > more to learn about writing user functions. OTOH, the way it is now > > is not exactly transparent --- in particular, NULL handling is so easy > > to forget about/get wrong. We can make that much easier. We can also > > simplify the interface noticeably for standard types like float4/float8. > > > > BTW: I am not in nearly as big a hurry as Bruce is to rip out support > > for the current user-function interface. I want to get rid of old-style > > builtin functions before 7.0 because of the portability issue (see > > below). But if a particular user is using old-style user functions > > and isn't having portability problems on his machine, there's no need > > to force him to convert, it seems to me. > > Personally, I could live with dropping the entire old > interface. That's not the problem. But at least Bruce and his > book need to know the final programming conventions if we > ought to change it at all, so it can be covered in his > manuscript when it is sent down to the paper. No. My coverage of that is going to be more conceptual than actual programming. Do whatever you think is best. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
wieck@debis.com (Jan Wieck) writes:
>>>> Also I miss the interface for tuple set returns. I know that
>>>> this requires much more in other sections than only the fmgr,
>>>> but we need to cover it now or we'll not be able to do it
>>>> without another change to the fmgr interface at the time we
>>>> want to support real stored procedures.
>>
>> OK, I'm willing to worry about that. But what, exactly, needs to
>> be provided in the fmgr interface?
> First we need another relation type in pg_class. It's like a
> table or view, but none of the NORMAL SQL statements can be
> used with it (e.g. INSERT, SELECT, ...). It just describes a
> row structure.
Why bother? Just create a table --- if you don't want to put any
rows in it, you don't have to.
> Of course, it requires some new fields in the rangetable
> entry. Anyway, at the beginning of execution for such a
> query, the executor initializes those RTE's by creating a
> temptable with the schema of the specified structure or
> relation. Then it calls the user function, passing in some
> handle to the temp table and the user function fills in the
> tuples. Now the rest of query execution is if it is a regular
> table. After execution, the temp table is dropped.
OK, but this doesn't answer the immediate problem of what needs to
appear in the fmgr interface.
I have been thinking about this some, and I think maybe our best bet is
not to commit to *exactly* what needs to pass through fmgr, but rather
to put in a couple of generic hooks. I can see two hooks that are
needed: one is for passing "context" information, such as information
about the current trigger event when a function is called from the
trigger manager. (This'd replace the present CurrentTriggerData global,
which I hope you'll agree shouldn't be a global...) The other is for
passing and/or returning information about the function result --- maybe
returning a tuple descriptor, maybe passing in the name of a temp table
to put a result set in, maybe other stuff. So, I am thinking that we
should add two fields like this to the FunctionCallInfo struct:
Node *resultinfo; /* pass or return extra info about result */Node *context; /* pass info about context of
call*/
We would restrict the usage of these fields only to the extent of saying
that they must point to some kind of Node --- that lets callers and
callees check the node tag to make sure that they understand what they
are looking at. Different node types can be used to handle different
situations. For an "ordinary" function call expecting a scalar return
value, both fields will be set to NULL. Other conventions for their use
will be developed as time goes on.
Does this look good to you, or am I off the track entirely?
regards, tom lane
>
> Node *resultinfo; /* pass or return extra info about result */
> Node *context; /* pass info about context of call */
>
> Does this look good to you, or am I off the track entirely?
>
> regards, tom lane
>
Looks perfect. I appreciate to get rid of globals whenever
possible.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
Tom Lane wrote:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > Sounds good. My only question is whether people need backward
> > compatibility, and whether we can remove the compatiblity part of the
> > interface and small overhead after 7.1 or later?
>
> I think we could drop it after a decent interval, but I don't see any
> reason to be in a hurry. I do think that we'll get complaints if 7.0
> doesn't have any backward compatibility for existing user functions.
Right. A major release is what it is. And porting
applications to a new major release too, it is a conversion,
not an upgrade. Therefore a major release should drop as much
backward compatibility code for minor releases as possible.
Thus, we should think about getting rid of the broken design
for functions returning tuple sets in 7.0. As far as I
understand the books I have, there are a couple of different
types of functions/procedures out, and not every database
implements all types, nor do they all name one and the same
type equally so that something called function in one
database is a stored procedure in another. Anyway, the
different types are:
1. Functions returning a scalar value taking only input-
arguments.
2. Functions returning a scalar value taking input-, output-
and in/out-arguments.
3. Functions returning nothing taking only input-arguments.
4. Functions returning nothing taking input-, output- and
in/out-arguments.
5. Functions returning a set of result rows taking only
input-arguments.
6. Functions returning a set of result rows taking input-,
output- and in/out-arguments.
I don't think that we have to implement everything, and since
we don't have host variables, output- and in/out-arguments
would make sense only for calls from procedural languages.
OTOH they would cause much trouble so they are one detail to
let out for PostgreSQL.
Three cases left. Type number 1. we have already. And it is
advanced, because the arguments can be either single values,
or single rows.
And type number 3. is easy, because invoking something that
returns a dummy that is thrown away is absolutely no work.
So the only thing that's really left is number 5. The funny
detail is, that those functions or procedures can't be used
inside regular SELECT queries. Instead a CALL FUNCTION or
EXECUTE PROCEDURE statement is used from the client
application or inside a PL block. CALL FUNCTION then returns
a tuple set as a SELECT does. The result in our world
therefore has a tuple descriptor and depending on the invoker
is sent to the client or stored in an SPI tuple table.
So we do not need to call functions returning sets through
the normal function manager. It could competely deny calls to
set functions, and the interface for them can be a total
different one. I have something in mind that could work
without temp tables, but it requires a redesign for PL/pgSQL
and causes some limitations for PL/Tcl. Let's leave that for
a past 7.0 release.
I correct my previous statements and vote to deny calls to
set functions through the default function manager in 7.0.
And there is another detail I found while browsing through
the books. Functions can be defined as [NOT] NULL CALL (IBM
DB2). Functions defined as NOT NULL CALL will be called only
if all their arguments aren't NULL. So we can prevent much
NULL handling inside the functions if we simply define that a
function that is NOT NULL CALL will allways return NULL if
any of it's input arguments is NULL. This case can then be
handled at the function manager level without calling the
function itself. Nearly all our builtin functions behave that
way but have all the tests inside.
Another detail I'm missing now is a new, really defined
interface for type input/output functions. The fact that they
are defined taking one opaque (yepp, should be something
different as already discussed) argument but in fact get more
information from the attribute is ugly.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > ... > > 5. Functions returning a set of result rows taking only > input-arguments. ... > So the only thing that's really left is number 5. The funny > detail is, that those functions or procedures can't be used > inside regular SELECT queries. Instead a CALL FUNCTION or > EXECUTE PROCEDURE statement is used from the client > application or inside a PL block. CALL FUNCTION then returns > a tuple set as a SELECT does. The result in our world > therefore has a tuple descriptor and depending on the invoker > is sent to the client or stored in an SPI tuple table. > > So we do not need to call functions returning sets through > the normal function manager. It could competely deny calls to > set functions, and the interface for them can be a total > different one. I have something in mind that could work > without temp tables, but it requires a redesign for PL/pgSQL > and causes some limitations for PL/Tcl. Let's leave that for > a past 7.0 release. > > I correct my previous statements and vote to deny calls to > set functions through the default function manager in 7.0. > It would be very nice if we could use the tuple-set-returning functions in place of tables/views, SELECT * FROM LOGGED_IN_USERS_INFO_PROC; or at least define views on them: CREATE VIEV LOGGED_IN_USERS AS CALL FUNCTION LOGGED_IN_USERS_INFO_PROC; We would not need to call them in place of functions that return either single-value or tuple. On the topic of 2x3=6 kinds of functions you mentioned I think we could use jet another type of functions - the one returning a tuple/row as is ofteh case in python and possibly other languages that do automatic tuple packing/unpacking. It could be used in cases like this: INSERT INTO MY_TABLE CALL FUNCTION MY_ROW_VALUE; or DELETE FROM MY_TABLE WHERE * = CALL FUNCTION MY_ROW_VALUE; (The last example is not ansi and does not work currently), OTOH, these exaples would jus be redundant cases for your 5th case. OTOOH, all the functions returning less than a set of rows are redundadnt cases of the functions that do ;) ----------- Hannu
Jan Wieck wrote: > > Another detail I'm missing now is a new, really defined > interface for type input/output functions. The fact that they > are defined taking one opaque (yepp, should be something > different as already discussed) argument but in fact get more > information from the attribute is ugly. Can we currently return a list of the same type ? I guess we can, as lists (or arrays) are fundamentl types in PostgreSQL, but I'm not sure. I would like to define aggregate functions list() and set() Could I define then just once and specify that they return an array of their input type ? Half of that is currently done for count() - i.e. it can take any type of argument, but I guess the return-array-of-input-type is more complicated. Also (probably off topic) how hard would it be to add another type of aggregate funtions tha operate on pairs of values ? I would like to have FOR_MIN and FOR_MAX (and possibly MIN_MIN and MAX_MAX) functions that return _another_ field from a table for a minimal value in one field. ------------- Hannu
Hannu Krosing wrote:
>
> Jan Wieck wrote:
> >
> > I correct my previous statements and vote to deny calls to
> > set functions through the default function manager in 7.0.
> >
>
> It would be very nice if we could use the tuple-set-returning
> functions in place of tables/views,
>
> SELECT * FROM LOGGED_IN_USERS_INFO_PROC;
Exactly that's something I want for long now. Sticking
another querytree, that returns a tuple set, into a
rangetable entry. This other querytree could be either a
SELECT as in
SELECT A.x, A.y, B.z FROM table1 A, (SELECT ...) B
WHERE A.x = B.x;
or a function returning a set as in
SELECT A.x, A.y, B.z FROM table1 A, setfunc('arg') B
WHERE A.x = B.x;
Finally, the form
CALL setfunc('arg');
would be equivalent to a
SELECT * FROM setfunc('arg');
but closer to the IBM DB2 calling syntax. The first one is
required to get rid of some problems in the rule system,
especially views with aggregate columns that need their own
GROUP BY clause. The other ones are what we need to implement
stored procedures.
>
> or at least define views on them:
>
> CREATE VIEV LOGGED_IN_USERS AS CALL FUNCTION LOGGED_IN_USERS_INFO_PROC;
Wrong syntax since the statement after AS must be a SELECT.
But a
CREATE VIEW v AS SELECT * FROM setfunc();
would do the trick.
>
> We would not need to call them in place of functions that return
> either single-value or tuple.
>
> On the topic of 2x3=6 kinds of functions you mentioned I think we
> could use jet another type of functions - the one returning a
> tuple/row as is ofteh case in python and possibly other languages
> that do automatic tuple packing/unpacking.
>
> It could be used in cases like this:
>
> INSERT INTO MY_TABLE CALL FUNCTION MY_ROW_VALUE;
Let's clearly distinguish between scalar, row and set return
values. A scalar return value is one single datum. A row
return value is exactly one tuple of 1...n datums and a set
return value is a collection of 0...n rows.
What we have now (at least what works properly) are only
scalar return values from functions. And I don't see the
point of a row return, so I think we don't need them.
>
> or
>
> DELETE FROM MY_TABLE WHERE * = CALL FUNCTION MY_ROW_VALUE;
>
> (The last example is not ansi and does not work currently),
>
> OTOH, these exaples would jus be redundant cases for your 5th case.
>
> OTOOH, all the functions returning less than a set of rows are
> redundadnt cases of the functions that do ;)
But please don't forget that it isn't enough to write down
the syntax and specify the behaviour with some english words.
We must define the behaviour in C too, and in that language
it's a little more than a redundant case of something,
because we don't have that something.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > Hannu Krosing wrote: > > > > Jan Wieck wrote: > What we have now (at least what works properly) are only > scalar return values from functions. And I don't see the > point of a row return, so I think we don't need them. That's what I mant by the OTOH below > > > > (The last example is not ansi and does not work currently), > > > > OTOH, these exaples would jus be redundant cases for your 5th case. > > > > OTOOH, all the functions returning less than a set of rows are > > redundadnt cases of the functions that do ;) > > But please don't forget that it isn't enough to write down > the syntax and specify the behaviour with some english words. > We must define the behaviour in C too, and in that language > it's a little more than a redundant case of something, > because we don't have that something. Yes, that's the hard part. --------- Hannu
> Tom Lane wrote: > > > Bruce Momjian <maillist@candle.pha.pa.us> writes: > > > Sounds good. My only question is whether people need backward > > > compatibility, and whether we can remove the compatiblity part of the > > > interface and small overhead after 7.1 or later? > > > > I think we could drop it after a decent interval, but I don't see any > > reason to be in a hurry. I do think that we'll get complaints if 7.0 > > doesn't have any backward compatibility for existing user functions. > > Right. A major release is what it is. And porting > applications to a new major release too, it is a conversion, > not an upgrade. Therefore a major release should drop as much > backward compatibility code for minor releases as possible. If this was simple stuff, we could keep compatibility with little problem. But, with complex stuff like this, keeping backward compatibility sometimes make things more confusing. They can code things two ways, and that makes people get confused as to which one to follow. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
I think that the proposals for the revision of the function interface are all an improvement on what is there, and I hope to try to find time to help implement whatever is decided. Here are some more thoughts in relation to the question of set valued and tuple valued (or complex type?) arguments. Another place that user defined functions are used in the PostgreSQL backend is in association with access methods. Both as boolean operators for search predicates, and as auxiliary functions for the access methods. Allowing set valued and tuple valued arguments and return values for functions and operators in this setting can be valuable. For instance, suppose I have a table t that stores geometric objects in some space, and I have a spatial index such as R*-tree, or even a GIST index. Given a set of points pts I want to do a query of the form SELECT * FORM t WHERE t.object <intersects> pts; Under these circumstances it would be really nice to be able to pass a set of objects (as an SPI tuple table for instance) into the index. Currently, the way I do this (with a custom access method) is to create a temp table, put the key set into the temp table, and pass the name of the temp table to the access method in the search key. The access method then does an SPI select on the temp table and stores the returned items into the private scan state for use during the scan. While I realize that implementing this example requires much more than a change to the function interface, I hope that it illustrates that it is perhaps a good idea to keep as much flexibility in the function interface as possible. Bernie Franpitt
Bernie Franpitt wrote:
> SELECT * FORM t WHERE t.object <intersects> pts;
>
> Under these circumstances it would be really nice to be able to pass a
> set of objects (as an SPI tuple table for instance) into the index.
>
> Currently, the way I do this (with a custom access method) is to create
> a temp table, put the key set into the temp table, and pass the name of
> the temp table to the access method in the search key. The access
> method then does an SPI select on the temp table and stores the returned
> items into the private scan state for use during the scan.
>
> While I realize that implementing this example requires much more than a
> change to the function interface, I hope that it illustrates that it is
> perhaps a good idea to keep as much flexibility in the function
> interface as possible.
Uhhh - it is a good idea, but passing tuple sets as arguments
to functions, that will cause headaches, man.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
On Oct 30, Jan Wieck mentioned: > Right. A major release is what it is. And porting > applications to a new major release too, it is a conversion, > not an upgrade. Therefore a major release should drop as much > backward compatibility code for minor releases as possible. Certainly true. But that would also mean that we'd have to keep maintaining the 6.* series as well. At least bug-fixing and releasing one or two more 6.5.x versions. Up until now the usual answer to a bug was "upgrade to latest version". But if you break compatibility you can't do that any more. (Compare to Linux 2.0 vs 2.2) I'm just wondering what the plans are in that regard. -- Peter Eisentraut Sernanders vaeg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut wrote:
> On Oct 30, Jan Wieck mentioned:
>
> > Right. A major release is what it is. And porting
> > applications to a new major release too, it is a conversion,
> > not an upgrade. Therefore a major release should drop as much
> > backward compatibility code for minor releases as possible.
>
> Certainly true. But that would also mean that we'd have to keep
> maintaining the 6.* series as well. At least bug-fixing and releasing one
> or two more 6.5.x versions. Up until now the usual answer to a bug was
> "upgrade to latest version". But if you break compatibility you can't do
> that any more. (Compare to Linux 2.0 vs 2.2) I'm just wondering what the
> plans are in that regard.
Sometimes ago, I already pointed out that we have to support
one or too older releases for some time. Not only because we
might drop some compatibility code. Each release usually
declares one or the other new keyword, making existing
applications probably fail with the new release. And no
amount of compatibility code would help in that case! It's a
deadlock trap, an application that cannot be easily ported to
a newer release because of incompatibilities in the
querylaguage cannot use the last release it is compatible to
because of a bug.
There is a new aspect in this discussion since then. The new
corporation PostgreSQL Inc. offers commercial support for our
database (look at www.pgsql.com). If they offer support, they
must support older releases as well, so they need to
backpatch already.
Wouldn't it be a good idea if their return into our project
are bugfix releases of older versions (created by
backpatching release branches)? In the case of a customers
accident, they have to do it anyway. And doing it for
critical bugs during idle time could avoid accidents, so it's
good customer service.
Marc, what do you think about a such an agreement?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
wieck@debis.com (Jan Wieck) writes:
> There is a new aspect in this discussion since then. The new
> corporation PostgreSQL Inc. offers commercial support for our
> database (look at www.pgsql.com). If they offer support, they
> must support older releases as well, so they need to
> backpatch already.
Yes, but who's the "them" here? If PostgreSQL Inc. has any warm
bodies other than the existing group of developers, I sure haven't
heard from them...
I agree 100% with Jan's basic point: we must provide a degree of
backwards compatibility from release to release. In some cases
that might create enough pain to be worth debating, but in this
particular case it seems like the choice is a no-brainer. We just
leave in the transition fmgr code that we're going to write anyway.
I don't understand why it even got to be a topic of discussion.
regards, tom lane