Обсуждение: SQL99 ARRAY support proposal
I'd like to implement SQL99/200x ARRAY support (well, at least the 
basics). Does anyone have objections to the following grammar/semantics?
===========================================================
Per SQL200x - examples
===========================================================
create table foo(f1 integer ARRAY[3]);  result same as create table foo(f1 integer[3]); the 3  really does nothing
(thisdeviates from spec but is  same as current implementation)
 
create table foo(f1 integer ARRAY);  result same as create table foo(f1 integer[]);
select ARRAY[1,2,3];  result '{1,2,3}'
select ARRAY[(select oid from pg_class order by relname)];  result is array of all the oid's in pg_class in relname
order
select ARRAY[1,2] || 3  result '{1,2,3}'
select ARRAY[1,2] || ARRAY[3,4]  result '{1,2,3,4}'
===========================================================
Proposed PostgreSQL extensions
===========================================================
select ARRAY[[1,2,3], [4,5,6]];  result '{{1,2,3},{4,5,6}}'
select ARRAY[[1,2],[3,4]] || 5  result '{{1},{3},{5}}'
select ARRAY[[1,2],[3,4]] || [5,6]  result '{{1,2},{3,4},{5,6}}'
use UNION's algorithm for deriving the element type when not specified
Comments, suggestions, objections?
Thanks,
Joe
			
		Joe Conway <mail@joeconway.com> writes:
> select ARRAY[1,2,3];
>    result '{1,2,3}'
The array type is determined how?  I'd like this syntax better if there
were a way to force the choice of array type...
> select ARRAY[(select oid from pg_class order by relname)];
>    result is array of all the oid's in pg_class in relname order
Puh-leez tell me that's not in the spec.  How is one supposed to
distinguish this usage from the scalar-subselect case?
> select ARRAY[1,2] || 3
>    result '{1,2,3}'
Datatypes?  How many variants of the || operator do you plan to offer?
What will be the side-effects on the parser's ability to pick one?
> select ARRAY[1,2] || ARRAY[3,4]
>    result '{1,2,3,4}'
Same worry as above; likewise for the "proposed extensions".
        regards, tom lane
			
		Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>>select ARRAY[1,2,3];
>>   result '{1,2,3}'
> 
> The array type is determined how?  I'd like this syntax better if there
> were a way to force the choice of array type...
What about:
select integer ARRAY[1,2,3];   result '{1,2,3}'::integer[]
>>select ARRAY[(select oid from pg_class order by relname)];
>>   result is array of all the oid's in pg_class in relname order
> 
> Puh-leez tell me that's not in the spec.  How is one supposed to
> distinguish this usage from the scalar-subselect case?
Well, SQL99 has this:
<array value constructor> ::=  <array value list constructor>
<array value list constructor> ::=  ARRAY <left bracket or trigraph>        <array element list>        <right bracket
ortrigraph>
 
but SQL200x has this:
<array value constructor> ::=  <array value constructor by enumeration> |  <array value constructor by query>
<array value constructor by enumeration> ::=  ARRAY <left bracket or trigraph>        <array element list>
<rightbracket or trigraph>
 
<array value constructor by query> ::=  ARRAY <left paren>        <query expression> [ <order by clause> ]
<rightparen>
 
>>select ARRAY[1,2] || 3
>>   result '{1,2,3}'
> 
> 
> Datatypes?
maybe?
select integer ARRAY[1,2] || 3   result '{1,2,3}'::integer[]
> How many variants of the || operator do you plan to offer?
One for each builtin datatype[]/datatype pair (e.g. integer[]/integer), 
and another for each datatype[] (e.g. integer[]/integer[])
> What will be the side-effects on the parser's ability to pick one?
Not really sure. I figured I'd cross that bridge when I got to it. Are 
there specific landmines you're thinking of in there?
>>select ARRAY[1,2] || ARRAY[3,4]>>   result '{1,2,3,4}'
select integer ARRAY[1,2] || integer ARRAY[3,4]   result '{1,2,3,4}'::integer[]
Or else the use UNION's algorithm for deriving the element type (you 
suggested this previously, but I may have misunderstood)
Joe
			
		Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> The array type is determined how?  I'd like this syntax better if there
>> were a way to force the choice of array type...
> What about:
> select integer ARRAY[1,2,3];
>     result '{1,2,3}'::integer[]
By analogy to the "type 'literal'" syntax?  I'd prefer not to go that
route, because that syntax for literals is a horrid kluge --- to keep
bison from spitting up, we've had to put a bunch of nasty restrictions
on the type names that can appear in such constructs.  All those
restrictions would have to apply here, too.
It's possible that we could use the other cast syntaxes:ARRAY[1,2,3]::integer[]CAST(ARRAY[1,2,3] AS integer[])
It would take some hackery to propagate the destination type down into
the ARRAY[] before the latter's type resolution is done, but at least
it'd be a quite localized hack.
>> Puh-leez tell me that's not in the spec.  How is one supposed to
>> distinguish this usage from the scalar-subselect case?
> but SQL200x has this:
> <array value constructor> ::=
>    <array value constructor by enumeration> |
>    <array value constructor by query>
> <array value constructor by enumeration> ::=
>    ARRAY <left bracket or trigraph>
>          <array element list>
>          <right bracket or trigraph>
> <array value constructor by query> ::=
>    ARRAY <left paren>
>          <query expression> [ <order by clause> ]
>          <right paren>
This I could live with --- note the difference in punctuation.  There
would be a clear syntactic and semantic difference between
ARRAY(SELECT ...) and ARRAY[(SELECT ...)].
>> How many variants of the || operator do you plan to offer?
> One for each builtin datatype[]/datatype pair (e.g. integer[]/integer), 
> and another for each datatype[] (e.g. integer[]/integer[])
That does not seem workable at all.  It'll be a maintenance nightmare
even worse than what we already have for array equality; and I suspect
you'll find that the parser will have trouble resolving which || to pick.
I have been toying with the idea of replacing all the array-equality
operators with a single pg_proc/pg_operator entry for "ANYARRAY = ANYARRAY".
Now that arrays carry their element type in the header, a single
function could implement all these variants.  Type mismatch (eg,
int[] = float[]) would become a runtime error instead of a parse error,
but that seems acceptable.
I see how the array || array cases could be handled similarly, but I
don't see how to make array || scalar work that way.  If we declared an
operator ANYARRAY || ANY then there'd be no way for it to check at
runtime that the right-hand operand matched the lefthand element type.
(Also, it'd create resolution conflicts with ANYARRAY || ANYARRAY.)
It'd be nice to have a general-purpose solution for this problem.
If we did, the stuff in contrib/array would benefit too, and perhaps
have (at last) a shot at becoming mainstream.
Speculating wildly: what if we invent a new pseudotype ANYARRAYELEMENT,
which would be allowed only in operator/function declarations that also
use ANYARRAY, egANYARRAY || ANYARRAYELEMENT
and would match only the element type of the array in the ANYARRAY
position.  I think it'd be possible to hack the parser to make such a
check in parse_coerce.  There might be a cleaner way to do this, but
I do feel that something along this line is a better approach than
creating N different pg_operator entries for the same operation.
        regards, tom lane
			
		Tom Lane wrote: > By analogy to the "type 'literal'" syntax? I'd prefer not to go that > route, because that syntax for literals is a horrid kluge --- to keep > bison from spitting up, we've had to put a bunch of nasty restrictions > on the type names that can appear in such constructs. All those > restrictions would have to apply here, too. > > It's possible that we could use the other cast syntaxes: > ARRAY[1,2,3]::integer[] > CAST(ARRAY[1,2,3] AS integer[]) > It would take some hackery to propagate the destination type down into > the ARRAY[] before the latter's type resolution is done, but at least > it'd be a quite localized hack. OK -- I'll try to make that work. I presume that in the non-specified case "ARRAY[1,2,3]" I should use something similar to UNION's resolution rules? >><array value constructor by enumeration> ::= >> ARRAY <left bracket or trigraph> >> <array element list> >> <right bracket or trigraph> >><array value constructor by query> ::= >> ARRAY <left paren> >> <query expression> [ <order by clause> ] >> <right paren> > > This I could live with --- note the difference in punctuation. There > would be a clear syntactic and semantic difference between > ARRAY(SELECT ...) and ARRAY[(SELECT ...)]. Sorry -- I guess I mis-read that. So "ARRAY(SELECT ...)" it is. > [...lots of good ideas regarding generalizing array operations...] I played with generalizing array functions a bit for plr and ran into some problems (which I can't specifically recall at the moment), but clearly that's the way to go. I'll start playing with your suggestions in C code, and report back for more feedback as it solidifies. Thanks! Joe
Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> It would take some hackery to propagate the destination type down into
>> the ARRAY[] before the latter's type resolution is done, but at least
>> it'd be a quite localized hack.
> OK -- I'll try to make that work. I presume that in the non-specified 
> case "ARRAY[1,2,3]" I should use something similar to UNION's resolution 
> rules?
Yeah, that'd make sense to me.
>>> [...lots of good ideas regarding generalizing array operations...]
> I played with generalizing array functions a bit for plr and ran into 
> some problems (which I can't specifically recall at the moment), but 
> clearly that's the way to go. I'll start playing with your suggestions 
> in C code, and report back for more feedback as it solidifies.
It'd be useful if you can reconstruct what problems you ran into.
After more thought I'm thinking that we should specify these pseudotypes
like so: a given set of actual operand types matches an
operator/function whose definition uses ANYARRAY or ANYELEMENT only if
the same element type is involved at all positions.  Thus inANYARRAY = ANYARRAY
the left and right operands must be the same array type (so we preserve
the existing parse-time check, but we only need one pg_operator entry
to do it).  ForANYARRAY || ANYELEMENT
we get the desired behavior.  And there are some other interesting
possibilities: for example, a function could be declaredfoo(ANY, ANYELEMENT, ANYELEMENT)
which would mean that it takes any three datatypes, but the second
and third arguments have to be the same datatype.
If we run into limitations of this scheme, it could be generalized
further by adding pseudotypes ANYARRAY2 and ANYELEMENT2 that have
the same restriction among themselves, but are independent of
ANYARRAY/ANYELEMENT.  Then for examplefoo(ANYELEMENT, ANYELEMENT, ANYELEMENT2, ANYELEMENT2)
takes four params that can be anything so long as the first two are
the same datatype and the second two are also the same datatype (but
possibly a different type from the first two).  And you could add
ANYARRAY3, etc, till you got bored.  But I'd not bother with this
until someone showed a concrete need for it.  At the moment, the
useful examples I can think of don't need more than one "free variable"
in their set of argument datatypes.
        regards, tom lane
			
		Tom Lane wrote:
>>I played with generalizing array functions a bit for plr and ran into 
>>some problems (which I can't specifically recall at the moment), but 
>>clearly that's the way to go. I'll start playing with your suggestions 
>>in C code, and report back for more feedback as it solidifies.
> 
> It'd be useful if you can reconstruct what problems you ran into.
> 
I've played around a bit more and refreshed my memory -- here are two 
problems:
CREATE OR REPLACE FUNCTION array_push (anyarray, any)
RETURNS anyarray
AS '$libdir/plr','array_push'
LANGUAGE 'C';
ERROR:  parser: parse error at or near "any" at character 50
It seems that "any" is not accepted as a function parameter type. From 
gram.y it appears that the cause is that "any" is a reserved keyword:
<snip>
/* * Name classification hierarchy. * * IDENT is the lexeme returned by the lexer for identifiers that match * no known
keyword. In most cases, we can accept certain keywords as
 
</snip>
<snip>
/* Type identifier --- names that can be type names. */
type_name:    IDENT    { $$ = $1; }    | unreserved_keyword { $$ = pstrdup($1); }    ;
</snip>
So for grins I did this:
regression=# select oid,typname from pg_type where typname like '%any%'; oid  | typname
------+---------- 2276 | any 2277 | anyarray
(2 rows)
regression=# update pg_type set typname = 'anyscalar' where oid = 2276;
UPDATE 1
CREATE OR REPLACE FUNCTION array_push (anyarray, anyscalar)
RETURNS anyarray
AS '$libdir/plr','array_push'
LANGUAGE 'C';
regression=# select array_push('{1,2}'::integer[],3::integer); array_push
------------ {1,2,3}
(1 row)
So far, so good. But now the second problem:
select f1[2] from   (select array_push('{1,2}'::integer[],3::integer) as f1) as t;
ERROR:  transformArraySubscripts: type anyarray is not an array
I'm just starting to dig into this one.
Joe
			
		Joe Conway <mail@joeconway.com> writes:
> It seems that "any" is not accepted as a function parameter type.
You have to double-quote it.
We could perhaps rename it to avoid the keyword conflict; I'd lean
towards "anytype" if we do ("anyscalar" seems misleading; I'd expect
that to exclude arrays).  I think I chose ANY because there was
precedent in CREATE AGGREGATE for that.
> So far, so good. But now the second problem:
> select f1[2] from
>     (select array_push('{1,2}'::integer[],3::integer) as f1) as t;
> ERROR:  transformArraySubscripts: type anyarray is not an array
Mph.  I'm not sure we can make that work readily ... unless you want
to make the parser assume that a function taking and returning ANYARRAY
actually returns the same array type as its input is.  Otherwise the
parser has no way to determine the datatype yielded by f1[2].
        regards, tom lane
			
		Joe Conway kirjutas E, 10.03.2003 kell 05:35: > CREATE OR REPLACE FUNCTION array_push (anyarray, anyscalar) > RETURNS anyarray > AS '$libdir/plr','array_push' > LANGUAGE 'C'; could you make it RETURNS typeof($1) ? -------------- Hannu
Hannu Krosing <hannu@tm.ee> writes: > Joe Conway kirjutas E, 10.03.2003 kell 05:35: >> CREATE OR REPLACE FUNCTION array_push (anyarray, anyscalar) >> RETURNS anyarray > could you make it > RETURNS typeof($1) Not directly --- we have to fit the return-type info into an OID field. We could fake it by inventing one or more pseudotypes, "SAMEASPARAMn". But I think I like better the notion of extending my bound-together- ANYARRAY-and-ANYELEMENT proposal, http://archives.postgresql.org/pgsql-hackers/2003-03/msg00319.php Suppose that we do that, and then further say that ANYARRAY or ANYELEMENT appearing as the return type implies that the return type is actually the common element or array type. Then we have such useful behaviors as: array_push(anyarray, anyelement) returns anyarrayarray_pop(anyarray) returns anyelementarray_subscript(anyarray, int) yieldsanyelementsingleton_array(anyelement) yields anyarray The last three cases cannot be handled by a SAMEASPARAM construct. regards, tom lane
Tom Lane wrote: > But I think I like better the notion of extending my bound-together- > ANYARRAY-and-ANYELEMENT proposal, > http://archives.postgresql.org/pgsql-hackers/2003-03/msg00319.php > > Suppose that we do that, and then further say that ANYARRAY or > ANYELEMENT appearing as the return type implies that the return type > is actually the common element or array type. Then we have such > useful behaviors as: > > array_push(anyarray, anyelement) returns anyarray > array_pop(anyarray) returns anyelement > array_subscript(anyarray, int) yields anyelement > singleton_array(anyelement) yields anyarray > > The last three cases cannot be handled by a SAMEASPARAM construct. That was my concern also. I like the above. So if I understand correctly, all instances of anyarray and anyelement in a function definition would need to be self-consistent, but the group could represent essentially any datatype with its corresponding array type. If we need more than one of these self consistent groups, we could resort to anyarray1/anyelement1, etc. Does this sound correct? Also, an implementation question: if I have a type oid for an element, what is the preferred method for determining the corresponding array? I'm thinking that the most efficient method might be to use the element-type name with a '_' prepended to get the array-type oid, but that seems ugly. Thoughts? Thanks, Joe
On Mon, Mar 10, 2003 at 09:49:47AM -0500, Tom Lane wrote: > Hannu Krosing <hannu@tm.ee> writes: > > Joe Conway kirjutas E, 10.03.2003 kell 05:35: > >> CREATE OR REPLACE FUNCTION array_push (anyarray, anyscalar) > >> RETURNS anyarray > > > could you make it > > RETURNS typeof($1) > > Not directly --- we have to fit the return-type info into an OID field. > We could fake it by inventing one or more pseudotypes, "SAMEASPARAMn". > > But I think I like better the notion of extending my bound-together- > ANYARRAY-and-ANYELEMENT proposal, > http://archives.postgresql.org/pgsql-hackers/2003-03/msg00319.php > > Suppose that we do that, and then further say that ANYARRAY or > ANYELEMENT appearing as the return type implies that the return type > is actually the common element or array type. Then we have such > useful behaviors as: > > array_push(anyarray, anyelement) returns anyarray > array_pop(anyarray) returns anyelement > array_subscript(anyarray, int) yields anyelement > singleton_array(anyelement) yields anyarray > > The last three cases cannot be handled by a SAMEASPARAM construct. ... typeof($1)[], or a ARRAYELEMSAMEASPARAM construct? I'm really liking this discussion. I know this is sort of "out there", but I have found in languages like StandardML and Objective CAML that templatized- type functions are _extremely_ useful. These languages type systems are amazingly powerful (the language syntax is another matter *sigh*). I'm not necessarily suggesting implementing this, but I just want to feed the debate a bit. I view the type system of these guys as "the ideal", and would be in ecstacy if PostgreSQL had it, but I realize implementing the thing would prolly be far from practical. First, there are templatized types. Arrays in PostgreSQL are sort of a kludge of templatized types, but they would be defined like so: type a' array = <some definition ...> which means that you are describing an array of some type a' (the apostrophe indicates a type variable). You can also create other neat templatized types as an aside: type a' Nullable = Null | Value of a' Which means the expressions: Value 47 --> of type int NullableNull --> of type a' Nullable (determined from context) But then, you could also say: int array array Or even: int Nullable array Which is somthing you can't in PostgreSQL but would be very nice. But then you could say: let invert_matrix m : a' array array -> a' array array = <code...> let multiply x : a', y : a' -> a' = <code ...> You could have more than one type variable in a templatized type or function, true, but I've never really needed more than one. I can imagine cases where it would be useful, but just haven't needed one. Plus:* get rid of horrible 'int4_' type hacks for array. Minus:* can't use oid to represent exact type, rather a string of oids.* need second table to hold function type constraintswhen function is templatized. (or could make it params "oid array array", aka oid[][]!) Reserve eight or tenoids for template parameter slots (in other words, for a' through j' or something). Warning: I have been called the "type nazi" <g> One other thing from StandardML that I have always wanted in PostgreSQL (or anywhere else I program, for that matter)- record types. (Warning, this is also very wishful thinking and "out there"). In ML/CAML, a record type is defined like so: type myrecord = {x : int,y : int,s : string}; "myrecord" is actually just type alias, the canonical record definition is: {s:string, x:int, y:int} ... with the attributes in alphabetical order, because unless you are mucking with pointers in C, it really doesn't matter what order they are in. The first advantage become very apparent: Any two records with the same named attributes of the same types are always of the same type. In PostgreSQL, this would mean that functions that operate on RECORD{x:int,y:int,s:string} could operate on a record from any relation with those attributes. Further, to make inheritance pretty much unnecesary, you could allow a record with more attributes to satisfy a parameter or return value constraint. In other words, you could call function foo(RECORD{x:int,y:int}) on a RECORD{s:string,x:int,y:int}. I've thought about this trick a lot. In theory there is a possibility of not getting what you want, but in practice it would almost never happen. The demostrative case would be calling distance_from_origin(RECORD{x:int,y:int}) on RECORD{x:int,y:int,z:int}, but in this case you need to make a distance_from_origin(RECORD{x:int,y:int,z:int}). This way, you could make a function which operates on RECORD{oid:oid} which could be called on any record from a table. I've wanted to do this sort of thing on several occasions- one application has notes which can be attached to any row, sort of like PostgreSQL comments. Another to keep track of what user changed which fields. Etc. Alright.... heheh I didn't mean to ramble. /ME gets head out of clouds and goes back to practical work <g> > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordo0mo@postgresql.org
Joe Conway <mail@joeconway.com> writes:
> So if I understand correctly, all instances of anyarray and anyelement 
> in a function definition would need to be self-consistent, but the group 
> could represent essentially any datatype with its corresponding array 
> type. If we need more than one of these self consistent groups, we could 
> resort to anyarray1/anyelement1, etc. Does this sound correct?
Right.
> Also, an implementation question: if I have a type oid for an element, 
> what is the preferred method for determining the corresponding array? 
> I'm thinking that the most efficient method might be to use the 
> element-type name with a '_' prepended to get the array-type oid, but 
> that seems ugly. Thoughts?
I was thinking about that earlier.  Right now there is no other way.
We could consider adding a column to pg_type to link to the array type,
but I'm not sure it's worth the trouble.  I think if you look around,
there's probably already a subroutine someplace that does the lookup
using the '_foo' approach.
        regards, tom lane
			
		> So if I understand correctly, all instances of anyarray and anyelement > in a function definition would need to be self-consistent, but the group > could represent essentially any datatype with its corresponding array > type. If we need more than one of these self consistent groups, we could > resort to anyarray1/anyelement1, etc. Does this sound correct? > > Also, an implementation question: if I have a type oid for an element, > what is the preferred method for determining the corresponding array? > I'm thinking that the most efficient method might be to use the > element-type name with a '_' prepended to get the array-type oid, but > that seems ugly. Thoughts? What about a cast? 1::arraytype Chris
Tom Lane wrote: > But I think I like better the notion of extending my bound-together- > ANYARRAY-and-ANYELEMENT proposal, > http://archives.postgresql.org/pgsql-hackers/2003-03/msg00319.php > > Suppose that we do that, and then further say that ANYARRAY or > ANYELEMENT appearing as the return type implies that the return type > is actually the common element or array type. Then we have such > useful behaviors as: > > array_push(anyarray, anyelement) returns anyarray > array_pop(anyarray) returns anyelement > array_subscript(anyarray, int) yields anyelement > singleton_array(anyelement) yields anyarray Before I get too far along, I'd like to get some feedback. The attached patch implements Tom's bound-together-ANYARRAY-and-ANYELEMENT proposal (and includes ANY as well, per earlier discussion). With it, the following works: CREATE OR REPLACE FUNCTION array_push (anyarray, anyelement) RETURNS anyarray AS '$libdir/plr','array_push' LANGUAGE 'C'; regression=# select f1[2] from (select array_push('{1,2}'::integer[],3) as f1) as t; f1 ---- 2 (1 row) Does it make sense to commit this now, or should it wait for the other changes described below? The following are my proposed next phases in array support changes. Please let me know now if you think any of these won't fly (conceptually): 1) Implement SQL99/200x ARRAY[] changes proposed here: http://archives.postgresql.org/pgsql-hackers/2003-03/msg00297.php as modified by http://archives.postgresql.org/pgsql-hackers/2003-03/msg00315.php 2) Implement the following new builtin functions array_push(anyarray, anyelement) returns anyarray array_pop(anyarray) returns anyelement array_subscript(anyarray, int) yields anyelement singleton_array(anyelement) returns anyarray - any reason not to call this one simply "array"? split(text, text) returns text[] - split string into array on delimiter implode(text[], text) returns text - join array elements into a string using given string delimiter 3) Modify contrib/array functions as needed and move to the backend. Or possibly write equivalent functions from scratch -- I just noticed this in contrib/array: * This software is distributed under the GNU General Public License * either version 2, or (at your option) any later version. Is anyone still in contact with Massimo Dal Zotto? Any chance he would change the license to BSD? 4) Update "User's Guide"->"Data Types"->"Arrays" documentation and create a new section: "User's Guide"-> "Functions and Operators"-> "Array Functions and Operators" Thoughts, comments, objections all welcomed. Thanks, Joe Index: src/backend/parser/parse_coerce.c =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/backend/parser/parse_coerce.c,v retrieving revision 2.93 diff -c -r2.93 parse_coerce.c *** src/backend/parser/parse_coerce.c 9 Feb 2003 06:56:28 -0000 2.93 --- src/backend/parser/parse_coerce.c 12 Mar 2003 16:17:39 -0000 *************** *** 188,194 **** ReleaseSysCache(targetType); } ! else if (targetTypeId == ANYOID || targetTypeId == ANYARRAYOID) { /* assume can_coerce_type verified that implicit coercion is okay */ --- 188,194 ---- ReleaseSysCache(targetType); } ! else if (targetTypeId == ANYOID || targetTypeId == ANYELEMENTOID || targetTypeId == ANYARRAYOID) { /* assume can_coerce_type verified that implicit coercion is okay */ *************** *** 325,332 **** continue; } ! /* accept if target is ANY */ ! if (targetTypeId == ANYOID) continue; /* --- 325,332 ---- continue; } ! /* accept if target is ANY or ANYELEMENT */ ! if (targetTypeId == ANYOID || targetTypeId == ANYELEMENTOID) continue; /* Index: src/backend/parser/parse_func.c =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/backend/parser/parse_func.c,v retrieving revision 1.144 diff -c -r1.144 parse_func.c *** src/backend/parser/parse_func.c 9 Feb 2003 06:56:28 -0000 1.144 --- src/backend/parser/parse_func.c 12 Mar 2003 17:46:44 -0000 *************** *** 41,46 **** --- 41,50 ---- List *fargs, Oid *input_typeids, Oid *function_typeids); + static Oid enforce_generic_type_consistency(Oid *oid_array, + Oid *true_oid_array, + int nargs, + Oid rettype); static int match_argtypes(int nargs, Oid *input_typeids, FuncCandidateList function_typeids, *************** *** 309,314 **** --- 313,324 ---- "\n\tYou may need to add explicit typecasts"); } + /* + * enforce consistency with ANY, ANYARRAY, and ANYELEMENT argument + * and return types, possibly modifying return type along the way + */ + rettype = enforce_generic_type_consistency(oid_array, true_oid_array, nargs, rettype); + /* perform the necessary typecasting of arguments */ make_arguments(nargs, fargs, oid_array, true_oid_array); *************** *** 347,352 **** --- 357,491 ---- return retval; } + /* + * If ANY, ANYARRAY, or ANYELEMENT is used for a function's arguments or + * return type, make sure the runtime types are consistent with + * each other. The argument consistency rules are like so: + * + * 1) All arguments declared ANY should have matching datatypes. + * 2) All arguments declared ANYARRAY should have matching datatypes. + * 3) All arguments declared ANYELEMENT should have matching datatypes. + * 4) If there are arguments of both ANYELEMENT and ANYARRAY, make sure + * the runtime scalar argument type is in fact the element type for + * the runtime array argument type. + * + * Rules are applied to the function's return type (possibly altering it) + * if it is declared ANY, ANYARRAY, or ANYELEMENT: + * + * 1) If return type is ANY, and any argument is ANY, use the + * arguments runtime type as the function's return type. + * 2) If return type is ANY but no argument is ANY, leave the return type as + * is. + * XXX should this case be rejected at the point of function creation? + * 3) If return type is ANYARRAY, and any argument is ANYARRAY, use the + * arguments runtime type as the function's return type. + * 4) If return type is ANYARRAY, no argument is ANYARRAY, but any argument + * is ANYELEMENT, use the runtime type of the argument to determine + * the function's return type, i.e. the element type's corresponding + * array type. + * 5) If return type is ANYARRAY, no argument is ANYARRAY or ANYELEMENT, + * leave the return type as is. + * XXX should this case be rejected at the point of function creation? + * 6) If return type is ANYELEMENT, and any argument is ANYELEMENT, use the + * arguments runtime type as the function's return type. + * 7) If return type is ANYELEMENT, no argument is ANYELEMENT, but any argument + * is ANYARRAY, use the runtime type of the argument to determine + * the function's return type, i.e. the array type's corresponding + * element type. + * 8) If return type is ANYELEMENT, no argument is ANYARRAY or ANYELEMENT + * leave the return type as is. + * XXX should this case be rejected at the point of function creation? + */ + static Oid + enforce_generic_type_consistency(Oid *oid_array, Oid *true_oid_array, int nargs, Oid rettype) + { + int j; + Oid any_typeid = InvalidOid; + Oid elem_typeid = InvalidOid; + Oid array_typeid = InvalidOid; + Oid array_typelem = InvalidOid; + + /* + * Loop through the arguments to see if we have any that are + * ANYARRAY or ANYELEMENT. If so, require the runtime types to be + * self-consistent + */ + for (j = 0; j < nargs; j++) + { + if (true_oid_array[j] == ANYOID) + { + if (OidIsValid(any_typeid) && oid_array[j] != any_typeid) + elog(ERROR, "Inconsistent use of arguments declared ANY"); + any_typeid = oid_array[j]; + } + + if (true_oid_array[j] == ANYELEMENTOID) + { + if (OidIsValid(elem_typeid) && oid_array[j] != elem_typeid) + elog(ERROR, "Inconsistent use of arguments declared ANYELEMENT"); + elem_typeid = oid_array[j]; + } + + if (true_oid_array[j] == ANYARRAYOID) + { + if (OidIsValid(array_typeid) && oid_array[j] != array_typeid) + elog(ERROR, "Inconsistent use of arguments declared ANYARRAY"); + array_typeid = oid_array[j]; + } + } + + /* + * Fast Track: if none of the arguments are ANY, ANYARRAY, or ANYELEMENT, + * return the original rettype now + */ + if (!OidIsValid(any_typeid) && + !OidIsValid(array_typeid) && + !OidIsValid(elem_typeid)) + return rettype; + + /* if we return ANYOID check consistency with any like arguments */ + if (rettype == ANYOID) + { + if (OidIsValid(any_typeid)) + return any_typeid; + else + return rettype; + } + + /* get the element type based on the array type, if we have one */ + if (OidIsValid(array_typeid)) + { + array_typelem = get_typelem(array_typeid); + + if (!OidIsValid(elem_typeid)) + { + /* if we don't have an element type yet, use the one we just got */ + elem_typeid = array_typelem; + } + else if (array_typelem != elem_typeid) + { + /* otherwise, they better match */ + elog(ERROR, "Argument declared ANYARRAY not consistent with " \ + "argument declared ANYELEMENT"); + } + } + + /* if we return ANYARRAYOID enforce consistency with any like arguments */ + if (rettype == ANYARRAYOID) + { + if (OidIsValid(array_typeid)) + return array_typeid; + else + return get_arraytype(elem_typeid); + } + + /* if we return ANYELEMENTOID check consistency with any like arguments */ + if (rettype == ANYELEMENTOID) + return elem_typeid; + + /* we don't return a generic type; send back the original return type */ + return rettype; + } /* match_argtypes() * Index: src/backend/utils/cache/lsyscache.c =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/cache/lsyscache.c,v retrieving revision 1.90 diff -c -r1.90 lsyscache.c *** src/backend/utils/cache/lsyscache.c 3 Feb 2003 21:15:44 -0000 1.90 --- src/backend/utils/cache/lsyscache.c 12 Mar 2003 16:17:39 -0000 *************** *** 1040,1045 **** --- 1040,1108 ---- } /* + * get_typelem + * + * Given the type OID, return the typelem field (element type OID + * for array types) + */ + Oid + get_typelem(Oid typid) + { + HeapTuple tp; + + tp = SearchSysCache(TYPEOID, + ObjectIdGetDatum(typid), + 0, 0, 0); + if (HeapTupleIsValid(tp)) + { + Form_pg_type typtup = (Form_pg_type) GETSTRUCT(tp); + Oid result; + + result = typtup->typelem; + ReleaseSysCache(tp); + return result; + } + else + return InvalidOid; + } + + /* + * get_arraytype + * + * Given an element type OID, return the OID the corresponding + * array type + */ + Oid + get_arraytype(Oid elem_typeid) + { + HeapTuple tp; + + tp = SearchSysCache(TYPEOID, + ObjectIdGetDatum(elem_typeid), + 0, 0, 0); + if (HeapTupleIsValid(tp)) + { + Form_pg_type typtup = (Form_pg_type) GETSTRUCT(tp); + char *elem_typename; + Oid elem_namespaceId; + char *array_typename; + + elem_typename = pstrdup(NameStr(typtup->typname)); + elem_namespaceId = typtup->typnamespace; + ReleaseSysCache(tp); + + array_typename = makeArrayTypeName(elem_typename); + + return GetSysCacheOid(TYPENAMENSP, + PointerGetDatum(array_typename), + ObjectIdGetDatum(elem_namespaceId), + 0, 0); + } + else + return InvalidOid; + } + + /* * get_typdefault * Given a type OID, return the type's default value, if any. * Index: src/include/catalog/pg_type.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/include/catalog/pg_type.h,v retrieving revision 1.139 diff -c -r1.139 pg_type.h *** src/include/catalog/pg_type.h 23 Jan 2003 23:39:06 -0000 1.139 --- src/include/catalog/pg_type.h 12 Mar 2003 16:17:39 -0000 *************** *** 523,528 **** --- 523,530 ---- #define ANYOID 2276 DATA(insert OID = 2277 ( anyarray PGNSP PGUID -1 f p t \054 0 0 anyarray_in anyarray_out i x f 0 -1 0 _null__null_ )); #define ANYARRAYOID 2277 + DATA(insert OID = 2283 ( anyelement PGNSP PGUID 4 t p t \054 0 0 any_in any_out i p f 0 -1 0 _null_ _null_ )); + #define ANYELEMENTOID 2283 DATA(insert OID = 2278 ( void PGNSP PGUID 4 t p t \054 0 0 void_in void_out i p f 0 -1 0 _null_ _null_ )); #define VOIDOID 2278 DATA(insert OID = 2279 ( trigger PGNSP PGUID 4 t p t \054 0 0 trigger_in trigger_out i p f 0 -1 0 _null_ _null_)); Index: src/include/utils/lsyscache.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/include/utils/lsyscache.h,v retrieving revision 1.67 diff -c -r1.67 lsyscache.h *** src/include/utils/lsyscache.h 3 Feb 2003 21:15:45 -0000 1.67 --- src/include/utils/lsyscache.h 12 Mar 2003 16:17:39 -0000 *************** *** 55,60 **** --- 55,62 ---- char *typalign); extern char get_typstorage(Oid typid); extern int32 get_typtypmod(Oid typid); + extern Oid get_typelem(Oid typid); + extern Oid get_arraytype(Oid elem_typeid); extern Node *get_typdefault(Oid typid); extern char get_typtype(Oid typid); extern Oid get_typ_typrelid(Oid typid); Index: src/test/regress/expected/type_sanity.out =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/test/regress/expected/type_sanity.out,v retrieving revision 1.17 diff -c -r1.17 type_sanity.out *** src/test/regress/expected/type_sanity.out 15 Jan 2003 19:35:48 -0000 1.17 --- src/test/regress/expected/type_sanity.out 12 Mar 2003 18:59:47 -0000 *************** *** 101,119 **** -----+---------+-----+--------- (0 rows) ! -- As of 7.3, this check finds SET and refcursor, which are borrowing ! -- other types' I/O routines SELECT p1.oid, p1.typname, p2.oid, p2.proname FROM pg_type AS p1, pg_proc AS p2 WHERE p1.typinput = p2.oid AND p1.typtype in ('b', 'p') AND NOT (p1.typelem != 0 AND p1.typlen < 0) AND NOT (p2.prorettype = p1.oid AND NOT p2.proretset) ORDER BY 1; ! oid | typname | oid | proname ! ------+-----------+-----+----------- ! 32 | SET | 109 | unknownin ! 1790 | refcursor | 46 | textin ! (2 rows) -- Varlena array types will point to array_in SELECT p1.oid, p1.typname, p2.oid, p2.proname --- 101,120 ---- -----+---------+-----+--------- (0 rows) ! -- As of 7.4, this check finds SET, refcursor, and anyelement, which are ! -- borrowing other types' I/O routines SELECT p1.oid, p1.typname, p2.oid, p2.proname FROM pg_type AS p1, pg_proc AS p2 WHERE p1.typinput = p2.oid AND p1.typtype in ('b', 'p') AND NOT (p1.typelem != 0 AND p1.typlen < 0) AND NOT (p2.prorettype = p1.oid AND NOT p2.proretset) ORDER BY 1; ! oid | typname | oid | proname ! ------+------------+------+----------- ! 32 | SET | 109 | unknownin ! 1790 | refcursor | 46 | textin ! 2283 | anyelement | 2294 | any_in ! (3 rows) -- Varlena array types will point to array_in SELECT p1.oid, p1.typname, p2.oid, p2.proname *************** *** 126,133 **** (0 rows) -- Check for bogus typoutput routines ! -- As of 7.3, this check finds SET and refcursor, which are borrowing ! -- other types' I/O routines SELECT p1.oid, p1.typname, p2.oid, p2.proname FROM pg_type AS p1, pg_proc AS p2 WHERE p1.typoutput = p2.oid AND p1.typtype in ('b', 'p') AND NOT --- 127,134 ---- (0 rows) -- Check for bogus typoutput routines ! -- As of 7.4, this check finds SET, refcursor, and anyelement, which are ! -- borrowing other types' I/O routines SELECT p1.oid, p1.typname, p2.oid, p2.proname FROM pg_type AS p1, pg_proc AS p2 WHERE p1.typoutput = p2.oid AND p1.typtype in ('b', 'p') AND NOT *************** *** 135,145 **** (p2.oid = 'array_out'::regproc AND p1.typelem != 0 AND p1.typlen = -1)) ORDER BY 1; ! oid | typname | oid | proname ! ------+-----------+-----+------------ ! 32 | SET | 110 | unknownout ! 1790 | refcursor | 47 | textout ! (2 rows) SELECT p1.oid, p1.typname, p2.oid, p2.proname FROM pg_type AS p1, pg_proc AS p2 --- 136,147 ---- (p2.oid = 'array_out'::regproc AND p1.typelem != 0 AND p1.typlen = -1)) ORDER BY 1; ! oid | typname | oid | proname ! ------+------------+------+------------ ! 32 | SET | 110 | unknownout ! 1790 | refcursor | 47 | textout ! 2283 | anyelement | 2295 | any_out ! (3 rows) SELECT p1.oid, p1.typname, p2.oid, p2.proname FROM pg_type AS p1, pg_proc AS p2 Index: src/test/regress/sql/type_sanity.sql =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/test/regress/sql/type_sanity.sql,v retrieving revision 1.17 diff -c -r1.17 type_sanity.sql *** src/test/regress/sql/type_sanity.sql 15 Jan 2003 19:35:48 -0000 1.17 --- src/test/regress/sql/type_sanity.sql 12 Mar 2003 18:59:42 -0000 *************** *** 84,91 **** p2.proargtypes[1] = 'oid'::regtype AND p2.proargtypes[2] = 'int4'::regtype)); ! -- As of 7.3, this check finds SET and refcursor, which are borrowing ! -- other types' I/O routines SELECT p1.oid, p1.typname, p2.oid, p2.proname FROM pg_type AS p1, pg_proc AS p2 WHERE p1.typinput = p2.oid AND p1.typtype in ('b', 'p') AND NOT --- 84,91 ---- p2.proargtypes[1] = 'oid'::regtype AND p2.proargtypes[2] = 'int4'::regtype)); ! -- As of 7.4, this check finds SET, refcursor, and anyelement, which are ! -- borrowing other types' I/O routines SELECT p1.oid, p1.typname, p2.oid, p2.proname FROM pg_type AS p1, pg_proc AS p2 WHERE p1.typinput = p2.oid AND p1.typtype in ('b', 'p') AND NOT *************** *** 102,109 **** -- Check for bogus typoutput routines ! -- As of 7.3, this check finds SET and refcursor, which are borrowing ! -- other types' I/O routines SELECT p1.oid, p1.typname, p2.oid, p2.proname FROM pg_type AS p1, pg_proc AS p2 WHERE p1.typoutput = p2.oid AND p1.typtype in ('b', 'p') AND NOT --- 102,109 ---- -- Check for bogus typoutput routines ! -- As of 7.4, this check finds SET, refcursor, and anyelement, which are ! -- borrowing other types' I/O routines SELECT p1.oid, p1.typname, p2.oid, p2.proname FROM pg_type AS p1, pg_proc AS p2 WHERE p1.typoutput = p2.oid AND p1.typtype in ('b', 'p') AND NOT
Joe Conway <mail@joeconway.com> writes:
> +  * If ANY, ANYARRAY, or ANYELEMENT is used for a function's arguments or
> +  * return type, make sure the runtime types are consistent with
> +  * each other. The argument consistency rules are like so:
> +  *
> +  * 1) All arguments declared ANY should have matching datatypes.
> +  * 2) All arguments declared ANYARRAY should have matching datatypes.
> +  * 3) All arguments declared ANYELEMENT should have matching datatypes.
> +  * 4) If there are arguments of both ANYELEMENT and ANYARRAY, make sure
> +  *    the runtime scalar argument type is in fact the element type for
> +  *    the runtime array argument type.
Hmm.  I don't see why we should drag ANY into this --- it should just be
a no-constraints placeholder, same as before.  What's the gain from
constraining it that you don't get from ANYELEMENT?
> +  * 5) If return type is ANYARRAY, no argument is ANYARRAY or ANYELEMENT,
> +  *    leave the return type as is.
> +  *    XXX should this case be rejected at the point of function creation?
Probably.  This case could be handled just as well by declaring the
output to be ANY, I'd think.
> +  * 8) If return type is ANYELEMENT, no argument is ANYARRAY or ANYELEMENT
> +  *    leave the return type as is.
> +  *    XXX should this case be rejected at the point of function creation?
Likewise.  The point of (this reinterpretation of) ANYARRAY and
ANYELEMENT is to let the parser deduce the actual output type.
If it's not going to be able to deduce anything, use ANY instead.
        regards, tom lane
			
		Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>>+  * If ANY, ANYARRAY, or ANYELEMENT is used for a function's arguments or
>>+  * return type, make sure the runtime types are consistent with
>>+  * each other. The argument consistency rules are like so:
> 
> Hmm.  I don't see why we should drag ANY into this --- it should just be
> a no-constraints placeholder, same as before.  What's the gain from
> constraining it that you don't get from ANYELEMENT?
I was thinking of the case  create function foo("any") returns "any"
but I guess you're right, it can just as easily be  create function foo(anyelement) returns anyelement
I'll pull the ANY stuff out.
>>+  * 5) If return type is ANYARRAY, no argument is ANYARRAY or ANYELEMENT,
>>+  *    leave the return type as is.
>>+  *    XXX should this case be rejected at the point of function creation?
> 
> Probably.  This case could be handled just as well by declaring the
> output to be ANY, I'd think.
> 
>>+  * 8) If return type is ANYELEMENT, no argument is ANYARRAY or ANYELEMENT
>>+  *    leave the return type as is.
>>+  *    XXX should this case be rejected at the point of function creation?
> 
> Likewise.  The point of (this reinterpretation of) ANYARRAY and
> ANYELEMENT is to let the parser deduce the actual output type.
> If it's not going to be able to deduce anything, use ANY instead.
OK -- I'll take care of that too.
Thanks,
Joe
			
		Joe Conway writes: > 2) Implement the following new builtin functions > array_push(anyarray, anyelement) returns anyarray > array_pop(anyarray) returns anyelement > array_subscript(anyarray, int) yields anyelement > singleton_array(anyelement) returns anyarray > - any reason not to call this one simply "array"? > split(text, text) returns text[] > - split string into array on delimiter > implode(text[], text) returns text > - join array elements into a string using given string delimiter I think this goes too far. It is just an invitation to people to create bad database designs by using arrays as lists. Create an array support package on gborg if you like, but I feel this should not be in the mainline. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut wrote:> Joe Conway writes:>>>2) Implement the following new builtin functions>> array_push(anyarray,anyelement) returns anyarray>> array_pop(anyarray) returns anyelement>> array_subscript(anyarray,int) yields anyelement>> singleton_array(anyelement) returns anyarray>> - any reasonnot to call this one simply "array"?>> split(text, text) returns text[]>> - split string into array ondelimiter>> implode(text[], text) returns text>> - join array elements into a string using given string delimiter > I think this goes too far. It is just an invitation to people to create > bad database designs by using arrays as lists. Create an array support > package on gborg if you like, but I feel this should not be in the > mainline. > Sorry, I don't agree with that assessment. There are lots of ways people can create bad database designs using the CREATE TABLE statement too ;-) Arrays *do* have a place, and they are supported in SQL99+. We get complaints almost daily regarding the poor array support -- this is just a step toward improving that. If you want to debate the merit of individual functions on that list, let's do that, but to dismiss them all with a wave-of-the-hand is too far. Joe
Joe Conway <mail@joeconway.com> writes:
> Peter Eisentraut wrote:
>> Create an array support package on gborg if you like, but I feel this
>> should not be in the mainline.
> Arrays *do* have a place, and they are supported in SQL99+.
FWIW, I'm with Joe on this one.  Arrays have their uses; and it's not
like there are no misusable constructs in SQL ...
        regards, tom lane
			
		OK, let's look at these more closely: > >> array_push(anyarray, anyelement) returns anyarray The standard spelling for that appears to be somearray || ARRAY[element] which also has the nice property that it is commutative. > >> array_pop(anyarray) returns anyelement That appears to mean that you return somearray[0] and alter the array as a side effect. How do you plan to do that? > >> array_subscript(anyarray, int) yields anyelement That's just somearray[x], no? > >> singleton_array(anyelement) returns anyarray That's ARRAY[element]. > >> split(text, text) returns text[] > >> - split string into array on delimiter > >> implode(text[], text) returns text > >> - join array elements into a string using given string delimiter I can live with these, but perhaps we should choose some not-so-generic names. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes:
> OK, let's look at these more closely:
>> array_push(anyarray, anyelement) returns anyarray
> The standard spelling for that appears to be
>     somearray || ARRAY[element]
> which also has the nice property that it is commutative.
Sure ... but that just means that || is the operator name for the
underlying array_push function.  We still need a way to declare this
operation as a function.
>> array_pop(anyarray) returns anyelement
> That appears to mean that you return somearray[0] and alter the array as a
> side effect.  How do you plan to do that?
Yeah, I wasn't thinking very clearly there...
>> array_subscript(anyarray, int) yields anyelement
> That's just somearray[x], no?
Yes.  But the fact that we can now represent the semantics of [] as a
function seems to me to indicate that we're on the right track in terms
of generalizing the capabilities of functions.
        regards, tom lane
			
		Peter Eisentraut wrote: >>>> array_pop(anyarray) returns anyelement > > That appears to mean that you return somearray[0] and alter the array > as a side effect. How do you plan to do that? I'll give you this one -- doesn't make sense. >>>> split(text, text) returns text[] - split string into array on >>>> delimiter There was a thread on this last September, and the consensus was that the function *should* be name split, in order to be consistent with the similar function existing in Perl and PHP (at least). >>>> implode(text[], text) returns text - join array elements into a >>>> string using given string delimiter I'm open to opinions on implode() -- I only picked implode() because that's what it is called in PHP. Any suggestions? Joe
On Thu, Mar 13, 2003 at 09:53:15AM -0800, Joe Conway wrote: > >>>>implode(text[], text) returns text - join array elements into a > >>>>string using given string delimiter > > I'm open to opinions on implode() -- I only picked implode() because > that's what it is called in PHP. Any suggestions? In both Perl and Python, that type of function is called "join". -johnnnnn
On Thu, Mar 13, 2003 at 12:00:46PM -0600, johnnnnnn wrote: > On Thu, Mar 13, 2003 at 09:53:15AM -0800, Joe Conway wrote: > > >>>>implode(text[], text) returns text - join array elements into a > > >>>>string using given string delimiter > > > > I'm open to opinions on implode() -- I only picked implode() because > > that's what it is called in PHP. Any suggestions? > > In both Perl and Python, that type of function is called "join". Yeah, and join is also consistent with the inverse function being called "split". IIRC the equivalent function in PHP is explode(). -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Si no sabes adonde vas, es muy probable que acabes en otra parte.
johnnnnnn wrote: > On Thu, Mar 13, 2003 at 09:53:15AM -0800, Joe Conway wrote: >>>>>>implode(text[], text) returns text - join array elements into a >>>>>>string using given string delimiter > > In both Perl and Python, that type of function is called "join". Hmmm -- I doubt that would fly, although I see it is specifically allowed as a function name (func_name_keyword list). Anyone have opinions on this either way? Joe
Alvaro Herrera wrote: > Yeah, and join is also consistent with the inverse function being called > "split". IIRC the equivalent function in PHP is explode(). > Actually it looks like PHP supports both explode() and split(), and their inverse functions implode() and join(). split() appears to split the string by regular expression, whereas explode() splits by a simple string separator. I was not intending to support regex in split(), so maybe the best choice of names is explode() and implode()? Joe
On Thu, 2003-03-13 at 12:53, Joe Conway wrote: > Peter Eisentraut wrote: > >>>> array_pop(anyarray) returns anyelement > > > > That appears to mean that you return somearray[0] and alter the array > > as a side effect. How do you plan to do that? > > I'll give you this one -- doesn't make sense. > > >>>> split(text, text) returns text[] - split string into array on > >>>> delimiter > > There was a thread on this last September, and the consensus was that > the function *should* be name split, in order to be consistent with the > similar function existing in Perl and PHP (at least). > > >>>> implode(text[], text) returns text - join array elements into a > >>>> string using given string delimiter > > I'm open to opinions on implode() -- I only picked implode() because > that's what it is called in PHP. Any suggestions? I think implode() and explode() go together. split() and join() are a pair. Pick one ;) -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Tom Lane kirjutas N, 13.03.2003 kell 19:12: > Peter Eisentraut <peter_e@gmx.net> writes: > > OK, let's look at these more closely: > > >> array_push(anyarray, anyelement) returns anyarray > > > The standard spelling for that appears to be > > somearray || ARRAY[element] > > which also has the nice property that it is commutative. > > Sure ... but that just means that || is the operator name for the > underlying array_push function. We still need a way to declare this > operation as a function. I think he mant that you just need to conacat for too arrays, no need for single-element push/append. OTOH a separate push may be more efficient contrib/intarray has the following functions (note that they use + for || above) OPERATIONS: int[] && int[] - overlap - returns TRUE if arrays has at least one common elements. int[] @ int[] -contains - returns TRUE if left array contains right array int[] ~ int[] - contained - returns TRUEif left array is contained in right array # int[] - return the number of elements in arrayint[] + int - push element to array ( add to end of array) int[] + int[] - merge of arrays (right array addedto the end of left one) int[] - int - remove entries matched by right argument from array int[]- int[] - remove right array from left int[] | int - returns intarray - union of arguments int[] | int[] - returnsintarray as a union of two arrays int[] & int[] - returns intersection of arrays int[] @@ query_int - returns TRUEif array satisfies query (like '1&(2|3)') query_int ~~ int[] - -/- ----------------- Hannu
Hannu Krosing <hannu@tm.ee> writes:
> Tom Lane kirjutas N, 13.03.2003 kell 19:12:
>>> The standard spelling for that appears to be
>>> somearray || ARRAY[element]
>>> which also has the nice property that it is commutative.
>> 
>> Sure ... but that just means that || is the operator name for the
>> underlying array_push function.  We still need a way to declare this
>> operation as a function.
> I think he mant that you just need to conacat for too arrays, no need
> for single-element push/append.
Oh, I see.  But my point remains: unless you want to take || out of the
domain of operators and make it something hard-wired into the parser,
there has to be an underlying function with a matching signature.  So
all these problems come up anyway.
> contrib/intarray has the following functions (note that they use + for
> || above)
The reason that stuff is still contrib, and not mainstream, is we didn't
have a way to make the functions polymorphic.  One-datatype-at-a-time
interface functions are not appealing, especially not when they have to
be hand-coded in C.  But with the features discussed in this thread, we
could make the intarray functionality datatype-independent --- whereupon
I for one would vote to move it into the mainstream.
        regards, tom lane
			
		Joe Conway <mail@joeconway.com> writes:
>> In both Perl and Python, that type of function is called "join".
> Hmmm -- I doubt that would fly, although I see it is specifically 
> allowed as a function name (func_name_keyword list). Anyone have 
> opinions on this either way?
Good point --- it would work today, but any small tweak in the JOIN
grammar might force us to reserve the keyword altogether.  It'd be
safer to use a name that is not an SQL keyword ...
        regards, tom lane
			
		Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >>Hmmm -- I doubt that would fly, although I see it is specifically >>allowed as a function name (func_name_keyword list). Anyone have >>opinions on this either way? > > Good point --- it would work today, but any small tweak in the JOIN > grammar might force us to reserve the keyword altogether. It'd be > safer to use a name that is not an SQL keyword ... I'm leaning toward implode() and explode() now anyway because split() uses a regex for the delimiter in PHP (and probably Perl), and I was not planning to get that fancy. Joe
Joe Conway <mail@joeconway.com> writes: > I'm leaning toward implode() and explode() now anyway because split() uses a > regex for the delimiter in PHP (and probably Perl), and I was not planning to > get that fancy. PHP isn't exactly an exemplar for great language design. explode/implode are terribly non-self-descriptive names. Someone seeing them for the first time wouldn't really have any clue what they did and would have zero chance of guessing their names to find them in an index. I would suggest join_str() and split_str() if "join" is too sensitive a word for an sql language. -- greg
> >>>> implode(text[], text) returns text - join array elements into a > >>>> string using given string delimiter > > I'm open to opinions on implode() -- I only picked implode() because > that's what it is called in PHP. Any suggestions? It's also called 'join' in PHP... Chris
On 13 Mar 2003, Greg Stark wrote: > > Joe Conway <mail@joeconway.com> writes: > > > I'm leaning toward implode() and explode() now anyway because split() uses a > > regex for the delimiter in PHP (and probably Perl), and I was not planning to > > get that fancy. > > PHP isn't exactly an exemplar for great language design. > > explode/implode are terribly non-self-descriptive names. Someone seeing them > for the first time wouldn't really have any clue what they did and would have > zero chance of guessing their names to find them in an index. > > I would suggest join_str() and split_str() if "join" is too sensitive a word > for an sql language. I disagree on both your points. I'll not address the first one, but the use of explode / implode to me was obvious the first time I saw it years ago. It's used by many other languages than PHP, and is as obvious in meaning to me as join or split. In fact, to me, join and split are string functions, not array functions. explode and implode, otoh, are more obviously array functions. Here's the short list from 4 pages of google search for explode implode language -php: Lotusscript ML GIML Lisp vbscript Scheme ADA ICI SML REX You're quick to throw out a trollish barb against PHP without any real discussion as to why it's such a horrible language. I have to say the quality of code I've seen come out of the PHP development community is signifigantly better than what I've seen coming out of the Perl community lately, and often has much better use of OO methodologies and is generally more easily understood than the equivalent Perl code. If you don't like PHP, dont use it. If you don't like a proposal has, argue against that proposal. Please stop trolling against PHP. it's unsightly and unprofessional and doesn't belong in this list.
"scott.marlowe" <scott.marlowe@ihs.com> writes: > You're quick to throw out a trollish barb against PHP without any > real discussion as to why it's such a horrible language. No need to be so sensitive. It was just a humorous way of making the point that just because PHP does something doesn't mean it's necessarily the best idea. Suffice it to say I disagree about the quality of PHP language design but it wasn't the point of the comment nor is it on topic. Do you really think someone looking for a function to break up a string into a list of strings would ever think of looking up "explode" in an index if he hadn't already used PHP or (shudder) VBScript? Oops, there I go again. -- greg
On 14 Mar 2003, Greg Stark wrote: > > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > > You're quick to throw out a trollish barb against PHP without any > > real discussion as to why it's such a horrible language. > > No need to be so sensitive. It was just a humorous way of making the point > that just because PHP does something doesn't mean it's necessarily the best > idea. Suffice it to say I disagree about the quality of PHP language design > but it wasn't the point of the comment nor is it on topic. > > Do you really think someone looking for a function to break up a string into a > list of strings would ever think of looking up "explode" in an index if he > hadn't already used PHP or (shudder) VBScript? > > Oops, there I go again. Let me answer your question with a question: Is anyone who is familiar with SQL syntax gonna look for "join" when thinking about arrays?
> Do you really think someone looking for a function to break up a > string into a list of strings would ever think of looking up "explode" > in an index if he hadn't already used PHP or (shudder) VBScript? It's also one of the classic examples of things used in introductory courses on Lisp that are absolutely counterproductive because no one would ever use (EXPLODE FOO) in /real/ code... -- If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me http://cbbrowne.com/info/advocacy.html Go, lemmings, go!
-*- Greg Stark <gsstark@mit.edu> [ 2003-03-14 17:43 ]: > Do you really think someone looking for a function to break up a string into a > list of strings would ever think of looking up "explode" in an index if he > hadn't already used PHP or (shudder) VBScript? If one had gotten used to Lotus Notes, sure. ;> -- Regards, Tolli tolli@tol.li
On Fri, 14 Mar 2003, Christopher Browne wrote: > > Do you really think someone looking for a function to break up a > > string into a list of strings would ever think of looking up "explode" > > in an index if he hadn't already used PHP or (shudder) VBScript? > > It's also one of the classic examples of things used in introductory > courses on Lisp that are absolutely counterproductive because no one > would ever use (EXPLODE FOO) in /real/ code... OK, that's enough. No one's been insulting your coding skills, your choice of language or how you use it. Please return the favor. That's all I'm asking. It may seem like a throw away line for you, but you are both directly insulting many people who have done you no harm. I vote for explode / implode as easier to type and remember than join_str. Also, in the SQL world, it's very likely that a back ground in ADA/LISP/REX/PHP is just as common as Perl or Java. If you've got a point to make, please try to do so in a manner that isn't a backhanded slap to others at the same time.
On Fri, 14 Mar 2003, Þórhallur Hálfdánarson wrote: > -*- Greg Stark <gsstark@mit.edu> [ 2003-03-14 17:43 ]: > > Do you really think someone looking for a function to break up a string into a > > list of strings would ever think of looking up "explode" in an index if he > > hadn't already used PHP or (shudder) VBScript? > > If one had gotten used to Lotus Notes, sure. ;> To try and get back on track... Let me ask you, if you were looking through a list of array functions and you saw explode and implode, and you had no other experience with a language that used those keywords, would you, upon seeing them, have some idea what they did? I certainly did the first time I saw them way back when on Rex. Since there was no one around to teach me the religious belief that explode and implode are anathema to "real" programmers, I started using them and was quite happy with their performance. (Sorry, that's a reference to a previouis post about "real" code. I'll stop the verbal sparring now :-) I would like to have a single word. Join and Split are a poor choice becuase join already has a very well defined meaning in SQL. any variation that keeps the word join in it is a poor choice because it can be confusing. What other languages have used before is of little consequence really, as long as Postgresql's choices are internally consistent, are a desriptive word and don't conflict with current key words. The fact that some languages that use explode / implode are not favored by certain folks means less than nothing to me. The names for functions should be something that is easily found in the docs both when you don't know what it is called and when you do. Explode may be a little harder to find when you don't know the name yet, but since you should be using the table of contents, not the index to look up a function whose name you don't know, then all that means is having a good table of contents and well layed out docs. If you do know the function name, you shouldn't have 200 choices in an index or a keyword search. Join, both as a keyword, and as a word, is very common in the docs. Searching for it, I currently find 412 entries for the word join in the 7.3.2 docs. Explode, on the other hand, returns 0 count. So, if someone goes to google and searches on site:techdocs.postgresql.org join, he will get hundreds of pages back. If he searches on explode, he would, in the future, only find those pages referencing the explode function. I'll repeat, the part about looking in an index for a keyword you don't know yet is a straw man, and a poor one at that. If you don't know the keyword for a cursor is "declare" why would you look in the index? You wouldn't, you'd look up cursors in the table of contents and find it that way. Or you'd search for it. If cursors were used for some other function in postgresql (widget counting et. al.) it would make it harder to find the entries you wanted for real cursors.
On Friday 14 March 2003 13:24, scott.marlowe wrote:
> I vote for explode / implode as easier to type and remember than join_str.
> Also, in the SQL world, it's very likely that a back ground in
> ADA/LISP/REX/PHP is just as common as Perl or Java.
If we're voting I vote for join/split.  Implode and explode just sound 
unprofessional to me.  Minor point but it's just preference, right?
-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.
			
		Here's a list of synonyms for join (courtesy of m-w.com): associate, bracket, coadunate, coagment, coalesce, combine, compound, concrete, conjoin, conjugate, connect, couple, link, marry, one, relate, unite, wed, yoke and for split: carve, cleave, dissect, dissever, sever, slice, sunder, cleave, rend, rip, rive Personally I think having "marry" and "divorce" would be nice ;-) andrew (and when this debate is finished we can get back to emacs vs. vi)
On Fri, 14 Mar 2003, D'Arcy J.M. Cain wrote: > On Friday 14 March 2003 13:24, scott.marlowe wrote: > > I vote for explode / implode as easier to type and remember than join_str. > > Also, in the SQL world, it's very likely that a back ground in > > ADA/LISP/REX/PHP is just as common as Perl or Java. > > If we're voting I vote for join/split. Implode and explode just sound > unprofessional to me. Minor point but it's just preference, right? Sorry, but that's really not much of a reason. I listed several reasons why ANY OTHER CHOICE is better, not just explode / implode. I think those reasons make sense. You didn't address any of them in your choice, only the fact that you don't like explode/implode, so I'll assume you didn't have any real issues with the reasons I listed for not using join/split, just with the words explode/implode. If that's the case, the perhaps a compromise is in order, i.e. anything BUT join/split or explode/implode. But then the problem is that there are probably no other common words I'm familiar with that any other languages use to go from string to array and back that we are likely to glom onto.
"scott.marlowe" <scott.marlowe@ihs.com> writes: > On Fri, 14 Mar 2003, Þórhallur Hálfdánarson wrote: > > > -*- Greg Stark <gsstark@mit.edu> [ 2003-03-14 17:43 ]: > > > Do you really think someone looking for a function to break up a string into a > > > list of strings would ever think of looking up "explode" in an index if he > > > hadn't already used PHP or (shudder) VBScript? > > > > If one had gotten used to Lotus Notes, sure. ;> > > To try and get back on track... > > Let me ask you, if you were looking through a list of array functions > and you saw explode and implode, and you had no other experience with a > language that used those keywords, would you, upon seeing them, have some > idea what they did? It's all good Scott. Anyone wanting to use PostgreSQL arrays would undoubtedly open up the corresponding part of the manual that covers array functions. Since there is likely to be less than a page full of function definitions you could probably call the functions foo() and bar() and get away with it (please don't). While I personally think that join_str and split_str are somewhat more descriptive, implode and explode are fine. More importantly, since *you* are the one doing the actual legwork it's your call. IMHO that's one of the benefits of actually submitting code. You write the code, you get to pick the function names. Now, you might have some issues from the rest of the PostgreSQL hackers if you named the functions "marlowe-ify" and "un-marlowe-ify", but anything not completely ridiculous should be fine (and even marlowe-ify would have the advantage of not being a reserved word in any software I can think of off hand). As for the rest of the discussion, poking fun at development languages and tools is an age-old part of computers. PHP has the disadvantage of being both very popular, very new, and primarily a web technology (and of not being Lisp like :) so it draws more than its share of flames. It's all good fun. Jason
On 14 Mar 2003, Jason Earl wrote: > It's all good Scott. Anyone wanting to use PostgreSQL arrays would > undoubtedly open up the corresponding part of the manual that covers > array functions. Since there is likely to be less than a page full of > function definitions you could probably call the functions foo() and > bar() and get away with it (please don't). While I personally think > that join_str and split_str are somewhat more descriptive, implode and > explode are fine. > > More importantly, since *you* are the one doing the actual legwork > it's your call. IMHO that's one of the benefits of actually > submitting code. You write the code, you get to pick the function > names. Now, you might have some issues from the rest of the > PostgreSQL hackers if you named the functions "marlowe-ify" and > "un-marlowe-ify", but anything not completely ridiculous should be > fine (and even marlowe-ify would have the advantage of not being a > reserved word in any software I can think of off hand). > > As for the rest of the discussion, poking fun at development languages > and tools is an age-old part of computers. PHP has the disadvantage > of being both very popular, very new, and primarily a web technology > (and of not being Lisp like :) so it draws more than its share of > flames. It's all good fun. Actually, I think it was someone else (Joe???) that is doing the leg work, and he was the one choosing explode / implode and getting gruff for it, so I was just stepping in and defending his decision. I do think using a function name with the word join in it meaning anything other than a SQL join is a recipe for confusion though.
"scott.marlowe" <scott.marlowe@ihs.com> writes: > On 14 Mar 2003, Jason Earl wrote: > > > It's all good Scott. Anyone wanting to use PostgreSQL arrays > > would undoubtedly open up the corresponding part of the manual > > that covers array functions. Since there is likely to be less > > than a page full of function definitions you could probably call > > the functions foo() and bar() and get away with it (please don't). > > While I personally think that join_str and split_str are somewhat > > more descriptive, implode and explode are fine. > > > > More importantly, since *you* are the one doing the actual legwork > > it's your call. IMHO that's one of the benefits of actually > > submitting code. You write the code, you get to pick the function > > names. Now, you might have some issues from the rest of the > > PostgreSQL hackers if you named the functions "marlowe-ify" and > > "un-marlowe-ify", but anything not completely ridiculous should be > > fine (and even marlowe-ify would have the advantage of not being a > > reserved word in any software I can think of off hand). > > > > As for the rest of the discussion, poking fun at development > > languages and tools is an age-old part of computers. PHP has the > > disadvantage of being both very popular, very new, and primarily a > > web technology (and of not being Lisp like :) so it draws more > > than its share of flames. It's all good fun. > > Actually, I think it was someone else (Joe???) that is doing the leg > work, and he was the one choosing explode / implode and getting > gruff for it, so I was just stepping in and defending his decision. Oops, my bad. My brain must already think that it is the weekend. My reasoning still stands, though. Whoever writes the code gets to pick the names (assuming, of course, that they can get them past the rest of the PostgreSQL hackers). There's parts of PostgreSQL so cool that I would continue to use them even if the function were called jason_earl_is_a_stupid_head(). Heck, the reason that I don't like terms like explode and implode probably stems from the fact that I tend to have error functions with those sorts of dramatic names :). You know "lp0 is on fire" type of stuff. > I do think using a function name with the word join in it meaning > anything other than a SQL join is a recipe for confusion though. Perhaps. We'll have to see what gets submitted. Jason
Jason Earl wrote: >>Actually, I think it was someone else (Joe???) that is doing the leg >>work, and he was the one choosing explode / implode and getting >>gruff for it, so I was just stepping in and defending his decision. > > Oops, my bad. My brain must already think that it is the weekend. My > reasoning still stands, though. Whoever writes the code gets to pick > the names (assuming, of course, that they can get them past the rest > of the PostgreSQL hackers). <dons flame proof suit> Yup, that was me. I was watching from the sidelines ;-) I'll get on with coding and try to consider all of the input when it comes to picking the names. In the end, it will depend on whatever the guys with commit access will live with, so I'm not going to worry about it too much. </dons flame proof suit> Joe
Joe Conway wrote: > Jason Earl wrote: > >>Actually, I think it was someone else (Joe???) that is doing the leg > >>work, and he was the one choosing explode / implode and getting > >>gruff for it, so I was just stepping in and defending his decision. > > > >Oops, my bad. My brain must already think that it is the weekend. My > >reasoning still stands, though. Whoever writes the code gets to pick > >the names (assuming, of course, that they can get them past the rest > >of the PostgreSQL hackers). > > <dons flame proof suit> > Yup, that was me. I was watching from the sidelines ;-) > > I'll get on with coding and try to consider all of the input when it > comes to picking the names. In the end, it will depend on whatever the > guys with commit access will live with, so I'm not going to worry about > it too much. > </dons flame proof suit> My 2 cents: Use "split" and "merge". Avoids the "join" issue and avoids the "implode/explode" issue too. :-) -- Kevin Brown kevin@sysexperts.com
On Sun, 23 Mar 2003, Kevin Brown wrote: > Joe Conway wrote: > > Jason Earl wrote: > > >>Actually, I think it was someone else (Joe???) that is doing the leg > > >>work, and he was the one choosing explode / implode and getting > > >>gruff for it, so I was just stepping in and defending his decision. > > > > > >Oops, my bad. My brain must already think that it is the weekend. My > > >reasoning still stands, though. Whoever writes the code gets to pick > > >the names (assuming, of course, that they can get them past the rest > > >of the PostgreSQL hackers). > > > > <dons flame proof suit> > > Yup, that was me. I was watching from the sidelines ;-) > > > > I'll get on with coding and try to consider all of the input when it > > comes to picking the names. In the end, it will depend on whatever the > > guys with commit access will live with, so I'm not going to worry about > > it too much. > > </dons flame proof suit> > > My 2 cents: > > Use "split" and "merge". Avoids the "join" issue and avoids the > "implode/explode" issue too. :-) Isn't merge a new SQL keyword in SQL99 or SQL03?
scott.marlowe wrote: > On Sun, 23 Mar 2003, Kevin Brown wrote: >>Use "split" and "merge". Avoids the "join" issue and avoids the >>"implode/explode" issue too. :-) > > Isn't merge a new SQL keyword in SQL99 or SQL03? Yup, in SQL200x at least: 14.9 <merge statement> Function Conditionally update rows of a table, or insert new rows into a table, or both. Format <merge statement> ::= MERGE INTO <target table> [ [ AS ] <merge correlation name> ] USING <table reference> ON <search condition> <merge operation specification> At the risk of stirring up a debate again, my latest thoughts were to define the two functions as: str_to_array(str TEXT, delim TEXT) returns TEXT[] array_to_str(array ANYARRAY, delim TEXT) returns TEXT Joe