Обсуждение: array functions - request for opinions (was Re: [PATCHES] array support patch phase 1 patch)
The discussion below has been taking place on the PATCHES list. I'm
forwarding it to HACKERS to get a wider audience. We're looking for
opinions on (at a minimum):
1) Which of these functions should exist *and* be included in user documentation
2) Which of these functions are not worth having in the backend at all
3) Specifically WRT array_accum(), is there merit in having a userland function that aggregates row values into
arrays,for use in custom aggregates.
Note that all of these functions have already been implemented and are
either in CVS (all except str_to_array and array_to_str) or in a
submitted patch. Some are required to implement the SQL 99 standard
syntax, but some are not. Specifically, array_subscript, array_assign,
and singleton_array are not needed to implement other functionality and
would be the primary candidates for removal.
Thanks,
Joe
-------- Original Message --------
Peter Eisentraut wrote:
> Joe Conway writes:
>>I personally don't understand why we should hide them from users. If I
>>prefer to use array_append(var1, var2) rather than (var1 || var2),
>>what's the problem? Its a matter of taste as to which is better.
>
> The problem is that this approach leads to bloat without bound. Maybe
> tomorrow someone prefers append_array(var1, var2) or var1 + var2. The
> standard says it's var1 || var2, there is no technical or substantial
> aesthetical argument against it, so that's what we should use.
I can see your point. But is there any circumstance where the function
will work and the standard syntax won't? Until recently, it was not
possible to assign individual array elements in PL/pgSQL (I think Tom
fixed this). Are there any more examples?
Let's go down the list:
array_accum(anyarray, anyelement) -- no standard syntax; more on this later
array_append(anyarray, anyelement) -- can be easily done with "||" operator; is there any case where "||" won't
workthat array_append() will? If not, don't document as a user function.
array_assign(anyarray, integer, anyelement) -- can be easily done with "var[n] = X" syntax; is there any case
where"var[n] = X" won't work that array_assign() will? If not, don't document as a user function.
array_cat(anyarray, anyarray) -- see array_append
array_dims(anyarray) -- no standard syntax; should be documented.
array_lower(anyarray, integer) -- no standard syntax; should be documented.
array_prepend(anyelement, anyarray) -- see array_append
array_subscript(anyarray, integer) -- can be easily done with "var[n]" syntax; is there any case where "var[n]"
won'twork that array_subscript() will? If not, don't document as a user function.
array_to_str(anyarray, text) -- no standard syntax; should be documented.
array_upper(anyarray, integer) -- no standard syntax; should be documented.
singleton_array(anyelement) -- can be easily done with "array[x]" or "'{x}'" syntax; is there any case where one
ofthese won't work that singleton_array() will? If not, don't document as a user function.
str_to_array(text, text) -- no standard syntax; should be documented.
BTW, should this discussion be on HACKERS or even GENERAL in order to
get a wider audience of opinion?
>>And in any case, array_accum() is intended to be used for building
>>custom aggregates, so that needs to be documented.
> Can you give an example of an aggregate or a class of aggregates where
> this would be useful?
>
There are many discussions on the lists over the past few years in which
people have requested this kind of functionality. There is even a
contrib/intagg that does this for integers only. I don't think there is
any question that there is a demand for the feature. Some examples:
http://fts.postgresql.org/db/msg.html?mid=1021530
http://fts.postgresql.org/db/msg.html?mid=1096592
http://fts.postgresql.org/db/msg.html?mid=1031700
http://fts.postgresql.org/db/msg.html?mid=1353047
http://fts.postgresql.org/db/msg.html?mid=1063738
http://fts.postgresql.org/db/msg.html?mid=1050837
http://fts.postgresql.org/db/msg.html?mid=1066349
Some people would like to simply aggregate rows of data into arrays for
analysis, some want to write custom final functions to post-process the
resulting array (e.g. median). With array_accum() and some of the other
new functions (array_lower & array_upper specifically come to mind),
people have a relatively easy way to create there own custom aggregates
using PL/pgSQL.
With PL/R, I can use array_accum to create an aggregate for just about
any summary statistic that I'm interested in. And while, yes, there is
already an implementation of array_accum in PL/R, it isn't reasonable to
require anyone who wants to use it to install libR also. And, yes, there
are undoubtedly other ways to create those aggregates, but none as
simple to use.
Again, I'd suggest that if we want to debate this much further, we
should move the discussion to the GENERAL and SQL lists so that users
can have a chance to chime in.
Joe
Joe Conway writes: > >>And in any case, array_accum() is intended to be used for building > >>custom aggregates, so that needs to be documented. > > Can you give an example of an aggregate or a class of aggregates where > > this would be useful? > There are many discussions on the lists over the past few years in which > people have requested this kind of functionality. There is even a > contrib/intagg that does this for integers only. I don't think there is > any question that there is a demand for the feature. Some examples: These applications should use an empty array as initial value. Then the normal array concatenation can be used as transition function. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut wrote:
> Joe Conway writes:
>>There are many discussions on the lists over the past few years in which
>>people have requested this kind of functionality. There is even a
>>contrib/intagg that does this for integers only. I don't think there is
>>any question that there is a demand for the feature. Some examples:
>
> These applications should use an empty array as initial value. Then the
> normal array concatenation can be used as transition function.
How can you create an aggregate using an operator as a transition function?
=# CREATE AGGREGATE myagg
-# (
(# BASETYPE = text,
(# SFUNC = '||',
(# STYPE = text,
(# INITCOND = ''
(# );
ERROR: AggregateCreate: function ||(text, text) does not exist
Also (I suppose you could argue this should be fixed, but anyway...),
you can't currently add elements to an empty array.
regression=# create table t(f float8[]);
CREATE TABLE
regression=# insert into t values('{}');
INSERT 1865486 1
regression=# update t set f[1] = 1;
ERROR: Invalid array subscripts
regression=# select array_dims(f) from t; array_dims
------------
(1 row)
Joe
Joe Conway wrote: > How can you create an aggregate using an operator as a transition > function? > > =# CREATE AGGREGATE myagg > -# ( > (# BASETYPE = text, > (# SFUNC = '||', > (# STYPE = text, > (# INITCOND = '' > (# ); > ERROR: AggregateCreate: function ||(text, text) does not exist Use the function underlying the operator, in this case textcat. Regards, Andreas
Andreas Pflug wrote: > Joe Conway wrote: > >> How can you create an aggregate using an operator as a transition >> function? >> >> =# CREATE AGGREGATE myagg >> -# ( >> (# BASETYPE = text, >> (# SFUNC = '||', >> (# STYPE = text, >> (# INITCOND = '' >> (# ); >> ERROR: AggregateCreate: function ||(text, text) does not exist > > > Use the function underlying the operator, in this case textcat. > Right, but Peter's desire is to *not document* the underlying function (this was just an example, we're really discussing the new array functions array_accum and/or array_append), and my point is that it needs to be documented if we expect people to use it. Joe
Joe Conway <mail@joeconway.com> writes:
> How can you create an aggregate using an operator as a transition function?
You'd have to reference the underlying function. I think what Peter is
really asking is whether the implementation function for || will serve
for this purpose, rather than making an additional one.
> Also (I suppose you could argue this should be fixed, but anyway...),
> you can't currently add elements to an empty array.
That particular manifestation is not relevant; I think you mean this
one:
regression=# select '{}'::int4[] || 2;
ERROR: Arrays greater than one-dimension are not supported
In both cases I think it's mostly a matter of defining a defensible
behavior, which is probably not that hard, but no one's yet gotten
annoyed enough to try to fix it.
regards, tom lane
Tom Lane wrote:
> You'd have to reference the underlying function.
That's what I thought, but I thought maybe I was missing something.
> I think what Peter is really asking is whether the implementation
> function for || will serve for this purpose, rather than making an
> additional one.
But, as I said in my other post a moment ago, that then implies (at
least to me) that the underlying function needs to be documented as an
"end user" function, rather than an "internal" one.
> That particular manifestation is not relevant; I think you mean this
> one:
>
> regression=# select '{}'::int4[] || 2; ERROR: Arrays greater than
> one-dimension are not supported
I just wanted to show that the issue pre-existed the new functions. I
think they are both caused by the fact that an empty array has *no*
dimensions (clearly the error message in the new array functions could
be better).
> In both cases I think it's mostly a matter of defining a defensible
> behavior, which is probably not that hard, but no one's yet gotten
> annoyed enough to try to fix it.
It is probably easy enough to work around in the array concatenation
functions. Would it be defensible to say that ('{}'::int4[] || 2) should
produce ('{2}'::int4[]), i.e. a one-dimensional integer array with one
element?
Joe
Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> In both cases I think it's mostly a matter of defining a defensible
>> behavior, which is probably not that hard, but no one's yet gotten
>> annoyed enough to try to fix it.
> It is probably easy enough to work around in the array concatenation
> functions. Would it be defensible to say that ('{}'::int4[] || 2) should
> produce ('{2}'::int4[]), i.e. a one-dimensional integer array with one
> element?
You would also have to assume that the subscript lower bound is one,
which doesn't bother me but is an additional bit of state that has to
appear out of nowhere. (In the assignment case you don't have to assume
that, since the subscript tells you what to do.)
regards, tom lane
Tom Lane wrote:
> You would also have to assume that the subscript lower bound is one,
> which doesn't bother me but is an additional bit of state that has to
> appear out of nowhere. (In the assignment case you don't have to assume
> that, since the subscript tells you what to do.)
I've gotten this working for array concatenation and assignment.
Examples:
-- empty array concatenated with any element, return one element,
-- one-dimensional array, with lower bound set to 1
regression=# select '{}'::int4[] || 1; ?column?
---------- {1}
(1 row)
regression=# select 0 || '{}'::int4[]; ?column?
---------- {0}
(1 row)
regression=# select array_dims(0 || '{}'::int4[]); array_dims
------------ [1:1]
(1 row)
-- empty array concatenated with any non-empty, return the non-empty one
regression=# select '{}'::int4[] || array[[[1,2],[3,4]]]; ?column?
----------------- {{{1,2},{3,4}}}
(1 row)
-- concatenate two empty arrays, return empty array
regression=# select '{}'::int4[] || '{}'::int4[]; ?column?
---------- {}
(1 row)
-- assignment to empty array: determine number
-- of dimensions and array subscripts based on those
-- given in the assignment statement
regression=# create table t(f float8[]);
CREATE TABLE
regression=# insert into t values('{}');
INSERT 2011035 1
regression=# update t set f[-2:2] = array[1,2,3,4,5];
UPDATE 1
regression=# select * from t; f
------------- {1,2,3,4,5}
(1 row)
regression=# select array_dims(f) from t; array_dims
------------ [-2:2]
(1 row)
One question, should this work to create an empty array:
regression=# select array[];
ERROR: parser: parse error at or near "]" at character 14
or by analogy to '{}'::int4[]
regression=# select array[]::int4[];
ERROR: parser: parse error at or near "]" at character 14
Or is the current '{}'::int4[] syntax all we want/need?
Joe
Joe Conway <mail@joeconway.com> writes:
> One question, should this work to create an empty array:
> regression=# select array[];
> ERROR: parser: parse error at or near "]" at character 14
Only if you can specify what type it has. This seems to get back
to our off-list discussion about what to do with array of unknown.
regards, tom lane
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>>regression=# select array[];
>>ERROR: parser: parse error at or near "]" at character 14
>
> Only if you can specify what type it has. This seems to get back
> to our off-list discussion about what to do with array of unknown.
>
Yeah, without an unknown[] data type, I think a special case hack would
be needed even for the array[]::int4[] example, because the coercion to
int4[] happens too late.
I guess '{}'::arraytype[] will have to do for now.
Joe