Обсуждение: SQL syntax extentions - to put postgres ahead in the race

Поиск
Список
Период
Сортировка

SQL syntax extentions - to put postgres ahead in the race

От
"Ram Nathaniel"
Дата:
Hi,
 
I am a developer working with many databases, as a part of my job. I use heavy SQL queries and have become somewhat of an expert in SQL, including tricks and workarounds of the limitation of the language.
 
I feel like a lot of the workarounds could be avoided with adding a few new operators to the SQL language that should be rather easy to support but would give a vast improvement and probably a leap of performance in many common queries. I write you about this hoping that you would support these operators (even though they are not in the ANSI) and thereby position PostGres as a leader and not just a follower! I personaly have a great interest in seeing open source software making it big time! So please - forward this to one of your more technical developers/executives and hopefully I will get to see it in the next version of PostGres:
 
1) The operator  "of max":
suppose I have a table "grades" of 3 fields: class/student/grade where I store many grades of many students of many classes. I want to get the name of the highest scoring student in each class. Note that there may be many students with the same grade, but for starters let's say there is a primary key of class+grade.
 
My query would now be:
select student from grades where class+'#'+grade in
(
   select class+'#'+max(grade) from grades group by class
) a
 
This means working the query twice - and relying on louzy conversion of the grade from numerical to textual.
We could also use:
 
select student from grades where student in
(
  select student from grades group by class
  having class+'#'+grade = max(class+'#'+grade)
) a
 
This is even worse!
 
The optimal would be to introduce a new operator "of max" that would be used as follows:
 
select student of max(grade) from grades group by class
 
simillarly one should support "of min" and "of any" (brings a representative of the group)
 
2) aggregated concatenation:
Traditionally the SQL language has referred from supporting order dependent operators from taking a role in aggregated functions. This means that since the query: "select class, grade from grades" does not ensure the order of the records returned, the operation sum() is supported (i.e. select class, sum(grade) from grades group by class) but other operations that would be order dependent are not supported.
I think this approach should be revised. In many cases one would want to get a list of the student names delimited with a comma. It would be great if one could write:
select class, list(student, ',') from grades group by class
and get
 
class  list
-----  ----
class1 john, ruth,...
.
.
.
 
This is of course an order dependent operation so the syntax can either be:
select class, list(student, ',') from grades group by class order by student, grade
in which case the list would be ordered before the list is created, or if no particular order is requested the concatenation of the names should be in an arbitrary order.
 
Well - that's all for now :-)
 
Good luck!
 
Ram

Re: SQL syntax extentions - to put postgres ahead in the race

От
Stephan Szabo
Дата:
On Thu, 5 Aug 2004, Ram Nathaniel wrote:

>
> 1) The operator  "of max":
> suppose I have a table "grades" of 3 fields: class/student/grade where I
> store many grades of many students of many classes. I want to get the
> name of the highest scoring student in each class. Note that there may
> be many students with the same grade, but for starters let's say there
> is a primary key of class+grade.
>
> My query would now be:
> select student from grades where class+'#'+grade in
> (
>    select class+'#'+max(grade) from grades group by class
> ) a

As a side note, I'd think that something like:select student from grades where (class,grade) in (select class,
max(grade)from grades group by class);
 
should avoid textual operations.  I'm assuming the + above are meant to be
concatenation (||).

> The optimal would be to introduce a new operator "of max" that would be used as follows:
>
> select student of max(grade) from grades group by class

PostgreSQL provides an extension called DISTINCT ON.

Something likeselect distinct on (class) student from grades order by class, gradedesc;
should get you one arbitrary student with the highest grade in his or her
class.

If you want to order by the grades, I think you need a layer around it.
If you don't care about the class order, you might consider making the
class ordering desc as well to make it easier to use a multi-column index
on (class,grade).

> 2) aggregated concatenation:

Theoretically, you should be able to do this right now in PostgreSQL with
user defined aggregates (although you can't pass a second argument
currently for the separator).  I believe that an ordered subquery in FROM
will currently allow you to get an ordered aggregate, or perhaps you'd
have to turn off hash aggregation, but I think you should be able to get
it to keep the ordering.


Re: SQL syntax extentions - to put postgres ahead in the race

От
Tom Lane
Дата:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Thu, 5 Aug 2004, Ram Nathaniel wrote:
>> 2) aggregated concatenation:

> Theoretically, you should be able to do this right now in PostgreSQL with
> user defined aggregates (although you can't pass a second argument
> currently for the separator).

There's nothing particularly stopping us from supporting
multiple-argument aggregates, except a lack of round tuits.
(I suppose we'd want to rethink the syntax of CREATE AGGREGATE,
but otherwise it ought to be pretty straightforward.)

> I believe that an ordered subquery in FROM
> will currently allow you to get an ordered aggregate, or perhaps you'd
> have to turn off hash aggregation, but I think you should be able to get
> it to keep the ordering.

I think you would want to ORDER BY twice:
   SELECT class, list(student) from(select class, student from grades order by class, student) ss   order by class;

It looks like (at least in CVS tip) planner.c will take into account the
relative costs of doing a GroupAgg vs doing a HashAgg and re-sorting,
but I'm too tired to try it right now...
        regards, tom lane


Re: SQL syntax extentions - to put postgres ahead in the race

От
Tom Lane
Дата:
I wrote:
> There's nothing particularly stopping us from supporting
> multiple-argument aggregates, except a lack of round tuits.

BTW, you can actually fake this pretty well in 8.0, by making an
aggregate that uses a rowtype input.  For example:

regression=# create type twostrings as (s1 text, s2 text);
CREATE TYPE
regression=# create function list_concat(text, twostrings) returns text as $$
regression$# select case when $1 is null then $2.s1
regression$# when $2.s1 is null then $1
regression$# else $1 || $2.s2 || $2.s1
regression$# end$$ language sql;
CREATE FUNCTION
regression=# create aggregate concat (
regression(# basetype = twostrings,
regression(# stype = text,
regression(# sfunc = list_concat);
CREATE AGGREGATE
regression=# select * from text_tbl;       f1
-------------------doh!hi de ho neighbormore stuffand more
(4 rows)

regression=# select concat((f1, '|')) from text_tbl;                  concat
--------------------------------------------doh!|hi de ho neighbor|more stuff|and more
(1 row)


This is somewhat inefficient compared to native support for
multi-argument aggregates, but at least we have something we can point
people to until we find time to make that happen.
        regards, tom lane