Обсуждение: [PATCH] GROUP BY ALL

Поиск
Список
Период
Сортировка

[PATCH] GROUP BY ALL

От
David Christensen
Дата:
I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality.  There certainly is utility in
such a construct IMHO.

The grammar is unambiguous, so can support this construct in lieu of
the traditional GROUP BY clause.  Enclosed is a patch which adds this
via just scanning the TargetEntry list and adding anything that is not
an aggregate function call to the groupList.

Still need some docs; just throwing this out there and getting some feedback.

Thanks,

David

Вложения

Re: [PATCH] GROUP BY ALL

От
"David G. Johnston"
Дата:
On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:
I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality.  There certainly is utility in
such a construct IMHO.

Still need some docs; just throwing this out there and getting some feedback.


I strongly dislike adding this feature.  I'd only consider supporting it if it was part of the SQL standard.

Code is written once and read many times.  This feature caters to the writer, not the reader.  And furthermore usage of this is prone to be to the writer's detriment as well.

David J.

Re: [PATCH] GROUP BY ALL

От
Isaac Morland
Дата:
On Mon, 22 Jul 2024 at 17:34, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:
I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality.  There certainly is utility in
such a construct IMHO.

Still need some docs; just throwing this out there and getting some feedback.


I strongly dislike adding this feature.  I'd only consider supporting it if it was part of the SQL standard.

Code is written once and read many times.  This feature caters to the writer, not the reader.  And furthermore usage of this is prone to be to the writer's detriment as well.

And for when this might be useful, the syntax for it already exists, although a spurious error message is generated:

odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
ERROR:  column "uw_term.term_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
                ^

I'm not sure exactly what's going on here — it's like it's still seeing the table name in the field list as only a table name and not the value corresponding to the whole table as a row value (But in general I'm not happy with the system's ability to figure out that a column's value has only one possibility given the grouping columns). You can work around:

odyssey=> with t as (select uw_term, count(*) from uw_term group by uw_term) select (uw_term).*, count from t;

This query works.

Re: [PATCH] GROUP BY ALL

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:
>> I see that there'd been some chatter but not a lot of discussion about
>> a GROUP BY ALL feature/functionality.  There certainly is utility in
>> such a construct IMHO.

> I strongly dislike adding this feature.  I'd only consider supporting it if
> it was part of the SQL standard.

Yeah ... my recollection is that we already rejected this idea.
If you want to re-litigate that, "throwing this out there" is
not a sufficient argument.

(Personally, I'd wonder exactly what ALL is quantified over: the
whole output of the FROM clause, or only columns mentioned in the
SELECT tlist, or what? And why that choice rather than another?)

            regards, tom lane



Re: [PATCH] GROUP BY ALL

От
Tom Lane
Дата:
Isaac Morland <isaac.morland@gmail.com> writes:
> And for when this might be useful, the syntax for it already exists,
> although a spurious error message is generated:

> odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
> ERROR:  column "uw_term.term_id" must appear in the GROUP BY clause or be
> used in an aggregate function
> LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
>                 ^

> I'm not sure exactly what's going on here

The SELECT entry is expanded into "uw_term.col1, uw_term.col2,
uw_term.col3, ...", and those single-column Vars don't match the
whole-row Var appearing in the GROUP BY list.  I guess if we
think this is important, we could add a proof rule saying that
a per-column Var is functionally dependent on a whole-row Var
of the same relation.  Odd that the point hasn't come up before
(though I guess that suggests that few people try this).

            regards, tom lane



Re: [PATCH] GROUP BY ALL

От
David Christensen
Дата:
On Mon, Jul 22, 2024 at 4:34 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:
>>
>> I see that there'd been some chatter but not a lot of discussion about
>> a GROUP BY ALL feature/functionality.  There certainly is utility in
>> such a construct IMHO.
>>
>> Still need some docs; just throwing this out there and getting some feedback.
>>
>
> I strongly dislike adding this feature.  I'd only consider supporting it if it was part of the SQL standard.
>
> Code is written once and read many times.  This feature caters to the writer, not the reader.  And furthermore usage
ofthis is prone to be to the writer's detriment as well. 

I'd say this feature (at least for me) caters to the investigator;
someone who is interactively looking at data hence why it would cater
to the writer.  Consider acquainting yourself with a large table that
has a large number of annoying-named fields where you want to look at
how different data is correlated or broken-down.  Something along the
lines of:

SELECT last_name, substring(first_name,1,1) as first_initial,
income_range, count(*) FROM census_data GROUP BY ALL;

If you are iteratively looking at things, adding or removing fields
from your breakdown, you only need to change it in a single place, the
tlist.  Additionally, expressions can be used transparently without
needing to repeat them.  (Yes, in practice, I'd often use GROUP BY 1,
2, say, but if you add more fields to this you need to edit in
multiple places.)

David



Re: [PATCH] GROUP BY ALL

От
David Christensen
Дата:
On Mon, Jul 22, 2024 at 5:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:
> >> I see that there'd been some chatter but not a lot of discussion about
> >> a GROUP BY ALL feature/functionality.  There certainly is utility in
> >> such a construct IMHO.
>
> > I strongly dislike adding this feature.  I'd only consider supporting it if
> > it was part of the SQL standard.
>
> Yeah ... my recollection is that we already rejected this idea.
> If you want to re-litigate that, "throwing this out there" is
> not a sufficient argument.

Heh, fair enough.  I actually wrote the patch after encountering the
syntax in DuckDB and it seemed easy enough to add to Postgres while
providing some utility, then ended up seeing a thread about it later.
I did not get the sense that this had been globally rejected; though
there were definitely voices against it, it seemed to trail off rather
than coming to a conclusion.

> (Personally, I'd wonder exactly what ALL is quantified over: the
> whole output of the FROM clause, or only columns mentioned in the
> SELECT tlist, or what? And why that choice rather than another?)

My intention here was to basically be a shorthand for "group by
specified non-aggregate fields in the select list".  Perhaps I'm not
being creative enough, but what is the interpretation/use case for
anything else? :-)

While there are other ways to accomplish these things, making an easy
way to GROUP BY with aggregate queries would be useful in the field,
particularly when doing iterative discovery work would save a lot of
time with a situation that is both detectable and hits users with
errors all the time.

I'm not married to the exact syntax of this feature; anything else
short and consistent could work if `ALL` is considered to potentially
gain a different interpretation in the future.

David



Re: [PATCH] GROUP BY ALL

От
David Christensen
Дата:
On Tue, Jul 23, 2024 at 8:21 AM Andrei Borodin <x4mmm@yandex-team.ru> wrote:
>
> On 23 Jul 2024, at 00:40, Isaac Morland <isaac.morland@gmail.com> wrote:
>
> odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
> ERROR:  column "uw_term.term_id" must appear in the GROUP BY clause or be used in an aggregate function
> LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
>
>
> AFAIR this problem was solved in my implementation [0]
>
> On 23 Jul 2024, at 01:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> (Personally, I'd wonder exactly what ALL is quantified over: the
> whole output of the FROM clause, or only columns mentioned in the
> SELECT tlist, or what? And why that choice rather than another?)
>
>
> I'd like to have GROUP BY AUTO (I also proposed version GROUP BY SURPRISE ME). But I wouldn't like to open pandora
boxof syntax sugar extensions which may will be incompatible with future standards. 
> If we could have extensible grammar - I'd be happy to have a lot of such enhancements. My top 2 are FROM table SELECT
columnand better GROUP BY. 

GROUP BY AUTO also seems fine here to me; I understand the desire to
avoid major incompatible syntax changes; GROUP BY ALL does exist in
multiple products so it's not unprecedented.

I wrote my patch before seeing your thread, sorry I missed that. :-)

David



Re: [PATCH] GROUP BY ALL

От
David Christensen
Дата:
On Mon, Jul 22, 2024 at 4:41 PM Isaac Morland <isaac.morland@gmail.com> wrote:

> And for when this might be useful, the syntax for it already exists, although a spurious error message is generated:
>
> odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
> ERROR:  column "uw_term.term_id" must appear in the GROUP BY clause or be used in an aggregate function
> LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
>                 ^

This is with my patch, or existing postgres?  Grouping by record is
not actually what this patch is trying to do, so perhaps there is some
ambiguity; this is intended to GROUP BY any select item that is not an
aggregate function.



Re: [PATCH] GROUP BY ALL

От
Laurenz Albe
Дата:
On Tue, 2024-07-23 at 08:37 -0500, David Christensen wrote:
> My intention here was to basically be a shorthand for "group by
> specified non-aggregate fields in the select list".  Perhaps I'm not
> being creative enough, but what is the interpretation/use case for
> anything else? :-)

I am somewhat against this feature.
It is too much magic for my taste.

It might be handy for interactive use, but I would frown at an application
that uses code like that, much like I'd frown at "SELECT *" in application code.

Yours,
Laurenz Albe



Re: [PATCH] GROUP BY ALL

От
David Christensen
Дата:
On Tue, Jul 23, 2024 at 10:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Tue, 2024-07-23 at 08:37 -0500, David Christensen wrote:
> > My intention here was to basically be a shorthand for "group by
> > specified non-aggregate fields in the select list".  Perhaps I'm not
> > being creative enough, but what is the interpretation/use case for
> > anything else? :-)
>
> I am somewhat against this feature.
> It is too much magic for my taste.
>
> It might be handy for interactive use, but I would frown at an application
> that uses code like that, much like I'd frown at "SELECT *" in application code.

Sure, not everything that makes things easier is strictly necessary;
we could require `CAST(field AS text)` instead of `::text`, make
subqueries required for transforming oids into specific system tables
instead of `::regfoo` casts,  any number of other choices, remove
`SELECT *` as a parse option, but making it easier to do common things
interactively as a DBA has value as well.

David



Re: [PATCH] GROUP BY ALL

От
"David G. Johnston"
Дата:
On Tue, Jul 23, 2024 at 9:48 AM David Christensen <david@pgguru.net> wrote:

Sure, not everything that makes things easier is strictly necessary;
we could require `CAST(field AS text)` instead of `::text`,

Probably should have...being standard and all.  Though syntactic sugar is quite different from new behavior - transforming :: to CAST is straight-forward.

make
subqueries required for transforming oids into specific system tables
instead of `::regfoo` casts,

Since OID is non-standard this falls within our purview.

  any number of other choices, remove
`SELECT *` as a parse option,

Again, standard dictated.

but making it easier to do common things
interactively as a DBA has value as well.


Agreed, but this isn't a clear-cut win, and doesn't have standard conformance to tip the scale over fully.

Also, there are so many better tools for data exploration.  Removing this quirk only marginally closes that gap.

David J.

Re: [PATCH] GROUP BY ALL

От
Paul Jungwirth
Дата:
On 7/22/24 15:43, Tom Lane wrote:
> Isaac Morland <isaac.morland@gmail.com> writes:
>> And for when this might be useful, the syntax for it already exists,
>> although a spurious error message is generated:
> 
>> odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
>> ERROR:  column "uw_term.term_id" must appear in the GROUP BY clause or be
>> used in an aggregate function
>> LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
>>                  ^
> 
>> I'm not sure exactly what's going on here
> 
> The SELECT entry is expanded into "uw_term.col1, uw_term.col2,
> uw_term.col3, ...", and those single-column Vars don't match the
> whole-row Var appearing in the GROUP BY list.  I guess if we
> think this is important, we could add a proof rule saying that
> a per-column Var is functionally dependent on a whole-row Var
> of the same relation.  Odd that the point hasn't come up before
> (though I guess that suggests that few people try this).

I was just using this group-by-row feature last week to implement a temporal outer join in a way 
that would work for arbitrary tables. Here is some example SQL:

https://github.com/pjungwir/temporal_ops/blob/b10d65323749faa6c47956db2e8f95441e508fce/sql/outer_join.sql#L48-L66

That does `GROUP BY a` then `SELECT (x.a).*`.[1]

It is very useful for writing queries that don't want to know about the structure of the row.

I noticed the same error as Isaac. I worked around the problem by wrapping it in a subquery and 
decomposing the row outside. It's already an obscure feature, and an easy workaround might be why 
you haven't heard complaints before. I wouldn't mind writing a patch for that rule when I get a 
chance (if no one else gets to it first.)

[1] Actually I see it does `GROUP BY a, a.valid_at`, but that is surely more than I need. I think 
that `a.valid_at` is leftover from a previous version of the query.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com



Re: [PATCH] GROUP BY ALL

От
Peter Eisentraut
Дата:
On 23.07.24 00:29, Tom Lane wrote:
> (Personally, I'd wonder exactly what ALL is quantified over: the
> whole output of the FROM clause, or only columns mentioned in the
> SELECT tlist, or what? And why that choice rather than another?)

Looks like the main existing implementations take it to mean all entries 
in the SELECT list that are not aggregate functions.

https://duckdb.org/docs/sql/query_syntax/groupby.html#group-by-all
https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-select-groupby.html#parameters
https://docs.snowflake.com/en/sql-reference/constructs/group-by#parameters




Re: [PATCH] GROUP BY ALL

От
Jelte Fennema-Nio
Дата:
On Mon, 22 Jul 2024 at 22:55, David Christensen <david@pgguru.net> wrote:
> I see that there'd been some chatter but not a lot of discussion about
> a GROUP BY ALL feature/functionality.  There certainly is utility in
> such a construct IMHO.

+1 from me. When exploring data, this is extremely useful because you
don't have to update the GROUP BY clause every time

Regarding the arguments against this:
1. I don't think this is any more unreadable than being able to GROUP
BY 1, 2, 3. Or being able to use column aliases from the SELECT in the
GROUP BY clause. Again this is already allowed. Personally I actually
think it's more readable than specifying e.g. 5 columns in the group
by, because then I have to cross-reference with columns in the SELECT
clause to find out if they are the same. With ALL I instantly know
it's grouped by all
2. This is indeed not part of the standard. But we have many things
that are not part of the standard. I think as long as we use the same
syntax as snowflake, databricks and duckdb I personally don't see a
big problem. Then we could try and make this be part of the standard
in the next version of the standard.



Re: [PATCH] GROUP BY ALL

От
Jelte Fennema-Nio
Дата:
On Tue, 23 Jul 2024 at 15:22, Andrei Borodin <x4mmm@yandex-team.ru> wrote:
> I'd like to have GROUP BY AUTO (I also proposed version GROUP BY SURPRISE ME). But I wouldn't like to open pandora
boxof syntax sugar extensions which may will be incompatible with future standards.
 
> If we could have extensible grammar - I'd be happy to have a lot of such enhancements. My top 2 are FROM table SELECT
columnand better GROUP BY.
 

Personally my number one enhancement would be allowing a trailing
comma after the last column in the SELECT clause.



Re: [PATCH] GROUP BY ALL

От
Pavel Stehule
Дата:
Hi

st 24. 7. 2024 v 10:57 odesílatel Jelte Fennema-Nio <postgres@jeltef.nl> napsal:
On Mon, 22 Jul 2024 at 22:55, David Christensen <david@pgguru.net> wrote:
> I see that there'd been some chatter but not a lot of discussion about
> a GROUP BY ALL feature/functionality.  There certainly is utility in
> such a construct IMHO.

+1 from me. When exploring data, this is extremely useful because you
don't have to update the GROUP BY clause every time

Regarding the arguments against this:
1. I don't think this is any more unreadable than being able to GROUP
BY 1, 2, 3. Or being able to use column aliases from the SELECT in the
GROUP BY clause. Again this is already allowed. Personally I actually
think it's more readable than specifying e.g. 5 columns in the group
by, because then I have to cross-reference with columns in the SELECT
clause to find out if they are the same. With ALL I instantly know
it's grouped by all
2. This is indeed not part of the standard. But we have many things
that are not part of the standard. I think as long as we use the same
syntax as snowflake, databricks and duckdb I personally don't see a
big problem. Then we could try and make this be part of the standard
in the next version of the standard.

 Aggregation against more columns is pretty slow and memory expensive in Postgres.

DuckDB is an analytic database with different storage, different executors. I like it very much, but I am not sure if I want to see these features in Postgres.

Lot of developers are not very smart, and with proposed feature, then instead to write correct and effective query, then they use just GROUP BY ALL. Slow query should look like a slow query :-)

Regards

Pavel


Re: [PATCH] GROUP BY ALL

От
"Andrey M. Borodin"
Дата:

> On 24 Jul 2024, at 13:58, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> On Tue, 23 Jul 2024 at 15:22, Andrei Borodin <x4mmm@yandex-team.ru> wrote:
>> I'd like to have GROUP BY AUTO (I also proposed version GROUP BY SURPRISE ME). But I wouldn't like to open pandora
boxof syntax sugar extensions which may will be incompatible with future standards. 
>> If we could have extensible grammar - I'd be happy to have a lot of such enhancements. My top 2 are FROM table
SELECTcolumn and better GROUP BY. 
>
> Personally my number one enhancement would be allowing a trailing
> comma after the last column in the SELECT clause.

Yes, trailing comma sounds great too.

One more similar syntax sugar I can think of. I see lots of queries like
SELECT somtheing
FROM table1
WHERE 1=1
and id = x
--and col1 = val1
and col2 = val2

I was wondering where does that "1=1" comes from. It's because developer comment condition one by one like "--, col1 =
val1".And they do not want to cope with and\or continuation. 


Best regards, Andrey Borodin.

PS. Seems like Mail.App mangled my previous message despite using plain text. It's totally broken in archives...


Re: [PATCH] GROUP BY ALL

От
Ashutosh Bapat
Дата:
On Tue, Jul 23, 2024 at 6:53 PM David Christensen <david@pgguru.net> wrote:
>
> On Mon, Jul 22, 2024 at 4:34 PM David G. Johnston
> <david.g.johnston@gmail.com> wrote:
> >
> > On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:
> >>
> >> I see that there'd been some chatter but not a lot of discussion about
> >> a GROUP BY ALL feature/functionality.  There certainly is utility in
> >> such a construct IMHO.
> >>
> >> Still need some docs; just throwing this out there and getting some feedback.
> >>
> >
> > I strongly dislike adding this feature.  I'd only consider supporting it if it was part of the SQL standard.
> >
> > Code is written once and read many times.  This feature caters to the writer, not the reader.  And furthermore
usageof this is prone to be to the writer's detriment as well. 
>
> I'd say this feature (at least for me) caters to the investigator;
> someone who is interactively looking at data hence why it would cater
> to the writer.  Consider acquainting yourself with a large table that
> has a large number of annoying-named fields where you want to look at
> how different data is correlated or broken-down.  Something along the
> lines of:

To me this looks like a feature that a data exploration tool may
implement instead of being part of the server. It would then provide
more statistics about each correlation/column set etc.

--
Best Wishes,
Ashutosh Bapat