Обсуждение: What's wrong with this group by clause?

Поиск
Список
Период
Сортировка

What's wrong with this group by clause?

От
Franco Bruno Borghesi
Дата:
Hi everyone.

Below you can find a simplified example of a real case.
I don't understand why I'm getting the "john" record twice.
I know there's no point in using constants in the group by, but makes sense in
the real much bigger query.
What I see is that when I issue an EXPLAIN with this query, it tells me that
the SORT KEY is "0".
When I remove one of the constant fields and try again the EXPLAIN, it says
that the SORT KEY is "0, name" (and the group by works, it shows only one
record for each person in the table).

Thanks anyone.

/*EXAMPLE*/
CREATE TABLE people
(  name TEXT
);
INSERT INTO people VALUES ('john');
INSERT INTO people VALUES ('john');
INSERT INTO people VALUES ('pete');
INSERT INTO people VALUES ('pete');
INSERT INTO people VALUES ('ernest');
INSERT INTO people VALUES ('john');
SELECT  0 AS field1,  0 AS field2,   name
FROM  people
GROUP BY  field1,  field2,  name;
field1 | field2 |  name
--------+--------+--------     0 |      0 | john     0 |      0 | pete     0 |      0 | ernest     0 |      0 | john
(4 rows)

Re: What's wrong with this group by clause?

От
Christoph Haller
Дата:
> Hi everyone.=20
>
> Below you can find a simplified example of a real case.=20
> I don't understand why I'm getting the "john" record twice.=20
> I know there's no point in using constants in the group by, but makes
sense=
>  in=20
> the real much bigger query.=20
> What I see is that when I issue an EXPLAIN with this query, it tells
me tha=
> t=20
> the SORT KEY is "0".=20
> When I remove one of the constant fields and try again the EXPLAIN, it
says=
> =20
> that the SORT KEY is "0, name" (and the group by works, it shows only
one=
> =20
> record for each person in the table).
>
> Thanks anyone.
>
> /*EXAMPLE*/
> CREATE TABLE people
> (
>    name TEXT
> );
> INSERT INTO people VALUES ('john');
> INSERT INTO people VALUES ('john');
> INSERT INTO people VALUES ('pete');
> INSERT INTO people VALUES ('pete');
> INSERT INTO people VALUES ('ernest');
> INSERT INTO people VALUES ('john');
> =20=20=20
> SELECT
>    0 AS field1,
>    0 AS field2,=20
>    name
> FROM
>    people
> GROUP BY
>    field1,
>    field2,
>    name;
>
I did your example and did not get the "john" record twice.
Instead I got what you expected.field1 | field2 |  name
--------+--------+--------     0 |      0 | ernest     0 |      0 | john     0 |      0 | pete
(3 rows)
What version are you using?

Regards, Christoph




Re: What's wrong with this group by clause?

От
Manfred Koizar
Дата:
[forwarding to -hackers]

On Tue, 4 Nov 2003 18:28:12 -0300, Franco Bruno Borghesi
<franco@akyasociados.com.ar> wrote:
>Below you can find a simplified example of a real case. 
>I don't understand why I'm getting the "john" record twice. 

ISTM you have found a Postgres 7.3 bug.

I get one john withPostgreSQL 7.1.3 on i686-pc-cygwin, compiled by GCC 2.95.3-5
andPostgreSQL 7.2 on i686-pc-linux-gnu, compiled by GCC 2.7.2.1

but two johns withPostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.7.2.1

>/*EXAMPLE*/
>CREATE TABLE people
>(
>   name TEXT
>);
>INSERT INTO people VALUES ('john');
>INSERT INTO people VALUES ('john');
>INSERT INTO people VALUES ('pete');
>INSERT INTO people VALUES ('pete');
>INSERT INTO people VALUES ('ernest');
>INSERT INTO people VALUES ('john');
>   
>SELECT
>   0 AS field1,
>   0 AS field2, 
>   name
>FROM
>   people
>GROUP BY
>   field1,
>   field2,
>   name;
>
> field1 | field2 |  name
>--------+--------+--------
>      0 |      0 | john
>      0 |      0 | pete
>      0 |      0 | ernest
>      0 |      0 | john
>(4 rows)

Same forSELECT 0 AS field1, 0 AS field2, name  FROM people GROUP BY 1, 2, name;

ServusManfred


Re: [HACKERS] What's wrong with this group by clause?

От
Tom Lane
Дата:
Manfred Koizar <mkoi-pg@aon.at> writes:
> ISTM you have found a Postgres 7.3 bug.

Yeah.  Actually, the planner bug has been there a long time, but it was
only latent until the parser stopped suppressing duplicate GROUP BY
items:

2002-08-18 14:46  tgl
* src/backend/parser/parse_clause.c: Remove optimization wherebyparser would make only one sort-list entry when two
equal()targetlistitems were to be added to an ORDER BY or DISTINCT list. Although indeed this would make sorting
fractionallyfaster bysometimes saving a comparison, it confuses the heck out of laterstages of processing, because it
makesit look like the user wroteDISTINCT ON rather than DISTINCT.  Bug reported byjoe@piscitella.com.
 

7.3 patch is attached if you need it.
        regards, tom lane


*** src/backend/optimizer/plan/planner.c.orig    Wed Mar  5 13:38:26 2003
--- src/backend/optimizer/plan/planner.c    Thu Mar 13 11:21:16 2003
***************
*** 1498,1510 ****          * are just dummies with no extra execution cost.)          */         List
*sort_tlist= new_unsorted_tlist(subplan->targetlist);         int            keyno = 0;         List       *gl;
foreach(gl, groupClause)         {             GroupClause *grpcl = (GroupClause *) lfirst(gl);
 
!             TargetEntry *te = nth(grpColIdx[keyno] - 1, sort_tlist);             Resdom       *resdom = te->resdom;
          /*
 
--- 1498,1511 ----          * are just dummies with no extra execution cost.)          */         List
*sort_tlist= new_unsorted_tlist(subplan->targetlist);
 
+         int            grpno = 0;         int            keyno = 0;         List       *gl;          foreach(gl,
groupClause)        {             GroupClause *grpcl = (GroupClause *) lfirst(gl);
 
!             TargetEntry *te = nth(grpColIdx[grpno] - 1, sort_tlist);             Resdom       *resdom = te->resdom;
          /*
 
***************
*** 1518,1523 ****
--- 1519,1525 ----                 resdom->reskey = ++keyno;                 resdom->reskeyop = grpcl->sortop;
  }
 
+             grpno++;         }          Assert(keyno > 0);


Re: What's wrong with this group by clause?

От
"Len Morgan"
Дата:
From: Franco Bruno Borghesi <franco@akyasociados.com.ar>
>SELECT>  0 AS field1, > 0 AS field2,  >name
>FROM>  people
>GROUP BY>  field1, > field2, >name;

I think the problem is that you don't have a column to group on.  Try adding
SELECT ....,count(*).... so that there is an aggregate of some kind.
Alternatively, you could use DISTINCT ON (field1,field2) field1,field2,name
FROM ...  although this is a Postgres specific extension of the SQL spec.

Len Morgan




Re: What's wrong with this group by clause?

От
Manfred Koizar
Дата:
On Thu, 13 Mar 2003 01:34:34 -0600, "Len Morgan"
<len-morgan@crcom.net> wrote:
>>GROUP BY
> >  field1,
>  > field2,
>  >name;
>I think the problem is that you don't have a column to group on.

field1, field2, and name are the grouping columns.

>Try adding
>SELECT ....,count(*).... so that there is an aggregate of some kind.

You don't need an aggregate in a GROUP BY query.  A SELECT ... GROUP
BY without any aggregate behaves like SELECT DISTINCT.  There's
nothing wrong with it.  Performance might be a different story.

BTW, Franco's problem has been recognised as a bug and a patch has
already been published (cf. Tom Lane's mail in this thread).

ServusManfred


Re: [HACKERS] What's wrong with this group by clause?

От
Franco Bruno Borghesi
Дата:
Thanks Tom, I applied the patch and it works perfect now.
Thanks to you all.

On Thursday 13 March 2003 14:02, Tom Lane wrote:
> Manfred Koizar <mkoi-pg@aon.at> writes:
> > ISTM you have found a Postgres 7.3 bug.
>
> Yeah.  Actually, the planner bug has been there a long time, but it was
> only latent until the parser stopped suppressing duplicate GROUP BY
> items:
>
> 2002-08-18 14:46  tgl
>
>     * src/backend/parser/parse_clause.c: Remove optimization whereby
>     parser would make only one sort-list entry when two equal()
>     targetlist items were to be added to an ORDER BY or DISTINCT list.
>     Although indeed this would make sorting fractionally faster by
>     sometimes saving a comparison, it confuses the heck out of later
>     stages of processing, because it makes it look like the user wrote
>     DISTINCT ON rather than DISTINCT.  Bug reported by
>     joe@piscitella.com.
>
> 7.3 patch is attached if you need it.
>
>             regards, tom lane
>
>
> *** src/backend/optimizer/plan/planner.c.orig    Wed Mar  5 13:38:26 2003
> --- src/backend/optimizer/plan/planner.c    Thu Mar 13 11:21:16 2003
> ***************
> *** 1498,1510 ****
>            * are just dummies with no extra execution cost.)
>            */
>           List       *sort_tlist = new_unsorted_tlist(subplan->targetlist);
>           int            keyno = 0;
>           List       *gl;
>
>           foreach(gl, groupClause)
>           {
>               GroupClause *grpcl = (GroupClause *) lfirst(gl);
> !             TargetEntry *te = nth(grpColIdx[keyno] - 1, sort_tlist);
>               Resdom       *resdom = te->resdom;
>
>               /*
> --- 1498,1511 ----
>            * are just dummies with no extra execution cost.)
>            */
>           List       *sort_tlist = new_unsorted_tlist(subplan->targetlist);
> +         int            grpno = 0;
>           int            keyno = 0;
>           List       *gl;
>
>           foreach(gl, groupClause)
>           {
>               GroupClause *grpcl = (GroupClause *) lfirst(gl);
> !             TargetEntry *te = nth(grpColIdx[grpno] - 1, sort_tlist);
>               Resdom       *resdom = te->resdom;
>
>               /*
> ***************
> *** 1518,1523 ****
> --- 1519,1525 ----
>                   resdom->reskey = ++keyno;
>                   resdom->reskeyop = grpcl->sortop;
>               }
> +             grpno++;
>           }
>
>           Assert(keyno > 0);