Обсуждение: GROUP BY checks inadequate when set returning functions in column list

Поиск
Список
Период
Сортировка

GROUP BY checks inadequate when set returning functions in column list

От
Chris Travers
Дата:
Hi all;

In some of my tests regarding set-returning functions I came across
some very strange behavior.  Group by seems to have very strange (and
inconsistent) behavior when connected to the use of a set-returning
function in a column list.

Consider the following function which returns a list of rows from a table:

mtech_test=# select * from account_heading__list();
  id   | accno | parent_id |            description
-------+-------+-----------+-----------------------------------
 10001 | 1000  |           | CURRENT ASSETS
 10006 | 1500  |           | INVENTORY ASSETS
 10010 | 1800  |           | CAPITAL ASSETS
 10015 | 2000  |           | CURRENT LIABILITIES
 10027 | 2600  |           | LONG TERM LIABILITIES
 10451 | 2700  |           | Expense Accounts for Individuals
 10225 | 3000  |           | CAPITAL
 10030 | 3300  |           | SHARE CAPITAL
 10032 | 4000  |           | SALES REVENUE
 10036 | 4300  |           | CONSULTING REVENUE
 10039 | 4400  |           | OTHER REVENUE
 10043 | 5000  |           | COST OF GOODS SOLD
 10049 | 5400  |           | PAYROLL EXPENSES
 10055 | 5600  |           | GENERAL & ADMINISTRATIVE EXPENSES
(14 rows)

(Source code for function will be included below but I dont think this
is a function issue).

The above results are expected. Similarly if I run it in the column
list, I get tuple representations of the same data:

mtech_test=# select account_heading__list();
               account_heading__list
---------------------------------------------------
 (10001,1000,,"CURRENT ASSETS")
 (10006,1500,,"INVENTORY ASSETS")
 (10010,1800,,"CAPITAL ASSETS")
 (10015,2000,,"CURRENT LIABILITIES")
 (10027,2600,,"LONG TERM LIABILITIES")
 (10451,2700,,"Expense Accounts for Individuals")
 (10225,3000,,CAPITAL)
 (10030,3300,,"SHARE CAPITAL")
 (10032,4000,,"SALES REVENUE")
 (10036,4300,,"CONSULTING REVENUE")
 (10039,4400,,"OTHER REVENUE")
 (10043,5000,,"COST OF GOODS SOLD")
 (10049,5400,,"PAYROLL EXPENSES")
 (10055,5600,,"GENERAL & ADMINISTRATIVE EXPENSES")
(14 rows)

It's when we add group by that things appear broken.  Note it starts
returning 196 (14 x 14) records, which suggests a cross join against
itself.

mtech_test=# explain analyze select (account_heading__list()).* group by accno
mtech_test-# ;


                                         QUERY PLAN

---------------------------------------------------------------------------------
------------
 HashAggregate  (cost=0.26..1.27 rows=1 width=0) (actual
time=0.456..1.986 rows=1
96 loops=1)
   ->  Result  (cost=0.00..0.26 rows=1 width=0) (actual
time=0.170..0.194 rows=14
 loops=1)
 Total runtime: 2.076 ms
(3 rows)

My guess from looking at this deeper is that this is likely just
behavior that is prevented by group by column checks absent set
returning functions.  The behavior goes away when the return columns
are brought back in line with the group by:

mtech_test=# select count(*) from (select
(account_heading__list()).accno group by accno) c;
 count
-------
    14
(1 row)

Is this something we should be checking for and throwing exceptions based on?

mtech_test=# select version()
mtech_test-# ;
                                                  version

---------------------------------------------------------------------------------
--------------------------
 PostgreSQL 9.1.4 on i386-redhat-linux-gnu, compiled by gcc (GCC)
4.7.0 20120507
(Red Hat 4.7.0-5), 32-bit
(1 row)


mtech_test=# \df+ account_heading__list

                                    List of functions
 Schema |         Name          |   Result data type    | Argument
data types |
Type  | Volatility |  Owner   | Language |                   Source
code
          |                                  Description

--------+-----------------------+-----------------------+---------------------+--
------+------------+----------+----------+---------------------------------------
----------+----------------------------------------------------------------------
---------
 public | account_heading__list | SETOF account_heading |
       | n
ormal | stable     | postgres | sql      |  SELECT * FROM
account_heading order b
y accno;  |  Returns a list of all account headings, currently ordered
by account
 number.+
        |                       |                       |
       |
      |            |          |          |
          |
(1 row)

Best Wishes,
Chris Travers

Re: GROUP BY checks inadequate when set returning functions in column list

От
Tom Lane
Дата:
Chris Travers <chris@metatrontech.com> writes:
> It's when we add group by that things appear broken.  Note it starts
> returning 196 (14 x 14) records, which suggests a cross join against
> itself.

> mtech_test=# explain analyze select (account_heading__list()).* group by accno
> mtech_test-# ;

Hm, that really ought to throw an error, since you have ungrouped
columns in the result.  Not sure why it doesn't.

Beyond that, though, using a SRF in the target list this way is a bad
idea because the semantics are very ill-defined.  Stick to using it
in FROM.  (The upcoming LATERAL feature will remove the functional
limitations associated with that, so we're very unlikely to do anything
towards changing the existing behavior of SRFs-in-target-lists, no
matter how wacko they might appear.)

            regards, tom lane

Re: GROUP BY checks inadequate when set returning functions in column list

От
Chris Travers
Дата:
On Wed, Aug 22, 2012 at 8:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Chris Travers <chris@metatrontech.com> writes:
>> It's when we add group by that things appear broken.  Note it starts
>> returning 196 (14 x 14) records, which suggests a cross join against
>> itself.
>
>> mtech_test=# explain analyze select (account_heading__list()).* group by accno
>> mtech_test-# ;
>
> Hm, that really ought to throw an error, since you have ungrouped
> columns in the result.  Not sure why it doesn't.

Yeah, that was my point.  I don't know if it is worth fixing but
always better to report.
>
> Beyond that, though, using a SRF in the target list this way is a bad
> idea because the semantics are very ill-defined.

Yeah.  It looks like when you pair it with a general set in a from
clause you get an implicit cross join.

I have been avoiding it because I don't like implicit cross joins.

I will be following the LATERAL feature with interest.   Thanks for that.

Best Wishes,
Chris Travers

Re: GROUP BY checks inadequate when set returning functions in column list

От
Tom Lane
Дата:
Chris Travers <chris@metatrontech.com> writes:
> On Wed, Aug 22, 2012 at 8:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Chris Travers <chris@metatrontech.com> writes:
> mtech_test=# explain analyze select (account_heading__list()).* group by accno

>> Hm, that really ought to throw an error, since you have ungrouped
>> columns in the result.  Not sure why it doesn't.

> Yeah, that was my point.  I don't know if it is worth fixing but
> always better to report.

I looked into this, and found that the query basically expands to

SELECT (func()).field1, (func()).field2, (func()).field3, ... GROUP BY 2;

There are no Vars in this query; only FuncExprs and FieldSelects.
So that's why the check for ungrouped variables isn't complaining.
To make it complain, we'd have to have a notion that the different
occurrences of the function are generating related columns of a row,
which is something that quite disappears in the *-expansion.

As I said before, this isn't really an area that anybody is excited
about changing --- it's all legacy behavior, and if we change it
we're more likely to get complaints from people whose code broke
than compliments from people who can now use it for something.
Eventually I think SRFs in the targetlist will be deprecated in
favor of LATERAL constructs, though it will be a very long time
before we could consider removing the feature altogether.

            regards, tom lane