Обсуждение: BUG #13863: Select from views gives wrong results

Поиск
Список
Период
Сортировка

BUG #13863: Select from views gives wrong results

От
wrb@autistici.org
Дата:
The following bug has been logged on the website:

Bug reference:      13863
Logged by:          W
Email address:      wrb@autistici.org
PostgreSQL version: 9.5.0
Operating system:   Mac OS X 10.11.2 (postgres is from homebrew)
Description:

Hello,

I noticed strange behavior while using views with the new grouping sets
functionality.

Steps to reproduce are here:
https://gist.github.com/wrb/a73f45dcd335c8bc166c

On my machine, selecting from view gives zero results while selecting from
materialize view (or using CTE) gives the correct result. There's nothing
unusual in log.

Looking at explain analyze function, it seems that the filter is pushed too
deep and removes rows from the original table and not from the aggreage. It
works correctly when using CTE.

Thanks!

Re: BUG #13863: Select from views gives wrong results

От
Haribabu Kommi
Дата:
On Thu, Jan 14, 2016 at 5:35 AM,  <wrb@autistici.org> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13863
> Logged by:          W
> Email address:      wrb@autistici.org
> PostgreSQL version: 9.5.0
> Operating system:   Mac OS X 10.11.2 (postgres is from homebrew)
> Description:
>
> Hello,
>
> I noticed strange behavior while using views with the new grouping sets
> functionality.
>
> Steps to reproduce are here:
> https://gist.github.com/wrb/a73f45dcd335c8bc166c

I feel, It is always better to place the simple sql file in the mail itself,
instead of link.

Following are the sql statements that i took from the link.

-- table
create table emp (emp_id serial, department text, position text, sex
text, salary numeric);

-- data
begin;
insert into emp (department, position, sex, salary) values
('marketing', 'junior', 'F', 20000);
insert into emp (department, position, sex, salary) values
('marketing', 'junior', 'F', 25000);
insert into emp (department, position, sex, salary) values
('marketing', 'junior', 'M', 25000);
insert into emp (department, position, sex, salary) values ('it',
'senior', 'M', 125000);
insert into emp (department, position, sex, salary) values ('sales',
'normal', 'M', 50000);
insert into emp (department, position, sex, salary) values ('sales',
'junior', 'F', 30000);
insert into emp (department, position, sex, salary) values ('sales',
'junior', 'F', 30000);
commit;

-- view
create view v_emp_stats as  select department, position, sex,
avg(salary) from emp group by cube (department, position, sex);

-- materialized view
create materialized view mv_emp_stats as select department, position,
sex, avg(salary) from emp group by cube (department, position, sex);

-- those two selects should be identical, right?
select * from v_emp_stats where department is not null and position is null;
select * from mv_emp_stats where department is not null and position is null;

--CTE
with emp_data as (
select department, position, sex, avg(salary) from emp group by cube
(department, position, sex)
)
select * from emp_data where department is not null and position is null;

> On my machine, selecting from view gives zero results while selecting from
> materialize view (or using CTE) gives the correct result. There's nothing
> unusual in log.
>
> Looking at explain analyze function, it seems that the filter is pushed too
> deep and removes rows from the original table and not from the aggreage. It
> works correctly when using CTE.

As per my understanding of the result, this is correct.

>-- those two selects should be identical, right?
>select * from v_emp_stats where department is not null and position is null;

View's don't have storage, so any query on the view is rewritten to
the target relation.
In the above scenario, the same happened and the where clause is pushed into
the view select query. There is no row that is presented in the emp
table satisfies
the where clause that is provided. Because of this reason no data is selected.

>select * from mv_emp_stats where department is not null and position is null;
>
>with emp_data as (
>select department, position, sex, avg(salary) from emp group by cube (department, position, sex)
>)
>select * from emp_data where department is not null and position is null;

But where as with materialized view, it has the storage. The materialized view
is populated with the query result during creation. So adding an where clause
on materialized view satisfies with it and thus it returned the
results. The number
of records in the materialized view are 27 compared to the number of records in
table are 7.

The same with the CTE also, first the inner query is executed and on top of that
result the outer query is executed. Because of this reason, the where clause is
satisfied and the results are returned.

This is just an usage problem.

Regards,
Hari Babu
Fujitsu Australia

Re: BUG #13863: Select from views gives wrong results

От
wrb
Дата:
<body class=3D"ApplePlainTextBody" dir=3D"auto" =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space;">Ooops, forgot to mail the =
list.WBegin forwarded =
message:From: wrb <wrb@autistici.org>Subject: Re: =
[BUGS] BUG #13863: Select from views gives wrong resultsDate: 14. =
ledna 2016 10:49:27 SE=C4=8CTo: Haribabu Kommi =
<kommi.haribabu@gmail.com>Followup: This query =
should show better what's wrong with the optimizer:select a, =
count(*)  from test group by cube (a);a | count =
---+-------1 |     12 | =
    1  |     2(3 =
rows)select * from (select a, count(*)  from test group by =
cube (a)) t where a is null;a | count ---+-------  | =
    0explain select * from (select a, =
count(*) from test group by cube (a)) t where a is null; =
            &n=
bsp;           &nbs=
p; QUERY PLAN =
            &n=
bsp;           &nbs=
p;  --------------------------------------------------------=
----------GroupAggregate  (cost=3D32.79..32.89 rows=3D2 =
width=3D4)  Group Key: test.a  Group Key: () =
 Filter: (test.a IS NULL)  ->  Sort =
 (cost=3D32.79..32.82 rows=3D11 width=3D4) =
       Sort Key: test.a =
       ->  Seq Scan on test =
 (cost=3D0.00..32.60 rows=3D11 width=3D4) =
            &n=
bsp;Filter: (a IS NULL)The predicate "a is null" in the =
outer select should refer to results of inner select and should =
therefore return (null, 2) instead of no rows, =
right?W=

Re: BUG #13863: Select from views gives wrong results

От
wrb
Дата:
> On 14. 1. 2016, at 0:50, Haribabu Kommi <kommi.haribabu@gmail.com> =
wrote:
>=20
> View's don't have storage, so any query on the view is rewritten to
> the target relation.
> In the above scenario, the same happened and the where clause is =
pushed into
> the view select query. There is no row that is presented in the emp
> table satisfies
> the where clause that is provided. Because of this reason no data is =
selected.
>=20
> But where as with materialized view, it has the storage. The =
materialized view
> is populated with the query result during creation. So adding an where =
clause
> on materialized view satisfies with it and thus it returned the
> results. The number
> of records in the materialized view are 27 compared to the number of =
records in
> table are 7.
>=20
> The same with the CTE also, first the inner query is executed and on =
top of that
> result the outer query is executed. Because of this reason, the where =
clause is
> satisfied and the results are returned.

I understand why it happens - with normal group by, the predicate *can* =
be pushed down
unless it is on generated column. But with cube, every column is a =
geenrated column because
of the way cube works.

It works correctly unless cube is involved:

create table test (a int, b int);
create view v_test as select a, count(*) from test group by a;
create view v_test2 as select a, count(*) from test group by cube (a);
insert into test values (1, 1), (2, 2);

this doesn't get pushed down either as CTE or from view:

explain with t as (select a, count(*) from test group by a) select * =
from t where count =3D 1;
                              QUERY PLAN                             =20
----------------------------------------------------------------------
 CTE Scan on t  (cost=3D45.90..50.40 rows=3D1 width=3D12)
   Filter: (count =3D 1)
   CTE t
     ->  HashAggregate  (cost=3D43.90..45.90 rows=3D200 width=3D4)
           Group Key: test.a
           ->  Seq Scan on test  (cost=3D0.00..32.60 rows=3D2260 =
width=3D4)
(6 rows)

explain select * from v_test where count =3D 1;
                          QUERY PLAN                         =20
--------------------------------------------------------------
 HashAggregate  (cost=3D49.55..52.05 rows=3D200 width=3D4)
   Group Key: test.a
   Filter: (count(*) =3D 1)
   ->  Seq Scan on test  (cost=3D0.00..32.60 rows=3D2260 width=3D4)
(4 rows)


This does get pushed down correctly:

explain select * from v_test where a =3D 1;
                         QUERY PLAN                        =20
------------------------------------------------------------
 GroupAggregate  (cost=3D0.00..38.32 rows=3D1 width=3D4)
   Group Key: test.a
   ->  Seq Scan on test  (cost=3D0.00..38.25 rows=3D11 width=3D4)
         Filter: (a =3D 1)
(4 rows)


This doesn't get pushed down, but it's a known issue with CTEs being =
optimization boundary in postgres:

explain with t as (select a, count(*) from test group by a) select * =
from t where a =3D 1;
                              QUERY PLAN                             =20
----------------------------------------------------------------------
 CTE Scan on t  (cost=3D45.90..50.40 rows=3D1 width=3D12)
   Filter: (a =3D 1)
   CTE t
     ->  HashAggregate  (cost=3D43.90..45.90 rows=3D200 width=3D4)
           Group Key: test.a
           ->  Seq Scan on test  (cost=3D0.00..32.60 rows=3D2260 =
width=3D4)
(6 rows)

But when cube gets involved:

with t as (select a, count(*) from test group by cube (a)) select * from =
t where a is null;
 a | count=20
---+-------
   |     2

CTE still works, because it doesn't get pushed down:

explain with t as (select a, count(*) from test group by cube (a)) =
select * from t where a is null;
                                 QUERY PLAN                              =
  =20
=
--------------------------------------------------------------------------=
--
 CTE Scan on t  (cost=3D177.47..181.49 rows=3D1 width=3D12)
   Filter: (a IS NULL)
   CTE t
     ->  GroupAggregate  (cost=3D158.51..177.47 rows=3D201 width=3D4)
           Group Key: test.a
           Group Key: ()
           ->  Sort  (cost=3D158.51..164.16 rows=3D2260 width=3D4)
                 Sort Key: test.a
                 ->  Seq Scan on test  (cost=3D0.00..32.60 rows=3D2260 =
width=3D4)
(9 rows)

but with a view, it gets pushed down to the table, even though it =
doesn't make any sense for cube,=20
and gives incorrect results:

select * from v_test2 where a is null;
 a | count=20
---+-------
   |     0

explain select * from v_test2 where a is null;
                            QUERY PLAN                           =20
------------------------------------------------------------------
 GroupAggregate  (cost=3D32.79..32.89 rows=3D2 width=3D4)
   Group Key: test.a
   Group Key: ()
   Filter: (test.a IS NULL)
   ->  Sort  (cost=3D32.79..32.82 rows=3D11 width=3D4)
         Sort Key: test.a
         ->  Seq Scan on test  (cost=3D0.00..32.60 rows=3D11 width=3D4)
               Filter: (a IS NULL)



No, I'm pretty sure this is incorrect result, because views are supposed =
to provide logical data independence and running the same query through =
nested select, CTE and view should give the same result, right? If not, =
I'm pretty sure this is the first case where it doesn't work like this.

W

>=20
> This is just an usage problem.
>=20
> Regards,
> Hari Babu
> Fujitsu Australia

Re: BUG #13863: Select from views gives wrong results

От
Dean Rasheed
Дата:
On 14 January 2016 at 09:19, wrb <wrb@autistici.org> wrote:
> No, I'm pretty sure this is incorrect result, because views are supposed =
to provide logical data independence and running the same query through nes=
ted select, CTE and view should give the same result, right? If not, I'm pr=
etty sure this is the first case where it doesn't work like this.
>

Yes, there appears to be a real bug here, but actually the underlying
cause is more to do with the way HAVING and GROUPING SETS interact.

When the clause is pushed down into the view it is turned from a WHERE
clause into a HAVING clause, because the view query has grouping. That
part is OK. However, the query planner then decides that since the new
HAVING clause doesn't contain any aggregates or volatile expressions,
it is safe to turn it back into a WHERE clause in the inner query.
That would be OK for normal grouping, because in that case the clause
would evaluate the same for all the rows in a group. However, that
doesn't work for grouping sets, since the output of the grouping sets
aggregation may contains nulls not present before grouping.

Here is a simpler example:

create table foo(a int, b int);
insert into foo values (1,1), (2,2);

select a, count(*) from foo group by cube(a);
 a | count
---+-------
 1 |     1
 2 |     1
   |     2
(3 rows)

select a, count(*) from foo group by cube(a) having a is null;
 a | count
---+-------
   |     0
(1 row)

select a, count(*) from foo group by cube(a) having a is distinct from 1;
 a | count
---+-------
 2 |     1
   |     1
(2 rows)

The results from these last 2 queries are incorrect -- they're not the
same as evaluating the HAVING clause after grouping.

This bug appears to have been introduced by commit
61444bfb809d3a088a270a59f383af3d4cd157b0, which was discussed here:
http://www.postgresql.org/message-id/20150726151456.GD12755@awork2.anarazel=
.de

Probably that change should be undone and the comment expanded to
explain why it isn't safe when the query has grouping sets.

Regards,
Dean

Re: BUG #13863: Select from views gives wrong results

От
Andres Freund
Дата:
Hi,

Thanks for digging!

On 2016-01-16 11:23:31 +0000, Dean Rasheed wrote:
> When the clause is pushed down into the view it is turned from a WHERE
> clause into a HAVING clause, because the view query has grouping. That
> part is OK. However, the query planner then decides that since the new
> HAVING clause doesn't contain any aggregates or volatile expressions,
> it is safe to turn it back into a WHERE clause in the inner query.
> That would be OK for normal grouping, because in that case the clause
> would evaluate the same for all the rows in a group. However, that
> doesn't work for grouping sets, since the output of the grouping sets
> aggregation may contains nulls not present before grouping.

> Probably that change should be undone and the comment expanded to
> explain why it isn't safe when the query has grouping sets.

Hm, I'm a bit hesitant to potentially noticeably regress plans here,
although it seems unikely that already "relies" on this. Can't we
essentially treat references to columns which can be set to null by due
to a grouping set, as containing an agg clause and only prevent moving
in that case?

Greetings,

Andres Freund

Re: BUG #13863: Select from views gives wrong results

От
Dean Rasheed
Дата:
On 16 January 2016 at 13:12, Andres Freund <andres@anarazel.de> wrote:
> Hm, I'm a bit hesitant to potentially noticeably regress plans here,
> although it seems unikely that already "relies" on this. Can't we
> essentially treat references to columns which can be set to null by due
> to a grouping set, as containing an agg clause and only prevent moving
> in that case?
>

I suppose it could be done, but it seems like a fair bit of work. It
would be a bit like check_ungrouped_columns() except that it would
only be safe to move the clause if it only referred to expressions
that appeared in the intersection of all the grouping sets. I think
that would rule out most real-world examples, including just about
anything using rollup or cube, so I'm not sure it would be worth the
effort.

Regards,
Dean

Re: BUG #13863: Select from views gives wrong results

От
Andrew Gierth
Дата:
>>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes:

 > On 16 January 2016 at 13:12, Andres Freund <andres@anarazel.de> wrote:
 >> Hm, I'm a bit hesitant to potentially noticeably regress plans here,
 >> although it seems unikely that already "relies" on this. Can't we
 >> essentially treat references to columns which can be set to null by
 >> due to a grouping set, as containing an agg clause and only prevent
 >> moving in that case?

 Dean> I suppose it could be done, but it seems like a fair bit of
 Dean> work. It would be a bit like check_ungrouped_columns() except
 Dean> that it would only be safe to move the clause if it only referred
 Dean> to expressions that appeared in the intersection of all the
 Dean> grouping sets. I think that would rule out most real-world
 Dean> examples, including just about anything using rollup or cube, so
 Dean> I'm not sure it would be worth the effort.

Obvious question is, can we keep the optimization in the case that the
qual is strict? and if so, should we?

--
Andrew (irc:RhodiumToad)

Re: BUG #13863: Select from views gives wrong results

От
Dean Rasheed
Дата:
On 18 January 2016 at 12:54, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
> Obvious question is, can we keep the optimization in the case that the
> qual is strict? and if so, should we?
>

Perhaps. I don't think it's entirely obvious what it means for a qual
to be strict.

I think that it would have to be the case that for every nullable
expression (every expression appearing in at least one grouping set,
but not appearing in all the grouping sets) the qual would have to be
provably false or null when that expression was null, or the qual
would have to be independent of that expression.

That seems like a pretty tricky piece of code to get right. It might
be more reasonable if you limited yourself to the case where all the
expressions were Vars, then you could use find_nonnullable_vars().

Regards,
Dean

Re: BUG #13863: Select from views gives wrong results

От
Andrew Gierth
Дата:
I think for now the following patch is best. This preserves the ability
to push degenerate HAVING clauses through an explicit GROUP BY () as
well as an implicit one, but prevents pushdown in all other cases.

Includes comment and regression test.

--
Andrew (irc:RhodiumToad)


Вложения

Re: BUG #13863: Select from views gives wrong results

От
Dean Rasheed
Дата:
On 7 February 2016 at 23:21, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
> I think for now the following patch is best. This preserves the ability
> to push degenerate HAVING clauses through an explicit GROUP BY () as
> well as an implicit one, but prevents pushdown in all other cases.
>
> Includes comment and regression test.
>

Yes, I think that's probably the best solution for now. Longer term I
think it might be possible to do better, along the lines discussed
above, but I haven't had time to explore any of those options.

Note that with this change to the "if" condition, the next "else if"
condition can be simplified back to just:

    else if (parse->groupClause)
    {
        /* move it to WHERE */
        ...

Andres, are you going to take this one?

Regards,
Dean

Re: BUG #13863: Select from views gives wrong results

От
Andres Freund
Дата:
On 2016-02-08 09:41:10 +0000, Dean Rasheed wrote:
> On 7 February 2016 at 23:21, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
> > I think for now the following patch is best. This preserves the ability
> > to push degenerate HAVING clauses through an explicit GROUP BY () as
> > well as an implicit one, but prevents pushdown in all other cases.
> >
> > Includes comment and regression test.
> >
>
> Yes, I think that's probably the best solution for now. Longer term I
> think it might be possible to do better, along the lines discussed
> above, but I haven't had time to explore any of those options.
>
> Note that with this change to the "if" condition, the next "else if"
> condition can be simplified back to just:
>
>     else if (parse->groupClause)
>     {
>         /* move it to WHERE */
>         ...

FWIW, I find the current phrasing clearer.

> Andres, are you going to take this one?

Working on it as we speak. I thought you might appreciate doing your
first commit under a littlebit less pressure.

Andres

Re: BUG #13863: Select from views gives wrong results

От
Dean Rasheed
Дата:
On 8 February 2016 at 09:44, Andres Freund <andres@anarazel.de> wrote:
>> Note that with this change to the "if" condition, the next "else if"
>> condition can be simplified back to just:
>>
>>     else if (parse->groupClause)
>>     {
>>         /* move it to WHERE */
>>         ...
>
> FWIW, I find the current phrasing clearer.
>

OK, fair enough.

>> Andres, are you going to take this one?
>
> Working on it as we speak. I thought you might appreciate doing your
> first commit under a littlebit less pressure.
>

Thanks. I'm a bit swamped with work for my day job right now.

Regards,
Dean

Re: BUG #13863: Select from views gives wrong results

От
Andres Freund
Дата:
On 2016-02-07 23:21:57 +0000, Andrew Gierth wrote:
> I think for now the following patch is best. This preserves the ability
> to push degenerate HAVING clauses through an explicit GROUP BY () as
> well as an implicit one, but prevents pushdown in all other cases.
>
> Includes comment and regression test.

I've pushed this now. I'd greatly appreciate sending/pinging earlier
than the day of wrapping, next time round...

Andres