Обсуждение: Postgres views cannot use both union and join/where

Поиск
Список
Период
Сортировка

Postgres views cannot use both union and join/where

От
Mithran Kulasekaran
Дата:
Hi,

We are trying to use the postgres view to accommodate some of the complex workflow related operations, we perform we saw like using union in a where clause inside a view actually pushed the where clause to both subqueries and we get good performance using the index , but when used in a join it does a full scan and filter of the table instead of pushing the filter column as a where clause. we also found that when used without any join/where in the union clause (i.e., select ... from template union all select ... from template_staging) works with joins just fine , i think the only problem is when we try to use both union and where/join the issue starts to happen is there any specific flag or release planned to address this issue.

Postgres version: PostgreSQL 12.7 (Debian 12.7-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit

SQL Steps:
create table template
(
id int primary key,
name varchar(30) unique,
description varchar(30)
);

create table template_staging
(
id int primary key,
name varchar(30) unique,
description varchar(30),
is_deleted bool
);

insert into template (id, name, description)
values (1, 'test1', 'hello'),
(2, 'test2', 'hello world 2'),
(3, 'test3', 'hello world 3');
insert into template_staging (id, name, description, is_deleted)
values (3, 'test3', 'revert hello world 3', false),
(4, 'test4', 'hello world 2', false),
(5, 'test5', 'hello world 3', false);

create  view template_view (id, name, description, is_staged) as
select t.id,t.name, t.description, false as is_staged
from template t
left join template_staging ts on t.name = ts.name and ts.name is null
UNION ALL
select t.id, t.name, t.description, true as is_stage
from template_staging t
where is_deleted is false;
create table tester(
id int primary key,
template_id int
);
insert into tester (id, template_id)
values (1, 1),
(2, 2),
(3, 3),(4, 4);

Analysis:
EXPLAIN ANALYZE select * from template_view where id=1;
Append  (cost=0.15..16.36 rows=2 width=161) (actual time=0.012..0.015 rows=1 loops=1)
  ->  Index Scan using template_pkey on template t  (cost=0.15..8.17 rows=1 width=161) (actual time=0.011..0.012 rows=1 loops=1)
        Index Cond: (id = 1)
  ->  Index Scan using template_staging_pkey on template_staging t_1  (cost=0.15..8.17 rows=1 width=161) (actual time=0.002..0.002 rows=0 loops=1)
        Index Cond: (id = 1)
        Filter: (is_deleted IS FALSE)

EXPLAIN ANALYZE select * from template_view where name='test1';
Append  (cost=0.15..16.36 rows=2 width=157) (actual time=0.012..0.015 rows=1 loops=1)
  ->  Index Scan using template_name_key on template t  (cost=0.15..8.17 rows=1 width=157) (actual time=0.012..0.012 rows=1 loops=1)
        Index Cond: ((name)::text = 'test1'::text)
  ->  Index Scan using template_staging_name_key on template_staging t_1  (cost=0.15..8.17 rows=1 width=157) (actual time=0.002..0.002 rows=0 loops=1)
        Index Cond: ((name)::text = 'test1'::text)
        Filter: (is_deleted IS FALSE)


EXPLAIN ANALYZE select * from tester t inner join template_view tv on tv.id = t.template_id
where t.id=1;
Hash Join  (cost=8.18..48.19 rows=3 width=169) (actual time=0.024..0.032 rows=1 loops=1)
  Hash Cond: (t_1.id = t.template_id)
  ->  Append  (cost=0.00..38.27 rows=645 width=161) (actual time=0.008..0.014 rows=6 loops=1)
        ->  Seq Scan on template t_1  (cost=0.00..14.30 rows=430 width=161) (actual time=0.008..0.009 rows=3 loops=1)
        ->  Seq Scan on template_staging t_2  (cost=0.00..14.30 rows=215 width=161) (actual time=0.003..0.004 rows=3 loops=1)
              Filter: (is_deleted IS FALSE)
  ->  Hash  (cost=8.17..8.17 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Index Scan using tester_pkey on tester t  (cost=0.15..8.17 rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1)
              Index Cond: (id = 1)
EXPLAIN (ANALYZE, BUFFERS) select * from template_view where id=1;
Append  (cost=0.15..16.36 rows=2 width=161) (actual time=0.011..0.015 rows=1 loops=1)
  Buffers: shared hit=3
  ->  Index Scan using template_pkey on template t  (cost=0.15..8.17 rows=1 width=161) (actual time=0.011..0.011 rows=1 loops=1)
        Index Cond: (id = 1)
        Buffers: shared hit=2
  ->  Index Scan using template_staging_pkey on template_staging t_1  (cost=0.15..8.17 rows=1 width=161) (actual time=0.002..0.002 rows=0 loops=1)
        Index Cond: (id = 1)
        Filter: (is_deleted IS FALSE)
        Buffers: shared hit=1

EXPLAIN (ANALYZE, BUFFERS) select * from tester t inner join template_view tv on tv.id = t.template_id
where t.id=1;
Hash Join  (cost=8.18..48.19 rows=3 width=169) (actual time=0.019..0.025 rows=1 loops=1)
  Hash Cond: (t_1.id = t.template_id)
  Buffers: shared hit=4
  ->  Append  (cost=0.00..38.27 rows=645 width=161) (actual time=0.007..0.011 rows=6 loops=1)
        Buffers: shared hit=2
        ->  Seq Scan on template t_1  (cost=0.00..14.30 rows=430 width=161) (actual time=0.006..0.007 rows=3 loops=1)
              Buffers: shared hit=1
        ->  Seq Scan on template_staging t_2  (cost=0.00..14.30 rows=215 width=161) (actual time=0.002..0.003 rows=3 loops=1)
              Filter: (is_deleted IS FALSE)
              Buffers: shared hit=1
  ->  Hash  (cost=8.17..8.17 rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        Buffers: shared hit=2
        ->  Index Scan using tester_pkey on tester t  (cost=0.15..8.17 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=1)
              Index Cond: (id = 1)
              Buffers: shared hit=2

Please let me know if you need more info. 

Thanks,
Mithran


Re: Postgres views cannot use both union and join/where

От
"David G. Johnston"
Дата:
On Tue, Oct 19, 2021 at 2:48 PM Mithran Kulasekaran <mithranakulasekaran@gmail.com> wrote:
i think the only problem is when we try to use both union and where/join the issue starts to happen

I'm unconvinced this is actually an issue based upon what is presented here.  All I'm seeing is two decidedly different queries resulting in different query plans.  That the "problem one" isn't using an index isn't surprising given the volume of data involved and the change from specifying a literal value in the where clause to letting a join determine which results to return.

Assuming you have a real scenario you are testing with being able to demonstrate (probably through the use of the query planner GUCs) that PostgreSQL can produce a better plan but doesn't by default would be a more compelling case.  More generally, you probably need to either use your real scenario's data to help demonstrate the issue or create a self-contained test that is at least closer to what it produces (this approach still benefits from seeing what is happening for real).

David J.


Re: Postgres views cannot use both union and join/where

От
Benedict Holland
Дата:
I thought a union mashed together two queries. The where clause can appear in both. But the execution plan will almost certainly run the first query and the second query. It should throw an error if the types don't match or the number of columns don't match. 

There are so few use cases for unions that can't get fixed with better schema designs. I ran into a few over the years.

On Tue, Oct 19, 2021, 9:32 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, Oct 19, 2021 at 2:48 PM Mithran Kulasekaran <mithranakulasekaran@gmail.com> wrote:
i think the only problem is when we try to use both union and where/join the issue starts to happen

I'm unconvinced this is actually an issue based upon what is presented here.  All I'm seeing is two decidedly different queries resulting in different query plans.  That the "problem one" isn't using an index isn't surprising given the volume of data involved and the change from specifying a literal value in the where clause to letting a join determine which results to return.

Assuming you have a real scenario you are testing with being able to demonstrate (probably through the use of the query planner GUCs) that PostgreSQL can produce a better plan but doesn't by default would be a more compelling case.  More generally, you probably need to either use your real scenario's data to help demonstrate the issue or create a self-contained test that is at least closer to what it produces (this approach still benefits from seeing what is happening for real).

David J.


Re: Postgres views cannot use both union and join/where

От
Michael Lewis
Дата:
On Tue, Oct 19, 2021 at 3:48 PM Mithran Kulasekaran <mithranakulasekaran@gmail.com> wrote:
create  view template_view (id, name, description, is_staged) as
select t.id,t.name, t.description, false as is_staged
from template t
left join template_staging ts on t.name = ts.name and ts.name is null

Does that work? I've only seen that type of logic written as-

left join template_staging ts on t.name = ts.name
where ts.name is null

Re: Postgres views cannot use both union and join/where

От
"David G. Johnston"
Дата:
On Tuesday, October 19, 2021, Michael Lewis <mlewis@entrata.com> wrote:
On Tue, Oct 19, 2021 at 3:48 PM Mithran Kulasekaran <mithranakulasekaran@gmail.com> wrote:
create  view template_view (id, name, description, is_staged) as
select t.id,t.name, t.description, false as is_staged
from template t
left join template_staging ts on t.name = ts.name and ts.name is null

Does that work? I've only seen that type of logic written as-

left join template_staging ts on t.name = ts.name
where ts.name is null

The are functionally equivalent, though the timing of the expression evaluation differs slightly.

It could also be written as an anti-join:

Select * from template as t where not exists (select 1 from template_staging as ts where t.name = ts.name)

David J.

Re: Postgres views cannot use both union and join/where

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tuesday, October 19, 2021, Michael Lewis <mlewis@entrata.com> wrote:
>> On Tue, Oct 19, 2021 at 3:48 PM Mithran Kulasekaran <
>> mithranakulasekaran@gmail.com> wrote:
>>> create  view template_view (id, name, description, is_staged) as
>>> select t.id,t.name, t.description, false as is_staged
>>> from template t
>>> left join template_staging ts on t.name = ts.name and ts.name is null

>> Does that work? I've only seen that type of logic written as-
>> left join template_staging ts on t.name = ts.name
>> where ts.name is null

> The are functionally equivalent, though the timing of the expression
> evaluation differs slightly.

No, not at all.  Michael's version correctly implements an anti-join,
where the first version does not.  The reason is that the WHERE clause
"sees" the column value post-JOIN, whereas the JOIN/ON clause "sees"
values pre-JOIN.

Assuming that the '=' operator is strict, the first query's ON clause
really reduces to constant false, so that you just get a null-extended
image of the left table.  That's almost surely not what's wanted.

            regards, tom lane



Re: Postgres views cannot use both union and join/where

От
"David G. Johnston"
Дата:
On Wed, Oct 20, 2021 at 6:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tuesday, October 19, 2021, Michael Lewis <mlewis@entrata.com> wrote:
>> On Tue, Oct 19, 2021 at 3:48 PM Mithran Kulasekaran <
>> mithranakulasekaran@gmail.com> wrote:
>>> create  view template_view (id, name, description, is_staged) as
>>> select t.id,t.name, t.description, false as is_staged
>>> from template t
>>> left join template_staging ts on t.name = ts.name and ts.name is null

>> Does that work? I've only seen that type of logic written as-
>> left join template_staging ts on t.name = ts.name
>> where ts.name is null

> The are functionally equivalent, though the timing of the expression
> evaluation differs slightly.

No, not at all.  Michael's version correctly implements an anti-join,
where the first version does not.  The reason is that the WHERE clause
"sees" the column value post-JOIN, whereas the JOIN/ON clause "sees"
values pre-JOIN.

Yeah, my bad.  I was actually thinking this but then figured the OP wouldn't have written an anti-join that didn't actually work.

My original email was going to be:

Adding the single table expression to the ON clause is shorthand for writing:

SELECT t.* FROM template AS t LEFT JOIN (SELECT * FROM template_staging WHERE template_staging.name IS NULL) AS ts ON t.name = ts.name;

David J.