Problems pushing down WHERE-clause to underlying view

Поиск
Список
Период
Сортировка
От Nicklas Avén
Тема Problems pushing down WHERE-clause to underlying view
Дата
Msg-id 4d8a05b6-39de-9a06-7e86-85754fd94716@jordogskog.no
обсуждение исходный текст
Ответы Re: Problems pushing down WHERE-clause to underlying view  (Adrian Klaver <adrian.klaver@aklaver.com>)
Список pgsql-general
Hi


We have a system with 2 layers of views. It is about forestry.

The first layer contains the logic like grouping volumes in logs 
together to stems or harvesting areas and joining species names to codes 
and things like that.

The second layer just joins this underlying views to a table with user 
ids and machine ids. So, when used by the application there is a where 
clause containing the user id which gives access to the correct data

in the underlying view based on machine id.


The underlying view in this case can return approx 22000 rows, grouped 
from approx 8 million logs by harvest date, harvest object and so on.


The problem is that it always calculates all those 22000 rows even if 
the user id I use only gives 250 rows.

So, the query uses 4 seconds instead of under 100 ms.


I have tried this on 2 servers and my laptop, PostgreSQL 9.6 and 10 and 
get the same issue.


I have tried to pick the query apart to understand what is happening.


First, the underlying view looks like this except I removed some fields 
that doesn't affect the case to save some space:

CREATE OR REPLACE VIEW underlying_view AS
  SELECT
     l.machine_key,
     o.object_name,
     o.sub_object_name,
     s.species_group_name,
     p.product_group_name ,
     l.m3_sub AS volume_m3sub,
     l.number_of_logs,
     mi.basemachine_manufacturer,
     mi.basemachine_model
    FROM
     (
        SELECT
         hl.contractor_id,
         hl.machine_key,
         hl.operator_key,
         hl.object_key,
         hl.sub_object_key,
         date(hl.harvest_date) AS harvest_date, --this is timestamptz 
since we use the time in other places
         hl.species_group_key,
         hl.product_key,
         sum(hl.m3_sub) AS m3_sub,
         count(*) AS number_of_logs
     FROM version_union_tables_r02.harvester_logs hl
     GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, 
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
hl.species_group_key, hl.product_key
     ) l
      LEFT JOIN version_union_tables_r02.machine_info mi ON 
l.machine_key::text = mi.machine_key::text
      LEFT JOIN version_union_tables_r02.objects o ON 
l.machine_key::text = o.machine_key::text AND l.object_key = 
o.object_key AND l.sub_object_key = o.sub_object_key
      LEFT JOIN version_union_tables_r02.products p ON 
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
      LEFT JOIN version_union_tables_r02.species s ON 
l.machine_key::text = s.machine_key::text AND l.species_group_key = 
s.species_group_key;


In the next level is a view that looks like this:


CREATE OR REPLACE VIEW top_level_view AS
  SELECT *
    FROM underlying_view a
      JOIN contractor_access b ON a.machine_key = b.machine_key WHERE 
b.active <> 0;


If I query this top_level_view like:

SELECT * FROM top_level_view WHERE user_id = 'name@email.address';

I haven't succeeded to avoid the underlying view to compute the full 
dataset.


The user_id 'name@email.address' returns 1 machine_key from 
contractor_access table that has any hits in the underlying view (4 in 
total but 3 machines are not represented in the underlying view)

We call the machine that we get from contractor_id table 'machine1'

Here is what I have tried to find out when the plan changes:


1) SELECT * FROM underlying_view WHERE machine_key = 'machine1'; -- 
returns 250 rows in approx 100 ms, so, works as expected


2) select * from (select 'machine1' machine_key) a, underlying_view b 
where a.machine_key = b.machine_key; --same as above, works as expected


3) select * from (select * from contractor_access where user_id = 
'name@email.address') a,
underlying_view b where a.machine_key = b.machine_key;      -- Here I am 
hit. this returns the same 250 rows, but in over 4 seconds


/*So I thought I should try to force down the machine_key to the 
underlying view with lateral like this*/


4) select * from (select * from contractor_access where user_id = 
'name@email.address') a,
lateral (select * from underlying_view where machine_key = a.machine_key) b;

But this doesn't work either. It returns the same 250 rows in approx 4 
seconds.


My question is, is there some trick to force the planner to push down 
the machine_key.

I cannot understand what is fooling the planner.

The table is analyzed, I have tried on several machines, so I do not 
think it is miss leading statistics.


I haven't done any configuration more than tried with 
max_parallel_workers_per_gather to 0 since the workers makes it harder 
to understand what is happening.


Here is the quer plan on query number 3 above:

EXPLAIN select * from (select * from contractor_access where user_id = 
'name@email.address') a,
underlying_view b where a.machine_key = b.machine_key;

Aggregate  (cost=543839.03..543839.04 rows=1 width=8)
   ->  Hash Join  (cost=395402.74..543798.72 rows=16123 width=0)
         Hash Cond: ((hl.machine_key)::text = contractor_access.machine_key)
         ->  Hash Left Join  (cost=395395.10..533563.59 rows=806147 
width=400)
               Hash Cond: (((hl.machine_key)::text = 
(s.machine_key)::text) AND (hl.species_group_key = s.species_group_key))
               ->  Hash Left Join  (cost=395380.73..485122.31 
rows=806147 width=32)
                     Hash Cond: (((hl.machine_key)::text = 
(p.machine_key)::text) AND (hl.product_key = p.product_key))
                     ->  Hash Left Join (cost=395320.48..444697.18 
rows=806147 width=36)
                           Hash Cond: (((hl.machine_key)::text = 
(o.machine_key)::text) AND (hl.object_key = o.object_key) AND 
(hl.sub_object_key = o.sub_object_key))
                           ->  Hash Left Join (cost=395257.01..417426.05 
rows=806147 width=36)
                                 Hash Cond: ((hl.machine_key)::text = 
(mi.machine_key)::text)
                                 Join Filter: (((hl.contractor_id)::text 
= (mi.contractor_id)::text) OR ((hl.contractor_id IS NULL) AND 
(mi.contractor_id IS NULL)))
                                 ->  HashAggregate 
(cost=395254.66..403316.13 rows=806147 width=86)
                                       Group Key: hl.machine_key, 
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
hl.harvest_date, hl.species_group_key, hl.product_key
                                       ->  Seq Scan on harvester_logs 
hl  (cost=0.00..234025.22 rows=8061472 width=54)
                                 ->  Hash  (cost=1.60..1.60 rows=60 
width=35)
                                       ->  Seq Scan on machine_info mi  
(cost=0.00..1.60 rows=60 width=35)
                           ->  Hash  (cost=33.26..33.26 rows=1726 width=23)
                                 ->  Seq Scan on objects o 
(cost=0.00..33.26 rows=1726 width=23)
                     ->  Hash  (cost=37.90..37.90 rows=1490 width=29)
                           ->  Seq Scan on products p (cost=0.00..37.90 
rows=1490 width=29)
               ->  Hash  (cost=7.55..7.55 rows=455 width=31)
                     ->  Seq Scan on species s  (cost=0.00..7.55 
rows=455 width=31)
         ->  Hash  (cost=7.59..7.59 rows=4 width=21)
               ->  Seq Scan on contractor_access  (cost=0.00..7.59 
rows=4 width=21)
                     Filter: (t4e_contractor_id = 
'name@email.address'::text)



Thanks


Nicklas Avén










В списке pgsql-general по дате отправления:

Предыдущее
От: Bruno Lavoie
Дата:
Сообщение: Re: Shared hosting with FDW on AWS RDS
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: PostgreSql Version Compatibility With Apache ActiveMQ