Re: Problems pushing down WHERE-clause to underlying view

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: Problems pushing down WHERE-clause to underlying view
Дата
Msg-id 473540ab-5933-04e8-ce55-074f4faaab82@aklaver.com
обсуждение исходный текст
Ответ на Re: Problems pushing down WHERE-clause to underlying view  (Nicklas Avén <nicklas.aven@jordogskog.no>)
Ответы Re: Problems pushing down WHERE-clause to underlying view  (Nicklas Avén <nicklas.aven@jordogskog.no>)
Список pgsql-general
On 2/15/19 9:27 AM, Nicklas Avén wrote:
> 
> On 2/15/19 5:06 PM, Adrian Klaver wrote:
>  > On 2/15/19 7:28 AM, Nicklas Avén wrote:
>  >> Hi
>  >>
>  >> The problem is that it always calculates all those 22000 rows even 
> if the user id I use only gives 250 rows.
>  >>
>  >> So, the query uses 4 seconds instead of under 100 ms.
>  >
>  > https://www.postgresql.org/docs/10/sql-createview.html
>  >
>  > "CREATE VIEW defines a view of a query. The view is not physically 
> materialized. Instead, the query is run every time the view is 
> referenced in a query."
> 
>  >
> 
> 
> 
> Sorry, I must have expressed what I mean bad. Of course a view is not 
> materialized.
> I will explain without views what I mean here below
> 
> 
> 
>  > Might want to look at materialized view:
>  > https://www.postgresql.org/docs/10/sql-creatematerializedview.html
>  >
>  > "CREATE MATERIALIZED VIEW is similar to CREATE TABLE AS, except that 
> it also remembers the query used to initialize the view, so that it can 
> be refreshed later upon demand. A materialized view has many of the same 
> properties as a table, but there is no support for temporary 
> materialized views or automatic generation of OIDs."
> 
>  >
> 
> 
> No, materialized views is not an option. We get some data into those 
> tables daily. Recalculating the full dataset on a lot of views like this 
> doesn't make sense.
> Instead we have tables maintained with processed new data. But I want 
> this last part of logic on top as views for flexibility,
> to not need cached tables for each possible type of grouping that we need.
> Started out with materialized views and it didn't work out well.
> 
> 
> 
> 
>  >
>  > I would also suggest running the EXPLAIN below with ANALYZE so actual 
> timings are returned. Also try:
>  >
>  > SELECT
>  >     *
>  > FROM
>  >     underlying_view AS b
>  > JOIN
>  >     contractor_access AS b
>  > ON
>  >     a.machine_key = b.machine_key
>  > WHERE
>  >     user_id = 'name@email.address'
>  >>
> 
> Sorry again, I didn't mention. This I have tried this since this is what 
> the top level view do.
> So first step when trying to understand this was (of course) to apply 
> the where-clause directly to the query
> 
> 
> 
> So, let's do that also on the underlying query (view) .
> 
> 
> Here I have 2 queries, where I apply the where clause directly to the 
> query in the underlying view
> 
> (joining the contractor_access table directly on that query).
> 
> The first takes 30-40 ms ms and returns the same 250 rows as the second.
> In the first I use the machine_key in the where clause.
> In the second query that takes about 16 seconds to return the same 250 
> rows I use the user_id in the contractor_access table.
> 
> I have also cleaned up the contractor_access table. So there is only 1 
> row now, with my email as user_id and the same machine_key as used in 
> the first query.
> 

I have not had chance to fully go through all of below. Some 
questions/suggestions:

1) Thanks for the formatted queries. If I could make a suggestion, when 
aliasing could you include AS. It would make finding what l.* refers to 
easier for those of us with old eyes:)

2) t4e_contractor_id is in the shiny_adm.contractor_access table?
If not where?

3) What is the schema for shiny_adm.contractor_access?
In particular what indexes are on it?

> 
> 
> Query 1:
> 
> 
> EXPLAIN ANALYZE
> SELECT
>      l.machine_key,
>      o.object_name,
>      o.sub_object_name,
>      o.object_user_id,
>      o.sub_object_user_id,
>      o.start_date AS object_start_date,
>      s.species_group_name,
>      p.product_group_name,
>      l.m3_sub AS volume_m3sub,
>      l.number_of_logs,
>      mi.basemachine_manufacturer,
>      mi.basemachine_model
>     FROM ( SELECT hl.contractor_id,
>              hl.machine_key,
>              hl.operator_key,
>              hl.object_key,
>              hl.sub_object_key,
>              date(hl.harvest_date) AS harvest_date,
>              hl.species_group_key,
>              hl.product_key,
>              sum(hl.m3_sub) AS m3_sub,
>              count(*) AS number_of_logs
>             FROM version_union_tables_r02.harvester_logs hl
>            GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, 
> hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
> hl.species_group_key, hl.product_key) l
>       LEFT JOIN version_union_tables_r02.machine_info mi ON 
> l.machine_key::text = mi.machine_key::text
>       LEFT JOIN version_union_tables_r02.objects o ON 
> l.machine_key::text = o.machine_key::text AND l.object_key = 
> o.object_key AND l.sub_object_key = o.sub_object_key
>       LEFT JOIN version_union_tables_r02.products p ON 
> l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
>       LEFT JOIN version_union_tables_r02.species s ON 
> l.machine_key::text = s.machine_key::text AND l.species_group_key = 
> s.species_group_key
> join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
> where l.machine_key = '887655635442600'
> ;
> 
> which results in this query plan
> 
> Nested Loop  (cost=61865.25..65302.20 rows=22624 width=122) (actual 
> time=25.804..27.134 rows=250 loops=1)
>    ->  Seq Scan on contractor_access ci  (cost=0.00..1.01 rows=1 
> width=19) (actual time=0.009..0.010 rows=1 loops=1)
>          Filter: (machine_key = '887655635442600'::text)
>    ->  Hash Left Join  (cost=61865.25..65074.95 rows=22624 width=122) 
> (actual time=25.793..26.959 rows=250 loops=1)
>          Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text) 
> AND (hl.species_group_key = s.species_group_key))
>          ->  Hash Left Join  (cost=61854.55..64263.92 rows=22624 
> width=120) (actual time=25.755..26.763 rows=250 loops=1)
>                Hash Cond: (((hl.machine_key)::text = 
> (p.machine_key)::text) AND (hl.product_key = p.product_key))
>                ->  Hash Left Join  (cost=61815.97..63145.14 rows=22624 
> width=118) (actual time=25.706..26.543 rows=250 loops=1)
>                      Hash Cond: (((hl.machine_key)::text = 
> (o.machine_key)::text) AND (hl.object_key = o.object_key) AND 
> (hl.sub_object_key = o.sub_object_key))
>                      ->  Hash Left Join  (cost=61799.78..62619.90 
> rows=22624 width=65) (actual time=25.668..26.327 rows=250 loops=1)
>                            Hash Cond: ((hl.machine_key)::text = 
> (mi.machine_key)::text)
>                            ->  HashAggregate (cost=61796.99..62079.79 
> rows=22624 width=69) (actual time=25.627..26.132 rows=250 loops=1)
>                                  Group Key: hl.machine_key, 
> hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
> date(hl.harvest_date), hl.species_group_key, hl.product_key
>                                  ->  Bitmap Heap Scan on harvester_logs 
> hl  (cost=570.14..61224.14 rows=22914 width=61) (actual 
> time=0.909..11.573 rows=24151 loops=1)
>                                        Recheck Cond: 
> ((machine_key)::text = '887655635442600'::text)
>                                        Heap Blocks: exact=538
>                                        ->  Bitmap Index Scan on 
> version_union_tables_r02_harvester_logs_machine_key (cost=0.00..564.41 
> rows=22914 width=0) (actual time=0.870..0.870 rows=24151 loops=1)
>                                              Index Cond: 
> ((machine_key)::text = '887655635442600'::text)
>                            ->  Hash  (cost=2.77..2.77 rows=1 width=38)
> (actual time=0.023..0.023 rows=1 loops=1)
>                                  Buckets: 1024  Batches: 1  Memory 
> Usage: 9kB
>                                  ->  Seq Scan on machine_info mi 
> (cost=0.00..2.77 rows=1 width=38) (actual time=0.018..0.019 rows=1 loops=1)
>                                        Filter: ((machine_key)::text = 
> '887655635442600'::text)
>                                        Rows Removed by Filter: 61
>                      ->  Hash  (cost=16.12..16.12 rows=4 width=84) 
> (actual time=0.025..0.026 rows=3 loops=1)
>                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
>                            ->  Bitmap Heap Scan on objects o 
> (cost=4.31..16.12 rows=4 width=84) (actual time=0.020..0.021 rows=3 
> loops=1)
>                                  Recheck Cond: ((machine_key)::text = 
> '887655635442600'::text)
>                                  Heap Blocks: exact=1
>                                  ->  Bitmap Index Scan on 
> version_union_tables_r02_objects_machine_key  (cost=0.00..4.31 rows=4 
> width=0) (actual time=0.015..0.015 rows=3 loops=1)
>                                        Index Cond: ((machine_key)::text 
> = '887655635442600'::text)
>                ->  Hash  (cost=38.19..38.19 rows=26 width=35) (actual 
> time=0.037..0.037 rows=26 loops=1)
>                      Buckets: 1024  Batches: 1  Memory Usage: 10kB
>                      ->  Bitmap Heap Scan on products p 
> (cost=4.48..38.19 rows=26 width=35) (actual time=0.015..0.026 rows=26 
> loops=1)
>                            Recheck Cond: ((machine_key)::text = 
> '887655635442600'::text)
>                            Heap Blocks: exact=1
>                            ->  Bitmap Index Scan on 
> version_union_tables_r02_products_machine_key  (cost=0.00..4.47 rows=26 
> width=0) (actual time=0.009..0.009 rows=26 loops=1)
>                                  Index Cond: ((machine_key)::text = 
> '887655635442600'::text)
>          ->  Hash  (cost=10.52..10.52 rows=12 width=37) (actual 
> time=0.028..0.029 rows=12 loops=1)
>                Buckets: 1024  Batches: 1  Memory Usage: 9kB
>                ->  Bitmap Heap Scan on species s (cost=4.37..10.52 
> rows=12 width=37) (actual time=0.016..0.021 rows=12 loops=1)
>                      Recheck Cond: ((machine_key)::text = 
> '887655635442600'::text)
>                      Heap Blocks: exact=1
>                      ->  Bitmap Index Scan on 
> version_union_tables_r02_species_machine_key  (cost=0.00..4.36 rows=12 
> width=0) (actual time=0.008..0.008 rows=12 loops=1)
>                            Index Cond: ((machine_key)::text = 
> '887655635442600'::text)
> Planning time: 0.434 ms
> Execution time: 27.370 ms
> 
> 
> 
> Next query, the slow one that calculates the whole dataset:
> 
> 
> EXPLAIN ANALYZE
> SELECT
>      l.machine_key,
>      o.object_name,
>      o.sub_object_name,
>      o.object_user_id,
>      o.sub_object_user_id,
>      o.start_date AS object_start_date,
>      s.species_group_name,
>      p.product_group_name,
>      l.m3_sub AS volume_m3sub,
>      l.number_of_logs,
>      mi.basemachine_manufacturer,
>      mi.basemachine_model
>     FROM ( SELECT hl.contractor_id,
>              hl.machine_key,
>              hl.operator_key,
>              hl.object_key,
>              hl.sub_object_key,
>              date(hl.harvest_date) AS harvest_date,
>              hl.species_group_key,
>              hl.product_key,
>              sum(hl.m3_sub) AS m3_sub,
>              count(*) AS number_of_logs
>             FROM version_union_tables_r02.harvester_logs hl
>            GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, 
> hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
> hl.species_group_key, hl.product_key) l
>       LEFT JOIN version_union_tables_r02.machine_info mi ON 
> l.machine_key::text = mi.machine_key::text
>       LEFT JOIN version_union_tables_r02.objects o ON 
> l.machine_key::text = o.machine_key::text AND l.object_key = 
> o.object_key AND l.sub_object_key = o.sub_object_key
>       LEFT JOIN version_union_tables_r02.products p ON 
> l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
>       LEFT JOIN version_union_tables_r02.species s ON 
> l.machine_key::text = s.machine_key::text AND l.species_group_key = 
> s.species_group_key
> join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
> where t4e_contractor_id = 'nicklas.aven@jordogskog.no';
> 
> 
> results in this query plan:
> 
> Hash Left Join  (cost=1780026.09..2023556.15 rows=4044 width=122) 
> (actual time=15860.900..15888.766 rows=250 loops=1)
>    Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND 
> (hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))
>    ->  Merge Left Join  (cost=1779946.65..2023340.22 rows=4044 width=69) 
> (actual time=15859.604..15887.287 rows=250 loops=1)
>          Merge Cond: ((hl.machine_key)::text = (s.machine_key)::text)
>          Join Filter: (hl.species_group_key = s.species_group_key)
>          Rows Removed by Join Filter: 2750
>          ->  Merge Left Join  (cost=1779915.71..2023136.40 rows=4044 
> width=67) (actual time=15859.072..15884.912 rows=250 loops=1)
>                Merge Cond: ((hl.machine_key)::text = (p.machine_key)::text)
>                Join Filter: (hl.product_key = p.product_key)
>                Rows Removed by Join Filter: 6250
>                ->  Merge Left Join  (cost=1779788.20..2022471.20 
> rows=4044 width=65) (actual time=15857.473..15879.504 rows=250 loops=1)
>                      Merge Cond: ((hl.machine_key)::text = 
> (mi.machine_key)::text)
>                      ->  Merge Join  (cost=1779783.74..2022437.81 
> rows=4044 width=48) (actual time=15857.359..15879.102 rows=250 loops=1)
>                            Merge Cond: ((hl.machine_key)::text = 
> ci.machine_key)
>                            ->  GroupAggregate 
> (cost=1779782.72..2012287.44 rows=808712 width=69) (actual 
> time=15088.353..15878.172 rows=2683 loops=1)
>                                  Group Key: hl.machine_key, 
> hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
> (date(hl.harvest_date)), hl.species_group_key, hl.product_key
>                                  ->  Sort (cost=1779782.72..1800000.52 
> rows=8087121 width=61) (actual time=15088.336..15488.144 rows=942552 
> loops=1)
>                                        Sort Key: hl.machine_key, 
> hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
> (date(hl.harvest_date)), hl.species_group_key, hl.product_key
>                                        Sort Method: external merge Disk: 
> 543456kB
>                                        ->  Seq Scan on harvester_logs 
> hl  (cost=0.00..243781.01 rows=8087121 width=61) (actual 
> time=0.007..3169.984 rows=8084464 loops=1)
>                            ->  Sort  (cost=1.02..1.03 rows=1 width=19)
> (actual time=0.019..0.020 rows=1 loops=1)
>                                  Sort Key: ci.machine_key
>                                  Sort Method: quicksort  Memory: 25kB
>                                  ->  Seq Scan on contractor_access ci 
> (cost=0.00..1.01 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=1)
>                                        Filter: (t4e_contractor_id = 
> 'nicklas.aven@jordogskog.no'::text)
>                      ->  Sort  (cost=4.47..4.62 rows=62 width=38) 
> (actual time=0.102..0.173 rows=266 loops=1)
>                            Sort Key: mi.machine_key
>                            Sort Method: quicksort  Memory: 30kB
>                            ->  Seq Scan on machine_info mi 
> (cost=0.00..2.62 rows=62 width=38) (actual time=0.008..0.030 rows=62 
> loops=1)
>                ->  Sort  (cost=127.50..131.23 rows=1491 width=35) 
> (actual time=1.205..3.071 rows=7204 loops=1)
>                      Sort Key: p.machine_key
>                      Sort Method: quicksort  Memory: 175kB
>                      ->  Seq Scan on products p  (cost=0.00..48.91 
> rows=1491 width=35) (actual time=0.004..0.497 rows=1491 loops=1)
>          ->  Sort  (cost=30.94..32.09 rows=460 width=37) (actual 
> time=0.385..1.233 rows=3259 loops=1)
>                Sort Key: s.machine_key
>                Sort Method: quicksort  Memory: 65kB
>                ->  Seq Scan on species s  (cost=0.00..10.60 rows=460 
> width=37) (actual time=0.004..0.146 rows=460 loops=1)
>    ->  Hash  (cost=49.25..49.25 rows=1725 width=84) (actual 
> time=1.286..1.287 rows=1690 loops=1)
>          Buckets: 2048  Batches: 1  Memory Usage: 190kB
>          ->  Seq Scan on objects o  (cost=0.00..49.25 rows=1725 
> width=84) (actual time=0.004..0.600 rows=1725 loops=1)
> Planning time: 0.527 ms
> Execution time: 15945.641 ms
> 
> 
> 
> 
> Thanks
> 
> Nicklas
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


В списке pgsql-general по дате отправления:

Предыдущее
От: Andrew Gierth
Дата:
Сообщение: Re: Size estimation of postgres core files
Следующее
От: Jeremy Finzel
Дата:
Сообщение: Re: Size estimation of postgres core files