LEFT JOINs not optimized away when not needed

Поиск
Список
Период
Сортировка
От Moshe Jacobson
Тема LEFT JOINs not optimized away when not needed
Дата
Msg-id CAJ4CxLkrWctbEhoshWvxnEDi9=PvoDbWzdf5OJqvfv840Hv0_Q@mail.gmail.com
обсуждение исходный текст
Ответы Re: LEFT JOINs not optimized away when not needed  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs

Dear PostgreSQL Gurus,

We currently have php code which, for the purposes of speed optimization, dynamically constructs queries based on the output columns and filter conditions requested by the application. This code is very complicated, so we are trying to simplify things by using views instead. We have in mind to define the view entirely in terms of left joins, so that when particular tables in the view are not requested on the output or used in filtering, Postgres will not join the table, and the query will still be fast. This would drastically simplify our middleware code.

However, it turns out that Postgres is not optimizing away the left joins as I would expect. See the following query:

     SELECT r.reset      FROM tb_reset r LEFT JOIN tb_project pj ON pj.project = r.project LEFT JOIN tb_location l ON l.location = r.location LEFT JOIN tb_program pg ON pg.program = r.program LEFT JOIN tb_reset_survey rsv ON rsv.reset = r.reset LEFT JOIN tb_program_location pl ON pl.program = r.program AND pl.location = r.location LEFT JOIN tb_program_type pgt ON pgt.program_type = pg.program_type LEFT JOIN tb_project_department pd ON pd.project = pj.project LEFT JOIN tb_department d ON d.department = pd.department LEFT JOIN tb_fiscal_calendar fc ON fc.day = r.execution_date LEFT JOIN tb_fiscal_month fm ON fm.fiscal_month = fc.month LEFT JOIN tb_rollout_schedule rs ON rs.fiscal_week = fc.week AND rs.fiscal_year = fc.year AND rs.program = r.program LEFT JOIN tb_reset_team rt ON rt.reset_team = r.reset_team LEFT JOIN tb_vendor v ON v.vendor = rt.vendor LEFT JOIN tb_reset_status rst ON rst.reset_status = r.reset_status LEFT JOIN tb_order_location ol ON ol.location = r.location LEFT JOIN tb_entity_reset er ON er.reset = r.reset LEFT JOIN tb_market m ON m.market = l.market LEFT JOIN tb_district dist ON dist.district = l.district LEFT JOIN tb_mregion mr ON mr.mregion = l.mregion LEFT JOIN tb_region rg ON rg.region = l.region LEFT JOIN tb_ogrp ogrp ON ogrp.ogrp = l.ogrp LEFT JOIN tb_buying_office byo ON byo.buying_office = l.buying_office LEFT JOIN tb_project_participant pp ON pp.project = r.project LEFT JOIN tb_project_status ps ON ps.project_status = pj.project_status     WHERE r.in_scope IS TRUE  GROUP BY r.reset  ORDER BY r.reset     LIMIT 100 OFFSET 0;

Seeing that there is only one output column, and that the results are grouped by this output column, it seems to me that the optimizer should not even look at the rest of the tables. However, in reality the following query plan is produced:

                                                                                            QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit  (cost=1.43..162.06 rows=100 width=4) (actual time=0.110..3.977 rows=100 loops=1)  ->  Group  (cost=1.43..3250.84 rows=2023 width=4) (actual time=0.109..3.941 rows=100 loops=1)        ->  Nested Loop Left Join  (cost=1.43..3239.44 rows=4560 width=4) (actual time=0.106..3.652 rows=1209 loops=1)              Join Filter: (pp.project = r.project)              Rows Removed by Join Filter: 1626              ->  Nested Loop Left Join  (cost=1.43..2964.79 rows=4560 width=8) (actual time=0.089..2.172 rows=629 loops=1)                    ->  Merge Left Join  (cost=1.15..2232.83 rows=2023 width=12) (actual time=0.075..1.094 rows=100 loops=1)                          Merge Cond: (r.reset = er.reset)                          ->  Nested Loop Left Join  (cost=1.00..2165.74 rows=2023 width=12) (actual time=0.072..1.051 rows=100 loops=1)                                Join Filter: ((rs.fiscal_week = fc.week) AND (rs.fiscal_year = fc.year))                                ->  Nested Loop Left Join  (cost=0.85..1318.06 rows=2023 width=24) (actual time=0.068..0.874 rows=100 loops=1)                                      ->  Nested Loop Left Join  (cost=0.56..523.87 rows=2023 width=20) (actual time=0.065..0.744 rows=100 loops=1)                                            Join Filter: (pd.project = pj.project)                                            Rows Removed by Join Filter: 200                                            ->  Merge Left Join  (cost=0.56..431.80 rows=2023 width=24) (actual time=0.043..0.526 rows=100 loops=1)                                                  Merge Cond: (r.reset = rsv.reset)                                                  ->  Nested Loop Left Join  (cost=0.42..409.82 rows=2023 width=24) (actual time=0.037..0.474 rows=100 loops=1)                                                        ->  Index Scan using tb_reset_pkey on tb_reset r  (cost=0.28..72.42 rows=2023 width=28) (actual time=0.023..0.111 rows=100 loops=1)                                                              Filter: (in_scope IS TRUE)                                                        ->  Index Scan using tb_project_pkey on tb_project pj  (cost=0.14..0.16 rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=100)                                                              Index Cond: (project = r.project)                                                  ->  Index Only Scan using ix_reset_survey on tb_reset_survey rsv  (cost=0.14..15.29 rows=130 width=4) (actual time=0.004..0.004 rows=0 loops=1)                                                        Heap Fetches: 0                                            ->  Materialize  (cost=0.00..1.04 rows=3 width=8) (actual time=0.000..0.001 rows=3 loops=100)                                                  ->  Seq Scan on tb_project_department pd  (cost=0.00..1.03 rows=3 width=8) (actual time=0.006..0.007 rows=3 loops=1)                                      ->  Index Scan using tb_fiscal_calendar_day_key on tb_fiscal_calendar fc  (cost=0.29..0.38 rows=1 width=16) (actual time=0.000..0.000 rows=0 loops=100)                                            Index Cond: (day = r.execution_date)                                ->  Index Scan using tb_rollout_schedule_program_key on tb_rollout_schedule rs  (cost=0.15..0.30 rows=8 width=12) (actual time=0.001..0.001 rows=0 loops=100)                                      Index Cond: (program = r.program)                          ->  Index Only Scan using tb_entity_reset_reset_role_key on tb_entity_reset er  (cost=0.15..39.90 rows=1770 width=4) (actual time=0.001..0.001 rows=0 loops=1)                                Heap Fetches: 0                    ->  Index Only Scan using tb_order_location_location_key on tb_order_location ol  (cost=0.28..0.34 rows=2 width=4) (actual time=0.004..0.008 rows=6 loops=100)                          Index Cond: (location = r.location)                          Heap Fetches: 629              ->  Materialize  (cost=0.00..1.06 rows=4 width=4) (actual time=0.000..0.001 rows=4 loops=629)                    ->  Seq Scan on tb_project_participant pp  (cost=0.00..1.04 rows=4 width=4) (actual time=0.002..0.005 rows=4 loops=1)Total runtime: 4.421 ms
(37 rows)

When I last reported this issue, I was asked to produce a self-contained example that could be used to reproduce this behavior. This was not easy because of the presence of sensitive customer data, a complex schema, and the need to have data in the tables for this problem to manifest. However, we have created the requested example, and the sql dump of the schema is attached.

Please let me know why Postgres is behaving this way. We are very eager to implement this new solution, but we are stalled out right now.

Thank you.

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc.
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

“Quality is not an act, it is a habit.” — Aristotle

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #10888: application is getting hanged in the poll() function of libpq.so.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: LEFT JOINs not optimized away when not needed