Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause

Поиск
Список
Период
Сортировка
От Sven R. Kunze
Тема Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause
Дата
Msg-id 3f4693c1-f13c-ed0d-54a2-6f80f9ce49ee@mail.de
обсуждение исходный текст
Ответ на Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-performance
Now I found time to investigate all proposed queries side by side. Here
are the results (warmup + multiple executions). TL;DR - Jeff's proposed
answer performs significantly faster with our data than any other
solution (both planning and execution time).


I have no real idea how PostgreSQL does query rewriting but I guess the
following steps (and reverse ones) are necessary:

1) detect "DISTINCT+LEFT OUTER JOIN" and rewrite to "SUBQUERY"

2) detect "MUTUAL JOIN ON KEY + OR" and rewrite to "UNION"

3) detect "MUTUAL IN KEY+ OR" and rewrite to "UNION"

4) detect "UNION + MUTUAL JOIN ON KEY" and rewrite to "SUBQUERY + UNION"


Doing (1+2) or (3+4) would result in the optimal query. To (1+2) seems
easier to do, although a "common SELECT lift up"/"UNION push down" (if
that's even the correct name) would also be great to have (that's 4)).
Is this somehow correct?


Regarding cost estimation: it seems like PostgreSQL is clever enough
here. So, I tend to agree with Jeff that this is not an issue with cost
estimation.


---- DISTINCT + LEFT OUTER JOIN


explain analyze
SELECT distinct <columns of big_table>
FROM "big_table"
     LEFT OUTER JOIN "table_a" ON ("big_table"."id" =
"table_a"."big_table_id")
     LEFT OUTER JOIN "table_b" ON ("big_table"."id" =
"table_b"."big_table_id")
WHERE
     ("table_a"."item_id" IN (<handful of items>)
     OR
     "table_b"."item_id" IN (<handful of items>));



  HashAggregate  (cost=206268.67..206269.46 rows=79 width=185) (actual
time=904.859..904.860 rows=5 loops=1)
    Group Key: <columns of big_table>
    ->  Merge Left Join  (cost=1.26..206265.11 rows=79 width=185)
(actual time=904.835..904.846 rows=6 loops=1)
          Merge Cond: (big_table.id = table_a.big_table_id)
          Filter: (((table_a.item_id)::text = ANY ('<handful of
items>'::text[])) OR ((table_b.item_id)::text = ANY ('<handful of
items>'::text[])))
          Rows Removed by Filter: 901355
          ->  Merge Left Join  (cost=0.85..196703.22 rows=858293
width=243) (actual time=0.009..745.736 rows=858690 loops=1)
                Merge Cond: (big_table.id = table_b.big_table_id)
                ->  Index Scan using big_table_pkey on big_table
(cost=0.42..180776.64 rows=858293 width=185) (actual time=0.005..399.102
rows=858690 loops=1)
                ->  Index Scan using table_b_pkey on table_b
(cost=0.42..10343.86 rows=274959 width=62) (actual time=0.003..60.961
rows=274958 loops=1)
          ->  Index Scan using table_a_big_table_id on table_a
(cost=0.42..4445.35 rows=118836 width=57) (actual time=0.003..25.456
rows=118833 loops=1)
  Planning time: 0.934 ms
  Execution time: 904.936 ms




---- SUBQUERY

explain analyze
SELECT <columns of big_table>
FROM "big_table"
WHERE
     "big_table"."id" in (SELECT "table_a"."big_table_id" FROM "table_a"
WHERE "table_a"."item_id" in (<handful of items>))
     OR
     "big_table"."id" in (SELECT "table_b"."big_table_id" FROM "table_b"
WHERE "table_b"."item_id" IN (<handful of items>));



  Seq Scan on big_table  (cost=100.41..115110.80 rows=643720 width=185)
(actual time=229.819..229.825 rows=5 loops=1)
    Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
    Rows Removed by Filter: 858685
    SubPlan 1
      ->  Index Scan using table_a_item_id_211f18d89c25bc21_uniq on
table_a (cost=0.42..58.22 rows=9 width=4) (actual time=0.026..0.043
rows=5 loops=1)
            Index Cond: ((item_id)::text = ANY ('<handful of
items>'::text[]))
    SubPlan 2
      ->  Index Scan using table_b_item_id_611f9f519d835e89_uniq on
table_b (cost=0.42..42.15 rows=5 width=4) (actual time=0.007..0.040
rows=5 loops=1)
            Index Cond: ((item_id)::text = ANY ('<handful of
items>'::text[]))
  Planning time: 0.261 ms
  Execution time: 229.901 ms



---- UNION

explain analyze
SELECT <columns of big_table>
FROM "big_table"
     INNER JOIN "table_a" ON ("big_table"."id" = "table_a"."big_table_id")
WHERE
     "table_a"."item_id" IN (<handful of items>)
UNION
SELECT <columns of big_table>
FROM "big_table"
     INNER JOIN "table_b" ON ("big_table"."id" = "table_b"."big_table_id")
WHERE
     "table_b"."item_id" IN (<handful of items>);

  HashAggregate  (cost=216.84..216.98 rows=14 width=185) (actual
time=0.092..0.093 rows=5 loops=1)
    Group Key: <columns of big_table>
    ->  Append  (cost=22.59..216.21 rows=14 width=185) (actual
time=0.035..0.080 rows=10 loops=1)
          ->  Nested Loop  (cost=22.59..132.17 rows=9 width=185) (actual
time=0.034..0.044 rows=5 loops=1)
                ->  Bitmap Heap Scan on table_a (cost=22.16..56.10
rows=9 width=4) (actual time=0.029..0.029 rows=5 loops=1)
                      Recheck Cond: ((item_id)::text = ANY ('<handful of
items>'::text[]))
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on
table_a_item_id_211f18d89c25bc21_uniq  (cost=0.00..22.16 rows=9 width=0)
(actual time=0.027..0.027 rows=5 loops=1)
                            Index Cond: ((item_id)::text = ANY
('<handful of items>'::text[]))
                ->  Index Scan using big_table_pkey on big_table
(cost=0.42..8.44 rows=1 width=185) (actual time=0.002..0.002 rows=1 loops=5)
                      Index Cond: (id = table_a.big_table_id)
          ->  Nested Loop  (cost=22.58..83.90 rows=5 width=185) (actual
time=0.029..0.035 rows=5 loops=1)
                ->  Bitmap Heap Scan on table_b (cost=22.15..41.64
rows=5 width=4) (actual time=0.026..0.026 rows=5 loops=1)
                      Recheck Cond: ((item_id)::text = ANY ('<handful of
items>'::text[]))
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on
table_b_item_id_611f9f519d835e89_uniq  (cost=0.00..22.15 rows=5 width=0)
(actual time=0.025..0.025 rows=5 loops=1)
                            Index Cond: ((item_id)::text = ANY
('<handful of items>'::text[]))
                ->  Index Scan using big_table_pkey on big_table
big_table_1  (cost=0.42..8.44 rows=1 width=185) (actual
time=0.001..0.001 rows=1 loops=5)
                      Index Cond: (id = table_b.big_table_id)
  Planning time: 0.594 ms
  Execution time: 0.177 ms




---- SUBQUERY + UNION


On 29.09.2016 20:03, Jeff Janes wrote:
> SELECT * FROM big_table
> WHERE
>    id in (SELECT big_table_id FROM table_a WHERE "table_a"."item_id"
> IN (<handful of items>) union
>          SELECT big_table_id FROM table_a WHERE "table_b"."item_id" IN
> (<handful of items>)
>   );


  Nested Loop  (cost=98.34..216.53 rows=14 width=185) (actual
time=0.061..0.069 rows=5 loops=1)
    ->  HashAggregate  (cost=97.91..98.05 rows=14 width=4) (actual
time=0.057..0.058 rows=5 loops=1)
          Group Key: table_a.big_table_id
          ->  Append  (cost=22.16..97.88 rows=14 width=4) (actual
time=0.028..0.054 rows=10 loops=1)
                ->  Bitmap Heap Scan on table_a (cost=22.16..56.10
rows=9 width=4) (actual time=0.028..0.029 rows=5 loops=1)
                      Recheck Cond: ((item_id)::text = ANY ('<handful of
items>'::text[]))
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on
table_a_item_id_211f18d89c25bc21_uniq  (cost=0.00..22.16 rows=9 width=0)
(actual time=0.026..0.026 rows=5 loops=1)
                            Index Cond: ((item_id)::text = ANY
('<handful of items>'::text[]))
                ->  Bitmap Heap Scan on table_b (cost=22.15..41.64
rows=5 width=4) (actual time=0.024..0.024 rows=5 loops=1)
                      Recheck Cond: ((item_id)::text = ANY ('<handful of
items>'::text[]))
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on
table_b_item_id_611f9f519d835e89_uniq  (cost=0.00..22.15 rows=5 width=0)
(actual time=0.023..0.023 rows=5 loops=1)
                            Index Cond: ((item_id)::text = ANY
('<handful of items>'::text[]))
    ->  Index Scan using big_table_pkey on big_table (cost=0.42..8.44
rows=1 width=185) (actual time=0.001..0.001 rows=1 loops=5)
          Index Cond: (id = table_a.big_table_id)
  Planning time: 0.250 ms
  Execution time: 0.104 ms






Cheers,
Sven


NOTE: I added a file with the results for better readability.

Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Sven R. Kunze"
Дата:
Сообщение: Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause
Следующее
От: Joe Proietti
Дата:
Сообщение: MYSQL Stats