Parametrization and UNION in view

Поиск
Список
Период
Сортировка
От Ronan Dunklau
Тема Parametrization and UNION in view
Дата
Msg-id CAARsnT0Am0AVQpQNHidXqqEHtJOieG4P4u9YAzWvJJ17i6eNRQ@mail.gmail.com
обсуждение исходный текст
Список pgsql-hackers
Hello,

We've encountered a query which took forever on our database, and after investigating why I managed to reduce the test case to something simple.

The problem is that the optimizer seems to fail to consider pushing a predicate down a "unionized" view:

CREATE TABLE t1 AS SELECT i FROM generate_series(1, 100000) i;
CREATE INDEX ON t1 (i);
CREATE TABLE t2 AS SELECT i FROM generate_series(1, 100000) i;
CREATE INDEX ON t2 (i);
CREATE TABLE t3 AS SELECT i FROM generate_series(1, 2) i;
CREATE INDEX ON t3 (i);

CREATE VIEW v1 AS
  SELECT * FROM t1
  UNION
  SELECT * FROM t2;

explain SELECT * FROM t3 WHERE EXISTS (SELECT i FROM v1 WHERE v1.i = t3.i);
                                 QUERY PLAN                                
----------------------------------------------------------------------------
 Hash Join  (cost=6387.05..11157.05 rows=2000 width=4)
   Hash Cond: (t1.i = t3.i)
   ->  HashAggregate  (cost=6386.00..8386.00 rows=200000 width=4)
         Group Key: t1.i
         ->  Append  (cost=0.00..5886.00 rows=200000 width=4)
               ->  Seq Scan on t1  (cost=0.00..1443.00 rows=100000 width=4)
               ->  Seq Scan on t2  (cost=0.00..1443.00 rows=100000 width=4)
   ->  Hash  (cost=1.02..1.02 rows=2 width=4)
         ->  Seq Scan on t3  (cost=0.00..1.02 rows=2 width=4)
(9 rows)

We can see that it choses to perform a join between the full output of the view and the really small subset.

The optimizer can be forced to select a proper plan by introducing an offset clause:

explain SELECT * FROM t3 WHERE EXISTS (SELECT i FROM v1 WHERE v1.i = t3.i OFFSET 0);
                                          QUERY PLAN                                         
----------------------------------------------------------------------------------------------
 Seq Scan on t3  (cost=0.00..1966.07 rows=1 width=4)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  HashAggregate  (cost=982.51..992.51 rows=1000 width=4)
           Group Key: t1.i
           ->  Append  (cost=12.17..980.01 rows=1000 width=4)
                 ->  Bitmap Heap Scan on t1  (cost=12.17..482.50 rows=500 width=4)
                       Recheck Cond: (i = t3.i)
                       ->  Bitmap Index Scan on t1_i_idx  (cost=0.00..12.04 rows=500 width=0)
                             Index Cond: (i = t3.i)
                 ->  Bitmap Heap Scan on t2  (cost=12.17..482.50 rows=500 width=4)
                       Recheck Cond: (i = t3.i)
                       ->  Bitmap Index Scan on t2_i_idx  (cost=0.00..12.04 rows=500 width=0)
                             Index Cond: (i = t3.i)

Notice how the cost is much lower than the one of the previous plan, which indicates to me that this plan is not even considered. If we raise the cost of various operations (set enable_hashjoin = off, set enable_material = off), we end up with a nonsensical nested loop:


explain SELECT * FROM t3 WHERE EXISTS (SELECT i FROM v1 WHERE v1.i = t3.i);
                                 QUERY PLAN                                
----------------------------------------------------------------------------
 Nested Loop  (cost=6386.00..25773.02 rows=2000 width=4)
   Join Filter: (t3.i = t1.i)
   ->  Seq Scan on t3  (cost=0.00..1.02 rows=2 width=4)
   ->  HashAggregate  (cost=6386.00..8386.00 rows=200000 width=4)
         Group Key: t1.i
         ->  Append  (cost=0.00..5886.00 rows=200000 width=4)
               ->  Seq Scan on t1  (cost=0.00..1443.00 rows=100000 width=4)
               ->  Seq Scan on t2  (cost=0.00..1443.00 rows=100000 width=4)

Should that be considered a bug ?

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Darafei "Komяpa" Praliaskouski
Дата:
Сообщение: Re: Changing SQL Inlining Behaviour (or...?)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] Memory leak in pg_config