Parametrization and UNION in view

Поиск

Список

Период

Сортировка

От	Ronan Dunklau
Тема	Parametrization and UNION in view
Дата	14 ноября 2018 г. 14:43:33
Msg-id	CAARsnT0Am0AVQpQNHidXqqEHtJOieG4P4u9YAzWvJJ17i6eNRQ@mail.gmail.com обсуждение исходный текст
Список	pgsql-hackers

Дерево обсуждения

Hello,

We've encountered a query which took forever on our database, and after investigating why I managed to reduce the test case to something simple.

The problem is that the optimizer seems to fail to consider pushing a predicate down a "unionized" view:

CREATE TABLE t1 AS SELECT i FROM generate_series(1, 100000) i;
CREATE INDEX ON t1 (i);
CREATE TABLE t2 AS SELECT i FROM generate_series(1, 100000) i;
CREATE INDEX ON t2 (i);
CREATE TABLE t3 AS SELECT i FROM generate_series(1, 2) i;
CREATE INDEX ON t3 (i);

CREATE VIEW v1 AS
SELECT * FROM t1
UNION
SELECT * FROM t2;

explain SELECT * FROM t3 WHERE EXISTS (SELECT i FROM v1 WHERE v1.i = t3.i);
                                 QUERY PLAN
----------------------------------------------------------------------------
Hash Join (cost=6387.05..11157.05 rows=2000 width=4)
   Hash Cond: (t1.i = t3.i)
   -> HashAggregate (cost=6386.00..8386.00 rows=200000 width=4)
         Group Key: t1.i
         -> Append (cost=0.00..5886.00 rows=200000 width=4)
               -> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=4)
               -> Seq Scan on t2 (cost=0.00..1443.00 rows=100000 width=4)
   -> Hash (cost=1.02..1.02 rows=2 width=4)
         -> Seq Scan on t3 (cost=0.00..1.02 rows=2 width=4)
(9 rows)

We can see that it choses to perform a join between the full output of the view and the really small subset.

The optimizer can be forced to select a proper plan by introducing an offset clause:

explain SELECT * FROM t3 WHERE EXISTS (SELECT i FROM v1 WHERE v1.i = t3.i OFFSET 0);
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on t3 (cost=0.00..1966.07 rows=1 width=4)
   Filter: (SubPlan 1)
   SubPlan 1
     -> HashAggregate (cost=982.51..992.51 rows=1000 width=4)
           Group Key: t1.i
           -> Append (cost=12.17..980.01 rows=1000 width=4)
                 -> Bitmap Heap Scan on t1 (cost=12.17..482.50 rows=500 width=4)
                       Recheck Cond: (i = t3.i)
                       -> Bitmap Index Scan on t1_i_idx (cost=0.00..12.04 rows=500 width=0)
                             Index Cond: (i = t3.i)
                 -> Bitmap Heap Scan on t2 (cost=12.17..482.50 rows=500 width=4)
                       Recheck Cond: (i = t3.i)
                       -> Bitmap Index Scan on t2_i_idx (cost=0.00..12.04 rows=500 width=0)
                             Index Cond: (i = t3.i)

Notice how the cost is much lower than the one of the previous plan, which indicates to me that this plan is not even considered. If we raise the cost of various operations (set enable_hashjoin = off, set enable_material = off), we end up with a nonsensical nested loop:

explain SELECT * FROM t3 WHERE EXISTS (SELECT i FROM v1 WHERE v1.i = t3.i);
                                 QUERY PLAN
----------------------------------------------------------------------------
Nested Loop (cost=6386.00..25773.02 rows=2000 width=4)
   Join Filter: (t3.i = t1.i)
   -> Seq Scan on t3 (cost=0.00..1.02 rows=2 width=4)
   -> HashAggregate (cost=6386.00..8386.00 rows=200000 width=4)
         Group Key: t1.i
         -> Append (cost=0.00..5886.00 rows=200000 width=4)
               -> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=4)
               -> Seq Scan on t2 (cost=0.00..1443.00 rows=100000 width=4)

Should that be considered a bug ?

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Parametrization and UNION in view