Re: [HACKERS] Secondary index access optimizations

Поиск

Список

Период

Сортировка

От	Konstantin Knizhnik
Тема	Re: [HACKERS] Secondary index access optimizations
Дата	14 августа 2017 г. 22:33:03
Msg-id	73cfa456-c313-6dc0-825f-093b52b85d29@postgrespro.ru обсуждение исходный текст
Ответ на	[HACKERS] Secondary index access optimizations (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Ответы	Re: [HACKERS] Secondary index access optimizations (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список	pgsql-hackers

Дерево обсуждения


On 14.08.2017 12:37, Konstantin Knizhnik wrote:
> Hi hackers,
>
> I am trying to compare different ways of optimizing work with huge 
> append-only tables in PostgreSQL where primary key is something like 
> timestamp and queries are usually accessing most recent data using 
> some secondary keys. Size of secondary index is one of the most 
> critical factors limiting  insert/search performance. As far as data 
> is inserted in timestamp ascending order, access to primary key is 
> well localized and accessed tables are present in memory. But if we 
> create secondary key for the whole table, then access to it will 
> require random reads from the disk and significantly decrease 
> performance.
>
> There are two well known solutions of the problem:
> 1. Table partitioning
> 2. Partial indexes
>
> This approaches I want to compare. First of all I want to check if 
> optimizer is able to generate efficient query execution plan covering 
> different time intervals.
> Unfortunately in both cases generated plan is not optimal.
>
> 1. Table partitioning:
>
> create table base (k integer primary key, v integer);
> create table part1 (check (k between 1 and 10000)) inherits (base);
> create table part2 (check (k between 10001 and 20000)) inherits (base);
> create index pi1 on part1(v);
> create index pi2 on part2(v);
> insert int part1 values (generate series(1,10000), random());
> insert into part2 values (generate_series(10001,20000), random());
> explain select * from base where k between 1 and 20000 and v = 100;
>                               QUERY PLAN
> -----------------------------------------------------------------------
>  Append  (cost=0.00..15.65 rows=3 width=8)
>    ->  Seq Scan on base  (cost=0.00..0.00 rows=1 width=8)
>          Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))
>    ->  Index Scan using pi1 on part1  (cost=0.29..8.31 rows=1 width=8)
>          Index Cond: (v = 100)
>          Filter: ((k >= 1) AND (k <= 20000))
>    ->  Index Scan using pi2 on part2  (cost=0.29..7.34 rows=1 width=8)
>          Index Cond: (v = 100)
>          Filter: ((k >= 1) AND (k <= 20000))
>
> Questions:
> - Is there some way to avoid sequential scan of parent table? Yes, it 
> is empty and so sequential scan will not take much time, but ... it 
> still requires some additional actions and so increasing query 
> execution time.
> - Why index scan of partition indexes includes filter condition if it 
> is guaranteed by check constraint that all records of this partition 
> match search predicate?
>
>
> 2. Partial indexes:
>
> create table t (k integer primary key, v integer);
> insert into t values (generate_series(1,20000),random());
> create index i1 on t(v) where k between 1 and 10000;
> create index i2 on t(v) where k between 10001 and 20000;
> postgres=# explain select * from t where k between 1 and 10000 and v = 
> 100;
>                          QUERY PLAN
> ------------------------------------------------------------
>  Index Scan using i1 on t  (cost=0.29..7.28 rows=1 width=8)
>    Index Cond: (v = 100)
> (2 rows)
>
>
> Here we get perfect plan. Let's try to extend search interval:
>
>
> postgres=# explain select * from t where k between 1 and 20000 and v = 
> 100;
>                             QUERY PLAN
> ------------------------------------------------------------------
>  Index Scan using t_pkey on t  (cost=0.29..760.43 rows=1 width=8)
>    Index Cond: ((k >= 1) AND (k <= 20000))
>    Filter: (v = 100)
> (3 rows)
>
> Unfortunately in this case Postgres is not able to apply partial indexes.
> And this is what I expected to get:
>
> postgres=# explain select * from t where k between 1 and 10000 and v = 
> 100 union all select * from t where k between 10001 and 20000 and v = 
> 100;
>                               QUERY PLAN
> ----------------------------------------------------------------------
>  Append  (cost=0.29..14.58 rows=2 width=8)
>    ->  Index Scan using i1 on t  (cost=0.29..7.28 rows=1 width=8)
>          Index Cond: (v = 100)
>    ->  Index Scan using i2 on t t_1  (cost=0.29..7.28 rows=1 width=8)
>          Index Cond: (v = 100)
>
>
> I wonder if there are some principle problems in supporting this two 
> things in optimizer:
> 1. Remove search condition for primary key if it is fully satisfied by 
> derived table check constraint.
> 2. Append index scans of several partial indexes if specified interval 
> is covered by their conditions.
>
> I wonder if someone is familiar with this part of optimizer ad can 
> easily fix it.
> Otherwise I am going to spend some time on solving this problems (if 
> community think that such optimizations will be useful).
>

Replying to myself: the following small patch removes redundant checks 
from index scans for derived tables:


diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index 939045d..1f7c9cf 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1441,6 +1441,20 @@ relation_excluded_by_constraints(PlannerInfo *root,        if
(predicate_refuted_by(safe_constraints,
 
rel->baserestrictinfo, false))                return true;

+       /*
+        * Remove from restriction list items implied by table constraints
+        */
+       safe_restrictions = NULL;
+       foreach(lc, rel->baserestrictinfo)
+       {
+               RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+               if (!predicate_implied_by(list_make1(rinfo->clause), 
safe_constraints, false)) {
+                       safe_restrictions = lappend(safe_restrictions, 
rinfo);
+               }
+       }
+       rel->baserestrictinfo = safe_restrictions;
+
+        return false; }

=================================================

I am not sure if this is the right place of adjusting restriction list 
and is uch transformation always correct.
But for the queries I have tested produced plans are correct:


postgres=# explain select * from base where k between 1 and 20000 and v 
= 100;                              QUERY PLAN
----------------------------------------------------------------------- Append  (cost=0.00..15.64 rows=3 width=8)   ->
SeqScan on base  (cost=0.00..0.00 rows=1 width=8)         Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))   ->  Index
Scanusing pi1 on part1  (cost=0.29..8.30 rows=1 width=8)         Index Cond: (v = 100)   ->  Index Scan using pi2 on
part2 (cost=0.29..7.34 rows=1 width=8)         Index Cond: (v = 100)
 
(7 rows)

postgres=# explain select * from base where k between 1 and 15000 and v 
= 100;                              QUERY PLAN
----------------------------------------------------------------------- Append  (cost=0.00..15.64 rows=3 width=8)   ->
SeqScan on base  (cost=0.00..0.00 rows=1 width=8)         Filter: ((k >= 1) AND (k <= 15000) AND (v = 100))   ->  Index
Scanusing pi1 on part1  (cost=0.29..8.30 rows=1 width=8)         Index Cond: (v = 100)   ->  Index Scan using pi2 on
part2 (cost=0.29..7.34 rows=1 width=8)         Index Cond: (v = 100)         Filter: (k <= 15000)
 
(8 rows)

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Chris Travers
Дата: 14 августа 2017 г., 22:28:48
Сообщение: Re: [HACKERS] Orphaned files in base/[oid]

Следующее

От: Andres Freund
Дата: 14 августа 2017 г., 22:33:48
Сообщение: Re: [HACKERS] Orphaned files in base/[oid]

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Secondary index access optimizations

Предыдущее

Следующее