[HACKERS] Secondary index access optimizations

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема [HACKERS] Secondary index access optimizations
Дата
Msg-id 27516421-5afa-203c-e22a-8407e9187327@postgrespro.ru
обсуждение исходный текст
Ответы Re: [HACKERS] Secondary index access optimizations  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список pgsql-hackers
Hi hackers,

I am trying to compare different ways of optimizing work with huge 
append-only tables in PostgreSQL where primary key is something like 
timestamp and queries are usually accessing most recent data using some 
secondary keys. Size of secondary index is one of the most critical 
factors limiting  insert/search performance. As far as data is inserted 
in timestamp ascending order, access to primary key is well localized 
and accessed tables are present in memory. But if we create secondary 
key for the whole table, then access to it will require random reads 
from the disk and significantly decrease performance.

There are two well known solutions of the problem:
1. Table partitioning
2. Partial indexes

This approaches I want to compare. First of all I want to check if 
optimizer is able to generate efficient query execution plan covering 
different time intervals.
Unfortunately in both cases generated plan is not optimal.

1. Table partitioning:

create table base (k integer primary key, v integer);
create table part1 (check (k between 1 and 10000)) inherits (base);
create table part2 (check (k between 10001 and 20000)) inherits (base);
create index pi1 on part1(v);
create index pi2 on part2(v);
insert int part1 values (generate series(1,10000), random());
insert into part2 values (generate_series(10001,20000), random());
explain select * from base where k between 1 and 20000 and v = 100;                              QUERY PLAN
----------------------------------------------------------------------- Append  (cost=0.00..15.65 rows=3 width=8)   ->
SeqScan on base  (cost=0.00..0.00 rows=1 width=8)         Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))   ->  Index
Scanusing pi1 on part1  (cost=0.29..8.31 rows=1 width=8)         Index Cond: (v = 100)         Filter: ((k >= 1) AND (k
<=20000))   ->  Index Scan using pi2 on part2  (cost=0.29..7.34 rows=1 width=8)         Index Cond: (v = 100)
Filter:((k >= 1) AND (k <= 20000))
 

Questions:
- Is there some way to avoid sequential scan of parent table? Yes, it is 
empty and so sequential scan will not take much time, but ... it still 
requires some additional actions and so increasing query execution time.
- Why index scan of partition indexes includes filter condition if it is 
guaranteed by check constraint that all records of this partition match 
search predicate?


2. Partial indexes:

create table t (k integer primary key, v integer);
insert into t values (generate_series(1,20000),random());
create index i1 on t(v) where k between 1 and 10000;
create index i2 on t(v) where k between 10001 and 20000;
postgres=# explain select * from t where k between 1 and 10000 and v = 100;                         QUERY PLAN
------------------------------------------------------------ Index Scan using i1 on t  (cost=0.29..7.28 rows=1 width=8)
 Index Cond: (v = 100)
 
(2 rows)


Here we get perfect plan. Let's try to extend search interval:


postgres=# explain select * from t where k between 1 and 20000 and v = 100;                            QUERY PLAN
------------------------------------------------------------------ Index Scan using t_pkey on t  (cost=0.29..760.43
rows=1width=8)   Index Cond: ((k >= 1) AND (k <= 20000))   Filter: (v = 100)
 
(3 rows)

Unfortunately in this case Postgres is not able to apply partial indexes.
And this is what I expected to get:

postgres=# explain select * from t where k between 1 and 10000 and v = 
100 union all select * from t where k between 10001 and 20000 and v = 100;                              QUERY PLAN
---------------------------------------------------------------------- Append  (cost=0.29..14.58 rows=2 width=8)   ->
IndexScan using i1 on t  (cost=0.29..7.28 rows=1 width=8)         Index Cond: (v = 100)   ->  Index Scan using i2 on t
t_1 (cost=0.29..7.28 rows=1 width=8)         Index Cond: (v = 100)
 


I wonder if there are some principle problems in supporting this two 
things in optimizer:
1. Remove search condition for primary key if it is fully satisfied by 
derived table check constraint.
2. Append index scans of several partial indexes if specified interval 
is covered by their conditions.

I wonder if someone is familiar with this part of optimizer ad can 
easily fix it.
Otherwise I am going to spend some time on solving this problems (if 
community think that such optimizations will be useful).

-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization
Следующее
От: Sandeep Thakkar
Дата:
Сообщение: Re: [HACKERS] pl/perl extension fails on Windows