Re: Get more from indices.

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: Get more from indices.
Дата
Msg-id 20140108.132257.201466486.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Get more from indices.  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Get more from indices.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
Hello,

tgl> I'm pretty sure that IsA test prevents the optimization from ever firing.

Thank you.

tgl> But aside from hasty typos,

Oops! I've picked up wrong node. It always denies pathkeys extension.

| !IsA(member, Var))

is a mistake of the following.

| !IsA(member->em_expr, Var))

tgl> is there enough left of this optimization to
tgl> be worth the complication?

Certainly.

This patch is intended to be with my another UNION-ALL
optimization pathce at first. It does very much with
UNION-ORDERBY-LIMIT and also with UNION_ALL(Partitioned
tables)-DISTINCT-ORDERBY-LIMIT.


===== test tables
create table pu1 (a int not null, b int not null, c int, d text);
create unique index i_pu1_ab on pu1 (a, b);
create table cu11 (like pu1 including all) inherits (pu1);
create table cu12 (like pu1 including all) inherits (pu1);
create table pu2 (like pu1 including all);
create table cu21 (like pu2 including all) inherits (pu2);
create table cu22 (like pu2 including all) inherits (pu2);
insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from generate_series(000000, 099999) a);
insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from generate_series(100000, 199999) a);
insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from generate_series(200000, 299999) a);
insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from generate_series(300000, 399999) a);
====

Following example is an implicit union derived from partitioned
tables created as above. Limit is added to highlight the effect.

9.4dev with no patch, 

| =# explain (analyze on, costs off) select distinct * from pu1 order by a, b limit 100;
|                                     QUERY PLAN  
| ----------------------------------------------------------------------------
|  Limit (actual time=246.653..246.730 rows=100 loops=1)
|  ->  Unique (actual time=246.646..246.713 rows=100 loops=1)
|   ->  Sort (actual time=246.645..246.666 rows=100 loops=1)
|        Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
|        Sort Method: external sort  Disk: 5280kB
|        ->  Append (actual time=0.017..52.608 rows=200000 loops=1)
|         ->  Seq Scan on pu1 (actual time=0.001..0.001 rows=0 loops=1)
|         ->  Seq Scan on cu11 (actual time=0.015..17.047 rows=100000 loops=1)
|         ->  Seq Scan on cu12 (actual time=0.007..15.474 rows=100000 loops=1)
|  Total runtime: 247.854 ms

Fairly common query. It will get following plan with this patch.


| =# explain (analyze on, costs off) select distinct * from pu1 order by a, b limit 100;
|                                       QUERY PLAN
| -----------------------------------------------------------------------------
|  Limit (actual time=0.108..0.350 rows=100 loops=1)
|  ->  Unique (actual time=0.104..0.329 rows=100 loops=1)
|   ->  Merge Append (actual time=0.103..0.214 rows=100 loops=1)
|         Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
|         ->  Index Scan .. on pu1 (actual time=0.003..0.003 rows=0 loops=1)
|         ->  Index Scan .. on cu11 (actual time=0.049..0.110 rows=100 loops=1)
|         ->  Index Scan .. on cu12 (actual time=0.044..0.044 rows=1 loops=1)
|  Total runtime: 0.666 ms


The same could be said on union with my another union-pullup
patch.  We will get the following result with only the
union-pullup patch.

|=# explain (analyze on, costs off) select * from cu11 union select * from cu12 union select * from cu21 union select *
fromcu22 order by a limit 10000;
 
|                                   QUERY PLAN
|---------------------------------------------------------------------------
| Limit (actual time=474.825..482.326 rows=10000 loops=1)
| ->  Unique (actual time=474.824..481.357 rows=10000 loops=1)
|  ->  Sort (actual time=474.823..477.101 rows=10000 loops=1)
|       Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
|       Sort Method: external sort  Disk: 10568kB
|       ->  Append (actual time=0.018..99.310 rows=400000 loops=1)
|        ->  Seq Scan on cu11 (actual time=0.017..16.263 rows=100000 loops=1)
|        ->  Seq Scan on cu12 (actual time=0.006..14.831 rows=100000 loops=1)
|        ->  Seq Scan on cu21 (actual time=0.006..14.766 rows=100000 loops=1)
|        ->  Seq Scan on cu22 (actual time=0.006..14.721 rows=100000 loops=1)
| Total runtime: 484.555 ms

This is also a quite common operation but implicit DISTINCT
prevents planner from selecting index scans. (ORDER BY is not
essential but needed because planner does not consult
distinct_pathkeys on planning scans and I've left it as it is.)

The planner gives the following plan with this patch.

| =# explain (analyze on, costs off) select * from cu11 union select * from cu12 union select * from cu21 union select
*from cu22 order by a limit 10000;
 
|                                    QUERY PLAN               
| -------------------------------------------------------------------------
|  Limit (actual time=0.197..14.739 rows=10000 loops=1)
|  ->  Unique (actual time=0.191..13.527 rows=10000 loops=1)
|   ->  Merge Append (actual time=0.191..7.894 rows=10000 loops=1)
|         Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
|         ->  Index Scan .. on cu11 (actual time=0.054..4.676 rows=10000 loops=1)
|         ->  Index Scan .. on cu12 (actual time=0.045..0.045 rows=1 loops=1)
|         ->  Index Scan .. on cu21 (actual time=0.044..0.044 rows=1 loops=1)
|         ->  Index Scan .. on cu22 (actual time=0.043..0.043 rows=1 loops=1)
|  Total runtime: 15.872 ms

This seems for me worth enough to do.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip kumar
Дата:
Сообщение: TODO: machine-readable pg_controldata
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE