[HACKERS] Parallel Index-only scan

Поиск
Список
Период
Сортировка
От Rafia Sabih
Тема [HACKERS] Parallel Index-only scan
Дата
Msg-id CAOGQiiPEAs4C=TBp0XShxBvnWXuzGL2u++Hm1=qnCpd6_Mf8Fw@mail.gmail.com
обсуждение исходный текст
Ответы Re: [HACKERS] Parallel Index-only scan  ("rafia.sabih" <rafia.sabih@enterprisedb.com>)
Список pgsql-hackers
Hello all,
This is to propose a patch for enabling parallel index-only scans. With the patch of parallel index scan around [1], adding the mechanism for parallelising index-only scans makes sense. Without this mechanism for the queries preferring index-only scans, it is likely that at higher scale parallel index or parallel seq scan serve as cheaper alternative to index-only scans and then query performance might suffer because of the added processing of index or seq scans. 

Performance
-----------------
Consider the performance of a simple query on TPC-H schema, 

explain analyse select count(*) from lineitem where l_shipdate < date '1995-01-03';

Without parallel index-only scan, parallel seq scan got picked and it took around 23 seconds for the query to execute,

Finalize Aggregate  (cost=651586.63..651586.64 rows=1 width=8) (actual time=22853.872..22853.873 rows=1 loops=1)
   ->  Gather  (cost=651586.21..651586.62 rows=4 width=8) (actual time=22853.684..22853.864 rows=5 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Partial Aggregate  (cost=650586.21..650586.22 rows=1 width=8) (actual time=22850.489..22850.489 rows=1 loops=5)
               ->  Parallel Seq Scan on lineitem  (cost=0.00..618021.73 rows=13025795 width=0) (actual time=0.035..20553.495 rows=10342437 loops=5)
                     Filter: (l_shipdate < '1995-01-03'::date)
                     Rows Removed by Filter: 13656485
 Planning time: 0.225 ms
 Execution time: 22855.196 ms

However, with parallel index-only scan, it took only 8.5 seconds,

Finalize Aggregate  (cost=568883.69..568883.70 rows=1 width=8) (actual time=8548.993..8548.993 rows=1 loops=1)
   ->  Gather  (cost=568883.27..568883.68 rows=4 width=8) (actual time=8548.789..8548.976 rows=5 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Partial Aggregate  (cost=567883.27..567883.28 rows=1 width=8) (actual time=8541.929..8541.929 rows=1 loops=5)
               ->  Parallel Index Only Scan using idx_l_shipdate on lineitem  (cost=0.57..535318.78 rows=13025795 width=0) (actual time=0.113..5866.729 rows=10342437 loops=5)
                     Index Cond: (l_shipdate < '1995-01-03'::date)
                     Heap Fetches: 0
 Planning time: 0.266 ms
 Execution time: 8569.735 ms

The effect of parallel index-only scan can be seen more in some more complex queries where parallelism is enabled till top of the tree, e.g, following query takes 118 ms on head,

explain analyse select sum(l_extendedprice * l_discount) as revenue, avg(o_orderkey) from orders, lineitem where o_orderkey < 10000 and o_orderkey = l_orderkey group by l_shipmode order by revenue

 Sort  (cost=24396.44..24396.45 rows=7 width=75) (actual time=118.823..118.825 rows=7 loops=1)
   Sort Key: (sum((lineitem.l_extendedprice * lineitem.l_discount)))
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=24396.23..24396.34 rows=7 width=75) (actual time=118.749..118.786 rows=7 loops=1)
         Group Key: lineitem.l_shipmode
         ->  Nested Loop  (cost=1.13..24293.11 rows=10312 width=27) (actual time=0.096..73.198 rows=9965 loops=1)
               ->  Index Only Scan using orders_pkey on orders  (cost=0.56..46.48 rows=2578 width=4) (actual time=0.072..1.663 rows=2503 loops=1)
                     Index Cond: (o_orderkey < 10000)
                     Heap Fetches: 0
               ->  Index Scan using idx_lineitem_orderkey on lineitem  (cost=0.57..6.45 rows=296 width=31) (actual time=0.018..0.023 rows=4 loops=2503)
                     Index Cond: (l_orderkey = orders.o_orderkey)
 Planning time: 1.062 ms
 Execution time: 118.977 ms

With parallel index-only scan, the performance improves to 40 ms,

Sort  (cost=7191.33..7191.35 rows=7 width=75) (actual time=40.475..40.476 rows=7 loops=1)
   Sort Key: (sum((lineitem.l_extendedprice * lineitem.l_discount)))
   Sort Method: quicksort  Memory: 25kB
   ->  Finalize GroupAggregate  (cost=7190.78..7191.23 rows=7 width=75) (actual time=40.168..40.451 rows=7 loops=1)
         Group Key: lineitem.l_shipmode
         ->  Sort  (cost=7190.78..7190.85 rows=28 width=75) (actual time=40.105..40.127 rows=35 loops=1)
               Sort Key: lineitem.l_shipmode
               Sort Method: quicksort  Memory: 29kB
               ->  Gather  (cost=7187.22..7190.11 rows=28 width=75) (actual time=39.344..39.983 rows=35 loops=1)
                     Workers Planned: 4
                     Workers Launched: 4
                     ->  Partial HashAggregate  (cost=6187.22..6187.31 rows=7 width=75) (actual time=25.981..26.011 rows=7 loops=5)
                           Group Key: lineitem.l_shipmode
                           ->  Nested Loop  (cost=1.13..6084.10 rows=10312 width=27) (actual time=0.139..16.352 rows=1993 loops=5)
                                 ->  Parallel Index Only Scan using orders_pkey on orders  (cost=0.56..27.14 rows=644 width=4) (actual time=0.082..0.366 rows=501 loops=5)
                                       Index Cond: (o_orderkey < 10000)
                                       Heap Fetches: 0
                                 ->  Index Scan using idx_lineitem_orderkey on lineitem  (cost=0.57..6.45 rows=296 width=31) (actual time=0.020..0.025 rows=4 loops=2503)
                                       Index Cond: (l_orderkey = orders.o_orderkey)
 Planning time: 1.170 ms
 Execution time: 40.898 ms 

Test configuration and machine detail:
--------------------------------------------------
TPC-H: scale facto                         : 20
work_mem                                      : 64MB
shared buffer                                   : 1GB
max_parallel_workers_per_gather  : 4
Machine                                           : IBM POWER, 4 socket machine with 512 GB RAM.
random_page_cost = seq_page_cost = 0.1
We kept random and seq page cost same since warm cache environment was ensured, thus, keeping a value of random_page_cost that is higher than seq_page_cost doesn't makes much sense in this setup.
Also, some additional indexes were build, for lineitem table l_shipdate, l_shipmode, and l_returnflag, on orders table index is added for o_comment and o_orderdate individually.

The point to note here is that with parallel index-only scan, not only the heap fetches are saved but also the aggregates/sort can be pushed down to workers and thus the total time of query can be improved. Clearly, the performance of queries improved significantly with this new operator and considering the changes required after parallel index scan patches is less if evaluated against the improvement in performance it offers.

Attached file:
------------------
1. parallel_index_only_v1.patch

This patch is to be applied over parallel index scan [1].

* Thanks to my colleagues Dilip Kumar and Amit Kapila for their support and feedback.

--
Regards,
Rafia Sabih
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] Declarative partitioning - another take
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: [HACKERS] jacana hung after failing to acquire random number