Re: Parallel Seq Scan
От | Thom Brown |
---|---|
Тема | Re: Parallel Seq Scan |
Дата | |
Msg-id | CAA-aLv6Xf-995_c54XqzTT4tKFmSipkgqN5tkywUOfd8jhNU7Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: Parallel Seq Scan
(Amit Kapila <amit.kapila16@gmail.com>)
|
Список | pgsql-hackers |
On 20 January 2015 at 14:29, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > Yeah, you need two separate global variables pointing to shm_mq
> > objects, one of which gets used by pqmq.c for errors and the other of
> > which gets used by printtup.c for tuples.
> >
>
> Okay, I will try to change the way as suggested without doing
> switching, but this way we need to do it separately for 'T', 'D', and
> 'C' messages.
>
I have taken care of integrating the parallel sequence scan with thelatest patch posted (parallel-mode-v1.patch) by Robert at belowlocation:Changes in this version-----------------------------------------------1. As mentioned previously, I have exposed one parameterParallelWorkerNumber as used in parallel-mode patch.2. Enabled tuple queue to be used for passing tuples fromworker backend to master backend along with error queueas per suggestion by Robert in the mail above.3. Involved master backend to scan the heap directly whentuples are not available in any shared memory tuple queue.4. Introduced 3 new parameters (cpu_tuple_comm_cost,parallel_setup_cost, parallel_startup_cost) for deciding the costof parallel plan. Currently, I have kept the default values forparallel_setup_cost and parallel_startup_cost as 0.0, as thoserequire some experiments.5. Fixed some issues (related to memory increase as reportedupthread by Thom Brown and general feature issues found duringtest)Note - I have yet to handle the new node types introduced at someof the places and need to verify prepared queries and some otherthings, however I think it will be good if I can get some feedbackat current stage.
I'm getting an issue:
➤ psql://thom@[local]:5488/pgbench
# set parallel_seqscan_degree = 8;
SET
Time: 0.248 ms
➤ psql://thom@[local]:5488/pgbench
# explain select c1 from t1;
QUERY PLAN
--------------------------------------------------------------
Parallel Seq Scan on t1 (cost=0.00..21.22 rows=100 width=4)
Number of Workers: 8
Number of Blocks Per Worker: 11
(3 rows)
Time: 0.322 ms
# explain analyse select c1 from t1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Parallel Seq Scan on t1 (cost=0.00..21.22 rows=100 width=4) (actual time=0.024..13.468 rows=100 loops=1)
Number of Workers: 8
Number of Blocks Per Worker: 11
Planning time: 0.040 ms
Execution time: 13.862 ms
(5 rows)
Time: 14.188 ms
➤ psql://thom@[local]:5488/pgbench
# set parallel_seqscan_degree = 10;
SET
Time: 0.219 ms
➤ psql://thom@[local]:5488/pgbench
# explain select c1 from t1;
QUERY PLAN
--------------------------------------------------------------
Parallel Seq Scan on t1 (cost=0.00..19.18 rows=100 width=4)
Number of Workers: 10
Number of Blocks Per Worker: 9
(3 rows)
Time: 0.375 ms
➤ psql://thom@[local]:5488/pgbench
# explain analyse select c1 from t1;
➤ psql://thom@[local]:5488/pgbench
# set parallel_seqscan_degree = 8;
SET
Time: 0.248 ms
➤ psql://thom@[local]:5488/pgbench
# explain select c1 from t1;
QUERY PLAN
--------------------------------------------------------------
Parallel Seq Scan on t1 (cost=0.00..21.22 rows=100 width=4)
Number of Workers: 8
Number of Blocks Per Worker: 11
(3 rows)
Time: 0.322 ms
# explain analyse select c1 from t1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Parallel Seq Scan on t1 (cost=0.00..21.22 rows=100 width=4) (actual time=0.024..13.468 rows=100 loops=1)
Number of Workers: 8
Number of Blocks Per Worker: 11
Planning time: 0.040 ms
Execution time: 13.862 ms
(5 rows)
Time: 14.188 ms
➤ psql://thom@[local]:5488/pgbench
# set parallel_seqscan_degree = 10;
SET
Time: 0.219 ms
➤ psql://thom@[local]:5488/pgbench
# explain select c1 from t1;
QUERY PLAN
--------------------------------------------------------------
Parallel Seq Scan on t1 (cost=0.00..19.18 rows=100 width=4)
Number of Workers: 10
Number of Blocks Per Worker: 9
(3 rows)
Time: 0.375 ms
➤ psql://thom@[local]:5488/pgbench
# explain analyse select c1 from t1;
So setting parallel_seqscan_degree above max_worker_processes causes the CPU to max out, and the query never returns, or at least not after waiting 2 minutes. Shouldn't it have a ceiling of max_worker_processes?
The original test I performed where I was getting OOM errors now appears to be fine:
# explain (analyse, buffers, timing) select distinct bid from pgbench_accounts;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1400411.11..1400412.11 rows=100 width=4) (actual time=8504.333..8504.335 rows=13 loops=1)
Group Key: bid
Buffers: shared hit=32 read=18183
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..1375411.11 rows=10000000 width=4) (actual time=0.054..7183.494 rows=10000000 loops=1)
Number of Workers: 8
Number of Blocks Per Worker: 18215
Buffers: shared hit=32 read=18183
Planning time: 0.058 ms
Execution time: 8876.967 ms
(9 rows)
Time: 8877.366 ms
# explain (analyse, buffers, timing) select distinct bid from pgbench_accounts;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1400411.11..1400412.11 rows=100 width=4) (actual time=8504.333..8504.335 rows=13 loops=1)
Group Key: bid
Buffers: shared hit=32 read=18183
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..1375411.11 rows=10000000 width=4) (actual time=0.054..7183.494 rows=10000000 loops=1)
Number of Workers: 8
Number of Blocks Per Worker: 18215
Buffers: shared hit=32 read=18183
Planning time: 0.058 ms
Execution time: 8876.967 ms
(9 rows)
Time: 8877.366 ms
Note that I increased seq_page_cost to force a parallel scan in this case.
Thom
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Pavel StehuleДата:
Сообщение: Re: proposal: searching in array function - array_position