Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .

Поиск
Список
Период
Сортировка
От Satoshi Nagayasu
Тема Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .
Дата
Msg-id CAA8sozfo+wveFciCa3ZJ2X7pG9feDZQ8s+eqw1_NjbWFXsRXEg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .
Список pgsql-hackers
Hi,

2016-06-20 15:42 GMT+09:00 Amit Kapila <amit.kapila16@gmail.com>:
> On Mon, Jun 20, 2016 at 11:48 AM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>
>> Hi all,
>>
>> My colleague noticed that the output of EXPLAIN ANALYZE doesn't work
>> fine for parallel seq scan.
>>
>> postgres(1)=# explain analyze verbose select count(*) from
>> pgbench_accounts ;
>>                                                     QUERY PLAN
>>
>>
-----------------------------------------------------------------------------------------------------------------------------------------------------
>>  Finalize Aggregate  (cost=217018.55..217018.56 rows=1 width=8)
>> (actual time=2640.015..2640.015 rows=1 loops=1)
>>    Output: count(*)
>>    ->  Gather  (cost=217018.33..217018.54 rows=2 width=8) (actual
>> time=2639.064..2640.002 rows=3 loops=1)
>>          Output: (PARTIAL count(*))
>>          Workers Planned: 2
>>          Workers Launched: 2
>>          ->  Partial Aggregate  (cost=216018.33..216018.34 rows=1
>> width=8) (actual time=2632.714..2632.715 rows=1 loops=3)
>>                Output: PARTIAL count(*)
>>                Worker 0: actual time=2632.583..2632.584 rows=1 loops=1
>>                Worker 1: actual time=2627.517..2627.517 rows=1 loops=1
>>                ->  Parallel Seq Scan on public.pgbench_accounts
>> (cost=0.00..205601.67 rows=4166667 width=0) (actual
>> time=0.042..1685.542 rows=3333333 loops=3)
>>                      Worker 0: actual time=0.033..1657.486 rows=3457968
>> loops=1
>>                      Worker 1: actual time=0.039..1702.979 rows=3741069
>> loops=1
>>  Planning time: 1.026 ms
>>  Execution time: 2640.225 ms
>> (15 rows)
>>
>> For example, the above result shows,
>> Parallel Seq Scan : actual rows = 3333333
>> worker 0               : actual rows = 3457968
>> worker 1               : actual rows = 3741069
>> Summation of these is 10532370, but actual total rows is 10000000.
>> I think that Parallel Seq Scan should show actual rows =
>> 10000000(total rows) or actual rows = 2800963(rows collected by
>> itself). (10000000 maybe better)
>>
>
> You have to read the rows at Parallel Seq Scan nodes as total count of rows,
> but you have to consider the loops parameter as well.

IMHO, "actual rows" of "Parallel Seq Scan" should not be divided by the loops,
because the total rows processed here is 10000000, not 3333333 * 3.
I think the actual row value shown here "3333333 " is a bit confusing and tricky
for users.

>> After spent time to investigate this behaviour, ISTM that the problem
>> is nloops of Parallel Seq Scan.
>> Parallel Seq Scan is done only once, but nloops is incremented to 3.
>
> nloops here indicates, that it is done for 2 workers and a master backend.

Right, but I'm not sure "loops" is an appropriate word for this context.

>> So its "actual rows" is calculated 3333333(10000000 / 3) at
>> explain.c:L1223.
>>
>
> Thats how it should be considered.  You might want to compare the behaviour
> with other cases where value of nloops is used.
>
>> Is it a bug?
>>
>
> I don't think so.

I would like to propose a few modification for parallel queries to
make it more clear.

>>                ->  Parallel Seq Scan on public.pgbench_accounts (cost=0.00..205601.67 rows=4166667 width=0) (actual
time=0.042..1685.542rows=10000000 workers=2)
 
>>                      Parent: actual time=0.033..1657.486 rows=2800963loops=1
>>                      Worker 0: actual time=0.033..1657.486 rows=3457968 loops=1
>>                      Worker 1: actual time=0.039..1702.979 rows=3741069 loops=1

How about this?

Regards,
-- 
Satoshi Nagayasu <snaga@uptime.jp>



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .
Следующее
От: Amit Langote
Дата:
Сообщение: Re: Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .