Buffers from parallel workers not accumulated to upper nodes with gather merge

Поиск
Список
Период
Сортировка
От Jehan-Guillaume de Rorthais
Тема Buffers from parallel workers not accumulated to upper nodes with gather merge
Дата
Msg-id 20200718160206.584532a2@firost
обсуждение исходный текст
Ответы Re: Buffers from parallel workers not accumulated to upper nodes with gather merge  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-bugs
Hi,

A bug report[1] on pev2 project exposes another bug from explain analyze
buffers, similar to the one fixed 2 years ago[2].

In this report, buffers from the workers are not propagated to upper nodes
on top of the Gather Merge node. On the opposite, buffers from workers ARE
propagated to upper nodes on top of a Gather node.

Here is some SQL to reproduce:

  CREATE TABLE contacts (contactSourceName text, filler text)
      WITH ( autovacuum_enabled = off);
  INSERT INTO contacts
      SELECT rpad(chr(64+i)::text,10,'0'),
          rpad(j::text,500,'0')
      FROM generate_series(1,26) i
      CROSS JOIN  generate_series(1,3200) j;
  
  SET work_mem TO '4MB';
  SET random_page_cost to 4;
  SET jit TO off ;
  SET parallel_setup_cost TO 0;
  SET parallel_tuple_cost TO 0;
  
  EXPLAIN (ANALYZE, BUFFERS, VERBOSE, COSTS OFF, TIMING OFF)
  SELECT contactsourceName, count(*)
  FROM contacts
  GROUP BY contactSourceName;

  Finalize GroupAggregate (actual rows=26 loops=1)
    Buffers: shared hit=2087 read=1463 written=32
    ->  Gather Merge (actual rows=70 loops=1)
          Workers Planned: 2
          Workers Launched: 2
          Buffers: shared hit=3910 read=2049 written=96
          ->  Sort (actual rows=23 loops=3)
                Buffers: shared hit=3910 read=2049 written=96
                Worker 0:
                  Buffers: shared hit=914 read=287 written=32
                Worker 1:
                  Buffers: shared hit=909 read=299 written=32
  [...]

Gather Merge exposes shared hit=3910.
Finalize GroupAggregate exposes shared hit=2087. 
The 909 and 914 buffers from workers have not been propagated.

Comparing with a Gather node:

  EXPLAIN (ANALYZE, BUFFERS, VERBOSE, COSTS OFF, TIMING OFF) SELECT count(*) AS
  c FROM contacts;

  Finalize Aggregate (actual rows=1 loops=1)
    Output: count(*)
    Buffers: shared hit=3894 read=2049 written=96
    ->  Gather (actual rows=3 loops=1)
          Workers Planned: 2
          Workers Launched: 2
          Buffers: shared hit=3894 read=2049 written=96
          ->  Partial Aggregate (actual rows=1 loops=3)
                Buffers: shared hit=3894 read=2049 written=96
                Worker 0:
                  Buffers: shared hit=870 read=299 written=32
                Worker 1:
                  Buffers: shared hit=887 read=301 written=32
  [...]

Buffers are correctly propagated to the Finalize Aggregate node.

I'm not perfectly clear with the code, but here is my analysis of the issue
and a fix proposal.

The Merge node works correctly because it calls ExecShutdownGatherWorkers as
soon as the workers are exhausted from gather_readnext. Because of this,
buffers from workers are already accounted and propagated to upper nodes
before the recursive call of ExecShutdownNode on each nodes. There's no similar
call to ExecShutdownGatherMergeWorkers for Gather Merge. Adding a call to
ExecShutdownGatherMergeWorkers in gather_merge_getnext when workers are
exhausted seems to fix the issue, but I feel like this is the wrong place to
fix this issue.

During the recursive call on ExecShutdownNode, the upper nodes call
InstrStartNode *after* the workers buffers have been propagated into
pgBufferUsage. Because of this, pgBufferUsage doesn't move anymore before the
call to InstrStopNode and buffers from parallel workers are only accounted up
to the Gather node, but not to upper node.

Note that this bug might similarly affect WAL counters as well but I did not try
to reproduce or analyse it.

Moving the call of InstrStartNode before the recurring call seems to fix the
issue for buffers, and might impact WALs counters as well, by setting
bufusage_start before buffers from workers are accumulated in pgBufferUsage.

You'll find in attachment a patch implementing this fix. Note that I asked
"stephanesql" on github's issue if he wants to be credited as reporter and
under what name he would like to appear there.

Regards,

[1] https://github.com/dalibo/pev2/issues/240
[2] see 85c9d3475e4f680dbca7c04fe096af018f3b8760 and
https://www.postgresql.org/message-id/flat/86137f17-1dfb-42f9-7421-82fd786b04a1%40anayrat.info

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: Reported type mismatch improperly
Следующее
От: Andy Fan
Дата:
Сообщение: Re: Reported type mismatch improperly