EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation

Поиск
Список
Период
Сортировка
От Maciek Sakrejda
Тема EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation
Дата
Msg-id CAOtHd0Bw6hsKV-TowuqYeTFhouER+Nds0i-oEd=AFzrKifGNiA@mail.gmail.com
обсуждение исходный текст
Ответы Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation  (Amit Kapila <amit.kapila16@gmail.com>)
Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hello,

I had some questions about the behavior of some accounting in parallel
EXPLAIN plans. Take the following plan:

```
Gather  (cost=1000.43..750173.74 rows=2 width=235) (actual
time=1665.122..1665.122 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=27683 read=239573
  I/O Timings: read=687.358
  ->  Nested Loop  (cost=0.43..749173.54 rows=1 width=235) (actual
time=1660.095..1660.095 rows=0 loops=3)
        Inner Unique: true
        Buffers: shared hit=77579 read=657847
        I/O Timings: read=2090.189
        Worker 0: actual time=1657.443..1657.443 rows=0 loops=1
          Buffers: shared hit=23645 read=208365
          I/O Timings: read=702.270
        Worker 1: actual time=1658.277..1658.277 rows=0 loops=1
          Buffers: shared hit=26251 read=209909
          I/O Timings: read=700.560
        ->  Parallel Seq Scan on public.schema_indices
(cost=0.00..748877.88 rows=35 width=235) (actual
time=136.744..1659.629 rows=32 loops=3)
              Filter: ((schema_indices.invalidated_at_snapshot_id IS
NULL) AND (NOT schema_indices.is_valid))
              Rows Removed by Filter: 703421
              Buffers: shared hit=77193 read=657847
              I/O Timings: read=2090.189
              Worker 0: actual time=69.248..1656.950 rows=32 loops=1
                Buffers: shared hit=23516 read=208365
                I/O Timings: read=702.270
              Worker 1: actual time=260.188..1657.875 rows=27 loops=1
                Buffers: shared hit=26140 read=209909
                I/O Timings: read=700.560
        ->  Index Scan using schema_tables_pkey on
public.schema_tables  (cost=0.43..8.45 rows=1 width=8) (actual
time=0.011..0.011 rows=0 loops=95)
              Index Cond: (schema_tables.id = schema_indices.table_id)
              Filter: (schema_tables.database_id = 123)
              Rows Removed by Filter: 1
              Buffers: shared hit=386
              Worker 0: actual time=0.011..0.011 rows=0 loops=32
                Buffers: shared hit=129
              Worker 1: actual time=0.011..0.011 rows=0 loops=27
                Buffers: shared hit=111
Planning Time: 0.429 ms
Execution Time: 1667.373 ms
```

The Nested Loop here aggregates data for metrics like `buffers read`
from its workers, and to calculate a metric like `buffers read` for
the parallel leader, we can subtract the values recorded in each
individual worker. This happens in the Seq Scan and Index Scan
children, as well. However, the Gather node appears to only include
values from its direct parallel leader child (excluding that child's
workers).

This leads to the odd situation that the Gather has lower values for
some of these metrics than its child (because the child node reporting
includes the worker metrics) even though the values are supposed to be
cumulative. This is even more surprising for something like I/O
timing, where the Gather has a lower `read` value than one of the
Nested Loop workers, which doesn't make sense in terms of wall-clock
time.

Is this behavior intentional? If so, is there an explanation of the
reasoning or the trade-offs involved? Would it not make sense to
propagate those cumulative parallel costs up the tree all the way to
the root, instead of only using the parallel leader metrics under
Gather?

Thanks,
Maciek



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Decomposing xml into table
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Default setting for enable_hashagg_disk