Обсуждение: Parallel stats in execution plans
Hi folks -
I'm having trouble understanding what some of the stats mean in the execution plan output when parallel workers are used. I've tried to read up about it, but I haven't been able to find anything that explains what I'm seeing. Apologies in advance if there's documentation I've been too stupid to find.
I've run the following query. The "towns" table is a massive table that I created in order to get some big numbers on a parallel query - don't worry, this isn't a real query I want to make faster, just a silly example I'd like to understand.
EXPLAIN (ANALYZE, FORMAT JSON, BUFFERS, VERBOSE) SELECT name, code, article FROM towns ORDER BY name ASC,code DESC;
The output looks like this:
[ { "Plan": { "Node Type": "Gather Merge", "Parallel Aware": false, "Startup Cost": 1013948.54, "Total Cost": 1986244.55, "Plan Rows": 8333384, "Plan Width": 77, "Actual Startup Time": 42978.838, "Actual Total Time": 60628.982, "Actual Rows": 10000010, "Actual Loops": 1, "Output": ["name", "code", "article"], "Workers Planned": 2, "Workers Launched": 2, "Shared Hit Blocks": 29, "Shared Read Blocks": 47641, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 91342, "Temp Written Blocks": 91479, "Plans": [ { "Node Type": "Sort", "Parent Relationship": "Outer", "Parallel Aware": false, "Startup Cost": 1012948.52, "Total Cost": 1023365.25, "Plan Rows": 4166692, "Plan Width": 77, "Actual Startup Time": 42765.496, "Actual Total Time": 48526.168, "Actual Rows": 3333337, "Actual Loops": 3, "Output": ["name", "code", "article"], "Sort Key": ["towns.name", "towns.code DESC"], "Sort Method": "external merge", "Sort Space Used": 283856, "Sort Space Type": "Disk", "Shared Hit Blocks": 170, "Shared Read Blocks": 142762, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 273289, "Temp Written Blocks": 273700, "Workers": [ { "Worker Number": 0, "Actual Startup Time": 42588.662, "Actual Total Time": 48456.662, "Actual Rows": 3277980, "Actual Loops": 1, "Shared Hit Blocks": 72, "Shared Read Blocks": 46794, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 89067, "Temp Written Blocks": 89202 }, { "Worker Number": 1, "Actual Startup Time": 42946.705, "Actual Total Time": 48799.414, "Actual Rows": 3385130, "Actual Loops": 1, "Shared Hit Blocks": 69, "Shared Read Blocks": 48327, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 92880, "Temp Written Blocks": 93019 } ], "Plans": [ { "Node Type": "Seq Scan", "Parent Relationship": "Outer", "Parallel Aware": true, "Relation Name": "towns", "Schema": "public", "Alias": "towns", "Startup Cost": 0.00, "Total Cost": 184524.92, "Plan Rows": 4166692, "Plan Width": 77, "Actual Startup Time": 0.322, "Actual Total Time": 8305.886, "Actual Rows": 3333337, "Actual Loops": 3, "Output": ["name", "code", "article"], "Shared Hit Blocks": 96, "Shared Read Blocks": 142762, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0, "Workers": [ { "Worker Number": 0, "Actual Startup Time": 0.105, "Actual Total Time": 8394.629, "Actual Rows": 3277980, "Actual Loops": 1, "Shared Hit Blocks": 35, "Shared Read Blocks": 46794, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0 }, { "Worker Number": 1, "Actual Startup Time": 0.113, "Actual Total Time": 8139.382, "Actual Rows": 3385130, "Actual Loops": 1, "Shared Hit Blocks": 32, "Shared Read Blocks": 48327, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0 } ] } ] } ] }, "Planning Time": 22.898, "Triggers": [ ], "Execution Time": 61133.161 } ]
Or a more slimmed-down version, with just the confusing fields:
[ { "Plan": { "Node Type": "Gather Merge", "Parallel Aware": false, "Actual Total Time": 60628.982, "Actual Rows": 10000010, "Actual Loops": 1, "Workers Planned": 2, "Workers Launched": 2, "Plans": [ { "Node Type": "Sort", "Parent Relationship": "Outer", "Parallel Aware": false, "Actual Total Time": 48526.168, "Actual Rows": 3333337, "Actual Loops": 3, "Workers": [ { "Worker Number": 0, "Actual Total Time": 48456.662, "Actual Rows": 3277980, "Actual Loops": 1 }, { "Worker Number": 1, "Actual Total Time": 48799.414, "Actual Rows": 3385130, "Actual Loops": 1 } ], "Plans": [ { "Node Type": "Seq Scan", "Parallel Aware": true, "Actual Total Time": 8305.886, "Actual Rows": 3333337, "Actual Loops": 3, "Workers": [ { "Worker Number": 0, "Actual Total Time": 8394.629, "Actual Rows": 3277980, "Actual Loops": 1 }, { "Worker Number": 1, "Actual Total Time": 8139.382, "Actual Rows": 3385130, "Actual Loops": 1 } ] } ] } ] }, "Execution Time": 61133.161 } ]
The things I'm struggling to understand are:
- How the time values combine with parallelism. For example, each execution of the sort node takes an average of 48.5s, over three loops. This makes a total running time of 145.5s. Even if this was perfectly distributed between the two workers, I would expect this to take 72.75s, which is more than the total execution time, so it can't take this long.
- How the row numbers combine with those in the "Workers" subkey. For example, in the Sort node, worker #0 produces 3,277,980 rows, while worker #1 produces 3,385,130 rows. The Sort node as a whole produces 3,333,337 rows per loop, for a total of 10,000,010 (the value in the gather merge node). I would have expected the number of rows produced by the two workers to sum to the number produced by the Sort node as a whole, either per loop or in total.
- How the "Actual Loops" values combine with those in the "Workers" subkey. For example, the "Sort" node has 3 loops, but each of the workers inside it have 1 loop. I would have expected either:
- each of the workers to have done 3 loops (since the sort is executed 3 times), or
- the number of loops in the two workers to sum to three (if the three executions of the sort are distributed across the two workers)
- each of the workers to have done 3 loops (since the sort is executed 3 times), or
Other info about my setup:
- Postgres version: "PostgreSQL 10.4, compiled by Visual C++ build 1800, 64-bit"
- I installed Postgres using the EnterpriseDB one-click installer
- OS: Windows 10, v1803
- I'm using Jetbrains Datagrip to connect to Postgres
- No errors are logged.
- Altered config settings:
application_name PostgreSQL JDBC Driver session client_encoding UTF8 client DateStyle ISO, MDY client default_text_search_config pg_catalog.english configuration file dynamic_shared_memory_type windows configuration file extra_float_digits 3 session lc_messages English_United States.1252 configuration file lc_monetary English_United States.1252 configuration file lc_numeric English_United States.1252 configuration file lc_time English_United States.1252 configuration file listen_addresses * configuration file log_destination stderr configuration file logging_collector on configuration file max_connections 100 configuration file max_stack_depth 2MB environment variable port 5432 configuration file shared_buffers 128MB configuration file TimeZone UTC client
Thanks in advance, I'm sure I've done something silly or misunderstood something obvious but I can't work out what it is for the life of me.
Dave
It seems like no-one has any ideas on this - does anyone know anywhere else I can try to look/ask to find out more?
Is it possible that this is a bug?
Thanks
Dave
Hi folks -
I'm having trouble understanding what some of the stats mean in the execution plan output when parallel workers are used. I've tried to read up about it, but I haven't been able to find anything that explains what I'm seeing. Apologies in advance if there's documentation I've been too stupid to find.
I've run the following query. The "towns" table is a massive table that I created in order to get some big numbers on a parallel query - don't worry, this isn't a real query I want to make faster, just a silly example I'd like to understand.
EXPLAIN (ANALYZE, FORMAT JSON, BUFFERS, VERBOSE) SELECT name, code, article FROM towns ORDER BY name ASC,code DESC;The output looks like this:
[ { "Plan": { "Node Type": "Gather Merge", "Parallel Aware": false, "Startup Cost": 1013948.54, "Total Cost": 1986244.55, "Plan Rows": 8333384, "Plan Width": 77, "Actual Startup Time": 42978.838, "Actual Total Time": 60628.982, "Actual Rows": 10000010, "Actual Loops": 1, "Output": ["name", "code", "article"], "Workers Planned": 2, "Workers Launched": 2, "Shared Hit Blocks": 29, "Shared Read Blocks": 47641, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 91342, "Temp Written Blocks": 91479, "Plans": [ { "Node Type": "Sort", "Parent Relationship": "Outer", "Parallel Aware": false, "Startup Cost": 1012948.52, "Total Cost": 1023365.25, "Plan Rows": 4166692, "Plan Width": 77, "Actual Startup Time": 42765.496, "Actual Total Time": 48526.168, "Actual Rows": 3333337, "Actual Loops": 3, "Output": ["name", "code", "article"], "Sort Key": ["towns.name", "towns.code DESC"], "Sort Method": "external merge", "Sort Space Used": 283856, "Sort Space Type": "Disk", "Shared Hit Blocks": 170, "Shared Read Blocks": 142762, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 273289, "Temp Written Blocks": 273700, "Workers": [ { "Worker Number": 0, "Actual Startup Time": 42588.662, "Actual Total Time": 48456.662, "Actual Rows": 3277980, "Actual Loops": 1, "Shared Hit Blocks": 72, "Shared Read Blocks": 46794, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 89067, "Temp Written Blocks": 89202 }, { "Worker Number": 1, "Actual Startup Time": 42946.705, "Actual Total Time": 48799.414, "Actual Rows": 3385130, "Actual Loops": 1, "Shared Hit Blocks": 69, "Shared Read Blocks": 48327, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 92880, "Temp Written Blocks": 93019 } ], "Plans": [ { "Node Type": "Seq Scan", "Parent Relationship": "Outer", "Parallel Aware": true, "Relation Name": "towns", "Schema": "public", "Alias": "towns", "Startup Cost": 0.00, "Total Cost": 184524.92, "Plan Rows": 4166692, "Plan Width": 77, "Actual Startup Time": 0.322, "Actual Total Time": 8305.886, "Actual Rows": 3333337, "Actual Loops": 3, "Output": ["name", "code", "article"], "Shared Hit Blocks": 96, "Shared Read Blocks": 142762, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0, "Workers": [ { "Worker Number": 0, "Actual Startup Time": 0.105, "Actual Total Time": 8394.629, "Actual Rows": 3277980, "Actual Loops": 1, "Shared Hit Blocks": 35, "Shared Read Blocks": 46794, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0 }, { "Worker Number": 1, "Actual Startup Time": 0.113, "Actual Total Time": 8139.382, "Actual Rows": 3385130, "Actual Loops": 1, "Shared Hit Blocks": 32, "Shared Read Blocks": 48327, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0 } ] } ] } ] }, "Planning Time": 22.898, "Triggers": [ ], "Execution Time": 61133.161 } ]Or a more slimmed-down version, with just the confusing fields:
[ { "Plan": { "Node Type": "Gather Merge", "Parallel Aware": false, "Actual Total Time": 60628.982, "Actual Rows": 10000010, "Actual Loops": 1, "Workers Planned": 2, "Workers Launched": 2, "Plans": [ { "Node Type": "Sort", "Parent Relationship": "Outer", "Parallel Aware": false, "Actual Total Time": 48526.168, "Actual Rows": 3333337, "Actual Loops": 3, "Workers": [ { "Worker Number": 0, "Actual Total Time": 48456.662, "Actual Rows": 3277980, "Actual Loops": 1 }, { "Worker Number": 1, "Actual Total Time": 48799.414, "Actual Rows": 3385130, "Actual Loops": 1 } ], "Plans": [ { "Node Type": "Seq Scan", "Parallel Aware": true, "Actual Total Time": 8305.886, "Actual Rows": 3333337, "Actual Loops": 3, "Workers": [ { "Worker Number": 0, "Actual Total Time": 8394.629, "Actual Rows": 3277980, "Actual Loops": 1 }, { "Worker Number": 1, "Actual Total Time": 8139.382, "Actual Rows": 3385130, "Actual Loops": 1 } ] } ] } ] }, "Execution Time": 61133.161 } ]The things I'm struggling to understand are:
- How the time values combine with parallelism. For example, each execution of the sort node takes an average of 48.5s, over three loops. This makes a total running time of 145.5s. Even if this was perfectly distributed between the two workers, I would expect this to take 72.75s, which is more than the total execution time, so it can't take this long.
- How the row numbers combine with those in the "Workers" subkey. For example, in the Sort node, worker #0 produces 3,277,980 rows, while worker #1 produces 3,385,130 rows. The Sort node as a whole produces 3,333,337 rows per loop, for a total of 10,000,010 (the value in the gather merge node). I would have expected the number of rows produced by the two workers to sum to the number produced by the Sort node as a whole, either per loop or in total.
- How the "Actual Loops" values combine with those in the "Workers" subkey. For example, the "Sort" node has 3 loops, but each of the workers inside it have 1 loop. I would have expected either:
- each of the workers to have done 3 loops (since the sort is executed 3 times), or
- the number of loops in the two workers to sum to three (if the three executions of the sort are distributed across the two workers)
Other info about my setup:
- Postgres version: "PostgreSQL 10.4, compiled by Visual C++ build 1800, 64-bit"
- I installed Postgres using the EnterpriseDB one-click installer
- OS: Windows 10, v1803
- I'm using Jetbrains Datagrip to connect to Postgres
- No errors are logged.
- Altered config settings:
application_name PostgreSQL JDBC Driver session client_encoding UTF8 client DateStyle ISO, MDY client default_text_search_config pg_catalog.english configuration file dynamic_shared_memory_type windows configuration file extra_float_digits 3 session lc_messages English_United States.1252 configuration file lc_monetary English_United States.1252 configuration file lc_numeric English_United States.1252 configuration file lc_time English_United States.1252 configuration file listen_addresses * configuration file log_destination stderr configuration file logging_collector on configuration file max_connections 100 configuration file max_stack_depth 2MB environment variable port 5432 configuration file shared_buffers 128MB configuration file TimeZone UTC client Thanks in advance, I'm sure I've done something silly or misunderstood something obvious but I can't work out what it is for the life of me.
Dave
On Thu, 17 Jan 2019 at 00:31, David Conlin <dc345@cantab.net> wrote: > How the time values combine with parallelism. For example, each execution of the sort node takes an average of 48.5s, overthree loops. This makes a total running time of 145.5s. Even if this was perfectly distributed between the two workers,I would expect this to take 72.75s, which is more than the total execution time, so it can't take this long. > How the row numbers combine with those in the "Workers" subkey. For example, in the Sort node, worker #0 produces 3,277,980rows, while worker #1 produces 3,385,130 rows. The Sort node as a whole produces 3,333,337 rows per loop, for atotal of 10,000,010 (the value in the gather merge node). I would have expected the number of rows produced by the two workersto sum to the number produced by the Sort node as a whole, either per loop or in total. > How the "Actual Loops" values combine with those in the "Workers" subkey. For example, the "Sort" node has 3 loops, buteach of the workers inside it have 1 loop. I would have expected either: > > each of the workers to have done 3 loops (since the sort is executed 3 times), or > the number of loops in the two workers to sum to three (if the three executions of the sort are distributed across thetwo workers) It's important to know that all of the actual row counts and actual time are divided by the number of loops, which in this case is 3, one per process working on that part of the plan. There are two workers, but also the main process helps out too. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services