Обсуждение: Explain: Duplicate key "Workers" in JSON format

Поиск
Список
Период
Сортировка

Explain: Duplicate key "Workers" in JSON format

От
Pierre Giraud
Дата:
Hi,

I'm currently working on a tool to visualize an execution plan [1]. For
those who know PEV, it's actually a fork of this great tool since it
hasn't been active for more than 2 years.

Among other things, I'd like to show information for the parallel
queries. First of which is information about the workers.

I'm facing a problem when I am trying to parse a plan in the JSON
format. The "Workers" key may be duplicated.

While it's not invalid to have keys with the same name several times at
the same level in a JSON object, it makes it almost impossible to get
the full info when parsed. Indeed when parsing such a JSON string only
the last key is kept. Part of the information is lost.

JSON validators warn us with the following message : "Duplicate key,
names should be unique."

Here's an example of a plan in VERBOSE mode.

[
  {
    "Plan": {
      "Node Type": "Gather Merge",
      "Parallel Aware": false,
      "Actual Startup Time": 1720.052,
      "Actual Total Time": 4252.290,
      "Actual Rows": 10000000,
      "Actual Loops": 1,
      "Output": ["c1", "c2"],
      "Workers Planned": 2,
      "Workers Launched": 2,
      "Plans": [
        {
          "Node Type": "Sort",
          "Parent Relationship": "Outer",
          "Parallel Aware": false,
          "Actual Startup Time": 1558.638,
          "Actual Total Time": 2127.522,
          "Actual Rows": 3333333,
          "Actual Loops": 3,
          "Output": ["c1", "c2"],
          "Sort Key": ["t1.c1"],
          "Sort Method": "external merge",
          "Sort Space Used": 126152,
          "Sort Space Type": "Disk",
          "Workers": [
            {
              "Worker Number": 0,
              "Sort Method": "external merge",
              "Sort Space Used": 73552,
              "Sort Space Type": "Disk"
            },
            {
              "Worker Number": 1,
              "Sort Method": "external merge",
              "Sort Space Used": 73320,
              "Sort Space Type": "Disk"
            }
          ],
          "Workers": [
            {
              "Worker Number": 0,
              "Actual Startup Time": 1487.846,
              "Actual Total Time": 1996.879,
              "Actual Rows": 2692973,
              "Actual Loops": 1
            },
            {
              "Worker Number": 1,
              "Actual Startup Time": 1468.256,
              "Actual Total Time": 2012.744,
              "Actual Rows": 2684443,
              "Actual Loops": 1
            }
          ],
          "Plans": [
            {
              "Node Type": "Seq Scan",
              "Parent Relationship": "Outer",
              "Parallel Aware": true,
              "Relation Name": "t1",
              "Schema": "public",
              "Alias": "t1",
              "Actual Startup Time": 0.211,
              "Actual Total Time": 372.858,
              "Actual Rows": 3333333,
              "Actual Loops": 3,
              "Output": ["c1", "c2"],
              "Workers": [
                {
                  "Worker Number": 0,
                  "Actual Startup Time": 0.029,
                  "Actual Total Time": 368.356,
                  "Actual Rows": 2692973,
                  "Actual Loops": 1
                },
                {
                  "Worker Number": 1,
                  "Actual Startup Time": 0.033,
                  "Actual Total Time": 368.874,
                  "Actual Rows": 2684443,
                  "Actual Loops": 1
                }
              ]
            }
          ]
        }
      ]
    },
    "Planning Time": 0.170,
    "Triggers": [
    ],
    "Execution Time": 4695.141
  }
]

As you can see, the "Workers" key is duplicated in the Sort node.

Here's the equivalent in TEXT format:

---------------------------------
 Gather Merge  (cost=735306.27..1707599.95 rows=8333364 width=17)
(actual time=1560.468..3749.583 rows=10000000 loops=1)
   Output: c1, c2
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=734306.25..744722.95 rows=4166682 width=17) (actual
time=1474.182..1967.788 rows=3333333 loops=3)
         Output: c1, c2
         Sort Key: t1.c1
         Sort Method: external merge  Disk: 125168kB
         Worker 0:  Sort Method: external merge  Disk: 73768kB
         Worker 1:  Sort Method: external merge  Disk: 74088kB
         Worker 0: actual time=1431.136..1883.370 rows=2700666 loops=1
         Worker 1: actual time=1431.175..1891.630 rows=2712505 loops=1
         ->  Parallel Seq Scan on public.t1  (cost=0.00..105264.82
rows=4166682 width=17) (actual time=0.214..386.014 rows=3333333 loops=3)
               Output: c1, c2
               Worker 0: actual time=0.027..382.325 rows=2700666 loops=1
               Worker 1: actual time=0.038..384.951 rows=2712505 loops=1
 Planning Time: 0.180 ms
 Execution Time: 4166.867 ms
(18 rows)
---------------------------------

I think that the text format should stay as is.

For the JSON format however it would be better in my opinion if
"Workers" data is merged. Parsing should not imply anything else than
"var myObj = JSON.parse(theJsonString);".

What do you think?

Thanks.

[1] https://dalibo.github.io/pev2/



Re: Explain: Duplicate key "Workers" in JSON format

От
Andrew Dunstan
Дата:
On 8/23/19 8:47 AM, Pierre Giraud wrote:
>       "Plans": [
>         {
>           "Node Type": "Sort",
>           "Parent Relationship": "Outer",
>           "Parallel Aware": false,
>           "Actual Startup Time": 1558.638,
>           "Actual Total Time": 2127.522,
>           "Actual Rows": 3333333,
>           "Actual Loops": 3,
>           "Output": ["c1", "c2"],
>           "Sort Key": ["t1.c1"],
>           "Sort Method": "external merge",
>           "Sort Space Used": 126152,
>           "Sort Space Type": "Disk",
>           "Workers": [
>             {
>               "Worker Number": 0,
>               "Sort Method": "external merge",
>               "Sort Space Used": 73552,
>               "Sort Space Type": "Disk"
>             },
>             {
>               "Worker Number": 1,
>               "Sort Method": "external merge",
>               "Sort Space Used": 73320,
>               "Sort Space Type": "Disk"
>             }
>           ],
>           "Workers": [
>             {
>               "Worker Number": 0,
>               "Actual Startup Time": 1487.846,
>               "Actual Total Time": 1996.879,
>               "Actual Rows": 2692973,
>               "Actual Loops": 1
>             },
>             {
>               "Worker Number": 1,
>               "Actual Startup Time": 1468.256,
>               "Actual Total Time": 2012.744,
>               "Actual Rows": 2684443,
>               "Actual Loops": 1
>             }
>           ],


Yes, that's really awful.


...


> I think that the text format should stay as is.
>
> For the JSON format however it would be better in my opinion if
> "Workers" data is merged. Parsing should not imply anything else than
> "var myObj = JSON.parse(theJsonString);".
>
> What do you think?
>

Maybe a simpler fix would be to rename one set of nodes to
"Sort-Workers" or some such.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Explain: Duplicate key "Workers" in JSON format

От
Jehan-Guillaume de Rorthais
Дата:
Hi hackers,

On Fri, 23 Aug 2019 09:30:59 -0400
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

> On 8/23/19 8:47 AM, Pierre Giraud wrote:
> >       "Plans": [
> >         {
> >           "Node Type": "Sort",
> >           "Parent Relationship": "Outer",
> >           "Parallel Aware": false,
> >           "Actual Startup Time": 1558.638,
> >           "Actual Total Time": 2127.522,
> >           "Actual Rows": 3333333,
> >           "Actual Loops": 3,
> >           "Output": ["c1", "c2"],
> >           "Sort Key": ["t1.c1"],
> >           "Sort Method": "external merge",
> >           "Sort Space Used": 126152,
> >           "Sort Space Type": "Disk",
> >           "Workers": [
> >             {
> >               "Worker Number": 0,
> >               "Sort Method": "external merge",
> >               "Sort Space Used": 73552,
> >               "Sort Space Type": "Disk"
> >             },
> >             {
> >               "Worker Number": 1,
> >               "Sort Method": "external merge",
> >               "Sort Space Used": 73320,
> >               "Sort Space Type": "Disk"
> >             }
> >           ],
> >           "Workers": [
> >             {
> >               "Worker Number": 0,
> >               "Actual Startup Time": 1487.846,
> >               "Actual Total Time": 1996.879,
> >               "Actual Rows": 2692973,
> >               "Actual Loops": 1
> >             },
> >             {
> >               "Worker Number": 1,
> >               "Actual Startup Time": 1468.256,
> >               "Actual Total Time": 2012.744,
> >               "Actual Rows": 2684443,
> >               "Actual Loops": 1
> >             }
> >           ],  
> 
> Yes, that's really awful.
> 
> 
> ...
> 
> 
> > I think that the text format should stay as is.
> >
> > For the JSON format however it would be better in my opinion if
> > "Workers" data is merged. Parsing should not imply anything else than
> > "var myObj = JSON.parse(theJsonString);".
> >
> > What do you think?

Please, find in attachment a patch that merge parallel data together.

Note that duplicated keys are discouraged in json specs, but they are forbidden
in yaml ones. See link in the commit message.

Regards,

Вложения

Re: Explain: Duplicate key "Workers" in JSON format

От
Alvaro Herrera
Дата:
We have not fixed this, have we?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Explain: Duplicate key "Workers" in JSON format

От
Andres Freund
Дата:
On 2020-03-31 14:06:50 -0300, Alvaro Herrera wrote:
> We have not fixed this, have we?

I thought we did:

commit 10013684970453a0ddc86050bba813c611114321
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   2020-01-25 18:16:42 -0500

    Clean up EXPLAIN's handling of per-worker details.

    Previously, it was possible for EXPLAIN ANALYZE of a parallel query
    to produce several different "Workers" fields for a single plan node,
    because different portions of explain.c independently generated
    per-worker data and wrapped that output in separate fields.  This
    is pretty bogus, especially for the structured output formats: even
    if it's not technically illegal, most programs would have a hard time
    dealing with such data.



Re: Explain: Duplicate key "Workers" in JSON format

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> We have not fixed this, have we?

Isn't this 10013684970453a0ddc86050bba813c611114321 ?

            regards, tom lane



Re: Explain: Duplicate key "Workers" in JSON format

От
Alvaro Herrera
Дата:
On 2020-Mar-31, Tom Lane wrote:

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > We have not fixed this, have we?
> 
> Isn't this 10013684970453a0ddc86050bba813c611114321 ?

Ah, right, another thread reporting the same thing, two months after
this one.

Thanks both :-)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services