Обсуждение: Open Item: Should non-text EXPLAIN always show properties?

Поиск
Список
Период
Сортировка

Open Item: Should non-text EXPLAIN always show properties?

От
David Rowley
Дата:
Over on [1] Justin mentions that the non-text EXPLAIN ANALYZE should
always show the "Disk Usage" and "HashAgg Batches" properties.  I
agree with this. show_wal_usage() is a good example of how we normally
do things.  We try to keep the text format as humanly readable as
possible but don't really expect humans to be commonly reading the
other supported formats, so we care less about including additional
details there.

There's also an open item regarding this for Incremental Sort, so I've
CC'd James and Tomas here. This seems like a good place to discuss
both.

I've attached a small patch that changes the Hash Aggregate behaviour
to always show these properties for non-text formats.

Does anyone object to this?

David

[1] https://www.postgresql.org/message-id/20200619040624.GA17995%40telsasoft.com

Вложения

Re: Open Item: Should non-text EXPLAIN always show properties?

От
James Coleman
Дата:
On Thu, Jun 25, 2020 at 5:15 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> Over on [1] Justin mentions that the non-text EXPLAIN ANALYZE should
> always show the "Disk Usage" and "HashAgg Batches" properties.  I
> agree with this. show_wal_usage() is a good example of how we normally
> do things.  We try to keep the text format as humanly readable as
> possible but don't really expect humans to be commonly reading the
> other supported formats, so we care less about including additional
> details there.
>
> There's also an open item regarding this for Incremental Sort, so I've
> CC'd James and Tomas here. This seems like a good place to discuss
> both.

Yesterday I'd replied [1] to Justin's proposal for this WRT
incremental sort and expressed my opinion that including both
unnecessarily (i.e., including disk when an in-memory sort was used)
is undesirable and confusing and leads to shortcuts I believe to be
bad habits when using the data programmatically.

On a somewhat related note, memory can be 0 but that doesn't mean no
memory was used: it's a result of how tuplesort.c doesn't properly
track memory usage when it switches to disk sort. The same isn't true
in reverse (we don't have 0 disk when disk was used), but I do think
it does show the idea that showing "empty" data isn't an inherent
good.

If there's a clear established pattern and/or most others seem to
prefer Justin's proposed approach, then I'm not going to fight it
hard. I just don't think it's the best approach.

James

[1] https://www.postgresql.org/message-id/CAAaqYe-LswZFUL4k5Dr6%3DEN6MJG1HurggcH4QzUs6UFqBbnQzQ%40mail.gmail.com



Re: Open Item: Should non-text EXPLAIN always show properties?

От
Robert Haas
Дата:
On Thu, Jun 25, 2020 at 8:42 AM James Coleman <jtc331@gmail.com> wrote:
> Yesterday I'd replied [1] to Justin's proposal for this WRT
> incremental sort and expressed my opinion that including both
> unnecessarily (i.e., including disk when an in-memory sort was used)
> is undesirable and confusing and leads to shortcuts I believe to be
> bad habits when using the data programmatically.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Open Item: Should non-text EXPLAIN always show properties?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jun 25, 2020 at 8:42 AM James Coleman <jtc331@gmail.com> wrote:
>> Yesterday I'd replied [1] to Justin's proposal for this WRT
>> incremental sort and expressed my opinion that including both
>> unnecessarily (i.e., including disk when an in-memory sort was used)
>> is undesirable and confusing and leads to shortcuts I believe to be
>> bad habits when using the data programmatically.

> +1.

I think the policy about non-text output formats is "all applicable
fields should be included automatically".  But the key word there is
"applicable".  Are disk-sort numbers applicable when no disk sort
happened?

I think the right way to think about this is that we are building
an output data structure according to a schema that should be fixed
for any particular plan shape.  If event X happened zero times in
a given execution, but it could have happened in a different execution
of the same plan, then we should print X with a zero count.  If X
could not happen period in this plan, we should omit X's entry.

So the real question here is whether the disk vs memory decision is
plan time vs run time.  AFAIK it's run time, which leads me to think
we ought to print the zeroes.

            regards, tom lane



Re: Open Item: Should non-text EXPLAIN always show properties?

От
James Coleman
Дата:
On Thu, Jun 25, 2020 at 12:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, Jun 25, 2020 at 8:42 AM James Coleman <jtc331@gmail.com> wrote:
> >> Yesterday I'd replied [1] to Justin's proposal for this WRT
> >> incremental sort and expressed my opinion that including both
> >> unnecessarily (i.e., including disk when an in-memory sort was used)
> >> is undesirable and confusing and leads to shortcuts I believe to be
> >> bad habits when using the data programmatically.
>
> > +1.
>
> I think the policy about non-text output formats is "all applicable
> fields should be included automatically".  But the key word there is
> "applicable".  Are disk-sort numbers applicable when no disk sort
> happened?
>
> I think the right way to think about this is that we are building
> an output data structure according to a schema that should be fixed
> for any particular plan shape.  If event X happened zero times in
> a given execution, but it could have happened in a different execution
> of the same plan, then we should print X with a zero count.  If X
> could not happen period in this plan, we should omit X's entry.
>
> So the real question here is whether the disk vs memory decision is
> plan time vs run time.  AFAIK it's run time, which leads me to think
> we ought to print the zeroes.

Do we print zeroes for memory usage when all sorts ended up spilling
to disk then? That might be the current behavior; I'd have to check.
Because that's a lie, but we don't have any better information
currently (which is unfortunate, but hardly in scope for fixing here.)

James



Re: Open Item: Should non-text EXPLAIN always show properties?

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> On Thu, Jun 25, 2020 at 12:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think the right way to think about this is that we are building
>> an output data structure according to a schema that should be fixed
>> for any particular plan shape.  If event X happened zero times in
>> a given execution, but it could have happened in a different execution
>> of the same plan, then we should print X with a zero count.  If X
>> could not happen period in this plan, we should omit X's entry.

> Do we print zeroes for memory usage when all sorts ended up spilling
> to disk then?

I did not claim that the pre-existing code adheres to this model
completely faithfully ;-).  But we ought to have a clear mental
picture of what it is we're trying to achieve.  If you don't like
the above design, propose a different one.

            regards, tom lane



Re: Open Item: Should non-text EXPLAIN always show properties?

От
David Rowley
Дата:
On Fri, 26 Jun 2020 at 04:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, Jun 25, 2020 at 8:42 AM James Coleman <jtc331@gmail.com> wrote:
> >> Yesterday I'd replied [1] to Justin's proposal for this WRT
> >> incremental sort and expressed my opinion that including both
> >> unnecessarily (i.e., including disk when an in-memory sort was used)
> >> is undesirable and confusing and leads to shortcuts I believe to be
> >> bad habits when using the data programmatically.
>
> > +1.
>
> I think the policy about non-text output formats is "all applicable
> fields should be included automatically".  But the key word there is
> "applicable".  Are disk-sort numbers applicable when no disk sort
> happened?
>
> I think the right way to think about this is that we are building
> an output data structure according to a schema that should be fixed
> for any particular plan shape.  If event X happened zero times in
> a given execution, but it could have happened in a different execution
> of the same plan, then we should print X with a zero count.  If X
> could not happen period in this plan, we should omit X's entry.
>
> So the real question here is whether the disk vs memory decision is
> plan time vs run time.  AFAIK it's run time, which leads me to think
> we ought to print the zeroes.

I think that's a pretty good way of thinking about it.

For the HashAgg case, the plan could end up spilling, so based on what
you've said, we should be printing those zeros as some other execution
of the same plan could spill.

If nobody objects to that very soon, then I'll go ahead and push the
changes for HashAgg's non-text EXPLAIN ANALYZE

David



Re: Open Item: Should non-text EXPLAIN always show properties?

От
David Rowley
Дата:
On Thu, 25 Jun 2020 at 21:15, David Rowley <dgrowleyml@gmail.com> wrote:
> I've attached a small patch that changes the Hash Aggregate behaviour
> to always show these properties for non-text formats.

I've pushed this change for HashAgg only and marked the open item as
completed for hash agg.  I'll leave it up to Justin, Tomas and James
to decide what to do with the incremental sort EXPLAIN open item.

David



Re: Open Item: Should non-text EXPLAIN always show properties?

От
Justin Pryzby
Дата:
On Thu, Jun 25, 2020 at 08:41:43AM -0400, James Coleman wrote:
> On Thu, Jun 25, 2020 at 5:15 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > Over on [1] Justin mentions that the non-text EXPLAIN ANALYZE should
> > always show the "Disk Usage" and "HashAgg Batches" properties.  I
> > agree with this. show_wal_usage() is a good example of how we normally
> > do things.  We try to keep the text format as humanly readable as
> > possible but don't really expect humans to be commonly reading the
> > other supported formats, so we care less about including additional
> > details there.
> >
> > There's also an open item regarding this for Incremental Sort, so I've
> > CC'd James and Tomas here. This seems like a good place to discuss
> > both.
> 
> Yesterday I'd replied [1] to Justin's proposal for this WRT
> incremental sort and expressed my opinion that including both
> unnecessarily (i.e., including disk when an in-memory sort was used)
> is undesirable and confusing and leads to shortcuts I believe to be
> bad habits when using the data programmatically.

I have gone back and forth about this.

The current non-text output for Incremental Sort is like:

       Sort Methods Used:            +
         - "quicksort"               +
       Sort Space Memory:            +
         Average Sort Space Used: 26 +
         Peak Sort Space Used: 26    +

explain.c determines whether to output in non-text mode by checking:
| if (groupInfo->maxDiskSpaceUsed > 0)

Which I think is per se wrong.  Either it should use a test like:
| if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT != 0)
or it should output the "Sort Space" unconditionally.

It does seem wrong if Incr Sort says "Sort Space Disk / Average: 0, Peak: 0"
when there was no disk sort at all, and it wasn't listed as a "Sort Method".

On the other hand, that's determined during execution, right?  (Based on things
like table content and table order and tuple width) ?  David's argument in
making the HashAgg's explain output unconditionally show Disk/Batch was that
this is not known until execution time (based on table content).

HashAgg now shows:

SET work_mem='64 MB'; explain(format yaml, analyze) SELECT a ,COUNT(1) FROM generate_series(1,99999)a GROUP BY 1;
...
     Disk Usage: 0                       +
     HashAgg Batches: 0                  +

So I think I still think incr sort should do the same, showing Disk:0.

Otherwise, I think it should at least use a test like this, rather than (DiskSpaceUsed > 0):
| if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT != 0)

-- 
Justin