Обсуждение: Open Item: Should non-text EXPLAIN always show properties?
Over on [1] Justin mentions that the non-text EXPLAIN ANALYZE should always show the "Disk Usage" and "HashAgg Batches" properties. I agree with this. show_wal_usage() is a good example of how we normally do things. We try to keep the text format as humanly readable as possible but don't really expect humans to be commonly reading the other supported formats, so we care less about including additional details there. There's also an open item regarding this for Incremental Sort, so I've CC'd James and Tomas here. This seems like a good place to discuss both. I've attached a small patch that changes the Hash Aggregate behaviour to always show these properties for non-text formats. Does anyone object to this? David [1] https://www.postgresql.org/message-id/20200619040624.GA17995%40telsasoft.com
Вложения
On Thu, Jun 25, 2020 at 5:15 AM David Rowley <dgrowleyml@gmail.com> wrote: > > Over on [1] Justin mentions that the non-text EXPLAIN ANALYZE should > always show the "Disk Usage" and "HashAgg Batches" properties. I > agree with this. show_wal_usage() is a good example of how we normally > do things. We try to keep the text format as humanly readable as > possible but don't really expect humans to be commonly reading the > other supported formats, so we care less about including additional > details there. > > There's also an open item regarding this for Incremental Sort, so I've > CC'd James and Tomas here. This seems like a good place to discuss > both. Yesterday I'd replied [1] to Justin's proposal for this WRT incremental sort and expressed my opinion that including both unnecessarily (i.e., including disk when an in-memory sort was used) is undesirable and confusing and leads to shortcuts I believe to be bad habits when using the data programmatically. On a somewhat related note, memory can be 0 but that doesn't mean no memory was used: it's a result of how tuplesort.c doesn't properly track memory usage when it switches to disk sort. The same isn't true in reverse (we don't have 0 disk when disk was used), but I do think it does show the idea that showing "empty" data isn't an inherent good. If there's a clear established pattern and/or most others seem to prefer Justin's proposed approach, then I'm not going to fight it hard. I just don't think it's the best approach. James [1] https://www.postgresql.org/message-id/CAAaqYe-LswZFUL4k5Dr6%3DEN6MJG1HurggcH4QzUs6UFqBbnQzQ%40mail.gmail.com
On Thu, Jun 25, 2020 at 8:42 AM James Coleman <jtc331@gmail.com> wrote: > Yesterday I'd replied [1] to Justin's proposal for this WRT > incremental sort and expressed my opinion that including both > unnecessarily (i.e., including disk when an in-memory sort was used) > is undesirable and confusing and leads to shortcuts I believe to be > bad habits when using the data programmatically. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jun 25, 2020 at 8:42 AM James Coleman <jtc331@gmail.com> wrote: >> Yesterday I'd replied [1] to Justin's proposal for this WRT >> incremental sort and expressed my opinion that including both >> unnecessarily (i.e., including disk when an in-memory sort was used) >> is undesirable and confusing and leads to shortcuts I believe to be >> bad habits when using the data programmatically. > +1. I think the policy about non-text output formats is "all applicable fields should be included automatically". But the key word there is "applicable". Are disk-sort numbers applicable when no disk sort happened? I think the right way to think about this is that we are building an output data structure according to a schema that should be fixed for any particular plan shape. If event X happened zero times in a given execution, but it could have happened in a different execution of the same plan, then we should print X with a zero count. If X could not happen period in this plan, we should omit X's entry. So the real question here is whether the disk vs memory decision is plan time vs run time. AFAIK it's run time, which leads me to think we ought to print the zeroes. regards, tom lane
On Thu, Jun 25, 2020 at 12:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > On Thu, Jun 25, 2020 at 8:42 AM James Coleman <jtc331@gmail.com> wrote: > >> Yesterday I'd replied [1] to Justin's proposal for this WRT > >> incremental sort and expressed my opinion that including both > >> unnecessarily (i.e., including disk when an in-memory sort was used) > >> is undesirable and confusing and leads to shortcuts I believe to be > >> bad habits when using the data programmatically. > > > +1. > > I think the policy about non-text output formats is "all applicable > fields should be included automatically". But the key word there is > "applicable". Are disk-sort numbers applicable when no disk sort > happened? > > I think the right way to think about this is that we are building > an output data structure according to a schema that should be fixed > for any particular plan shape. If event X happened zero times in > a given execution, but it could have happened in a different execution > of the same plan, then we should print X with a zero count. If X > could not happen period in this plan, we should omit X's entry. > > So the real question here is whether the disk vs memory decision is > plan time vs run time. AFAIK it's run time, which leads me to think > we ought to print the zeroes. Do we print zeroes for memory usage when all sorts ended up spilling to disk then? That might be the current behavior; I'd have to check. Because that's a lie, but we don't have any better information currently (which is unfortunate, but hardly in scope for fixing here.) James
James Coleman <jtc331@gmail.com> writes: > On Thu, Jun 25, 2020 at 12:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think the right way to think about this is that we are building >> an output data structure according to a schema that should be fixed >> for any particular plan shape. If event X happened zero times in >> a given execution, but it could have happened in a different execution >> of the same plan, then we should print X with a zero count. If X >> could not happen period in this plan, we should omit X's entry. > Do we print zeroes for memory usage when all sorts ended up spilling > to disk then? I did not claim that the pre-existing code adheres to this model completely faithfully ;-). But we ought to have a clear mental picture of what it is we're trying to achieve. If you don't like the above design, propose a different one. regards, tom lane
On Fri, 26 Jun 2020 at 04:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > On Thu, Jun 25, 2020 at 8:42 AM James Coleman <jtc331@gmail.com> wrote: > >> Yesterday I'd replied [1] to Justin's proposal for this WRT > >> incremental sort and expressed my opinion that including both > >> unnecessarily (i.e., including disk when an in-memory sort was used) > >> is undesirable and confusing and leads to shortcuts I believe to be > >> bad habits when using the data programmatically. > > > +1. > > I think the policy about non-text output formats is "all applicable > fields should be included automatically". But the key word there is > "applicable". Are disk-sort numbers applicable when no disk sort > happened? > > I think the right way to think about this is that we are building > an output data structure according to a schema that should be fixed > for any particular plan shape. If event X happened zero times in > a given execution, but it could have happened in a different execution > of the same plan, then we should print X with a zero count. If X > could not happen period in this plan, we should omit X's entry. > > So the real question here is whether the disk vs memory decision is > plan time vs run time. AFAIK it's run time, which leads me to think > we ought to print the zeroes. I think that's a pretty good way of thinking about it. For the HashAgg case, the plan could end up spilling, so based on what you've said, we should be printing those zeros as some other execution of the same plan could spill. If nobody objects to that very soon, then I'll go ahead and push the changes for HashAgg's non-text EXPLAIN ANALYZE David
On Thu, 25 Jun 2020 at 21:15, David Rowley <dgrowleyml@gmail.com> wrote: > I've attached a small patch that changes the Hash Aggregate behaviour > to always show these properties for non-text formats. I've pushed this change for HashAgg only and marked the open item as completed for hash agg. I'll leave it up to Justin, Tomas and James to decide what to do with the incremental sort EXPLAIN open item. David
On Thu, Jun 25, 2020 at 08:41:43AM -0400, James Coleman wrote: > On Thu, Jun 25, 2020 at 5:15 AM David Rowley <dgrowleyml@gmail.com> wrote: > > Over on [1] Justin mentions that the non-text EXPLAIN ANALYZE should > > always show the "Disk Usage" and "HashAgg Batches" properties. I > > agree with this. show_wal_usage() is a good example of how we normally > > do things. We try to keep the text format as humanly readable as > > possible but don't really expect humans to be commonly reading the > > other supported formats, so we care less about including additional > > details there. > > > > There's also an open item regarding this for Incremental Sort, so I've > > CC'd James and Tomas here. This seems like a good place to discuss > > both. > > Yesterday I'd replied [1] to Justin's proposal for this WRT > incremental sort and expressed my opinion that including both > unnecessarily (i.e., including disk when an in-memory sort was used) > is undesirable and confusing and leads to shortcuts I believe to be > bad habits when using the data programmatically. I have gone back and forth about this. The current non-text output for Incremental Sort is like: Sort Methods Used: + - "quicksort" + Sort Space Memory: + Average Sort Space Used: 26 + Peak Sort Space Used: 26 + explain.c determines whether to output in non-text mode by checking: | if (groupInfo->maxDiskSpaceUsed > 0) Which I think is per se wrong. Either it should use a test like: | if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT != 0) or it should output the "Sort Space" unconditionally. It does seem wrong if Incr Sort says "Sort Space Disk / Average: 0, Peak: 0" when there was no disk sort at all, and it wasn't listed as a "Sort Method". On the other hand, that's determined during execution, right? (Based on things like table content and table order and tuple width) ? David's argument in making the HashAgg's explain output unconditionally show Disk/Batch was that this is not known until execution time (based on table content). HashAgg now shows: SET work_mem='64 MB'; explain(format yaml, analyze) SELECT a ,COUNT(1) FROM generate_series(1,99999)a GROUP BY 1; ... Disk Usage: 0 + HashAgg Batches: 0 + So I think I still think incr sort should do the same, showing Disk:0. Otherwise, I think it should at least use a test like this, rather than (DiskSpaceUsed > 0): | if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT != 0) -- Justin