Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))
Дата
Msg-id CAKFQuwbnT_B9i4NFWXX0V=xdcsTi0XTHvm6NVcr8XHi39O_A7w@mail.gmail.com
обсуждение исходный текст
Ответ на explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
On Mon, Jan 24, 2022 at 9:54 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
I'm renaming this thread for better visibility, since buffers is a small,
optional part of the patches I sent.

I made a CF entry here.
https://commitfest.postgresql.org/36/3409/

On Wed, Dec 01, 2021 at 06:58:20PM -0600, Justin Pryzby wrote:
> On Mon, Nov 15, 2021 at 01:09:54PM -0600, Justin Pryzby wrote:
> > Some time ago, I had a few relevant patches:
> > 1) add explain(REGRESS) which is shorthand for (BUFFERS OFF, TIMING OFF, COSTS OFF, SUMMARY OFF)
> > 2) add explain(MACHINE) which elides machine-specific output from explain;
> >    for example, Heap Fetches, sort spaceUsed, hash nbuckets, and tidbitmap stuff.
> >
> > https://www.postgresql.org/message-id/flat/20200306213310.GM684@telsasoft.com
>
> The attached patch series now looks like this (some minor patches are not
> included in this list):
>
>  1. add GUC explain_regress, which disables TIMING, SUMMARY, COSTS;
 
>  2. enable explain(BUFFERS) by default (but disabled by explain_regress);

+1
 
>  3. Add explain(MACHINE) - which is disabled by explain_regress.
>     This elides various machine-specific output like Memory and Disk use.
>     Maybe it should be called something else like "QUIET" or "VERBOSE_MINUS_1"
>     or ??

INCIDENTALS ?

Aside from being 4 syllables and, for me at least, a pain to remember how to spell, it seems to fit.

RUNTIME is probably easier, and we just need to point out the difficulty in naming things since actual rows is still shown, and timings have their own independent option.

>
> The regression tests now automatically run with explain_regress=on, which is
> shorthand for TIMING OFF, SUMMARY OFF, COSTS OFF, and then BUFFERS OFF.
>
> There's a further option called explain(MACHINE) which can be disabled to hide
> portions of the output that are unstable, like Memory/Disk/Batches/
> Heap Fetches/Heap Blocks.  This allows "explain analyze" to be used more easily
> in regression tests, and simplifies some existing tests that currently use
> plpgsql functions to filter the output.  But it doesn't handle all the
> variations from parallel workers.
>
> (3) is optional, but simplifies some regression tests.  The patch series could
> be rephrased with (3) first.

I like the idea of encapsulating the logic about what to show into the generation code instead of a post-processing routine. 

I don't see any particular ordering of the three changes being most clear.

>
> Unfortunately, "COSTS OFF" breaks postgres_fdw remote_estimate.  If specifying
> "COSTS ON" in postgres_fdw.c is considered to be a poor fix

Given that we have version tests scattered throughout the postgres_fdw.c code I'd say protect it with version >= 9.0 and call it good.

But I'm assuming that we are somehow also sending over the GUC to the remote server (which will be v16+) but I am unsure why that would be the case.  I'm done code reading for now so I'm leaving this an open request/question.

, then I suppose
> this patch series could do as suggested and enable buffers by default only when
> ANALYZE is specified.  Then postgres_fdw is not affected, and the
> explain_regress GUC is optional: instead, we'd need to specify BUFFERS OFF in
> ~100 regression tests which use EXPLAIN ANALYZE.
I'm not following the transition from the prior sentences about COSTS to this one regarding BUFFERS.

  (3) still seems useful on its
> own.

It is an orthogonal feature with a different section of our user base being catered to, so I'd agree by default.

Patch Review Comments:


The following change in the machine commit seems out-of-place; are we fixing a bug here?

explain.c:
   /* Show buffer usage in planning */
- if (bufusage)
+ if (bufusage && es->buffers)



Typo (trailing comment // without comment):

ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es); //


I did a pretty thorough look-through of the locations of the various changes and everything seems consistent with the established code in this area (took me a bit to get to the level of comprehension though).  Whether there may be something missing I'm less able to judge.

From reading the other main thread I'm not finding this particular choice of GUC usage to be a problem anyone has raised and it does seem the cleanest solution, both for us and for anyone out there with a similar testing framework.

The machine output option covers an obvious need since we've already written ad-hoc parsing code to deal with the problem.

And, as someone who sees first-hand the kinds of conversations that occur surrounding beginner's questions, and getting more into performance questions specifically, I'm sympathetic with the desire to have BUFFERS something that is output by default.  Given existing buy-in for that idea I'd hope that at minimum the "update 100 .sql and expected output files, then change the default" commit happens even if the rest take some further discussion.  Though with even just a bit of attention I don't see any real problems getting the whole thing resolved one way or another.

David J.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Следующее
От: David Rowley
Дата:
Сообщение: Re: Add proper planner support for ORDER BY / DISTINCT aggregates