Re: explain HashAggregate to report bucket and memory stats

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: explain HashAggregate to report bucket and memory stats
Дата
Msg-id 20200306213310.GM684@telsasoft.com
обсуждение исходный текст
Ответ на Re: explain HashAggregate to report bucket and memory stats  (Andres Freund <andres@anarazel.de>)
Ответы Re: explain HashAggregate to report bucket and memory stats  (Andres Freund <andres@anarazel.de>)
Re: explain HashAggregate to report bucket and memory stats  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Fri, Mar 06, 2020 at 09:58:59AM -0800, Andres Freund wrote:
> > +            ExplainIndentText(es);
> > +            appendStringInfo(es->str,
> > +                        "Buckets: %ld (originally %ld)",
> > +                        inst->nbuckets,
> > +                        inst->nbuckets_original);
> 
> I'm not sure I like the alternative output formats here. All the other
> fields are separated with a comma, but the original size is in
> parens. I'd probably just format it as "Buckets: %lld " and then add
> ", Original Buckets: %lld" when differing.

It's done that way for consistency with hashJoin in show_hash_info().

> > +/* Update instrumentation stats */
> > +void
> > +UpdateTupleHashTableStats(TupleHashTable hashtable, bool initial)
> > +{
> > +    hashtable->instrument.nbuckets = hashtable->hashtab->size;
> > +    if (initial)
> > +    {
> > +        hashtable->instrument.nbuckets_original = hashtable->hashtab->size;
> > +        hashtable->instrument.space_peak_hash = hashtable->hashtab->size *
> > +            sizeof(TupleHashEntryData);
> > +        hashtable->instrument.space_peak_tuples = 0;
> > +    }
> > +    else
> > +    {
> > +#define maxself(a,b) a=Max(a,b)
> > +        /* hashtable->entrysize includes additionalsize */
> > +        maxself(hashtable->instrument.space_peak_hash,
> > +                hashtable->hashtab->size * sizeof(TupleHashEntryData));
> > +        maxself(hashtable->instrument.space_peak_tuples,
> > +                hashtable->hashtab->members * hashtable->entrysize);
> > +#undef maxself
> > +    }
> > +}
> 
> Not a fan of this macro.
> 
> I'm also not sure I understand what you're trying to do here?

I have to call UpdateTupleHashTableStats() from the callers at deliberate
locations.  If the caller fills the hashtable all at once, I can populate the
stats immediately after that, but if it's populated incrementally, then need to
update stats right before it's destroyed or reset, otherwise we can show tuple
size of the hashtable since its most recent reset, rather than a larger,
previous incarnation.

> > diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
> > index f457b5b150..b173b32cab 100644
> > --- a/src/test/regress/expected/aggregates.out
> > +++ b/src/test/regress/expected/aggregates.out
> > @@ -517,10 +517,11 @@ order by 1, 2;
> >           ->  HashAggregate
> >                 Output: s2.s2, sum((s1.s1 + s2.s2))
> >                 Group Key: s2.s2
> > +               Buckets: 4
> >                 ->  Function Scan on pg_catalog.generate_series s2
> >                       Output: s2.s2
> >                       Function Call: generate_series(1, 3)
> > -(14 rows)
> > +(15 rows)
> 
> These tests probably won't be portable. The number of hash buckets
> calculated will e.g. depend onthe size of the contained elements. And
> that'll e.g. will depend on whether pointers are 4 or 8 bytes.

I was aware and afraid of that.  Previously, I added this output only to
"explain analyze", and (as an quick, interim implementation) changed various
tests to use analyze, and memory only shown in "verbose" mode.  But as Tomas
pointed out, that's consistent with what's done elsewhere.

So is the solution to show stats only during explain ANALYZE ?

Or ... I have a patch to create a new explain(MACHINE) option to allow more
stable output, by avoiding Memory/Disk.  That doesn't attempt to make all
"explain analyze" output stable - there's other issues, I think mostly related
to parallel workers (see 4ea03f3f, 13e8b2ee).  But does allow retiring
explain_sq_limit and explain_parallel_sort_stats.  I'm including my patch to
show what I mean, but I didn't enable it for hashtable "Buckets:".  I guess in
either case, the tests shouldn't be included.

-- 
Justin

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: effective_io_concurrency's steampunk spindle maths
Следующее
От: Dent John
Дата:
Сообщение: Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from aREFCURSOR