Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

Поиск

Список

Период

Сортировка

От	David Rowley
Тема	Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Дата	20 марта 13:37:35
Msg-id	CAApHDvpq2gOt3gnzkd4Ud=wrT7YRGrF3zb29jgWzDsYEhETrOA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>)
Ответы	Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Список	pgsql-hackers

Дерево обсуждения

On Thu, 20 Mar 2025 at 21:48, Ilia Evdokimov
<ilya.evdokimov@tantorlabs.com> wrote:
>          ->  Memoize  (cost=0.30..0.41 rows=1 width=4)
>                Cache Key: t2.thousand
>                Cache Mode: logical
>                Cache Estimated Entries: 655
>                Cache Estimated NDistinct: 721
>                ->  Index Only Scan using tenk1_unique1 on tenk1 t1  (cost=0.29..0.40 rows=1 width=4)
>                      Index Cond: (unique1 = t2.thousand)
> (11 rows)
>
> Additionally, since this information would only be shown in EXPLAIN when costs are enabled, it should not cause any
performanceregression in normal execution. However, reviewers should be especially careful when verifying test outputs,
asthis change could affect plan details in regression tests. 
>
> Any thoughts?

> + ExplainIndentText(es);
> + appendStringInfo(es->str, "Cache Estimated Entries: %d\n", ((Memoize *) plan)->est_entries);
> + ExplainIndentText(es);
> + appendStringInfo(es->str, "Cache Estimated NDistinct: %0.0f\n", ((Memoize *) plan)->ndistinct);

est_entries is a uint32, so %u is the correct format character for that type.

I don't think you need to prefix all these properties with "Cache"
just because the other two properties have that prefix. I also don't
think the names you've chosen really reflect the meaning. How about
something like: "Estimated Distinct Lookup Keys: 721  Estimated
Capacity: 655", in that order. I think maybe having that as one line
for format=text is better than 2 lines. The EXPLAIN output is already
often taking up more lines in v18 than in v17, would be good to not
make that even worse unnecessarily.

I see the existing code there could use ExplainPropertyText rather
than have a special case for text and non-text formats. That's likely
my fault. If we're touching this code, then we should probably tidy
that up. Do you want to create a precursor fixup patch for that?

+ double ndistinct; /* Estimated number of distinct memoization keys,
+ * used for cache size evaluation. Kept for EXPLAIN */

Maybe this field in MemoizePath needs a better name. How about
"est_unique_keys"? and also do the rename in struct Memoize.

I'm also slightly concerned about making struct Memoize bigger. I had
issues with a performance regression [1] for 908a96861 when increasing
the WindowAgg struct size last year and the only way I found to make
it go away was to shuffle the fields around so that the struct size
didn't increase. I think we'll need to see a benchmark of a query that
hits Memoize quite hard with a small cache size to see if the
performance decreases as a result of adding the ndistinct field. It's
unfortunate that we'll not have the luxury of squeezing this double
into padding if we do see a slowdown.

David

[1] https://postgr.es/m/flat/CAHoyFK9n-QCXKTUWT_xxtXninSMEv%2BgbJN66-y6prM3f4WkEHw%40mail.gmail.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment