Обсуждение: track generic and custom plans in pg_stat_statements

Поиск
Список
Период
Сортировка

track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
Hi,

Currently, there is little visibility for queries that are being executed
using generic or custom plans. There is pg_prepared_statements which
show generic_plans and custom_plans as of d05b172a760, but this
information is backend local and not very useful to a DBA that wishes
to track this information cumulatively and over time. [0] had intentions
to add these counters to pg_stat_statements, but was withdrawn due
to timing with the commitfest at the time [0] and never picked back up again.

I think it's useful to add these counters.

Therefore, the attached patch adds two new columns to pg_stat_statements:
"generic_plan_calls" and "custom_plan_calls". These columns track the
number of executions performed using a generic or custom plan.

Only non-utility statements are counted, as utility statements with an
optimizable parameterized query (i.e. CTAS) cannot be called with PREPARE.
Additionally, when such statements are repeatedly executed using an extended
protocol prepared statement, pg_stat_statements may not handle them properly,
since queryId is set to 0 during pgss_ProcessUtility.

To avoid introducing two additional counters in CachedPlan, the existing
boolean is_reusable—which determines whether a generic plan is reused to
manage lock requirements—has been repurposed as an enum. This enum now
tracks different plan states, including "generic reused", "generic first time"
and "custom". pg_stat_statements uses these states to differentiate between
generic and custom plans for tracking purposes. ( I felt this was better than
having to carry 2 extra booleans in CachedPlan for this purpose, but that will
be the alternative approach ).

Not included in this patch and maybe for follow-up work, is this information
can be added to EXPLAIN output and perhaps pg_stat_database. Maybe that's
a good idea also?

This patch bumps the version pf pg_stat_statements.

--
Sami Imseih
Amazon Web Services (AWS)


[0] https://www.postgresql.org/message-id/add1e591fbe8874107e75d04328859ec%40oss.nttdata.com

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Greg Sabino Mullane
Дата:
Overall I like the idea; adds some nice visibility to something that has been very ephemeral in the past.

Not included in this patch and maybe for follow-up work, is this information a good idea also?
can be added to EXPLAIN output and perhaps pg_stat_database.

I could see EXPLAIN being somewhat useful (especially for non-interactive things like auto_explain), so a weak +1 on that.

Definitely not useful for pg_stat_database IMHO.

Some quick comments on the patch:

FILE: contrib/pg_stat_statements/expected/plan_cache.out

These tests seem very verbose. Why not just prepare a simple query:

prepare foo as select $1 > 0;
execute foo(1);
...then tweak things via plan_cache_mode to ensure we test the right things?

Also would be nice to constrain the pg_stat_statement SELECTs with some careful WHERE clauses. Would also allow you to remove the ORDER BY and the COLLATE.


FILE: contrib/pg_stat_statements/meson.build

oldextversions should still have a trailing comma


FILE: contrib/pg_stat_statements/pg_stat_statements.c

+ if (cplan && cplan->status > PLAN_CACHE_STATUS_CUSTOM_PLAN)

Not really comfortable with using the enum like this. Better would be explicitly listing the two states that lead to the increment. Not as good but still better:
  cplan->status >= PLAN_CACHE_STATUS_GENERIC_PLAN_BUILD

+ PlanCacheStatus status; /* is it a reused generic plan? */

The comment should be updated


FILE: contrib/pg_stat_statements/sql/plan_cache.sql

Missing a newline at the end


FILE: doc/src/sgml/pgstatstatements.sgml

+ Total number of non-utility statements executed using a generic plan

I'm not sure we need to specify non-utility here.


Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
Thanks for the review!

> I could see EXPLAIN being somewhat useful (especially for non-interactive things like auto_explain), so a weak +1 on
that.

I'll make this a follow-up to this patch.

> Definitely not useful for pg_stat_database IMHO.

agree as well. I did not have strong feelings about this.

> FILE: contrib/pg_stat_statements/expected/plan_cache.out
>
> These tests seem very verbose. Why not just prepare a simple query:
>
> prepare foo as select $1 > 0;
> execute foo(1);
> ...then tweak things via plan_cache_mode to ensure we test the right things?

Yeah, I went overboard to see what others think.
I toned it down drastically for v2 following your advice.


> oldextversions should still have a trailing comma

done


> FILE: contrib/pg_stat_statements/pg_stat_statements.c
>
> + if (cplan && cplan->status > PLAN_CACHE_STATUS_CUSTOM_PLAN)
>
> Not really comfortable with using the enum like this. Better would be explicitly listing the two states that lead to
theincrement. Not as good but still better:
 
>   cplan->status >= PLAN_CACHE_STATUS_GENERIC_PLAN_BUILD

fair, I use explicit values for each plan type.

> + PlanCacheStatus status; /* is it a reused generic plan? */
>
> The comment should be updated

done

>
> FILE: contrib/pg_stat_statements/sql/plan_cache.sql
>
> Missing a newline at the end

done

> FILE: doc/src/sgml/pgstatstatements.sgml
>
> + Total number of non-utility statements executed using a generic plan
>
> I'm not sure we need to specify non-utility here.
>

fair, I did not have strong feeling about this either.

Please see v2

Thanks!

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
>
> Please see v2
>

oops, had to fix a typo in meson.build. Please see v3.


--
Sami

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Ilia Evdokimov
Дата:
Hi,

Thank you for your patch. It is really useful for tracking the history 
of generic and custom plan usage.

At first glance, I have the following suggestions for improvement:

1. Is there any reason for the double check of cplan != NULL? It seems 
unnecessary, and we could simplify it to:

-if (cplan && cplan->status == PLAN_CACHE_STATUS_CUSTOM_PLAN)
+if (cplan->status == PLAN_CACHE_STATUS_CUSTOM_PLAN)

2. Should we add Assert(kind == PGSS_EXEC) at this place  to ensure that 
generic_plan_calls and custom_plan_calls are only incremented when 
appropriate?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.




Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> Thank you for your patch. It is really useful for tracking the history
> of generic and custom plan usage.

Thanks for the review!

> 1. Is there any reason for the double check of cplan != NULL? It seems
> unnecessary, and we could simplify it to:
>
> -if (cplan && cplan->status == PLAN_CACHE_STATUS_CUSTOM_PLAN)
> +if (cplan->status == PLAN_CACHE_STATUS_CUSTOM_PLAN)

No, it's not necessary and an oversight. removed.

> 2. Should we add Assert(kind == PGSS_EXEC) at this place  to ensure that
> generic_plan_calls and custom_plan_calls are only incremented when
> appropriate?
>

I don't think an assert is needed here. There is an assert at the start of
the block for PGSS_EXEC and PGSS_PLAN, but cplan is only available
in the executor.

v4 attached

--
Sami

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Ilia Evdokimov
Дата:


On 06.03.2025 18:04, Sami Imseih wrote:
2. Should we add Assert(kind == PGSS_EXEC) at this place  to ensure that
generic_plan_calls and custom_plan_calls are only incremented when
appropriate?


I don't think an assert is needed here. There is an assert at the start of
the block for PGSS_EXEC and PGSS_PLAN, but cplan is only available
in the executor.


You're right! Moreover, I didn't account for the fact that we pass NULL to pgss_ProcessUtility. In that case, Assert shouldn't be here.

I don't quite understand why do we need to differentiate between PLAN_CACHE_STATUS_GENERIC_PLAN_BUILD and PLAN_CACHE_STATUS_GENERIC_PLAN_REUSE? We could simply keep PLAN_CACHE_STATUS_GENERIC_PLAN_REUSE. I don't think users would see much of a difference in either pg_stat_statements or EXPLAIN.

As for EXPLAIN, maybe we should include this in VERBOSE mode?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> I don't quite understand why do we need to differentiate between
> PLAN_CACHE_STATUS_GENERIC_PLAN_BUILD and
> PLAN_CACHE_STATUS_GENERIC_PLAN_REUSE?
> We could simply keep PLAN_CACHE_STATUS_GENERIC_PLAN_REUSE.
> I don't think users would see much of a difference in either pg_stat_statements or EXPLAIN.

If we removed GENERIC_PLAN_BUILD, I suppose we can simply rely on
CPlan != NULL && cplan->status != PLAN_CACHE_STATUS_CUSTOM_PLAN
to determine that we have a generic plan. However, I rather keep the status(s)
defined this way for clarity.

> As for EXPLAIN, maybe we should include this in VERBOSE mode?

This could be a fast follow-up patch as there appears to be support
for this idea.

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
rebased in the attached v5.

-- 
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Ilia Evdokimov
Дата:
After the introduction of pg_overexplain extension and Robert's comment 
[0], I'm starting to have doubts about whether it's still appropriate to 
add this information to EXPLAIN itself. If there are compelling reasons 
that showing the plan type would be broadly useful to users in EXPLAIN, 
I’m happy to proceed. Otherwise, perhaps this is something better suited 
for extension.

[0]: 
https://www.postgresql.org/message-id/CA%2BTgmoZ8qXiZmmn4P9Mk1cf2mjMMLFPOjSasCjuKSiHFcm-ncw%40mail.gmail.com

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.




Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:

After the introduction of pg_overexplain extension and Robert's comment
[0], I'm starting to have doubts about whether it's still appropriate to
add this information to EXPLAIN itself. If there are compelling reasons
that showing the plan type would be broadly useful to users in EXPLAIN,
I’m happy to proceed. Otherwise, perhaps this is something better suited
for extension.

[0]:
https://www.postgresql.org/message-id/CA%2BTgmoZ8qXiZmmn4P9Mk1cf2mjMMLFPOjSasCjuKSiHFcm-ncw%40mail.gmail.com

Yes, pg_overexplain can be used for the explain, and we
may not even need it at all since generic plans
are already displayed with $1, $2 parameters.

Let me know if you have other comments for v5, the 
pg_stat_statements enhancements.


Sami Imseih
Amazon Web Services (AWS)

Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> It turns out that 1722d5eb05d8e reverted 525392d5727f, which
> made CachedPlan available in QueryDesc and thus
> available to pgss_ExecutorEnd.

I also forgot to mention, the revert also removes the generic plan
is_reused logic:

```
bool            is_reused;              /* is it a reused generic plan? */
```
which we had to account for up until v5. So this simplifies the
tracking a bit more as the only states to track are "generic plan"
or "custom plan" only.

--
Sami Imseih
Amazon Web Services (AWS)



Re: track generic and custom plans in pg_stat_statements

От
Nikolay Samokhvalov
Дата:
On Thu, May 29, 2025 at 11:56 AM Sami Imseih <samimseih@gmail.com> wrote:
It turns out that 1722d5eb05d8e reverted 525392d5727f, which
made CachedPlan available in QueryDesc and thus
available to pgss_ExecutorEnd.

So now we have to make CachedPlan available to QueryDesc as
part of this change. The reason the patch was reverted is related
to a memory leak [0] in the BuildCachedPlan code and is not related
to the part that made CachedPlan available to QueryDesc.

See v6 for the rebase of the patch and addition of testing for EXPLAIN
and EXPLAIN ANALYZE which was missing from v5.

I reviewed v6:

- applies to master cleanly, builds, tests pass and all works as expected
- overall, the patch looks great and I found no major issues
- tests and docs look good overall
- in docs, one minor comment:
    > "Total number of statements executed using a generic plan" vs. what we already have for `calls`
    here, in "Number of times the statement was executed", I see some inconsistencies:
        - the word "total" should be removed, I think
        - and maybe we should make wording consistent with the existing text – "number of times the statement ...")
- Also very minor, the test queries have duplicate `calls` columns:
    > SELECT calls, generic_plan_calls, custom_plan_calls, toplevel, calls, ...
-  plan->status is set in GetCachedPlan() but I don't see explicit initialization in plan creation paths; maybe it's worth having a defensive initialization for possible edge cases:
    > plan->status = PLAN_CACHE_STATUS_UNKNOWN;
    (here I'm not 100% sure, as palloc0 in CreateCachedPlan should zero-initialize it to PLAN_CACHE_STATUS_UNKNOWN anyway)

that's all I could find – and overall it's a great addition,

thank you, looking forward to having these two columns in prod.

Nik

Re: track generic and custom plans in pg_stat_statements

От
Nikolay Samokhvalov
Дата:




On Sat, May 31, 2025 at 5:06 PM Nikolay Samokhvalov <nik@postgres.ai> wrote:
On Thu, May 29, 2025 at 11:56 AM Sami Imseih <samimseih@gmail.com> wrote:
It turns out that 1722d5eb05d8e reverted 525392d5727f, which
made CachedPlan available in QueryDesc and thus
available to pgss_ExecutorEnd.

So now we have to make CachedPlan available to QueryDesc as
part of this change. The reason the patch was reverted is related
to a memory leak [0] in the BuildCachedPlan code and is not related
to the part that made CachedPlan available to QueryDesc.

See v6 for the rebase of the patch and addition of testing for EXPLAIN
and EXPLAIN ANALYZE which was missing from v5.

I reviewed v6:

- applies to master cleanly, builds, tests pass and all works as expected
- overall, the patch looks great and I found no major issues
- tests and docs look good overall
- in docs, one minor comment:
    > "Total number of statements executed using a generic plan" vs. what we already have for `calls`
    here, in "Number of times the statement was executed", I see some inconsistencies:
        - the word "total" should be removed, I think
        - and maybe we should make wording consistent with the existing text – "number of times the statement ...")
- Also very minor, the test queries have duplicate `calls` columns:
    > SELECT calls, generic_plan_calls, custom_plan_calls, toplevel, calls, ...
-  plan->status is set in GetCachedPlan() but I don't see explicit initialization in plan creation paths; maybe it's worth having a defensive initialization for possible edge cases:
    > plan->status = PLAN_CACHE_STATUS_UNKNOWN;
    (here I'm not 100% sure, as palloc0 in CreateCachedPlan should zero-initialize it to PLAN_CACHE_STATUS_UNKNOWN anyway)

that's all I could find – and overall it's a great addition,

thank you, looking forward to having these two columns in prod.

Ah, one more thing: the subject here and in CommitFest entry, "track generic and custom plans" slightly confused me at first, I think it's worth including words "calls" or "execution" there, and in the commit message, for clarity. Or just including the both column names as is.

Nik

Re: track generic and custom plans in pg_stat_statements

От
Ilia Evdokimov
Дата:


On 03.06.2025 06:31, Sami Imseih wrote:
See v8 with the field names reorganized.


Thanks for your work on this.

Since we've changed the placement of these parameters between parallel_workers and stats_since, we should also update the documentation to reflect their new location.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: track generic and custom plans in pg_stat_statements

От
Ilia Evdokimov
Дата:


On 03.06.2025 06:31, Sami Imseih wrote:
See v8 with the field names reorganized.


Apologies if you received two identical emails.

Since we've changed the placement of these parameters between parallel_workers and stats_since, we should also update the documentation to reflect their new location.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> Thanks for your work on this.
>
> Since we've changed the placement of these parameters
> between parallel_workers and stats_since, we should also update
> the documentation to reflect their new location.

Absolutely. My miss. Here is v9 with the doc updated.

Thanks!

--
Sami

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Michael Paquier
Дата:
On Mon, Jun 30, 2025 at 02:45:49PM +0300, Sami Imseih wrote:
> Only changes to the tests due to the revert of nested query
> tracking in f85f6ab051b

@@ -35,6 +36,7 @@ typedef struct QueryDesc
     /* These fields are provided by CreateQueryDesc */
     CmdType        operation;        /* CMD_SELECT, CMD_UPDATE, etc. */
     PlannedStmt *plannedstmt;    /* planner's output (could be utility, too) */
+    CachedPlan *cplan;            /* CachedPlan that supplies the plannedstmt */

Ugh.  This is plugging into an executor-related structure a completely
different layer, so that looks like an invasive layer violation to
me..  This is passed through ProcessQuery() from a Portal, changing
while on it ExplainOnePlan.  If we want to get access from a cached
plan, wouldn't it be simpler to check if we have an active portal in
one of the executor hooks of PGSS and retrieve the status of the plan
from it?  Okay, that's perhaps a bit hack-ish, but it is less invasive
and it removes the dependency to the plan cache facilities from
QueryDesc.

Note: the size of the change in pg_stat_statements--1.12--1.13.sql
points that we should seriously consider splitting these attributes
into multiple sub-functions.  That would make future upgrades simpler,
and the main function simpler even if we are a bit more lossy when
doing scans of PGSS entries across multiple functions with joins based
on the query ID, database ID and toplevel at least, because that's
what I guess we would use as common point for all the sub-functions
when joining them.
--
Michael

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 6/30/25 13:45, Sami Imseih wrote:
> rebased patch.
> 
> Only changes to the tests due to the revert of nested query
> tracking in f85f6ab051b
Thank you for your efforts.

I would like to share a few thoughts about this patch. First, I believe 
the 'status' field could be renamed to 'mode,' as it aligns better with 
the plan_cache_mode GUC name. Additionally, why introduce an unknown 
status? I can't foresee a scenario where it would be necessary, as we 
typically encounter either a custom or a generic plan during execution.

On a more general note, currently, CachedPlanSource does not refer to a 
custom version of the plan. However, it seems reasonable to implement 
this for better tracking. By adding a CachedPlanSource::cplan link, we 
can establish a connection to the entire PlanCache entry instead of only 
CachedPlan within a queryDesc and, consequently, make it accessible from 
the executor. This would give an access to statistics on costs and the 
number of replannings.

Furthermore, this could lead to interesting opportunities for 
extensions, such as resetting auto-mode decisions and planning 
statistics based on actual execution statistics, and manipulating 
generic/custom plan switching logic.

-- 
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> Ugh.  This is plugging into an executor-related structure a completely
> different layer, so that looks like an invasive layer violation to
> me..  This is passed through ProcessQuery() from a Portal, changing
> while on it ExplainOnePlan.  If we want to get access from a cached
> plan, wouldn't it be simpler to check if we have an active portal in
> one of the executor hooks of PGSS and retrieve the status of the plan
> from it?  Okay, that's perhaps a bit hack-ish, but it is less invasive
> and it removes the dependency to the plan cache facilities from
> QueryDesc.

I found that ActivePortal is to always "active" in ExecutorEnd for all cases.
Also, ActivePortal->cplan may not always be available at ExecutorStart.

I think we can rely on ActivePortal if we add a new field to portal which
tracks the cached plan status; i.e. we set ActivePortal->cache_plan_status
inside GetCachedPlan. Then in ExecutorStart, we read back this value and
store it in a new field in QueryDesc->estate. This will make the value
available to ExecutorEnd. I really don't want us making an extra pgss_store
call in ExecutorStart since it will add significant overhead.

What do you think about adding these couple of fields?

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> this for better tracking. By adding a CachedPlanSource::cplan link, we
> can establish a connection to the entire PlanCache entry instead of only
> CachedPlan within a queryDesc and, consequently, make it accessible from
> the executor. This would give an access to statistics on costs and the
> number of replannings.

This maybe out of scope for this patch, but can you elaborate on what you mean
by "CachedPlanSource::cplan link" ?

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:

> Ugh.  This is plugging into an executor-related structure a completely
> different layer, so that looks like an invasive layer violation to
> me..  This is passed through ProcessQuery() from a Portal, changing
> while on it ExplainOnePlan.  If we want to get access from a cached
> plan, wouldn't it be simpler to check if we have an active portal in
> one of the executor hooks of PGSS and retrieve the status of the plan
> from it?  Okay, that's perhaps a bit hack-ish, but it is less invasive
> and it removes the dependency to the plan cache facilities from
> QueryDesc.

I found that ActivePortal is to always "active" in ExecutorEnd for all cases.
Also, ActivePortal->cplan may not always be available at ExecutorStart.

I think we can rely on ActivePortal if we add a new field to portal which
tracks the cached plan status; i.e. we set ActivePortal->cache_plan_status
inside GetCachedPlan. Then in ExecutorStart, we read back this value and
store it in a new field in QueryDesc->estate. This will make the value
available to ExecutorEnd. I really don't want us making an extra pgss_store
call in ExecutorStart since it will add significant overhead.

What do you think about adding these couple of fields?

--
Sami

But I also have doubts about calling ActivePortal 
Inside GetCachedPlan. It should only be used in the Executor 
So, I’m not sure ActivePortal could 
be very helpful here they way I describe it above. 

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 17/7/2025 00:58, Sami Imseih wrote:
>> this for better tracking. By adding a CachedPlanSource::cplan link, we
>> can establish a connection to the entire PlanCache entry instead of only
>> CachedPlan within a queryDesc and, consequently, make it accessible from
>> the executor. This would give an access to statistics on costs and the
>> number of replannings.
> 
> This maybe out of scope for this patch, but can you elaborate on what you mean
> by "CachedPlanSource::cplan link" ?
You need to introduce a 'status' field, right? - to allow someone to 
identify the plan's type, which was previously somewhat complicated. 
However, it may be implemented in a slightly different way, by adding 
CachedPlanSource::cplan (meaning 'Current Plan') and a trivial 
convention: 'cplan' references the gplan field or it refers a custom 
plan. Instead of the CachedPlan, we may provide the executor with a link 
to more stable and informative CachedPlanSource entry. That's the main idea.
As I see it, CachedPlan doesn't make sense without plancache and a 
CachedPlanSource entry. So, it is at least a valid solution.

With that link, you can access various statistics: num_custom_plans, 
num_generic_plans, total_custom_cost, and generic_cost. It would also be 
possible to clear the *_cost fields and allow Postgres to make a new 
attempt at choosing the plan type - who knows, maybe the previous 
decision is already outdated?

My point is that we can address one of the common issues with generic 
plans in a more extensible way, enabling modules to access the 
CachedPlanSource data at the time they have access to the execution 
instrumentation.

It seems impractical to me to invent one more patch: since you've 
already modified the CreateQueryDesc interface and introduced a plan 
type identification logic, why do it twice?

-- 
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> >> this for better tracking. By adding a CachedPlanSource::cplan link, we
> >> can establish a connection to the entire PlanCache entry instead of only
> >> CachedPlan within a queryDesc and, consequently, make it accessible from
> >> the executor. This would give an access to statistics on costs and the
> >> number of replannings.
> >
> > This maybe out of scope for this patch, but can you elaborate on what you mean
> > by "CachedPlanSource::cplan link" ?
> You need to introduce a 'status' field, right? - to allow someone to
> identify the plan's type, which was previously somewhat complicated.
> However, it may be implemented in a slightly different way, by adding
> CachedPlanSource::cplan (meaning 'Current Plan') and a trivial
> convention: 'cplan' references the gplan field or it refers a custom
> plan. Instead of the CachedPlan, we may provide the executor with a link
> to more stable and informative CachedPlanSource entry. That's the main idea.
> As I see it, CachedPlan doesn't make sense without plancache and a
> CachedPlanSource entry. So, it is at least a valid solution.
>
> With that link, you can access various statistics: num_custom_plans,
> num_generic_plans, total_custom_cost, and generic_cost. It would also be
> possible to clear the *_cost fields and allow Postgres to make a new
> attempt at choosing the plan type - who knows, maybe the previous
> decision is already outdated?
>
> My point is that we can address one of the common issues with generic
> plans in a more extensible way, enabling modules to access the
> CachedPlanSource data at the time they have access to the execution
> instrumentation.
>
> It seems impractical to me to invent one more patch: since you've
> already modified the CreateQueryDesc interface and introduced a plan
> type identification logic, why do it twice?

Thanks for the clarification. I see what you're getting at now.

You're suggesting adding CachedPlanSource to QueryDesc instead of
CachedPlan. This would allow extensions to access the statistics and cost
information from the CachedPlanSource, which would help tools like
pg_stat_statements track planning data, as we are trying to do with this
patch. It could also support other use cases, such as allowing extensions to
modify the costs in order to force a generic or custom plan. I had not
considered that second use case, but if there is a good case for it, I am not
opposed.

Adding CachedPlanSource to QueryDesc seems doable. However, Michael
previously objected to adding CachedPlan to QueryDesc. Is there any
similar hesitation about including CachedPlanSource?

The best way to support the functionality needed by pg_stat_statements is
to expose either CachedPlan or CachedPlanSource via QueryDesc. I
support using CachedPlanSource, since it also enables the additional use
cases that Andrei has mentioned.

Andrei, do we actually need access to CachedPlanSource::cplan? For
tracking the plan cache mode in pg_stat_statements, it be sufficient
to add a new boolean field such as is_last_plan_generic to
CachedPlanSource. Do you have another use case you have in mind
that would require a cplan field that references either the generic or
custom plan?

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 17/7/2025 20:19, Sami Imseih wrote:
> Thanks for the clarification. I see what you're getting at now.
Thanks for reading this!
> 
> You're suggesting adding CachedPlanSource to QueryDesc instead of
> CachedPlan. This would allow extensions to access the statistics and cost
> information from the CachedPlanSource, which would help tools like
> pg_stat_statements track planning data, as we are trying to do with this
> patch. It could also support other use cases, such as allowing extensions to
> modify the costs in order to force a generic or custom plan. I had not
> considered that second use case, but if there is a good case for it, I am not
> opposed.
Hmm, I don't propose modifying costs. The focus is on resetting the plan 
cache decision that PostgreSQL has made in automatic mode. During the 
DBMS operation, various factors may cause a generic plan to be 
suboptimal or make it more desirable as well. Discussions from 2010 to 
2013 indicate that the community recognised the problem and discovered 
an approach based on execution time and real efforts rather than a 
cost-based method. While I doubt it could be ideal as a core solution, 
an extension may potentially do it for the sake of TPS maximisation. 
What we need is a way to access the plan cache entry.
> 
> Adding CachedPlanSource to QueryDesc seems doable. However, Michael
> previously objected to adding CachedPlan to QueryDesc. Is there any
> similar hesitation about including CachedPlanSource?
I agree that we should investigate further to find the most optimal 
solution. Personally, I'm open to including an internal reference to a 
plan cache entry within the QueryDesc, as long as the plan has its roots 
there.

> Andrei, do we actually need access to CachedPlanSource::cplan? For
> tracking the plan cache mode in pg_stat_statements, it be sufficient
> to add a new boolean field such as is_last_plan_generic to
> CachedPlanSource. Do you have another use case you have in mind
> that would require a cplan field that references either the generic or
> custom plan?
I'm not entirely sure. I followed your idea of referencing the entire 
list of planned statements during the execution of a single statement. 
The is_last_plan_generic field may be sufficient at first glance.

-- 
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> Hmm, I don't propose modifying costs. The focus is on resetting the plan
> cache decision that PostgreSQL has made in automatic mode. During the
> DBMS operation, various factors may cause a generic plan to be
> suboptimal or make it more desirable as well. Discussions from 2010 to
> 2013 indicate that the community recognised the problem and discovered
> an approach based on execution time and real efforts rather than a
> cost-based method. While I doubt it could be ideal as a core solution,
> an extension may potentially do it for the sake of TPS maximisation.
> What we need is a way to access the plan cache entry.

Thanks for clearing up my understanding. Essentially, override the
current cost-based
method of determining custom vs. generic by using something like execution time,
which is somehow tracked by the extension. That is how I understand this.
Now, I wonder if it may be a good idea to add some hooks in GetCachedPlan
to make this work?

> > Adding CachedPlanSource to QueryDesc seems doable. However, Michael
> > previously objected to adding CachedPlan to QueryDesc. Is there any
> > similar hesitation about including CachedPlanSource?
> I agree that we should investigate further to find the most optimal
> solution. Personally, I'm open to including an internal reference to a
> plan cache entry within the QueryDesc, as long as the plan has its roots
> there.

For the sake of this feature, I suspect making CachedPlanSource available
in QueryDesc will be a non-starter, but I would like to hear other opinions.
To accomplish the goals of this patch, we definitely need the current
execution’s
"plan cache mode" to be accessible in ExecutorEnd somehow.

Since carrying over the plan cache structures to QueryDesc does not seem
suitable, I wonder if the solution used in v11 is better. In v11, the plan cache
mode is tracked in both CachedPlan and QueryDesc, and CachedPlan is passed
to CreateQueryDesc. At this point, the value is set and made available to the
executor hooks.

I also want to note that we need to track the possible values in an enum with
three states: two for custom or generic, and one for an unset mode.
The unset mode
is necessary to allow an extension to determine whether a plan cache
was actually
used or not.

For pg_stat_statements, only pg_stat_statements.c was modified, and I
added tests
for extended query protocol.

--
Sami

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 18/7/2025 21:37, Sami Imseih wrote:
> Thanks for clearing up my understanding. Essentially, override the
> current cost-based
> method of determining custom vs. generic by using something like execution time,
> which is somehow tracked by the extension. That is how I understand this.
> Now, I wonder if it may be a good idea to add some hooks in GetCachedPlan
> to make this work?
Yes, it may work. But inside the GetCachedPlan, we still don't have 
access to the executed statement yet.
> 
>>> Adding CachedPlanSource to QueryDesc seems doable. However, Michael
>>> previously objected to adding CachedPlan to QueryDesc. Is there any
>>> similar hesitation about including CachedPlanSource?
>> I agree that we should investigate further to find the most optimal
>> solution. Personally, I'm open to including an internal reference to a
>> plan cache entry within the QueryDesc, as long as the plan has its roots
>> there.
> 
> For the sake of this feature, I suspect making CachedPlanSource available
> in QueryDesc will be a non-starter, but I would like to hear other opinions.
> To accomplish the goals of this patch, we definitely need the current
> execution’s
> "plan cache mode" to be accessible in ExecutorEnd somehow.
I agree with this - adding references to CachedPlan into the QueryDesc 
looks kludge.
The most boring aspect of pg_stat_statements for me is the multiple 
statements case: a single incoming query (the only case in the cache of 
a generic plan) may be rewritten as various statements with the same 
query ID. So, the execution time of the initial statement should be the 
sum of the executions of all rewritten parts.
I wonder if the pg_stat_statements processes it correctly at the moment. 
Because in this case, we'd need a hook inside the Portal execution code. 
However, I don't recall any threads on the mailing list discussing that.

If the multiple-statements case didn't exist for a cached statement, 
Michael's idea with an active portal would be much better, and it would 
make sense to add PlanCacheSource::cplan and replace Portal::CachedPlan 
with Portal::PlanCacheSource reference.

-- 
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> > "plan cache mode" to be accessible in ExecutorEnd somehow.
> I agree with this - adding references to CachedPlan into the QueryDesc
> looks kludge.
> The most boring aspect of pg_stat_statements for me is the multiple
> statements case: a single incoming query (the only case in the cache of
> a generic plan) may be rewritten as various statements with the same
> query ID. So, the execution time of the initial statement should be the
> sum of the executions of all rewritten parts.

AFAICT, It does sum up the stats of rewritten statements of a single statement,
since each one of those statements does call ExecutorEnd ( under the same
queryId ).

> If the multiple-statements case didn't exist for a cached statement,
> Michael's idea with an active portal would be much better, and it would
> make sense to add PlanCacheSource::cplan and replace Portal::CachedPlan
> with Portal::PlanCacheSource reference.

I don't like ActivePortal because cplan itself does not live by ExecutorEnd [0]
and in some cases it's not available in ExecutorStart ( think of
Explain/Explain Analyze),
so you still need to add some handling for those cases, which I don't
think we can
ignore. Also, if we say we can forgo tracking the Explain/Explain Analyze case,
we surely don't want to call pgss_store at ExecutorStart to set the plan cache
mode stats and ExecutorEnd to set everything else for a particular execution.
In short, we still need some fields to be added to QueryDesc to track the
plan cache mode info.

Last week I published a v11 that adds a field to QueryDesc, but as I thought
about this more, how about we just add 2 bool fields in QueryDesc->estate
( es_cached_plan and es_is_generic_plan ), a field in CachedPlan (
is_generic_plan )
and after ExecutorStart, and if we have a cplan, we set the
appropriate plan cache
mode value. I think it's better to confine these new fields in Estate
rather than
at a higher level like QueryDesc. See v12.

[0] https://www.postgresql.org/message-id/CAA5RZ0t4kmbBM%2BetEQVb1c8bMSWjKOY8zTEA43X2UhSu0A8dCA%40mail.gmail.com

--

Sami

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Michael Paquier
Дата:
On Mon, Jul 21, 2025 at 04:47:31PM -0500, Sami Imseih wrote:
> Last week I published a v11 that adds a field to QueryDesc, but as I thought
> about this more, how about we just add 2 bool fields in QueryDesc->estate
> ( es_cached_plan and es_is_generic_plan ), a field in CachedPlan (
> is_generic_plan )
> and after ExecutorStart, and if we have a cplan, we set the
> appropriate plan cache
> mode value. I think it's better to confine these new fields in Estate
> rather than
> at a higher level like QueryDesc. See v12.

Yes, I think that this is a much better idea to isolate the whole
concept and let pgss grab these values.  We have lived with such
additions for monitoring in EState a few times already, see for
example de3a2ea3b264 and 1d477a907e63 that are tainted with my
fingerprints.
--
Michael

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> Yes, I think that this is a much better idea to isolate the whole
> concept and let pgss grab these values.  We have lived with such
> additions for monitoring in EState a few times already, see for
> example de3a2ea3b264 and 1d477a907e63 that are tainted with my
> fingerprints.

correct, there is precedence for this.

This seems like the best way to proceed.

--

Sami



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> Note: the size of the change in pg_stat_statements--1.12--1.13.sql
> points that we should seriously consider splitting these attributes
> into multiple sub-functions.

So we don't lose track of this. This should be a follow-up thread. I do
agree something has to be done about the exploding list of attributes
in pg_s_s.

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 22/7/2025 01:22, Sami Imseih wrote:
>> Note: the size of the change in pg_stat_statements--1.12--1.13.sql
>> points that we should seriously consider splitting these attributes
>> into multiple sub-functions.
> 
> So we don't lose track of this. This should be a follow-up thread. I do
> agree something has to be done about the exploding list of attributes
> in pg_s_s.
+1

Not once I encountered people who want to track only a specific number 
of parameters and do not have much fun burdening themselves with all the 
data set, querying a whole huge stat view to analyse performance profiles.
In another scenario, an extension needs to track a limited number of 
parameters - let's say, blocks hit and blocks read. Another dimension - 
sometimes we are only interested in queries that involve complex join 
trees or partitioned tables and would be happy to avoid tracking all 
other queries.
It seems that a callback-based or subscription-based model could be 
worth exploring.

-- 
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 22/7/2025 01:07, Michael Paquier wrote:
> On Mon, Jul 21, 2025 at 04:47:31PM -0500, Sami Imseih wrote:
>> Last week I published a v11 that adds a field to QueryDesc, but as I thought
>> about this more, how about we just add 2 bool fields in QueryDesc->estate
>> ( es_cached_plan and es_is_generic_plan ), a field in CachedPlan (
>> is_generic_plan )
>> and after ExecutorStart, and if we have a cplan, we set the
>> appropriate plan cache
>> mode value. I think it's better to confine these new fields in Estate
>> rather than
>> at a higher level like QueryDesc. See v12.
> 
> Yes, I think that this is a much better idea to isolate the whole
> concept and let pgss grab these values.  We have lived with such
> additions for monitoring in EState a few times already, see for
> example de3a2ea3b264 and 1d477a907e63 that are tainted with my
> fingerprints.
I would like to oppose to the current version a little.

Commits de3a2ea3b264 and 1d477a907e63 introduced elements that are more 
closely related to the execution phase. While the parameter 
es_parallel_workers_to_launch could be considered a planning parameter, 
es_parallel_workers_launched and es_total_processed should not.
Furthermore, any new code that utilises ExecutorStart/Run/End must 
ensure that the fields es_cached_plan and es_is_generic_plan are set 
correctly.

IMO, the concept of a generic or custom plan is a fundamental property 
of the plan and should be stored within it - essentially in the 
PlannedStmt structure.
Additionally, it is unclear why we incur significant costs to alter the 
interface of ExplainOnePlan solely to track these parameters.

It may be more efficient to set the is_generic_plan option at the top 
plan node (PlannedStmt) and reference it wherever necessary. To identify 
a cached plan, we may consider pushing the CachedPlan/CachedPlanSource 
pointer down throughout pg_plan_query and maintaining a reference to the 
plan (or simply setting a boolean flag) at the same location — inside 
the PlannedStmt.

Such knowledge may provide an extra profit for planning - a generic 
plan, stored in the plan cache, may be reused later, and it may be 
profitable for an analytics-related extension to spend extra cycles and 
apply optimisation tricks more boldly.

-- 
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> I would like to oppose to the current version a little.
>
> Commits de3a2ea3b264 and 1d477a907e63 introduced elements that are more
> closely related to the execution phase. While the parameter
> es_parallel_workers_to_launch could be considered a planning parameter,
> es_parallel_workers_launched and es_total_processed should not.

Sure, I saw these new fields more in the realm of
"es_parallel_workers_to_launch",
which are planning related but are tracked in Estate for statistical purposes.

> Furthermore, any new code that utilises ExecutorStart/Run/End must
> ensure that the fields es_cached_plan and es_is_generic_plan are set
> correctly.

We really need to make sure ExecutorStart sets this correctly only. Estate
is carried over.

> IMO, the concept of a generic or custom plan is a fundamental property
> of the plan and should be stored within it - essentially in the
> PlannedStmt structure.

I do agree with this statement, and this idea did cross my mind; but I felt
that Estate is more ideal for statistics related fields, since we have a
precedent for this. But, I am open to the PlannedStmt idea as well.

> Additionally, it is unclear why we incur significant costs to alter the
> interface of ExplainOnePlan solely to track these parameters.

If we want to get this info to Estate, we have to.... if we go with PlannedStmt,
we do not.

> It may be more efficient to set the is_generic_plan option at the top
> plan node (PlannedStmt) and reference it wherever necessary. To identify
> a cached plan, we may consider pushing the CachedPlan/CachedPlanSource
> pointer down throughout pg_plan_query and maintaining a reference to the
> plan (or simply setting a boolean flag) at the same location — inside
> the PlannedStmt.

We will need a field to store an enum. let's call it CachedPlanType
with the types of cached plan. We need to be able to differentiate
when cached plans are not used, so a simple boolean is not
sufficient.
```
typedef enum CachedPlanType
{
PLAN_CACHE_NOT_SET, /* Not a cached plan */
PLAN_CACHE_GENERIC, /* Generic cached plan */
PLAN_CACHE_CUSTOM, /* Custom cached plan */
} CachedPlanType;
```
We can track a field for this enum in PlannedStmt and initialize
it to PLAN_CACHE_NOT_SET whenever we makeNode(PlannedStmt).
In GetCachedPlan, we iterate through plan->stmt_list to set the
value for the PlannedStmt.

CachedPlanType will have to be defined in nodes.h for this to work.

We can avoid the struct altogether and define the new PlannedStmt
field as an "int" and bit-pack the types.

I prefer the CachedPlanType struct to do this, as it's cleaner and
self-documenting.

> Such knowledge may provide an extra profit for planning - a generic
> plan, stored in the plan cache, may be reused later, and it may be
> profitable for an analytics-related extension to spend extra cycles and
> apply optimisation tricks more boldly.

Sure, either solution will provide this benefit.

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> > It may be more efficient to set the is_generic_plan option at the top
> > plan node (PlannedStmt) and reference it wherever necessary. To identify
> > a cached plan, we may consider pushing the CachedPlan/CachedPlanSource
> > pointer down throughout pg_plan_query and maintaining a reference to the
> > plan (or simply setting a boolean flag) at the same location — inside
> > the PlannedStmt.
>
> We will need a field to store an enum. let's call it CachedPlanType
> with the types of cached plan. We need to be able to differentiate
> when cached plans are not used, so a simple boolean is not
> sufficient.
> ```
> typedef enum CachedPlanType
> {
> PLAN_CACHE_NOT_SET, /* Not a cached plan */
> PLAN_CACHE_GENERIC, /* Generic cached plan */
> PLAN_CACHE_CUSTOM, /* Custom cached plan */
> } CachedPlanType;
> ```
> We can track a field for this enum in PlannedStmt and initialize
> it to PLAN_CACHE_NOT_SET whenever we makeNode(PlannedStmt).
> In GetCachedPlan, we iterate through plan->stmt_list to set the
> value for the PlannedStmt.
>
> CachedPlanType will have to be defined in nodes.h for this to work.
>
> We can avoid the struct altogether and define the new PlannedStmt
> field as an "int" and bit-pack the types.
>
> I prefer the CachedPlanType struct to do this, as it's cleaner and
> self-documenting.

v13 is the implementation using PlannedStmt as described above.

--
Sami

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 7/22/25 19:13, Sami Imseih wrote:
>> It may be more efficient to set the is_generic_plan option at the top
>> plan node (PlannedStmt) and reference it wherever necessary. To identify
>> a cached plan, we may consider pushing the CachedPlan/CachedPlanSource
>> pointer down throughout pg_plan_query and maintaining a reference to the
>> plan (or simply setting a boolean flag) at the same location — inside
>> the PlannedStmt.
> 
> We will need a field to store an enum. let's call it CachedPlanType
> with the types of cached plan. We need to be able to differentiate
> when cached plans are not used, so a simple boolean is not
> sufficient.
Sure. But I modestly hope you would add a CachedPlanSource pointer 
solely to the PlannedStmt and restructure it a little as we discussed 
above. And no new structures are needed. Am I wrong?

-- 
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:

> with the types of cached plan. We need to be able to differentiate
> when cached plans are not used, so a simple boolean is not
> sufficient.
Sure. But I modestly hope you would add a CachedPlanSource pointer
solely to the PlannedStmt and restructure it a little as we discussed
above. And no new structures are needed. Am I wrong?

That was my initial intention somehow to get CachedPlan available 
to Executor hooks. But, as you pointed out there is more value in 
CachedPlanSource. 

I know Michael opposed the idea of  carrying these structures,
at least CachedPlan, to Executor hooks ( or maybe just not QueryDesc?? ). 
It will be good to see what he think, or if others an opinion about this, about 
adding a pointer to CachedPlanSource in PlannedStmt vs setting a flag in 
PlannedStmt to track plan cache type for the current execution? The former
does provide more capability for extensions, as Andrei has pointed out 
earlier.


--
Sami 

Re: track generic and custom plans in pg_stat_statements

От
Michael Paquier
Дата:
On Tue, Jul 22, 2025 at 03:28:24PM -0500, Sami Imseih wrote:
> I know Michael opposed the idea of  carrying these structures,
> at least CachedPlan, to Executor hooks ( or maybe just not QueryDesc?? ).
> It will be good to see what he think, or if others an opinion about this,
> about
> adding a pointer to CachedPlanSource in PlannedStmt vs setting a flag in
> PlannedStmt to track plan cache type for the current execution? The former
> does provide more capability for extensions, as Andrei has pointed out
> earlier.

My line is pretty simple here: we should never include in
executor-specific or plan-specific areas structures that are handled
by entirely different memory contexts or portions of the code, all
parts (executor, query description and or [cached] plan) relying on
entirely different assumptions and contexts, because I suspect that it
would bite us back hard in the shape of corrupted stacks depending on
the timing where these resources are freed.  One of the upthread
proposals where a reference of the cached plan pointer was added to an
executor state structure crossed this line.

Simple fields are no-brainers, they are just carried in the same
context as the caller, independently of other states.  When it comes
to stats and monitoring, we don't have to be exact, we can sometimes
even be a bit lossy in the reports.

A simple "plan type" field in a PlannedStmt should be more than OK, as
it would be carried across the contexts where we want to know about it
and handle it, aka PGSS entry storing.
--
Michael

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:

> I know Michael opposed the idea of  carrying these structures,
> at least CachedPlan, to Executor hooks ( or maybe just not QueryDesc?? ).
> It will be good to see what he think, or if others an opinion about this,
> about
> adding a pointer to CachedPlanSource in PlannedStmt vs setting a flag in
> PlannedStmt to track plan cache type for the current execution? The former
> does provide more capability for extensions, as Andrei has pointed out
> earlier.

My line is pretty simple here: we should never include in
executor-specific or plan-specific areas structures that are handled
by entirely different memory contexts or portions of the code, all
parts (executor, query description and or [cached] plan) relying on
entirely different assumptions and contexts, because I suspect that it
would bite us back hard in the shape of corrupted stacks depending on
the timing where these resources are freed.  One of the upthread
proposals where a reference of the cached plan pointer was added to an
executor state structure crossed this line.

Simple fields are no-brainers, they are just carried in the same
context as the caller, independently of other states.  When it comes
to stats and monitoring, we don't have to be exact, we can sometimes
even be a bit lossy in the reports.

A simple "plan type" field in a PlannedStmt should be more than OK, as
it would be carried across the contexts where we want to know about it
and handle it, aka PGSS entry storing.

Thanks!

You make valid point about memory contexts and opening ourselves to 
bugs/crashes. 

[0] implements a single field “plan type” in PlannedStmt. 


--

Sami 


Re: track generic and custom plans in pg_stat_statements

От
Michael Paquier
Дата:
On Tue, Jul 22, 2025 at 02:52:49PM -0500, Sami Imseih wrote:
>> We will need a field to store an enum. let's call it CachedPlanType
>> with the types of cached plan. We need to be able to differentiate
>> when cached plans are not used, so a simple boolean is not
>> sufficient.

Guess so.

>> We can track a field for this enum in PlannedStmt and initialize
>> it to PLAN_CACHE_NOT_SET whenever we makeNode(PlannedStmt).
>> In GetCachedPlan, we iterate through plan->stmt_list to set the
>> value for the PlannedStmt.
>>
>> CachedPlanType will have to be defined in nodes.h for this to work.
>>
>> We can avoid the struct altogether and define the new PlannedStmt
>> field as an "int" and bit-pack the types.
>>
>> I prefer the CachedPlanType struct to do this, as it's cleaner and
>> self-documenting.
>
> v13 is the implementation using PlannedStmt as described above.

+   PLAN_CACHE_NOT_SET,         /* Not a cached plan */

This should be set to 0 by default, but we could enforce the
definition as well with a PLAN_CACHE_NOT_SET = 0?  Nit: perhaps name
it PLAN_CACHE_NONE to outline the fact that it is just not a cached
plan?

--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -131,6 +131,9 @@ typedef struct PlannedStmt
     /* non-null if this is utility stmt */
     Node       *utilityStmt;

+    /* type of cached plan, if it is one */
+    CachedPlanType cached_plan_type;
[...]
+    foreach(lc, plan->stmt_list)
+    {
+        PlannedStmt *pstmt = (PlannedStmt *) lfirst(lc);
+
+        pstmt->cached_plan_type = customplan ? PLAN_CACHE_CUSTOM : PLAN_CACHE_GENERIC;
+    }

Yes, this implementation feels much more natural, with a state
set to the result that we get when directly retrieving it from the
cached plan layer, pushing the data in the executor end hook in PGSS.
No objections to this approach; I can live with that.

A small thing that would be cleaner is to split the patch into two
parts: one for the in-core backend addition and a second for PGSS.
Code paths are different, so it's simple to do.

Andrei, what do you think about all that?
--
Michael

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Michael Paquier
Дата:
On Wed, Jul 23, 2025 at 12:15:06PM +0900, Michael Paquier wrote:
> A small thing that would be cleaner is to split the patch into two
> parts: one for the in-core backend addition and a second for PGSS.
> Code paths are different, so it's simple to do.

I have been looking at the backend part of the change to add the
cached plan type to PlannedStmt, and found the concept clean.  I have
moved the definition of the new enum to plannodes.h, tweaked a couple
of comments and the result seemed OK, so applied this part.

Attached is the rest of the patch for PGSS, not really reviewed yet.
I have noticed while going through it that pgssCachedPlanMode was not
required.
--
Michael

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 24/7/2025 09:03, Michael Paquier wrote:
> On Wed, Jul 23, 2025 at 12:15:06PM +0900, Michael Paquier wrote:
>> A small thing that would be cleaner is to split the patch into two
>> parts: one for the in-core backend addition and a second for PGSS.
>> Code paths are different, so it's simple to do.
>
> I have been looking at the backend part of the change to add the
> cached plan type to PlannedStmt, and found the concept clean.  I have
> moved the definition of the new enum to plannodes.h, tweaked a couple
> of comments and the result seemed OK, so applied this part.
I see you have chosen a variant with a new enum instead of a pointer to
a plan cache entry. I wonder if you could write the arguments
supporting this choice?

Currently, we are unable to track specific queries and analyse how planning
decisions affect their execution. IMO, this is a missed opportunity, as even
an extension-based feedback system could pave the way for developments
in self-tuning DBMS. Plan cache entries seem to be the most suitable target
for this purpose, as they usually refer to a long-lived statement and already
contain some valuable data.

--
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Tom Lane
Дата:
Andrei Lepikhov <lepihov@gmail.com> writes:
> I see you have chosen a variant with a new enum instead of a pointer to
> a plan cache entry. I wonder if you could write the arguments
> supporting this choice?

Pointing to a plan cache entry would often mean that the data
structure as a whole is circular (since a plan cache entry
will have a pointer to a plan).  That would in particular
make it unsafe for the plan to protect its pointer by incrementing
the cache entry's refcount --- the assemblage could never go away.
So I concur with Michael that what you propose is a bad idea.

That is not to say that I think 719dcf3c4 was a good idea: it looks
rather useless from here.  It seems to me that the right place to
accumulate these sorts of stats is in CachedPlanSources, and I don't
see how this helps.  What likely *would* help is some hooks in
plancache.c for pg_stat_statements to connect into so it can count
replanning events.

            regards, tom lane



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> Andrei Lepikhov <lepihov@gmail.com> writes:
> > I see you have chosen a variant with a new enum instead of a pointer to
> > a plan cache entry. I wonder if you could write the arguments
> > supporting this choice?
>
> Pointing to a plan cache entry would often mean that the data
> structure as a whole is circular (since a plan cache entry
> will have a pointer to a plan).  That would in particular
> make it unsafe for the plan to protect its pointer by incrementing
> the cache entry's refcount --- the assemblage could never go away.
> So I concur with Michael that what you propose is a bad idea.
>
> That is not to say that I think 719dcf3c4 was a good idea: it looks
> rather useless from here.  It seems to me that the right place to
> accumulate these sorts of stats is in CachedPlanSources, and I don't
> see how this helps.  What likely *would* help is some hooks in
> plancache.c for pg_stat_statements to connect into so it can count

One possible hook for accumulating custom and generic plans per
queryId would be inside GetCachedPlan. However, this would require
calling pgss_store an extra time, in addition to ExecutorEnd, every time
GetCachedPlan is executed, which could introduce non-negligible
overhead.

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Tom Lane
Дата:
Sami Imseih <samimseih@gmail.com> writes:
>> That is not to say that I think 719dcf3c4 was a good idea: it looks
>> rather useless from here.  It seems to me that the right place to
>> accumulate these sorts of stats is in CachedPlanSources, and I don't
>> see how this helps.  What likely *would* help is some hooks in
>> plancache.c for pg_stat_statements to connect into so it can count

> One possible hook for accumulating custom and generic plans per
> queryId would be inside GetCachedPlan. However, this would require
> calling pgss_store an extra time, in addition to ExecutorEnd, every time
> GetCachedPlan is executed, which could introduce non-negligible
> overhead.

Only if you insist that the way to handle this is to call pgss_store
at that time.  I'd be inclined to think about having some transient
process-local data structure that can remember the info until
ExecutorEnd.  But the plan tree is not the right place, because of
the circularity problem.

            regards, tom lane



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> Sami Imseih <samimseih@gmail.com> writes:
> >> That is not to say that I think 719dcf3c4 was a good idea: it looks
> >> rather useless from here.  It seems to me that the right place to
> >> accumulate these sorts of stats is in CachedPlanSources, and I don't
> >> see how this helps.  What likely *would* help is some hooks in
> >> plancache.c for pg_stat_statements to connect into so it can count
>
> > One possible hook for accumulating custom and generic plans per
> > queryId would be inside GetCachedPlan. However, this would require
> > calling pgss_store an extra time, in addition to ExecutorEnd, every time
> > GetCachedPlan is executed, which could introduce non-negligible
> > overhead.
>
> Only if you insist that the way to handle this is to call pgss_store
> at that time.  I'd be inclined to think about having some transient
> process-local data structure that can remember the info until
> ExecutorEnd.  But the plan tree is not the right place, because of
> the circularity problem.

One option might be to use a local hash table, keyed the same way as the
shared pgss hash (excluding dbid), to handle cases where a backend has
more than one active cached plan. Then at ExecutorEnd, the local entry could
be looked up and passed to pgss_store. Not sure if this is worth the effort vs
what has been committed.

--
Sami

Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> One option might be to use a local hash table, keyed the same way as the
> shared pgss hash (excluding dbid), to handle cases where a backend has
> more than one active cached plan. Then at ExecutorEnd, the local entry could
> be looked up and passed to pgss_store. Not sure if this is worth the effort vs
> what has been committed.

I should also add that making this information available in PlannedStmt allows
for EXPLAIN to also utilize this information. I am thinking we can add
this information
as part of core EXPLAIN or as an option in pg_overexplain.

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Michael Paquier
Дата:
On Thu, Jul 24, 2025 at 01:14:47PM -0500, Sami Imseih wrote:
>> Sami Imseih <samimseih@gmail.com> writes:
>>>> That is not to say that I think 719dcf3c4 was a good idea: it looks
>>>> rather useless from here.  It seems to me that the right place to
>>>> accumulate these sorts of stats is in CachedPlanSources, and I don't
>>>> see how this helps.  What likely *would* help is some hooks in
>>>> plancache.c for pg_stat_statements to connect into so it can count
>>
>>> One possible hook for accumulating custom and generic plans per
>>> queryId would be inside GetCachedPlan. However, this would require
>>> calling pgss_store an extra time, in addition to ExecutorEnd, every time
>>> GetCachedPlan is executed, which could introduce non-negligible
>>> overhead.

I would suspect more contention on the PGSS locks in the prepared
statement and/or extended query protocol case if we were to do that.
With the amount of information we want to track in PGSS, which is that
we want to know from where a plan is coming from (generic cache,
custom cache or something else), FWIW I'm still OK with this
information added in PlannedStmt itself to be able to link back to the
state retrieved at the end of the executor because it's also
cost-free.

I'm also OK to be proved wrong based on arguments that could justify
the addition of an extra hook, but I don't really see the use-cases
that could justify such additions yet.  Perhaps CachedPlanType is
misnamed, though, would it be more suited to name that as a sort of
"origin" or "source" field concept?  We want to know which which
source we have retrieved a plan that a PlannedStmt refers to.
--
Michael

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 24/7/2025 17:05, Tom Lane wrote:
> Andrei Lepikhov <lepihov@gmail.com> writes:
>> I see you have chosen a variant with a new enum instead of a pointer to
>> a plan cache entry. I wonder if you could write the arguments
>> supporting this choice?
>
> Pointing to a plan cache entry would often mean that the data
> structure as a whole is circular (since a plan cache entry
> will have a pointer to a plan).  That would in particular
> make it unsafe for the plan to protect its pointer by incrementing
> the cache entry's refcount --- the assemblage could never go away.
> So I concur with Michael that what you propose is a bad idea.
I was expecting more substantial arguments. The PostgreSQL code does not
restrict back-linking to the source in principle - see
IndexClause::rinfo for an example. I'm not sure what the problem is with
refcount - in extensions, we currently have access to the plan cache
entry for prepared statements.

In this particular case, I suggested storing a pointer to the
CachedPlanSource instead of the CachedPlan. For a custom plan, we won't
encounter any circular references, and for a generic plan, the reference
will be indirect.

Of course, read and write operations should be disabled for such a
pointer, and the copy operation should only duplicate the pointer itself.


--
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> Perhaps CachedPlanType is
> misnamed, though, would it be more suited to name that as a sort of
> "origin" or "source" field concept?  We want to know which which
> source we have retrieved a plan that a PlannedStmt refers to.

Hmm, I’m not sure I see this as an improvement. In my opinion,
CachedPlanType is a clear name that describes its purpose.

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Tom Lane
Дата:
Sami Imseih <samimseih@gmail.com> writes:
>> Perhaps CachedPlanType is
>> misnamed, though, would it be more suited to name that as a sort of
>> "origin" or "source" field concept?  We want to know which which
>> source we have retrieved a plan that a PlannedStmt refers to.

> Hmm, I’m not sure I see this as an improvement. In my opinion,
> CachedPlanType is a clear name that describes its purpose.

I think Michael's got a point.  As of HEAD there are seven different
places that are setting this to PLAN_CACHE_NONE; who's to say that
pg_stat_statements or some other extension might not wish to
distinguish some of those sources?  At the very least, user-submitted
versus internally-generated queries might be an interesting
distinction.  I don't have a concrete proposal for a different
categorization than what we've got, but it seems worth considering
while we still have the flexibility to change it easily.

            regards, tom lane



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> Sami Imseih <samimseih@gmail.com> writes:
> >> Perhaps CachedPlanType is
> >> misnamed, though, would it be more suited to name that as a sort of
> >> "origin" or "source" field concept?  We want to know which which
> >> source we have retrieved a plan that a PlannedStmt refers to.
>
> > Hmm, I’m not sure I see this as an improvement. In my opinion,
> > CachedPlanType is a clear name that describes its purpose.
>
> I think Michael's got a point.  As of HEAD there are seven different
> places that are setting this to PLAN_CACHE_NONE; who's to say that
> pg_stat_statements or some other extension might not wish to
> distinguish some of those sources?  At the very least, user-submitted
> versus internally-generated queries might be an interesting
> distinction.  I don't have a concrete proposal for a different
> categorization than what we've got, but it seems worth considering
> while we still have the flexibility to change it easily.

Sure, I get it now, I think. An example is the cached plan from a sql
in a utility statement of a prepared statement, as an example. right?

I can see that being useful, If I understood that correctly.

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Michael Paquier
Дата:
On Fri, Jul 25, 2025 at 02:34:07PM -0500, Sami Imseih wrote:
> Sami Imseih <samimseih@gmail.com> writes:
>> I think Michael's got a point.  As of HEAD there are seven different
>> places that are setting this to PLAN_CACHE_NONE; who's to say that
>> pg_stat_statements or some other extension might not wish to
>> distinguish some of those sources?  At the very least, user-submitted
>> versus internally-generated queries might be an interesting
>> distinction.  I don't have a concrete proposal for a different
>> categorization than what we've got, but it seems worth considering
>> while we still have the flexibility to change it easily.
>
> Sure, I get it now, I think. An example is the cached plan from a sql
> in a utility statement of a prepared statement, as an example. right?

Attached is my counter-proposal, where I have settled down to four
categories of PlannedStmt:
- "standard" PlannedStmt, when going through the planner.
- internally-generated "fake" PlannedStmt.
- custom cache
- generic cache

We could decide if a few more of the internal "fake" ones are worth
subdividing, but I am feeling that this is kind of OK to start with.
If we want more granularity, I was wondering about some bits to be
able to mix one or more of these categories, but they are all
exclusive as far as I know.
--
Michael

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 28/7/2025 08:18, Michael Paquier wrote:
> We could decide if a few more of the internal "fake" ones are worth
> subdividing, but I am feeling that this is kind of OK to start with.
> If we want more granularity, I was wondering about some bits to be
> able to mix one or more of these categories, but they are all
> exclusive as far as I know.
It looks good, but doesn't it seem too narrow?

For the sake of tracking queries, their parse tree and versions of the
plan, it seems worth adding an ext_list field to the Query and
PlannedStmt structures with a convention to add only Extensible nodes.
The core will be responsible only for copying this list from the
Query::ext_list to PlannedStmt::ext_list inside a correct memory
context.

This minor change allows an extension to track a specific query from
the parse tree up to the end of execution and carry as much data as
needed. The extension (pg_stat_statements as well) may add all the
necessary data in the parse hook, planner hook, or any of the
execution hooks. With a trivial naming convention, Extensible nodes of
different extensions will not interfere.
To identify the cached plan, the GetCachedPlan hook may be introduced.


--
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Michael Paquier
Дата:
On Mon, Jul 28, 2025 at 08:41:29AM +0200, Andrei Lepikhov wrote:
> It looks good, but doesn't it seem too narrow?

For the use case of the thread which is to count the number of custom
vs generic plans, it would be good enough.

> This minor change allows an extension to track a specific query from
> the parse tree up to the end of execution and carry as much data as
> needed. The extension (pg_stat_statements as well) may add all the
> necessary data in the parse hook, planner hook, or any of the
> execution hooks. With a trivial naming convention, Extensible nodes of
> different extensions will not interfere.
> To identify the cached plan, the GetCachedPlan hook may be introduced.

Without knowing the actual use cases where these additions can be
useful, introducing this extra amount of infrastructure may not be
justified.  Just my 2c and my impressions after studying the whole
thread.
--
Michael

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 28/7/2025 08:46, Michael Paquier wrote:
> On Mon, Jul 28, 2025 at 08:41:29AM +0200, Andrei Lepikhov wrote:
>> It looks good, but doesn't it seem too narrow?
> 
> For the use case of the thread which is to count the number of custom
> vs generic plans, it would be good enough.
Sure, no objections.
> 
>> This minor change allows an extension to track a specific query from
>> the parse tree up to the end of execution and carry as much data as
>> needed. The extension (pg_stat_statements as well) may add all the
>> necessary data in the parse hook, planner hook, or any of the
>> execution hooks. With a trivial naming convention, Extensible nodes of
>> different extensions will not interfere.
>> To identify the cached plan, the GetCachedPlan hook may be introduced.
> 
> Without knowing the actual use cases where these additions can be
> useful, introducing this extra amount of infrastructure may not be
> justified.  Just my 2c and my impressions after studying the whole
> thread.
The case is as I already described: in Postgres, extensions have no way 
to identify how their path hooks really influenced specific query plan 
and its execution. During planning, they have only a pointer to the 
Query structure, but at the end of execution, only a PlannedStmt is 
available. I propose improving this 'blind' approach to planning.
The only argument I usually receive in response to such a proposal is to 
add a beneficial example. The current pg_stat_statements change may be a 
good example of the employment of such infrastructure, isn't it?

-- 
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> The current pg_stat_statements change may be a
> good example of the employment of such infrastructure, isn't it?

So, the current set of patches now will help move forward the specific
use-case of this thread. If we need something different that will be
useful for more use-cases, perhaps it's better to start a new thread as
a follow-up to this? I think there may be more debate on what that
looks like.

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> Attached is my counter-proposal, where I have settled down to four
> categories of PlannedStmt:
> - "standard" PlannedStmt, when going through the planner.
> - internally-generated "fake" PlannedStmt.
> - custom cache
> - generic cache

Thanks for the update! I plan on reviewing this tomorrow.

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> > Attached is my counter-proposal, where I have settled down to four
> > categories of PlannedStmt:
> > - "standard" PlannedStmt, when going through the planner.
> > - internally-generated "fake" PlannedStmt.
> > - custom cache
> > - generic cache
>
> Thanks for the update! I plan on reviewing this tomorrow.

The only comment I have is I think we need a NOT_SET
member, so it can simplify the life of extensions that have code
paths which may or may not have a PlannedStmt, such as
pgss_store. In pgss_store, I don't want to pass the entire PlannedStmt,
nor do I want to pass PLAN_STMT_INTERNAL in the call during
post_parse_analyze, in which case we don't have a plan. Using
PLAN_STMT_INTERNAL as a default value if odd.

v15 includes the change with the above as well as the pg_stat_statements
changes.


--
Sami

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Michael Paquier
Дата:
On Tue, Jul 29, 2025 at 05:08:09PM -0500, Sami Imseih wrote:
> The only comment I have is I think we need a NOT_SET
> member, so it can simplify the life of extensions that have code
> paths which may or may not have a PlannedStmt, such as
> pgss_store.

Okay by me for having a default that maps to something else than the
rest.

+   PLAN_STMT_NOT_SET = 0,      /* origin not yet set */

The term "NOT_SET" makes me itch a little bit, even if there is an
existing parallel with OverridingKind.  Perhaps your proposal is OK,
still how about "UNKNOWN" instead to use as term for the default?

> In pgss_store, I don't want to pass the entire PlannedStmt,

Neither do I.

> nor do I want to pass PLAN_STMT_INTERNAL in the call during
> post_parse_analyze, in which case we don't have a plan.

Okay.
--
Michael

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 30/7/2025 09:20, Michael Paquier wrote:
> On Tue, Jul 29, 2025 at 05:08:09PM -0500, Sami Imseih wrote:
>> The only comment I have is I think we need a NOT_SET
>> member, so it can simplify the life of extensions that have code
>> paths which may or may not have a PlannedStmt, such as
>> pgss_store.
> 
> Okay by me for having a default that maps to something else than the
> rest.
> 
> +   PLAN_STMT_NOT_SET = 0,      /* origin not yet set */
> 
> The term "NOT_SET" makes me itch a little bit, even if there is an
> existing parallel with OverridingKind.  Perhaps your proposal is OK,
> still how about "UNKNOWN" instead to use as term for the default?
+1 to "UNKNOWN".

There may be various sources for query plans. For instance, in the plan 
freezing extension, I often bypass the standard planner completely by 
using a plan created in another backend and serialised in shared memory. 
Additionally, there is a concept of comparing hinted plans with those 
generated freely after an upgrade, which would serve as an extra source 
of plans.
One extra example - I sometimes build a 'referenced generic plan', using 
incoming constants as a source for clause selectivity calculations, 
building a generic plan, likewise SQL Server or Oracle implemented 
generics. It seems like a 'hybrid' type of plan ;).

But generally, classification in the PlannedStmtOrigin structure seems a 
little strange: a generic plan has a qualitative difference from any 
custom one. And any other plan also will be generic or custom, doesn't 
it? It is interesting information about the plan source, of course, but 
for the sake of performance analysis, it would be profitable to 
understand the type of plan. However, the last sentence may be a subject 
for another thread.

-- 
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> > The term "NOT_SET" makes me itch a little bit, even if there is an
> > existing parallel with OverridingKind.  Perhaps your proposal is OK,
> > still how about "UNKNOWN" instead to use as term for the default?
> +1 to "UNKNOWN".

We currently use both UNKNOWN and NOT_SET in different places.
However, I'm okay with using UNKNOWN, and I've updated it in v16.

> But generally, classification in the PlannedStmtOrigin structure seems a
> little strange: a generic plan has a qualitative difference from any
> custom one. And any other plan also will be generic or custom, doesn't
> it?

I am not sure I understand the reasoning here. Can you provide more details/
specific examples?

--
Sami

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 7/30/25 21:05, Sami Imseih wrote:
>>> The term "NOT_SET" makes me itch a little bit, even if there is an
>>> existing parallel with OverridingKind.  Perhaps your proposal is OK,
>>> still how about "UNKNOWN" instead to use as term for the default?
>> +1 to "UNKNOWN".
> 
> We currently use both UNKNOWN and NOT_SET in different places.
> However, I'm okay with using UNKNOWN, and I've updated it in v16.
> 
>> But generally, classification in the PlannedStmtOrigin structure seems a
>> little strange: a generic plan has a qualitative difference from any
>> custom one. And any other plan also will be generic or custom, doesn't
>> it?
> 
> I am not sure I understand the reasoning here. Can you provide more details/
> specific examples?
Yep,
When building a generic plan, you don't apply any constant to the 
clause, such as 'x<$1'.
That means you can't use histograms or MCV statistics when building a 
custom plan. The optimiser should guess and frequently uses just a 
'magic constant', like 0.05 or 0.33 for selectivity estimation. It 
sometimes drastically reduces the quality of the plan. So, analysing 
pg_s_s data, it would be beneficial to determine if a generic plan is 
effective or not.

In practice, with this knowledge, we can access the CachedPlanSource of 
the corresponding PREPARED statement via an extension and override the 
decision made in 'auto' mode. Unfortunately, we cannot obtain a pointer 
to plan cache entries for plans prepared by the extended protocol, but 
this may be possible in the future.

So, I meant that the source of the plan is one important characteristic, 
and the type (custom or generic) is another, independent characteristic

-- 
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> So, analysing
> pg_s_s data, it would be beneficial to determine if a generic plan is
> effective or not.

Yes, this is the point of adding these statistics to pg_s_s.

> In practice, with this knowledge, we can access the CachedPlanSource of
> the corresponding PREPARED statement via an extension and override the
> decision made in 'auto' mode. Unfortunately, we cannot obtain a pointer
> to plan cache entries for plans prepared by the extended protocol, but
> this may be possible in the future.
>
> So, I meant that the source of the plan is one important characteristic,
> and the type (custom or generic) is another, independent characteristic

The concepts of custom and generic plan types are associated with plan caches,
so they cannot have a different source. right?

--
Sami



Re: track generic and custom plans in pg_stat_statements

От
Michael Paquier
Дата:
On Wed, Jul 30, 2025 at 03:09:08PM -0500, Sami Imseih wrote:
>> In practice, with this knowledge, we can access the CachedPlanSource of
>> the corresponding PREPARED statement via an extension and override the
>> decision made in 'auto' mode. Unfortunately, we cannot obtain a pointer
>> to plan cache entries for plans prepared by the extended protocol, but
>> this may be possible in the future.
>>
>> So, I meant that the source of the plan is one important characteristic,
>> and the type (custom or generic) is another, independent characteristic

It seems to me that we're going to need a bit more than a design
concept here, explaining how this can be useful for core to provide
this knowledge to extensions.  A lot of concepts can be defined as
"useful", but it's hard to evaluate what you'd want here and how much
it can be useful in terms of manipulations of the cached plans
depending on what you may aim for.  It's also unclear up to which
extent the current state of the code would be able to help in the
concepts you're describing.  Anyway, I am not inclined to have pointer
references in structures pointing to other parts of the system, just
because these can be "useful".
--
Michael

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Michael Paquier
Дата:
On Wed, Jul 30, 2025 at 02:05:09PM -0500, Sami Imseih wrote:
> > > The term "NOT_SET" makes me itch a little bit, even if there is an
> > > existing parallel with OverridingKind.  Perhaps your proposal is OK,
> > > still how about "UNKNOWN" instead to use as term for the default?
> > +1 to "UNKNOWN".
>
> We currently use both UNKNOWN and NOT_SET in different places.
> However, I'm okay with using UNKNOWN, and I've updated it in v16.

Okay, applied both patches to get all that done, then.

Patch 0002 was not without turbulences:
- Order of the tests in meson.build and Makefile, where plancache
should be placed before the cleanup.  The squashing tests get that
wrong, actually..
- Some tweaks to the style of the SQL queries in the tests, two tweaks
in the wording of the docs.
- Lack of coverage for oldextversions for the transfer from 1.12 to
1.13.
- PGSS_FILE_HEADER should be bumped.  We have been very bad at that
since 2022, myself included, better later than never.
- I have decided to remove the cases with plan_cache_mode = auto, due
to a lack of predictibility depending on what the backend may decide
to use.
- For the last case with EXPLAIN + procedures, I have added an extra
"toplevel" in the ORDER BY clause, to ensure the order of the results.
"toplevel" was not required for the first three cases, where we don't
track the non-top-level queries.
- Added one round of code indentation.
--
Michael

Вложения

Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 30/7/2025 22:09, Sami Imseih wrote:
> The concepts of custom and generic plan types are associated with plan caches,
> so they cannot have a different source. right?
That is exactly what confused me: what does the 'origin' of the plan 
mean? See the comment:

 > PlannedStmtOrigin identifies from where a PlannedStmt comes from.

The first thing that comes to mind after reading this comment is a 
subsystem, such as the SPI, simple or extended protocol, or an 
extension. Another meaning is a type of plan, such as 'custom', 
'generic', or 'referenced'. As I see, here is a bit different 
classification used, not so obvious, at least for me.

-- 
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Andrei Lepikhov
Дата:
On 31/7/2025 02:09, Michael Paquier wrote:
> On Wed, Jul 30, 2025 at 03:09:08PM -0500, Sami Imseih wrote:
>>> In practice, with this knowledge, we can access the CachedPlanSource of
>>> the corresponding PREPARED statement via an extension and override the
>>> decision made in 'auto' mode. Unfortunately, we cannot obtain a pointer
>>> to plan cache entries for plans prepared by the extended protocol, but
>>> this may be possible in the future.
>>>
>>> So, I meant that the source of the plan is one important characteristic,
>>> and the type (custom or generic) is another, independent characteristic
> 
> It seems to me that we're going to need a bit more than a design
> concept here, explaining how this can be useful for core to provide
> this knowledge to extensions.  A lot of concepts can be defined as
> "useful", but it's hard to evaluate what you'd want here and how much
> it can be useful in terms of manipulations of the cached plans
> depending on what you may aim for.  It's also unclear up to which
> extent the current state of the code would be able to help in the
> concepts you're describing.  Anyway, I am not inclined to have pointer
> references in structures pointing to other parts of the system, just
> because these can be "useful".
Don't mind, I also think it wasn't a great idea to save a pointer.
After the discussion, I came up with the idea that an 'extended list' in 
some key nodes is a much more helpful and flexible way, serving multiple 
purposes.
Anyway, the extensibility model has never been an easy part of the 
system to design. So, let's think about that as a first draft approach.

-- 
regards, Andrei Lepikhov



Re: track generic and custom plans in pg_stat_statements

От
Sami Imseih
Дата:
> Patch 0002 was not without turbulences:

Thanks! And for sorting out the misses in the patch.

--
Sami