Обсуждение: Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

Поиск
Список
Период
Сортировка

Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
Julian Markwort
Дата:
On Thu, 2018-03-22 at 11:16 -0700, legrand legrand wrote:
> Reading other pg_stat_statements threads on this forum, there are
> also activ
> developments to add:
> - planing duration,
> - first date,
> - last_update date,

As I see it, planning duration, first date, and last update date would
be columns added to the pg_stat_statements view, i.e. they are tracked
for each kind of a (jumbled) query -- just as the good and bad plans,
their associated execution times and timestamps are.

> - parameters for normalized queries,

I've reviewed Vik Fearing's patch for this and have not heard back from
him. Also, as you have already explained in your summary post, these
parameters only aid in the examination of current plans and offers no
information regarding plans used in the past.

> I was wondering about how would your dev behave with all those new
> features.
> It seems to me that bad and good plans will not have any of thoses
> informations.

A patch that adds planning durations and timestamps associated with queries wouldn't interefere with my plans patch.
However, we could think about capturing the planning durations for the plans recorded by my patch.

The patch for tracking first-time parameters for normalized queries has
a different use case, compared to this patch. It shouldn't interfere
with my patch, anyway.

> Last question, didn't you think about a model to store all the
> different
> plans using a planid like
>
> queryid, planid, query, ...
> aaa plan1 ...
> aaa plan2 ...
> aaa plan3 ...
> ...
>
> I can not imagine that there would be so many of them ;o)

This wasn't obvious to me during development, as each entry (with a certain queryid) is directly connected to two
plans.
But with future development in mind it probably makes sense to separate the plans from the rest of pg_stat_statements.

This would also allow us to keep old plans, while only storing new ones that are not equivalent, essentially providing
ahistory of the plans used. 
Keep in mind that this check for equivalence would require further development and we'd have to make sure we're not
consumingtoo much memory (however much that is) when storing possibly infinite amounts of plans. 

I've created a draft patch that provides access to plans in a view called pg_stat_statements_plans.
There is no column that indicates whether the plan is "good" or "bad", because that is evident from the execution time
ofboth plans and because that would require some kind of plan_type for all plans that might be stored in future
versions.

Please take it for a spin and tell me, whether the layout and handling of the view make sense to you.

Julian

Вложения

Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
legrand legrand
Дата:
> I've created a draft patch that provides access to plans in a view 
>  called pg_stat_statements_plans.

++ I like it !

> There is no column that indicates whether the plan is "good" or "bad", 
>  because that is evident from the execution time of both plans and because 
>  that would require some kind of plan_type for all plans that might 
>  be stored in future versions.

At startup time there are 2 identical plans found in the view
I thought it should have be only one: the "initial" one 
(as long as there is no "good" or "bad" one).
maybe 3 "plan_types" like "init", "good" and "bad" should help.

Will a new line be created for good or bad plan if the plan is the same ?
shouldn't we capture "constants" and "parameters" inspite ?

Is query text needed in plan? 
it can be huge and is already available (in normalized format) 
in pg_stat_statements. If realy needed in raw format 
(with constants) we could force pgss to store it (in replacement of
normalize one)
using a guc pg_stat_statements.normalized = false (I have a patch to do
that).


In Thread:
http://www.postgresql-archive.org/Poc-pg-stat-statements-with-planid-td6014027.html
pg_stat_statements view has a new key with  dbid,userid,queryid + planid
and 2 columns added: "created" and "last_updated".

May be your view pg_stat_statements_plans could be transformed :
same key as pgss: dbid,userid,queryid,planid 
and THE plan column to store explain result (no more need for plan_time nor
tz that 
are already available in modified pgss).

Thoses changes to pgss are far from being accepted by community:
- planid calculation has to be discussed (I reused the one from
pg_stat_plans, 
  but I would have prefered explain command to provide it ...),
- "created" and "last_updated" columns also,
- ...

So maybe your stategy to keep pgss unchanged, and add extensions view is
better.
There is a risk of duplicated informations (like execution_time, calls, ...)

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
Julian Markwort
Дата:
On Fri, 2018-04-06 at 13:57 -0700, legrand legrand wrote:
> At startup time there are 2 identical plans found in the view
> I thought it should have be only one: the "initial" one
> (as long as there is no "good" or "bad" one).

Yes, those are 'remnants' from the time where I had two columns, one
for good_plan and one for bad_plan.
Whenever only one plan existed (either because the statement hadn't
been executed more than once or because the deviation from the previous
plan's time wasn't significant enough), the patch would display the
same plan string for both columns.

I'll have to be a little creative, but I think I'll be able to print
only one plan if both the good and bad plan are essentially the
"inital" plan.

You can be assured that the plan string isn't saved twice for those
cases, the good_plan and bad_plan structs just used the same  offset
for the qtexts file.

> maybe 3 "plan_types" like "init", "good" and "bad" should help.

I think, the categorization into good and bad plans can be made based
on the execution time of said plan.
I'd suggest using something like
SELECT * FROM pg_stat_statements_plans GROUP BY queryid ORDER BY
plan_time;
This way, if there is only one row for any queryid, this plan is
essentially the "initial" one. If there are two plans, their goodness
can be infered from the exection times...

> Will a new line be created for good or bad plan if the plan is the
> same ?
> shouldn't we capture "constants" and "parameters" inspite ?

Capturing constants and parameters would require us to normalize the
plan, which isn't done currently. (Primarily. because it would involve
a lot of logic; Secondly, because for my use case, I'm interested in
the specific constants and parameters that led to a good or bad plan)

Do you have a use case in mind which would profit from normalized
plans?

> Is query text needed in plan?
> it can be huge and is already available (in normalized format)
> in pg_stat_statements. If realy needed in raw format
> (with constants) we could force pgss to store it (in replacement of
> normalize one)
> using a guc pg_stat_statements.normalized = false (I have a patch to
> do
> that).

The query is part of what's returned by an explain statement, so I've
thought to keep the 'plan' intact.
Since all long strings (queries and plans alike) are stored in a file
on disk, I'm not too much worried about the space these strings take
up.
I'd think that a hugely long query would lead to an even longer plan,
in which case the problem to focus on might not be to reduce the length
of the string stored by a statistical plugin...

>
> In Thread:
> http://www.postgresql-archive.org/Poc-pg-stat-statements-with-planid-
> td6014027.html
> pg_stat_statements view has a new key with  dbid,userid,queryid +
> planid
> and 2 columns added: "created" and "last_updated".

I've taken a look at your patch. I agree that having a plan identifier
would be great, but I'm not a big fan of the jumbling. That's a lot of
hashing that needs to be done to decide wether two plans are
essentially equivalent or not.

Maybe, in the future, in a different patch, we could add functionality
that would allow us to compare two plan trees. There is even a remark
in the header of src/backend/nodes/equalfuncs.c regarding this:
 * Currently, in fact, equal() doesn't know how to compare Plan trees
 * either.  This might need to be fixed someday.

> May be your view pg_stat_statements_plans could be transformed :
> same key as pgss: dbid,userid,queryid,planid
> and THE plan column to store explain result (no more need for
> plan_time nor
> tz that
> are already available in modified pgss).

The documentation [of pg_stat_statements] gives no clear indication
which fields qualify as primary keys, mainly because the hashing that
generates the queryid isn't guaranteed to produce unique results...
So I'm not sure which columns should be used to create associations
between pg_stat_statements and pg_stat_statements_plans.


> Thoses changes to pgss are far from being accepted by community:
> - planid calculation has to be discussed (I reused the one from
> pg_stat_plans,
>   but I would have prefered explain command to provide it ...),
> - "created" and "last_updated" columns also,
> - ...
>
> So maybe your stategy to keep pgss unchanged, and add extensions view
> is
> better.
> There is a risk of duplicated informations (like execution_time,
> calls, ...)

It might be possible to completely seperate these two views into two
extensions. (pg_stat_statements_plans could use it's own hooks to
monitor query execution)
But that would probably result in a lot of duplicated code (for the
querytexts file) and would be a lot more elaborate to create, as I'd
need to recreate most of the hashmaps and locks.

Piggybacking onto the existing logic of pg_stat_statements to store a
plan every once in a while seems simpler to me.


Regards
Julian
Вложения

Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
legrand legrand
Дата:
[...]
> I've taken a look at your patch. I agree that having a plan identifier
> would be great, but I'm not a big fan of the jumbling. That's a lot of
> hashing that needs to be done to decide wether two plans are
> essentially equivalent or not.

As there is no planid available yet in core, I reused jumbling from 
pg_stat_plans, and I agree : it seems quite complicated (its not even 
sure that it is compatible with all the new partitioning stuff) and 
I won't be able to maintain it.

> Maybe, in the future, in a different patch, we could add functionality
> that would allow us to compare two plan trees. There is even a remark
> in the header of src/backend/nodes/equalfuncs.c regarding this:
>  * Currently, in fact, equal() doesn't know how to compare Plan trees
>  * either.  This might need to be fixed someday. 

It would be so helpfull if a planid was included in core Query / plan tree, 
it could be reused here in equal(), in explain commands, 
in pg_stat_activity, in pg_stat_statements ...


>> May be your view pg_stat_statements_plans could be transformed :
>> same key as pgss: dbid,userid,queryid,planid 
>> and THE plan column to store explain result (no more need for
>> plan_time nor
>> tz that 
>> are already available in modified pgss).

> The documentation [of pg_stat_statements] gives no clear indication
> which fields qualify as primary keys, mainly because the hashing that
> generates the queryid isn't guaranteed to produce unique results...
> So I'm not sure which columns should be used to create associations
> between pg_stat_statements and pg_stat_statements_plans.


Yes I understand, to choose between
  dbid,userid,queryid,planid 
and
  dbid,userid,planid 

(planid alone seems not acceptable regarding privacy policies).

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
Robert Haas
Дата:
On Fri, Apr 20, 2018 at 6:34 PM, legrand legrand
<legrand_legrand@hotmail.com> wrote:
> It would be so helpfull if a planid was included in core Query / plan tree,
> it could be reused here in equal(), in explain commands,
> in pg_stat_activity, in pg_stat_statements ...

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
Sergei Kornilov
Дата:
Hello
I notice latest patch version (v06) not applied after da6f3e45ddb68ab3161076e120e7c32cfd46d1db commit in april. We have
someconsensus with design of this feature?
 

I will mark CF entry as waiting on author

regards, Sergei


Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
Thomas Munro
Дата:
On Sat, Apr 7, 2018 at 1:03 AM, Julian Markwort
<julian.markwort@uni-muenster.de> wrote:
> I've created a draft patch that provides access to plans in a view called pg_stat_statements_plans.
> There is no column that indicates whether the plan is "good" or "bad", because that is evident from the execution
timeof both plans and because that would require some kind of plan_type for all plans that might be stored in future
versions.
>
> Please take it for a spin and tell me, whether the layout and handling of the view make sense to you.

I realise this is probably WIP, but for what it's worth here's some
feedback from my patch testing robot:

+ +INFINITY,

That ^ is from C99 math.h, and so we don't currently expect compilers
to have it (until such time as we decide to pull the trigger and
switch to C99, as discussed in a nearby thread):

  contrib/pg_stat_statements/pg_stat_statements.c(482): error C2065:
'INFINITY' : undeclared identifier
[C:\projects\postgresql\pg_stat_statements.vcxproj]
5826

Maybe get_float8_infinity()?

+ for(int j = 0; j < numPlans; j++)

Can't declare a new variable here in C89.

+ pgssPlan *planArray[numPlans];

Can't use variable length arrays in C89.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
Tom Lane
Дата:
[ off topic for this patch, but as long as you mentioned switching
to C99 ]

Thomas Munro <thomas.munro@enterprisedb.com> writes:
> + for(int j = 0; j < numPlans; j++)
> Can't declare a new variable here in C89.

As previously noted, that seems like a nice thing to allow ...

> + pgssPlan *planArray[numPlans];
> Can't use variable length arrays in C89.

... but I'm less excited about this one.  Seems like a great opportunity
for unexpected stack overflows, and thence at least the chance for
DOS-causing security attacks.  Can we prevent that from being allowed,
if we start using -std=c99?

            regards, tom lane


Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
Thomas Munro
Дата:
On Mon, Aug 20, 2018 at 2:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
> As previously noted, that seems like a nice thing to allow ...
>
>> + pgssPlan *planArray[numPlans];
>> Can't use variable length arrays in C89.
>
> ... but I'm less excited about this one.  Seems like a great opportunity
> for unexpected stack overflows, and thence at least the chance for
> DOS-causing security attacks.  Can we prevent that from being allowed,
> if we start using -std=c99?

-Werror=vla in GCC, apparently.

Another problem with VLAs is that they aren't in C++ and last I heard
they aren't ever likely to be (at least not with that syntax).  I'd rather not
introduce obstacles on that path.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Mon, Aug 20, 2018 at 2:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Can we prevent that from being allowed,
>> if we start using -std=c99?

> -Werror=vla in GCC, apparently.

Ah, cool.

> Another problem with VLAs is that they aren't in C++ and last I heard
> they aren't ever likely to be (at least not with that syntax).  I'd rather not
> introduce obstacles on that path.

While I'm not that excited about the prospect of ever moving to C++,
I'm on board with the idea of sticking to something that's a subset
of both C99 and C++.  Are there any other language features that we'd
need to avoid?  Is there a direct way of testing for that, rather
than finding per-feature warning methods?

            regards, tom lane


Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
Andres Freund
Дата:
Hi,

On 2018-08-20 15:21:07 +1200, Thomas Munro wrote:
> On Mon, Aug 20, 2018 at 2:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Thomas Munro <thomas.munro@enterprisedb.com> writes:
> > ... but I'm less excited about this one.  Seems like a great opportunity
> > for unexpected stack overflows, and thence at least the chance for
> > DOS-causing security attacks.  Can we prevent that from being allowed,
> > if we start using -std=c99?
> 
> -Werror=vla in GCC, apparently.

How about detecting support for that in our configure script and
automatically using it?  If we're uncomfortable with raising an error,
let's at least raise a warning?

- Andres


Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-08-20 15:21:07 +1200, Thomas Munro wrote:
>> -Werror=vla in GCC, apparently.

> How about detecting support for that in our configure script and
> automatically using it?  If we're uncomfortable with raising an error,
> let's at least raise a warning?

Yeah, we'd certainly auto-turn-on whatever options we think are needed.

            regards, tom lane


Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

От
Michael Paquier
Дата:
On Fri, Apr 06, 2018 at 03:03:30PM +0200, Julian Markwort wrote:
> As I see it, planning duration, first date, and last update date would
> be columns added to the pg_stat_statements view, i.e. they are tracked
> for each kind of a (jumbled) query -- just as the good and bad plans,
> their associated execution times and timestamps are.

The latest patch set does not apply and has been waiting on input from
author for more than a couple of weeks, so please note that I have
marked it as returned with feedback in commit fest 2018-09.
--
Michael

Вложения