Обсуждение: pg_stat_statements and planning time
Hi, pg_stat_statements is basically very helpful to find out slow queries. But since it doesn't report the time spent in the planner, we cannot find out slow queries which take most time to do query planning, from pg_stat_statements. Is there any reason why pg_stat_statements doesn't collect the planning time? Attached patch extends pg_stat_statements so that it reports the planning time. Thought? In the patch, I didn't change the column name "total_time" meaning the time spent in the executor because of the backward compatibility. But once new column "plan_time" is added, "total_time" is confusing and ISTM it should be renamed... Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Вложения
On Wed, Mar 7, 2012 at 11:45 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > Attached patch extends pg_stat_statements so that it reports the > planning time. Thought? If we successfully aggregate SQL in the current patch then this might be useful as well. Until we do that it's not much use. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Mar 7, 2012 at 9:00 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, Mar 7, 2012 at 11:45 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > >> Attached patch extends pg_stat_statements so that it reports the >> planning time. Thought? > > If we successfully aggregate SQL in the current patch then this might > be useful as well. Until we do that it's not much use. You mean pg_stat_statements normalization patch? Yes, if it will be applied, this patch would be more useful. But without the normalization patch, this patch would be useful for a user who use the extended protocol but not plan cache mechanism, I think. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 7, 2012 at 6:45 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > pg_stat_statements is basically very helpful to find out slow queries. > But since it doesn't report the time spent in the planner, we cannot > find out slow queries which take most time to do query planning, from > pg_stat_statements. Is there any reason why pg_stat_statements doesn't > collect the planning time? > > Attached patch extends pg_stat_statements so that it reports the > planning time. Thought? I think this is an interesting idea, but I think it's too late for 9.2. I'd like to have the planning time in a number of other places as well, such as EXPLAIN, and maybe statistics views. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Fujii Masao <masao.fujii@gmail.com> writes: > In the patch, I didn't change the column name "total_time" meaning > the time spent in the executor because of the backward compatibility. > But once new column "plan_time" is added, "total_time" is confusing and > ISTM it should be renamed... Well, if we were tracking planning time, what I would expect "total_time" to mean is plan time plus execution time. Should it be redefined that way, instead of renaming it? Another point here is that because of plan caching, the number of planner invocations could be quite different from the number of executor runs. It's not clear to me whether this will confuse matters for pg_stat_statements, but it's something to think about. Will it be possible to tell whether a particular statement is hugely expensive to plan but we don't do that often, versus cheap to plan but we do that a lot? IOW I am wondering if we need to track the number of invocations as well as total time. regards, tom lane
On Wed, Mar 7, 2012 at 8:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fujii Masao <masao.fujii@gmail.com> writes: >> In the patch, I didn't change the column name "total_time" meaning >> the time spent in the executor because of the backward compatibility. >> But once new column "plan_time" is added, "total_time" is confusing and >> ISTM it should be renamed... > > Well, if we were tracking planning time, what I would expect > "total_time" to mean is plan time plus execution time. Should it be > redefined that way, instead of renaming it? I think you are right. On first glance, a user would only be interested in one number, the total time. Most people are hunting queries that happen (surprisingly) frequently and don't have good indexes in place to serve them, and in those cases planning time is negligible. However, should planning time be the bottleneck then one has to investigate fewer, smaller queries, a very different solution space. > Another point here is that because of plan caching, the number of > planner invocations could be quite different from the number of executor > runs. It's not clear to me whether this will confuse matters for > pg_stat_statements, but it's something to think about. Will it be > possible to tell whether a particular statement is hugely expensive to > plan but we don't do that often, versus cheap to plan but we do that a > lot? IOW I am wondering if we need to track the number of invocations > as well as total time. I don't think tracking a few more words will do much harm, and could also do a lot of good....perhaps count, sum, sum(x**2) of planning and execution? (Few things to me are more annoying than an unadorned average when it would have been feasible to take a square of a number). Thoughts? -- fdr
On Thu, Mar 8, 2012 at 12:39 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Mar 7, 2012 at 6:45 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> pg_stat_statements is basically very helpful to find out slow queries. >> But since it doesn't report the time spent in the planner, we cannot >> find out slow queries which take most time to do query planning, from >> pg_stat_statements. Is there any reason why pg_stat_statements doesn't >> collect the planning time? >> >> Attached patch extends pg_stat_statements so that it reports the >> planning time. Thought? > > I think this is an interesting idea, but I think it's too late for 9.2. Yes. I will add this to the next commitfest. And, in the patch, I changed pg_stat_statements--1.1.sql, but, for 9.3 I will have to create 1.2.sql instead. > I'd like to have the planning time in a number of other places as > well, such as EXPLAIN, and maybe statistics views. +1 for EXPLAIN. But which statistics views are in your mind? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Mar 8, 2012 at 1:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fujii Masao <masao.fujii@gmail.com> writes: >> In the patch, I didn't change the column name "total_time" meaning >> the time spent in the executor because of the backward compatibility. >> But once new column "plan_time" is added, "total_time" is confusing and >> ISTM it should be renamed... > > Well, if we were tracking planning time, what I would expect > "total_time" to mean is plan time plus execution time. Should it be > redefined that way, instead of renaming it? Agreed, it's more intuitive for a user. Along with "total_time" and "plan_time", should we also define "exec_time" reporting only the execution time for improvement of usability though it can be calculated from "total_time" and "plan_time"? > Another point here is that because of plan caching, the number of > planner invocations could be quite different from the number of executor > runs. It's not clear to me whether this will confuse matters for > pg_stat_statements, but it's something to think about. Will it be > possible to tell whether a particular statement is hugely expensive to > plan but we don't do that often, versus cheap to plan but we do that a > lot? IOW I am wondering if we need to track the number of invocations > as well as total time. Agreed to add something like "plan_count" column. This also would be helpful for e.g., tuning the prepareThreshold parameter in JDBC. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 7, 2012 at 9:59 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> I'd like to have the planning time in a number of other places as >> well, such as EXPLAIN, and maybe statistics views. > > +1 for EXPLAIN. But which statistics views are in your mind? I don't know. I'm not sure if it's interesting to be able to count planning time vs. execution time on a server-wide basis, or even a per-database basis, but it seems like it might be, if we can do it cheaply. Then again, considering that gettimeofday is kinda expensive, I suppose that would have to be optional if we were to have it at all. Just thinking out loud, mostly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8 March 2012 13:09, Robert Haas <robertmhaas@gmail.com> wrote: > Then again, considering that gettimeofday is kinda > expensive, I suppose that would have to be optional if we were to have > it at all. +1. I'm not opposed to having such a mechanism, but it really ought to impose exactly no overhead on the common case where we don't particularly care about plan time. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > On 8 March 2012 13:09, Robert Haas <robertmhaas@gmail.com> wrote: >> �Then again, considering that gettimeofday is kinda >> expensive, I suppose that would have to be optional if we were to have >> it at all. > +1. I'm not opposed to having such a mechanism, but it really ought to > impose exactly no overhead on the common case where we don't > particularly care about plan time. I thought the proposal was to add it to (1) pg_stat_statement and (2) EXPLAIN, both of which are not in the normal code execution path. pg_stat_statement is already a drag on a machine with slow gettimeofday, but it's not clear why users of it would think that two gettimeofday's per query are acceptable and four are not. regards, tom lane
On 8 March 2012 14:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I thought the proposal was to add it to (1) pg_stat_statement and (2) > EXPLAIN, both of which are not in the normal code execution path. > pg_stat_statement is already a drag on a machine with slow gettimeofday, > but it's not clear why users of it would think that two gettimeofday's > per query are acceptable and four are not. To be clear, I don't see any reasons to not just have it within EXPLAIN output under all circumstances. pg_stat_statements will slow down query execution, but I see no reason to force users to pay for something that they may well not want by not including an 'off' switch for this additional instrumentation, given that it doubles the number of gettimeofdays. I'm not particularly concerned about platforms with slow gettimeofdays. I'm concerned with keeping the overhead of running pg_stat_statements as low as possible generally. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services