Обсуждение: performance statistics monitoring without spamming logs
I'm looking for a way of gathering performance stats in a more usable way than turning on `log_statement_stats` (or other related modules). The problem I have with the log_*_stats family of modules is that they log every single query, which makes them unusable in production. Aside from consuming space, there's also the problem that the log system wouldn't be able to keep up with the rate.
There are a couple ideas that pop into mind that would make these stats more usable:
1. Only log when the statement would otherwise already be logged. Such as due to the `log_statement` or `log_min_duration_statement` settings.
2. Make stats available in `pg_stat_statements` (or alternate view that could be joined on). The block stats are already available here, but others like CPU usage, page faults, and context switches are not.
To answer why I want this data: I want to be able to identify queries which are consuming large amounts of CPU time so that I can either optimize the query or optimize the application making the query, and free up CPU resources on the database. The `pg_stat_statements` view provides the `total_time` metric, but many things can contribute to query time other than CPU usage, and CPU usage is my primary concern at the moment.
Do these seem like reasonable requests? And if so, what's the procedure for getting them implemented?
Any thoughts on whether they would be hard to implement? I'm unfamiliar with the PostgresQL code base, but might be willing to attempt an implementation if it wouldn't be terribly difficult.
-Patrick
There are a couple ideas that pop into mind that would make these stats more usable:
1. Only log when the statement would otherwise already be logged. Such as due to the `log_statement` or `log_min_duration_statement` settings.
2. Make stats available in `pg_stat_statements` (or alternate view that could be joined on). The block stats are already available here, but others like CPU usage, page faults, and context switches are not.
To answer why I want this data: I want to be able to identify queries which are consuming large amounts of CPU time so that I can either optimize the query or optimize the application making the query, and free up CPU resources on the database. The `pg_stat_statements` view provides the `total_time` metric, but many things can contribute to query time other than CPU usage, and CPU usage is my primary concern at the moment.
Do these seem like reasonable requests? And if so, what's the procedure for getting them implemented?
Any thoughts on whether they would be hard to implement? I'm unfamiliar with the PostgresQL code base, but might be willing to attempt an implementation if it wouldn't be terribly difficult.
-Patrick
On Tue, Jul 10, 2018 at 01:54:12PM -0400, Patrick Hemmer wrote: > I'm looking for a way of gathering performance stats in a more usable > way than turning on `log_statement_stats` (or other related modules). > The problem I have with the log_*_stats family of modules is that they > log every single query, which makes them unusable in production. Aside > from consuming space, there's also the problem that the log system > wouldn't be able to keep up with the rate. > > There are a couple ideas that pop into mind that would make these stats > more usable: > 1. Only log when the statement would otherwise already be logged. Such > as due to the `log_statement` or `log_min_duration_statement` settings. Did you see: (Added Adrien to Cc); https://commitfest.postgresql.org/18/1691/ I don't think the existing patch does what you want, but perhaps all that's needed is this: if (save_log_statement_stats) + if (log_sample_rate==1 || was_logged) ShowUsage("EXECUTE MESSAGE STATISTICS"); In any case, I'm thinking that your request could/should be considered by whatever future patch implements sampling (if not implemented/included in the patch itself). If that doesn't do what's needed, that patch might still be a good crash course in how to start implementing what you need (perhaps on top of that patch). > 2. Make stats available in `pg_stat_statements` (or alternate view that > could be joined on). The block stats are already available here, but > others like CPU usage, page faults, and context switches are not. pg_stat_statements is ./contrib/pg_stat_statements/pg_stat_statements.c which is 3k LOC. getrusage stuff and log_*_stat stuff is in src/backend/tcop/postgres.c Justin
On Tue, Jul 10, 2018 at 11:38 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:
> 2. Make stats available in `pg_stat_statements` (or alternate view that
> could be joined on). The block stats are already available here, but
> others like CPU usage, page faults, and context switches are not.
pg_stat_statements is ./contrib/pg_stat_statements/pg_stat_statements.c which is 3k LOC.
getrusage stuff and log_*_stat stuff is in src/backend/tcop/postgres.c
Before you start implementing something here, take a look at pg_stat_kcache [0]
Which already aims to collect a few more system statistics than what pg_stat_statements provides today, and might be a good basis to extend from.
It might also be worth to look at pg_stat_activity wait event sampling to determine where a system spends time, see e.g. pg_wait_sampling [1] for one approach to this.
Best,
Lukas
Lukas Fittl
On 07/13/2018 12:25 AM, Lukas Fittl wrote: > On Tue, Jul 10, 2018 at 11:38 AM, Justin Pryzby <pryzby@telsasoft.com > <mailto:pryzby@telsasoft.com>> wrote: > > > 2. Make stats available in `pg_stat_statements` (or alternate view that > > could be joined on). The block stats are already available here, but > > others like CPU usage, page faults, and context switches are not. > > pg_stat_statements is > ./contrib/pg_stat_statements/pg_stat_statements.c which is 3k LOC. > > getrusage stuff and log_*_stat stuff is in src/backend/tcop/postgres.c > > > Before you start implementing something here, take a look at > pg_stat_kcache [0] > > Which already aims to collect a few more system statistics than what > pg_stat_statements provides today, and might be a good basis to extend from. > > It might also be worth to look at pg_stat_activity wait event sampling > to determine where a system spends time, see e.g. pg_wait_sampling > [1] for one approach to this. > Hi, You should look Powa stack : https://github.com/powa-team/powa Powa can aggregate metrics from different extensions such as pg_stat_statements, pg_stat_kcache and pg_wait_sampling recently : https://rjuju.github.io/postgresql/2018/07/09/wait-events-support-for-powa.html Regards, > [0]: https://github.com/powa-team/pg_stat_kcache > <https://github.com/powa-team/pg_stat_kcache> > [1]: https://github.com/postgrespro/pg_wait_sampling > <https://github.com/postgrespro/pg_wait_sampling> > > Best, > Lukas > > -- > Lukas Fittl
On 07/10/2018 08:38 PM, Justin Pryzby wrote: > On Tue, Jul 10, 2018 at 01:54:12PM -0400, Patrick Hemmer wrote: >> I'm looking for a way of gathering performance stats in a more usable >> way than turning on `log_statement_stats` (or other related modules). >> The problem I have with the log_*_stats family of modules is that they >> log every single query, which makes them unusable in production. Aside >> from consuming space, there's also the problem that the log system >> wouldn't be able to keep up with the rate. >> >> There are a couple ideas that pop into mind that would make these stats >> more usable: >> 1. Only log when the statement would otherwise already be logged. Such >> as due to the `log_statement` or `log_min_duration_statement` settings. > > Did you see: (Added Adrien to Cc); > https://commitfest.postgresql.org/18/1691/ > > I don't think the existing patch does what you want, but perhaps all that's > needed is this: > > if (save_log_statement_stats) > + if (log_sample_rate==1 || was_logged) > ShowUsage("EXECUTE MESSAGE STATISTICS"); > > In any case, I'm thinking that your request could/should be considered by > whatever future patch implements sampling (if not implemented/included in the > patch itself). Hi, Thanks for Cc, it seems a good idea. Will think about it ;) > > If that doesn't do what's needed, that patch might still be a good crash course > in how to start implementing what you need (perhaps on top of that patch). > >> 2. Make stats available in `pg_stat_statements` (or alternate view that >> could be joined on). The block stats are already available here, but >> others like CPU usage, page faults, and context switches are not. > > pg_stat_statements is ./contrib/pg_stat_statements/pg_stat_statements.c which is 3k LOC. > > getrusage stuff and log_*_stat stuff is in src/backend/tcop/postgres.c > > Justin >
On Fri, Jul 13, 2018 at 9:23 AM, Adrien NAYRAT <adrien.nayrat@anayrat.info> wrote: > On 07/13/2018 12:25 AM, Lukas Fittl wrote: >> >> On Tue, Jul 10, 2018 at 11:38 AM, Justin Pryzby <pryzby@telsasoft.com >> <mailto:pryzby@telsasoft.com>> wrote: >> >> > 2. Make stats available in `pg_stat_statements` (or alternate view >> that >> > could be joined on). The block stats are already available here, but >> > others like CPU usage, page faults, and context switches are not. >> >> pg_stat_statements is >> ./contrib/pg_stat_statements/pg_stat_statements.c which is 3k LOC. >> >> getrusage stuff and log_*_stat stuff is in src/backend/tcop/postgres.c >> >> >> Before you start implementing something here, take a look at >> pg_stat_kcache [0] >> >> Which already aims to collect a few more system statistics than what >> pg_stat_statements provides today, and might be a good basis to extend from. >> Also no one asked for it before, but we can definitely add all the other fields returned by get_rusage(2) in pg_stat_kcache. You can also look at https://github.com/markwkm/pg_proctab.