Обсуждение: IDEA: pg_stat_statements tracking utility statements by tag?
Hoi hackers,
We've been using the pg_stat_statements extension to get an idea of the queries used in the database, but the table is being filled with entries like:
SAVEPOINT sa_savepoint_NNN;
RELEASE SAVEPOINT sa_savepoint_NNN;
DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...
FETCH FORWARD 250 FROM "c_7f9451b03908_5"
Since the unique id is different for each query, the aggregation does nothing and there are quite a lot of these drowning out the normal queries (yes, I'm aware this is an issue of itself). The only way to deal with this is "pg_stat_statements.track_utility=off". However, it occurs to me that if you just tracked the tags rather than the full query text you could at least track the number of such queries and how much time they take. So the above queries would be tracked under SAVEPOINT, RELEASE, DECLARE CURSOR and (I guess) FETCH respectively. But it would also catch DDL.
Does this sound like something for which a patch would be accepted?
Have a nice day,
--
Martijn van Oosterhout <kleptog@gmail.com> http://svana.org/kleptog/
On 2020/07/29 18:24, Martijn van Oosterhout wrote: > Hoi hackers, > > We've been using the pg_stat_statements extension to get an idea of the queries used in the database, but the table isbeing filled with entries like: > > SAVEPOINT sa_savepoint_NNN; > RELEASE SAVEPOINT sa_savepoint_NNN; > DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ... > FETCH FORWARD 250 FROM "c_7f9451b03908_5" > > Since the unique id is different for each query, the aggregation does nothing and there are quite a lot of these drowningout the normal queries (yes, I'm aware this is an issue of itself). The only way to deal with this is "pg_stat_statements.track_utility=off".However, it occurs to me that if you just tracked the tags rather than the full querytext you could at least track the number of such queries and how much time they take. So the above queries would betracked under SAVEPOINT, RELEASE, DECLARE CURSOR and (I guess) FETCH respectively. But it would also catch DDL. > > Does this sound like something for which a patch would be accepted? Or, we should extend the existing query normalization to handle also DDL? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/07/29 18:24, Martijn van Oosterhout wrote: > > Hoi hackers, > > > > We've been using the pg_stat_statements extension to get an idea of the queries used in the database, but the table isbeing filled with entries like: > > > > SAVEPOINT sa_savepoint_NNN; > > RELEASE SAVEPOINT sa_savepoint_NNN; > > DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ... > > FETCH FORWARD 250 FROM "c_7f9451b03908_5" > > > > Since the unique id is different for each query, the aggregation does nothing and there are quite a lot of these drowningout the normal queries (yes, I'm aware this is an issue of itself). The only way to deal with this is "pg_stat_statements.track_utility=off".However, it occurs to me that if you just tracked the tags rather than the full querytext you could at least track the number of such queries and how much time they take. So the above queries would betracked under SAVEPOINT, RELEASE, DECLARE CURSOR and (I guess) FETCH respectively. But it would also catch DDL. > > > > Does this sound like something for which a patch would be accepted? > > Or, we should extend the existing query normalization to handle also DDL? +1, introducing DDL normalization seems like a better way to go in the long run. Defining what should and shouldn't be normalized can be tricky though.
On Wed, 29 Jul 2020 at 15:40, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
> Or, we should extend the existing query normalization to handle also DDL?
+1, introducing DDL normalization seems like a better way to go in the
long run. Defining what should and shouldn't be normalized can be
tricky though.
In principle, the only thing that really needs to be normalised is SAVEPOINT/CURSOR names which are essentially random strings which have no effect on the result. Most other stuff is material to the query.
That said, I think "aggregate by tag" has value all by itself. Being able to collapse all CREATE TABLES into a single line can be useful in some situations.
Ideally the results of FETCH "cursor" should be combined with the DECLARE, but I really don't know how to go about that.
Have a nice day,
--
Martijn van Oosterhout <kleptog@gmail.com> http://svana.org/kleptog/
On Wed, Jul 29, 2020 at 5:29 PM Martijn van Oosterhout <kleptog@gmail.com> wrote: > > > On Wed, 29 Jul 2020 at 15:40, Julien Rouhaud <rjuju123@gmail.com> wrote: >> >> On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> > >> > >> > Or, we should extend the existing query normalization to handle also DDL? >> >> +1, introducing DDL normalization seems like a better way to go in the >> long run. Defining what should and shouldn't be normalized can be >> tricky though. > > > In principle, the only thing that really needs to be normalised is SAVEPOINT/CURSOR names which are essentially randomstrings which have no effect on the result. Most other stuff is material to the query. > > That said, I think "aggregate by tag" has value all by itself. Being able to collapse all CREATE TABLES into a single linecan be useful in some situations. There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK PREPARED that should be normalized too. I also don't think that we really want to have different entries for begin / Begin / BEGIN / bEgin and similar for many other commands, as the hash is computed based on the query text.
On Wed, Jul 29, 2020 at 06:35:41PM +0200, Julien Rouhaud wrote: > There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK > PREPARED that should be normalized too. I also don't think that we > really want to have different entries for begin / Begin / BEGIN / > bEgin and similar for many other commands, as the hash is computed > based on the query text. Hmm. Do we really want to those commands fully normalized all the time? There may be applications that care about the stats of some commands that are for example prefixed the same way and would prefer group those things together. By fully normalizing those commands all the time, we would lose this option. An example. The ODBC driver uses its own grammar for internal savepoint names, aka _EXEC_SVP_%p. If you mix that with a second application that has its own naming policy for savepoints it would not be possible anymore to make the difference in the stats between what one or the other do. -- Michael
Вложения
On Thu, Jul 30, 2020 at 3:54 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jul 29, 2020 at 06:35:41PM +0200, Julien Rouhaud wrote: > > There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK > > PREPARED that should be normalized too. I also don't think that we > > really want to have different entries for begin / Begin / BEGIN / > > bEgin and similar for many other commands, as the hash is computed > > based on the query text. > > Hmm. Do we really want to those commands fully normalized all the > time? There may be applications that care about the stats of some > commands that are for example prefixed the same way and would prefer > group those things together. By fully normalizing those commands all > the time, we would lose this option. > > An example. The ODBC driver uses its own grammar for internal > savepoint names, aka _EXEC_SVP_%p. If you mix that with a second > application that has its own naming policy for savepoints it would not > be possible anymore to make the difference in the stats between what > one or the other do. But if you have an OLTP application that uses ODBC, won't you already have 80+% of pgss entries being savepoint orders, which is really not helpful at all? We'd technically lose the ability to group such commands together, but in most cases the current behavior is quite harmful.