Normalization of utility queries in pg_stat_statements
От | Michael Paquier |
---|---|
Тема | Normalization of utility queries in pg_stat_statements |
Дата | |
Msg-id | Y+MRdEq9W9XVa2AB@paquier.xyz обсуждение исходный текст |
Ответы |
Re: Normalization of utility queries in pg_stat_statements
(Michael Paquier <michael@paquier.xyz>)
|
Список | pgsql-hackers |
Hi all, (Adding Bertrand in CC.) $subject is a follow-up of the automation of query jumbling for utilities and DDLs, and attached is a set of patches that apply normalization to DDL queries across the board, for all utilities. This relies on tracking the location of A_Const nodes while removing from the query jumbling computation the values attached to the node, as as utility queries can show be stored as normalized in pg_stat_statements with some $N parameters. The main case behind doing that is of course monitoring, where we have seen some user instances willing to get more information but see pg_stat_statements as a bottleneck because the query ID of utility queries are based on the computation of their string, and is value-sensitive. That's the case mentioned by Bertrand Drouvot for CALL and SET where workloads full of these easily bloat pg_stat_statements, where we concluded about more automation in this area (so here it is): https://www.postgresql.org/message-id/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com For example, this makes possible the following grouping: - CALL func(1,2); CALL func(1,3); => CALL func($1,$2) - EXPLAIN SELECT 1; EXPLAIN SELECT 1; => EXPLAIN SELECT $1; - CREATE MATERIALIZED VIEW aam AS SELECT 1; becomes "CREATE MATERIALIZED VIEW aam AS SELECT $1". Query jumbling for DDLs and utilities happens now automatically, still are not represented correctly in pg_stat_statements (one bit of documentation I missed previously refers to the fact that these depend on their query strings, which is not the case yet). By the way, while looking at all that, I have really underestimated the use of Const nodes in utilities, as some queries can finish with the same query ID even if different values are stored in a query, still don't show up as normalized in pg_stat_statements, so the current state of HEAD is not good, though you would need to use the same object name to a conflict for most of them. So that's my mistake here with 3db72eb. If folks think that we'd better have a revert of this automated query jumbling facility based on this argument, that would be fine for me, as well. The main case I have noticed in this area is EXPLAIN, by the way. Note that it is actually easy to move to the ~15 approach of having a query ID depending on the Const node values for DDLs, by having a custom implementation in queryjumblefuncs.c for Const nodes, where we apply the constant value and don't store a location for normalization if a query has a utility once this information is stored in a JumbleState. This rule influences various DDLs, as well, once it gets applied across the board, and it's been some work to identify all of them, but I think that I have caught them all as the regression database offers all the possible patterns: - CREATE VIEW, CTAS, CREATE MATERIALIZED VIEW which have Const nodes depending on their attached queries, for various clauses. - ALTER TABLE/INDEX/FOREIGN with DEFAULT, SET components. - CREATE TABLE with partition bounds. - BEGIN and ABORT, with transaction commands getting grouped together. The attached patch set includes as a set of regression tests for pg_stat_statements for *all* the utility queries that have either Const or A_Const nodes, so as one can see the effect that all this stuff has. This is based on a diff of the contents of pg_stat_statements on the regression database once all these normalization rules are applied. Compilation of a Const can also be made depending on the type node. However, all that makes no sense if applying the same normalization rules to all the queries across the board, because all the queries would follow the same rules. That's the critical bit IMO. From what I get, the bloat of pg_stat_statements for all utilities is something that would be helpful for all such queries, still different things could be done on a per-node basis. Perhaps this is too aggressive as it is and people don't like it, though, so feedback is welcome. I'd like to think that maximizing grouping is nice though, because it leads to no actual loss of information on the workload pattern for the queries involved, AFAIU. This sentence may be overoptimistic. So, attached is a patch set, that does the following: - 0001 is a refactoring of the regression tests of pg_stat_statements by splitting a bit the tests. I bumped into that while getting confused at how the tests are now when it comes to the handling of utilities and track_planning, where these tests silently rely on other parts of the same file with different GUC settings. This refactoring is useful on its own, IMO, and the tests show the same output as previously. - 0002 is the addition of tests in pg_stat_statements for all the DDL and utility patterns that make use of Const and A_Const nodes. Even if query jumbling of utilities is done through their text string or their nodes, this is also useful. - 0003 is the code of the feature, that switches pg_stat_statements to properly normalize utility queries, with a modification to A_Const so as normalization can be applied to it. With the generation of the code for query jumbling being automated based on the node definitions, this is straight-forward as a code change, but the changes are basically impossible to track without all the patterns tracked by 0002. Thoughts and comments are welcome. 0001 and 0002 are useful on their own to keep track of utilities that use Const and A_Const after going through the query jumbling, even if an approach based on query string or the automated query jumbling for utilities is used (the query string approach a bit its value). I'll add that to the next commit fest. Thanks, -- Michael
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "wangw.fnst@fujitsu.com"Дата:
Сообщение: RE: Perform streaming logical transactions by background workers and parallel apply
Следующее
От: Thomas MunroДата:
Сообщение: Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED