Обсуждение: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements
[PROPOSAL] extend the object names to the qualified names inpg_stat_statements
От
Sergei Agalakov
Дата:
Hi, It would help to analyze performance issues if pg_stat_statements would extend the object names to the qualified names. Currently if we have two schemas ( say s1 and s2) with the objects with the same name ( say tables t1) then after the next executions: set schema 's1'; select count(*) from t1; // returns 10 set schema 's2'; select count(*) from t1; // returns 1000000000 we see in select queryid, query from pg_stat_statements where query like '%from t1%' something like this 3004391594 select count(*) from t1 1336375111 select count(*) from t1 We do see that the queries are different but we can't see why they are so much different in the execution time. If the pg_stat_statements module would extend the object name to the qualified names like s1.t1 and s2.t2 in the new column query_qn then we would see the report as select queryid, query_qn from pg_stat_statements where query like '%from t1%' 3004391594 select count(*) from s1.t1 1336375111 select count(*) from s2.t1 with an immediate understanding of what's going on. This problem isn't only about table names. The SQL statement may refer to the views, functions, operands etc. from the 'wrong' schema. Obviously it would be even bigger help in the situations with the more complex queries where it will be mush more difficult to find that the query was executed with the incorrect search_path settings. This change doesn't brake any existing functionality and will be easily utilized in the monitoring scripts and tools. Thank you, Sergei Agalakov It was discussed in pgsql-general, and now it seems to be ready as a proposal to pgsql-hackers. https://www.postgresql.org/message-id/372d75cc-053b-1a07-948f-089408d3cd3a@gmail.com
Sergei Agalakov <sergei.agalakov@gmail.com> writes: > It would help to analyze performance issues if pg_stat_statements would > extend the object names to the qualified names. What pg_stat_statements puts out is the original query text. As was already pointed out to you, changing that text is likely to break use-cases in which people are trying to match entries to actual queries or log entries. This would also entail rather significant overhead to find out schema names and interpolate them into the text. regards, tom lane
Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements
От
Alvaro Herrera
Дата:
On 2018-Nov-28, Tom Lane wrote: > Sergei Agalakov <sergei.agalakov@gmail.com> writes: > > It would help to analyze performance issues if pg_stat_statements would > > extend the object names to the qualified names. > > What pg_stat_statements puts out is the original query text. As was > already pointed out to you, changing that text is likely to break > use-cases in which people are trying to match entries to actual > queries or log entries. It's not immediately obvious, but he is proposing a _new_ column query_qn that has qualified names, leaving the current query column unchanged. > This would also entail rather significant overhead to find out schema > names and interpolate them into the text. True. I was thinking that the qualified-names version of the query would be obtained via ruleutils or some similar mechanism to deparse from the parsed query tree (not from the original query text), where only pg_catalog is considered visible. This would be enabled using a GUC that defaults to off. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements
От
Tomas Vondra
Дата:
On 11/28/18 10:46 PM, Alvaro Herrera wrote: > On 2018-Nov-28, Tom Lane wrote: > >> Sergei Agalakov <sergei.agalakov@gmail.com> writes: >>> It would help to analyze performance issues if pg_stat_statements would >>> extend the object names to the qualified names. >> >> What pg_stat_statements puts out is the original query text. As was >> already pointed out to you, changing that text is likely to break >> use-cases in which people are trying to match entries to actual >> queries or log entries. > > It's not immediately obvious, but he is proposing a _new_ column > query_qn that has qualified names, leaving the current query column > unchanged. > >> This would also entail rather significant overhead to find out schema >> names and interpolate them into the text. > > True. I was thinking that the qualified-names version of the query > would be obtained via ruleutils or some similar mechanism to deparse > from the parsed query tree (not from the original query text), where > only pg_catalog is considered visible. This would be enabled using a > GUC that defaults to off. > Wouldn't it be sufficient / better to just store the search_path used when executing the query? That should be enough to resolve the object names correctly, and the overhead would be much lower (both in terms extra space and CPU overhead). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2018-Nov-28, Tom Lane wrote: >> This would also entail rather significant overhead to find out schema >> names and interpolate them into the text. > True. I was thinking that the qualified-names version of the query > would be obtained via ruleutils or some similar mechanism to deparse > from the parsed query tree (not from the original query text), where > only pg_catalog is considered visible. This would be enabled using a > GUC that defaults to off. Color me skeptical --- ruleutils has never especially been designed to be fast, and I can't see that the overhead of this is going to be acceptable to anybody who needs pg_stat_statements in production. (Some admittedly rough experiments suggest that we might be talking about an order-of-magnitude slowdown for simple queries.) regards, tom lane
Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements
От
Sergei Agalakov
Дата:
It's a real problem. I saw this pattern more than once already. The people have several schemas with identical data structures as a preparation to eventual migration of the schema to its own server in the cloud. So you have ten schemas, one generic user from connection pool and pg_stat_statements is useless to identify a performance problem for a schema. You see ten different records with the different queryid and identical query text, you see that one record indicates a performance issue for this query in one particular schema, and you can't say what schema it is. A report that groups queries by the text may hide this problem altogether because in average the statistics are OK. I hoped that it would be possible to reassemble the query text after parsing where the names were already uniquely resolved. Thank you, Sergei On 11/28/2018 2:59 PM, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> On 2018-Nov-28, Tom Lane wrote: >>> This would also entail rather significant overhead to find out schema >>> names and interpolate them into the text. >> True. I was thinking that the qualified-names version of the query >> would be obtained via ruleutils or some similar mechanism to deparse >> from the parsed query tree (not from the original query text), where >> only pg_catalog is considered visible. This would be enabled using a >> GUC that defaults to off. > Color me skeptical --- ruleutils has never especially been designed > to be fast, and I can't see that the overhead of this is going to be > acceptable to anybody who needs pg_stat_statements in production. > (Some admittedly rough experiments suggest that we might be > talking about an order-of-magnitude slowdown for simple queries.) > > regards, tom lane
Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements
От
legrand legrand
Дата:
Alvaro Herrera-9 wrote > On 2018-Nov-28, Tom Lane wrote: > >> This would also entail rather significant overhead to find out schema >> names and interpolate them into the text. > > True. I was thinking that the qualified-names version of the query > would be obtained via ruleutils or some similar mechanism to deparse > from the parsed query tree (not from the original query text), where > only pg_catalog is considered visible. This would be enabled using a > GUC that defaults to off. maybe that explain on "table with many columns" suffers from the same problem, see: https://www.postgresql.org/message-id/1537816948278-0.post%40n3.nabble.com https://www.postgresql.org/message-id/20180924195048.7dhnjvq5ggwjtiwi@alap3.anarazel.de regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements
От
Christoph Berg
Дата:
Re: Tomas Vondra 2018-11-28 <1b4e4c5e-7007-cd61-aae5-4a1c248e385c@2ndquadrant.com> > Wouldn't it be sufficient / better to just store the search_path used > when executing the query? That should be enough to resolve the object > names correctly, and the overhead would be much lower (both in terms > extra space and CPU overhead). That would also work better for sub-queries from functions, if you wanted to re-run the query in question. It's closer to reality. Christoph
Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements
От
Alvaro Herrera
Дата:
On 2018-Nov-28, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > On 2018-Nov-28, Tom Lane wrote: > >> This would also entail rather significant overhead to find out schema > >> names and interpolate them into the text. > > > True. I was thinking that the qualified-names version of the query > > would be obtained via ruleutils or some similar mechanism to deparse > > from the parsed query tree (not from the original query text), where > > only pg_catalog is considered visible. This would be enabled using a > > GUC that defaults to off. > > Color me skeptical --- ruleutils has never especially been designed > to be fast, and I can't see that the overhead of this is going to be > acceptable to anybody who needs pg_stat_statements in production. > (Some admittedly rough experiments suggest that we might be > talking about an order-of-magnitude slowdown for simple queries.) Good point. Maybe we can save the OID array of schemas that are in search_path when the query is first entered into the statement pool, and produce the query_qn column only at the time the entry is interpreted (that is, when pg_stat_statements is query). ... oh, but that requires saving the plan tree too, which doesn't sound very convenient. Maybe just storing the search_path schemas (as Tomas already suggested) is sufficient for Sergei's use case? Do away with query_qn as such, and just have the user interpret the names according to the stored search_path. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements
От
Stephen Frost
Дата:
Greetings, * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > On 2018-Nov-28, Tom Lane wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > > On 2018-Nov-28, Tom Lane wrote: > > >> This would also entail rather significant overhead to find out schema > > >> names and interpolate them into the text. > > > > > True. I was thinking that the qualified-names version of the query > > > would be obtained via ruleutils or some similar mechanism to deparse > > > from the parsed query tree (not from the original query text), where > > > only pg_catalog is considered visible. This would be enabled using a > > > GUC that defaults to off. > > > > Color me skeptical --- ruleutils has never especially been designed > > to be fast, and I can't see that the overhead of this is going to be > > acceptable to anybody who needs pg_stat_statements in production. > > (Some admittedly rough experiments suggest that we might be > > talking about an order-of-magnitude slowdown for simple queries.) > > Good point. > > Maybe we can save the OID array of schemas that are in search_path when > the query is first entered into the statement pool, and produce the > query_qn column only at the time the entry is interpreted (that is, when > pg_stat_statements is query). ... oh, but that requires saving the plan > tree too, which doesn't sound very convenient. > > Maybe just storing the search_path schemas (as Tomas already suggested) > is sufficient for Sergei's use case? Do away with query_qn as such, and > just have the user interpret the names according to the stored > search_path. Seems like what you'd really want is to store all the environment, not just the search_path (consider the $user case...). Maybe saving just the OIDs of the search_path and then using them later would also work but it seems like we're just building up to tracking everything and doing it piecemeal with an extra column added in this release, another in the next release, etc.. Thanks! Stephen
Вложения
Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements
От
Sergei Agalakov
Дата:
On 11/29/2018 10:47 AM, Alvaro Herrera wrote: > On 2018-Nov-28, Tom Lane wrote: > >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> On 2018-Nov-28, Tom Lane wrote: >>>> This would also entail rather significant overhead to find out schema >>>> names and interpolate them into the text. >>> True. I was thinking that the qualified-names version of the query >>> would be obtained via ruleutils or some similar mechanism to deparse >>> from the parsed query tree (not from the original query text), where >>> only pg_catalog is considered visible. This would be enabled using a >>> GUC that defaults to off. >> Color me skeptical --- ruleutils has never especially been designed >> to be fast, and I can't see that the overhead of this is going to be >> acceptable to anybody who needs pg_stat_statements in production. >> (Some admittedly rough experiments suggest that we might be >> talking about an order-of-magnitude slowdown for simple queries.) > Good point. > > Maybe we can save the OID array of schemas that are in search_path when > the query is first entered into the statement pool, and produce the > query_qn column only at the time the entry is interpreted (that is, when > pg_stat_statements is query). ... oh, but that requires saving the plan > tree too, which doesn't sound very convenient. > > Maybe just storing the search_path schemas (as Tomas already suggested) > is sufficient for Sergei's use case? Do away with query_qn as such, and > just have the user interpret the names according to the stored > search_path. > I thought about just saving the search_path. It has all the necessary information for a DBA or a developer, but creates problems for reporting tools. If we have the new query_qn column then we can group statistics by query_qn and display it on the charts and graphs. If instead we use a combination of query and search_path then a reporting tools has to figure out the way to show both values to distinguish between different versions of query. If it is easier/faster to add search_path then let's add search_path instead of query_qn. It is already documented that query text isn't unique by itself, and reporting tools have to use query+queryid for uniqueness, and search_path will provide the currently unavailable information to the DBAs/developers. Good.
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2018-Nov-28, Tom Lane wrote: >> Color me skeptical --- ruleutils has never especially been designed >> to be fast, and I can't see that the overhead of this is going to be >> acceptable to anybody who needs pg_stat_statements in production. > Good point. > Maybe we can save the OID array of schemas that are in search_path when > the query is first entered into the statement pool, and produce the > query_qn column only at the time the entry is interpreted (that is, when > pg_stat_statements is query). ... oh, but that requires saving the plan > tree too, which doesn't sound very convenient. Yeah, and any subsequent DDL on relevant tables would break it. > Maybe just storing the search_path schemas (as Tomas already suggested) > is sufficient for Sergei's use case? Do away with query_qn as such, and > just have the user interpret the names according to the stored > search_path. This'd be OK by me, though I'm not sure that it represents a convenient solution for the original problem. regards, tom lane
Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements
От
Sergei Agalakov
Дата:
On 11/29/2018 10:59 AM, Stephen Frost wrote: > Greetings, > > * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: >> On 2018-Nov-28, Tom Lane wrote: >>> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>>> On 2018-Nov-28, Tom Lane wrote: >>>>> This would also entail rather significant overhead to find out schema >>>>> names and interpolate them into the text. >>>> True. I was thinking that the qualified-names version of the query >>>> would be obtained via ruleutils or some similar mechanism to deparse >>>> from the parsed query tree (not from the original query text), where >>>> only pg_catalog is considered visible. This would be enabled using a >>>> GUC that defaults to off. >>> Color me skeptical --- ruleutils has never especially been designed >>> to be fast, and I can't see that the overhead of this is going to be >>> acceptable to anybody who needs pg_stat_statements in production. >>> (Some admittedly rough experiments suggest that we might be >>> talking about an order-of-magnitude slowdown for simple queries.) >> Good point. >> >> Maybe we can save the OID array of schemas that are in search_path when >> the query is first entered into the statement pool, and produce the >> query_qn column only at the time the entry is interpreted (that is, when >> pg_stat_statements is query). ... oh, but that requires saving the plan >> tree too, which doesn't sound very convenient. >> >> Maybe just storing the search_path schemas (as Tomas already suggested) >> is sufficient for Sergei's use case? Do away with query_qn as such, and >> just have the user interpret the names according to the stored >> search_path. > Seems like what you'd really want is to store all the environment, not > just the search_path (consider the $user case...). Maybe saving just > the OIDs of the search_path and then using them later would also work > but it seems like we're just building up to tracking everything and > doing it piecemeal with an extra column added in this release, another > in the next release, etc.. > > Thanks! > > Stephen It's a valid concern. Instead of the adding just search_path column we can add a column session_info jsonb. Its content can be defined by the new configuration parameter pg_stat_statements.session_info ('current_schemas, current_user, session_user') // a subset of the data from the system information functions and it will have data like { "current_schemas" : ["pg_catalog", "s1", "s2", "public"], "current_user" : "user1", "session_user" : "user1" } It will allow the DBA/developer to reproduce a performance issue, and will allow the deeper level of granularity for the reporting tools. It's more complex than I have anticipated but doesn't break backward compatibility and extensible. Thank you, Sergei
Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements
От
Sergei Agalakov
Дата:
On 11/29/2018 12:46 PM, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> On 2018-Nov-28, Tom Lane wrote: >>> Color me skeptical --- ruleutils has never especially been designed >>> to be fast, and I can't see that the overhead of this is going to be >>> acceptable to anybody who needs pg_stat_statements in production. >> Good point. >> Maybe we can save the OID array of schemas that are in search_path when >> the query is first entered into the statement pool, and produce the >> query_qn column only at the time the entry is interpreted (that is, when >> pg_stat_statements is query). ... oh, but that requires saving the plan >> tree too, which doesn't sound very convenient. > Yeah, and any subsequent DDL on relevant tables would break it. > >> Maybe just storing the search_path schemas (as Tomas already suggested) >> is sufficient for Sergei's use case? Do away with query_qn as such, and >> just have the user interpret the names according to the stored >> search_path. > This'd be OK by me, though I'm not sure that it represents a convenient > solution for the original problem. > > regards, tom lane It's less convenient, but it presents a solution. Either that or even better a new session_info column that may capture together with search_path also current_user, session_user etc. that would allow to distinct between the different versions of the query based on the real context of execution. Thank you, Sergei