Обсуждение: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements

Поиск
Список
Период
Сортировка

[PROPOSAL] extend the object names to the qualified names inpg_stat_statements

От
Sergei Agalakov
Дата:
Hi,

It would help to analyze performance issues if pg_stat_statements would 
extend the object names to the qualified names.
Currently if we have two schemas ( say s1 and s2) with the objects with 
the same name ( say tables t1) then after the next executions:

set schema 's1';
select count(*) from t1; // returns 10
set schema 's2';
select count(*) from t1; // returns 1000000000

we see in
select queryid, query from pg_stat_statements where query like '%from t1%'
   something like this
3004391594 select count(*) from t1
1336375111 select count(*) from t1

We do see that the queries are different but we can't see why they are 
so much different in the execution time.
If the pg_stat_statements module would extend the object name to the 
qualified names like s1.t1 and s2.t2 in the new column query_qn then we 
would see the report as
select queryid, query_qn from pg_stat_statements where query like '%from 
t1%'
3004391594 select count(*) from s1.t1
1336375111 select count(*) from s2.t1
with an immediate understanding of what's going on.

This problem isn't only about table names. The SQL statement may refer 
to the views, functions, operands etc. from the 'wrong' schema.
Obviously it would be even bigger help in the situations with the more 
complex queries where it will be mush more difficult to find
that the query was executed with the incorrect search_path settings.

This change doesn't brake any existing functionality and will be easily 
utilized in the monitoring scripts and tools.

Thank you,

Sergei Agalakov

It was discussed in pgsql-general, and now it seems to be ready as a 
proposal to pgsql-hackers.
https://www.postgresql.org/message-id/372d75cc-053b-1a07-948f-089408d3cd3a@gmail.com



Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

От
Tom Lane
Дата:
Sergei Agalakov <sergei.agalakov@gmail.com> writes:
> It would help to analyze performance issues if pg_stat_statements would 
> extend the object names to the qualified names.

What pg_stat_statements puts out is the original query text.  As was
already pointed out to you, changing that text is likely to break
use-cases in which people are trying to match entries to actual
queries or log entries.  This would also entail rather significant
overhead to find out schema names and interpolate them into the text.

            regards, tom lane


Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements

От
Alvaro Herrera
Дата:
On 2018-Nov-28, Tom Lane wrote:

> Sergei Agalakov <sergei.agalakov@gmail.com> writes:
> > It would help to analyze performance issues if pg_stat_statements would 
> > extend the object names to the qualified names.
> 
> What pg_stat_statements puts out is the original query text.  As was
> already pointed out to you, changing that text is likely to break
> use-cases in which people are trying to match entries to actual
> queries or log entries.

It's not immediately obvious, but he is proposing a _new_ column
query_qn that has qualified names, leaving the current query column
unchanged.

> This would also entail rather significant overhead to find out schema
> names and interpolate them into the text.

True.  I was thinking that the qualified-names version of the query
would be obtained via ruleutils or some similar mechanism to deparse
from the parsed query tree (not from the original query text), where
only pg_catalog is considered visible.  This would be enabled using a
GUC that defaults to off.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements

От
Tomas Vondra
Дата:

On 11/28/18 10:46 PM, Alvaro Herrera wrote:
> On 2018-Nov-28, Tom Lane wrote:
> 
>> Sergei Agalakov <sergei.agalakov@gmail.com> writes:
>>> It would help to analyze performance issues if pg_stat_statements would 
>>> extend the object names to the qualified names.
>>
>> What pg_stat_statements puts out is the original query text.  As was
>> already pointed out to you, changing that text is likely to break
>> use-cases in which people are trying to match entries to actual
>> queries or log entries.
> 
> It's not immediately obvious, but he is proposing a _new_ column
> query_qn that has qualified names, leaving the current query column
> unchanged.
> 
>> This would also entail rather significant overhead to find out schema
>> names and interpolate them into the text.
> 
> True.  I was thinking that the qualified-names version of the query
> would be obtained via ruleutils or some similar mechanism to deparse
> from the parsed query tree (not from the original query text), where
> only pg_catalog is considered visible.  This would be enabled using a
> GUC that defaults to off.
> 

Wouldn't it be sufficient / better to just store the search_path used
when executing the query? That should be enough to resolve the object
names correctly, and the overhead would be much lower (both in terms
extra space and CPU overhead).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2018-Nov-28, Tom Lane wrote:
>> This would also entail rather significant overhead to find out schema
>> names and interpolate them into the text.

> True.  I was thinking that the qualified-names version of the query
> would be obtained via ruleutils or some similar mechanism to deparse
> from the parsed query tree (not from the original query text), where
> only pg_catalog is considered visible.  This would be enabled using a
> GUC that defaults to off.

Color me skeptical --- ruleutils has never especially been designed
to be fast, and I can't see that the overhead of this is going to be
acceptable to anybody who needs pg_stat_statements in production.
(Some admittedly rough experiments suggest that we might be
talking about an order-of-magnitude slowdown for simple queries.)

            regards, tom lane


Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements

От
Sergei Agalakov
Дата:
It's a real problem. I saw this pattern more than once already.
The people have several schemas with identical data structures as a 
preparation to eventual migration of the schema to its own server in the 
cloud.
So you have ten schemas, one generic user from connection pool and 
pg_stat_statements is useless to identify a performance problem for a 
schema.
You see ten different records with the different queryid and identical 
query text, you see that one record indicates a performance issue
for this query in one particular schema, and you can't say what schema 
it is. A report that groups queries by the text may hide this problem 
altogether
because in average the statistics are OK.
I hoped that it would be possible to reassemble the query text after 
parsing where the names were already uniquely resolved.

Thank you,

Sergei

On 11/28/2018 2:59 PM, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> On 2018-Nov-28, Tom Lane wrote:
>>> This would also entail rather significant overhead to find out schema
>>> names and interpolate them into the text.
>> True.  I was thinking that the qualified-names version of the query
>> would be obtained via ruleutils or some similar mechanism to deparse
>> from the parsed query tree (not from the original query text), where
>> only pg_catalog is considered visible.  This would be enabled using a
>> GUC that defaults to off.
> Color me skeptical --- ruleutils has never especially been designed
> to be fast, and I can't see that the overhead of this is going to be
> acceptable to anybody who needs pg_stat_statements in production.
> (Some admittedly rough experiments suggest that we might be
> talking about an order-of-magnitude slowdown for simple queries.)
>
>             regards, tom lane



Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements

От
legrand legrand
Дата:
Alvaro Herrera-9 wrote
> On 2018-Nov-28, Tom Lane wrote:
> 
>> This would also entail rather significant overhead to find out schema
>> names and interpolate them into the text.
> 
> True.  I was thinking that the qualified-names version of the query
> would be obtained via ruleutils or some similar mechanism to deparse
> from the parsed query tree (not from the original query text), where
> only pg_catalog is considered visible.  This would be enabled using a
> GUC that defaults to off.

maybe that explain on "table with many columns" suffers from the same 
problem, see:

https://www.postgresql.org/message-id/1537816948278-0.post%40n3.nabble.com

https://www.postgresql.org/message-id/20180924195048.7dhnjvq5ggwjtiwi@alap3.anarazel.de

regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements

От
Christoph Berg
Дата:
Re: Tomas Vondra 2018-11-28 <1b4e4c5e-7007-cd61-aae5-4a1c248e385c@2ndquadrant.com>
> Wouldn't it be sufficient / better to just store the search_path used
> when executing the query? That should be enough to resolve the object
> names correctly, and the overhead would be much lower (both in terms
> extra space and CPU overhead).

That would also work better for sub-queries from functions, if you
wanted to re-run the query in question. It's closer to reality.

Christoph


Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements

От
Alvaro Herrera
Дата:
On 2018-Nov-28, Tom Lane wrote:

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > On 2018-Nov-28, Tom Lane wrote:
> >> This would also entail rather significant overhead to find out schema
> >> names and interpolate them into the text.
> 
> > True.  I was thinking that the qualified-names version of the query
> > would be obtained via ruleutils or some similar mechanism to deparse
> > from the parsed query tree (not from the original query text), where
> > only pg_catalog is considered visible.  This would be enabled using a
> > GUC that defaults to off.
> 
> Color me skeptical --- ruleutils has never especially been designed
> to be fast, and I can't see that the overhead of this is going to be
> acceptable to anybody who needs pg_stat_statements in production.
> (Some admittedly rough experiments suggest that we might be
> talking about an order-of-magnitude slowdown for simple queries.)

Good point.

Maybe we can save the OID array of schemas that are in search_path when
the query is first entered into the statement pool, and produce the
query_qn column only at the time the entry is interpreted (that is, when
pg_stat_statements is query).  ... oh, but that requires saving the plan
tree too, which doesn't sound very convenient.

Maybe just storing the search_path schemas (as Tomas already suggested)
is sufficient for Sergei's use case?  Do away with query_qn as such, and
just have the user interpret the names according to the stored
search_path.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements

От
Stephen Frost
Дата:
Greetings,

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> On 2018-Nov-28, Tom Lane wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > > On 2018-Nov-28, Tom Lane wrote:
> > >> This would also entail rather significant overhead to find out schema
> > >> names and interpolate them into the text.
> >
> > > True.  I was thinking that the qualified-names version of the query
> > > would be obtained via ruleutils or some similar mechanism to deparse
> > > from the parsed query tree (not from the original query text), where
> > > only pg_catalog is considered visible.  This would be enabled using a
> > > GUC that defaults to off.
> >
> > Color me skeptical --- ruleutils has never especially been designed
> > to be fast, and I can't see that the overhead of this is going to be
> > acceptable to anybody who needs pg_stat_statements in production.
> > (Some admittedly rough experiments suggest that we might be
> > talking about an order-of-magnitude slowdown for simple queries.)
>
> Good point.
>
> Maybe we can save the OID array of schemas that are in search_path when
> the query is first entered into the statement pool, and produce the
> query_qn column only at the time the entry is interpreted (that is, when
> pg_stat_statements is query).  ... oh, but that requires saving the plan
> tree too, which doesn't sound very convenient.
>
> Maybe just storing the search_path schemas (as Tomas already suggested)
> is sufficient for Sergei's use case?  Do away with query_qn as such, and
> just have the user interpret the names according to the stored
> search_path.

Seems like what you'd really want is to store all the environment, not
just the search_path (consider the $user case...).  Maybe saving just
the OIDs of the search_path and then using them later would also work
but it seems like we're just building up to tracking everything and
doing it piecemeal with an extra column added in this release, another
in the next release, etc..

Thanks!

Stephen

Вложения

Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements

От
Sergei Agalakov
Дата:
On 11/29/2018 10:47 AM, Alvaro Herrera wrote:
> On 2018-Nov-28, Tom Lane wrote:
>
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> On 2018-Nov-28, Tom Lane wrote:
>>>> This would also entail rather significant overhead to find out schema
>>>> names and interpolate them into the text.
>>> True.  I was thinking that the qualified-names version of the query
>>> would be obtained via ruleutils or some similar mechanism to deparse
>>> from the parsed query tree (not from the original query text), where
>>> only pg_catalog is considered visible.  This would be enabled using a
>>> GUC that defaults to off.
>> Color me skeptical --- ruleutils has never especially been designed
>> to be fast, and I can't see that the overhead of this is going to be
>> acceptable to anybody who needs pg_stat_statements in production.
>> (Some admittedly rough experiments suggest that we might be
>> talking about an order-of-magnitude slowdown for simple queries.)
> Good point.
>
> Maybe we can save the OID array of schemas that are in search_path when
> the query is first entered into the statement pool, and produce the
> query_qn column only at the time the entry is interpreted (that is, when
> pg_stat_statements is query).  ... oh, but that requires saving the plan
> tree too, which doesn't sound very convenient.
>
> Maybe just storing the search_path schemas (as Tomas already suggested)
> is sufficient for Sergei's use case?  Do away with query_qn as such, and
> just have the user interpret the names according to the stored
> search_path.
>
I thought about just saving the search_path. It has all the necessary 
information for a DBA or a developer, but creates problems for reporting 
tools.
If we have the new query_qn column then we can group statistics by 
query_qn and display it on the charts and graphs.
If instead we use a combination of query and search_path then a 
reporting tools has to figure out the way to show both values to 
distinguish between
different versions of query.

If it is easier/faster to add search_path then let's add search_path 
instead of query_qn. It is already documented that query text isn't 
unique by itself,
and reporting tools have to use query+queryid for uniqueness, and 
search_path will provide the currently unavailable information to the 
DBAs/developers.
Good.



Re: [PROPOSAL] extend the object names to the qualified names in pg_stat_statements

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2018-Nov-28, Tom Lane wrote:
>> Color me skeptical --- ruleutils has never especially been designed
>> to be fast, and I can't see that the overhead of this is going to be
>> acceptable to anybody who needs pg_stat_statements in production.

> Good point.

> Maybe we can save the OID array of schemas that are in search_path when
> the query is first entered into the statement pool, and produce the
> query_qn column only at the time the entry is interpreted (that is, when
> pg_stat_statements is query).  ... oh, but that requires saving the plan
> tree too, which doesn't sound very convenient.

Yeah, and any subsequent DDL on relevant tables would break it.

> Maybe just storing the search_path schemas (as Tomas already suggested)
> is sufficient for Sergei's use case?  Do away with query_qn as such, and
> just have the user interpret the names according to the stored
> search_path.

This'd be OK by me, though I'm not sure that it represents a convenient
solution for the original problem.

            regards, tom lane


Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements

От
Sergei Agalakov
Дата:
On 11/29/2018 10:59 AM, Stephen Frost wrote:
> Greetings,
>
> * Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
>> On 2018-Nov-28, Tom Lane wrote:
>>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>>> On 2018-Nov-28, Tom Lane wrote:
>>>>> This would also entail rather significant overhead to find out schema
>>>>> names and interpolate them into the text.
>>>> True.  I was thinking that the qualified-names version of the query
>>>> would be obtained via ruleutils or some similar mechanism to deparse
>>>> from the parsed query tree (not from the original query text), where
>>>> only pg_catalog is considered visible.  This would be enabled using a
>>>> GUC that defaults to off.
>>> Color me skeptical --- ruleutils has never especially been designed
>>> to be fast, and I can't see that the overhead of this is going to be
>>> acceptable to anybody who needs pg_stat_statements in production.
>>> (Some admittedly rough experiments suggest that we might be
>>> talking about an order-of-magnitude slowdown for simple queries.)
>> Good point.
>>
>> Maybe we can save the OID array of schemas that are in search_path when
>> the query is first entered into the statement pool, and produce the
>> query_qn column only at the time the entry is interpreted (that is, when
>> pg_stat_statements is query).  ... oh, but that requires saving the plan
>> tree too, which doesn't sound very convenient.
>>
>> Maybe just storing the search_path schemas (as Tomas already suggested)
>> is sufficient for Sergei's use case?  Do away with query_qn as such, and
>> just have the user interpret the names according to the stored
>> search_path.
> Seems like what you'd really want is to store all the environment, not
> just the search_path (consider the $user case...).  Maybe saving just
> the OIDs of the search_path and then using them later would also work
> but it seems like we're just building up to tracking everything and
> doing it piecemeal with an extra column added in this release, another
> in the next release, etc..
>
> Thanks!
>
> Stephen
It's a valid concern. Instead of the adding just search_path column we 
can add a column session_info jsonb.
Its content can be defined by the new configuration parameter

pg_stat_statements.session_info ('current_schemas, current_user, 
session_user') // a subset of the data from the system information functions

and it will have data like
{
     "current_schemas" : ["pg_catalog", "s1", "s2", "public"],
     "current_user" : "user1",
     "session_user" : "user1"
}

It will allow the DBA/developer to reproduce a performance issue, and 
will allow the deeper level of granularity for the reporting tools.
It's more complex than I have anticipated but doesn't break backward 
compatibility and extensible.

Thank you,

Sergei



Re: [PROPOSAL] extend the object names to the qualified names inpg_stat_statements

От
Sergei Agalakov
Дата:
On 11/29/2018 12:46 PM, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> On 2018-Nov-28, Tom Lane wrote:
>>> Color me skeptical --- ruleutils has never especially been designed
>>> to be fast, and I can't see that the overhead of this is going to be
>>> acceptable to anybody who needs pg_stat_statements in production.
>> Good point.
>> Maybe we can save the OID array of schemas that are in search_path when
>> the query is first entered into the statement pool, and produce the
>> query_qn column only at the time the entry is interpreted (that is, when
>> pg_stat_statements is query).  ... oh, but that requires saving the plan
>> tree too, which doesn't sound very convenient.
> Yeah, and any subsequent DDL on relevant tables would break it.
>
>> Maybe just storing the search_path schemas (as Tomas already suggested)
>> is sufficient for Sergei's use case?  Do away with query_qn as such, and
>> just have the user interpret the names according to the stored
>> search_path.
> This'd be OK by me, though I'm not sure that it represents a convenient
> solution for the original problem.
>
>             regards, tom lane
It's less convenient, but it presents a solution. Either that or even 
better a new session_info column that
may capture together with search_path also  current_user, session_user 
etc. that would allow to distinct
between the different versions of the query based on the real context of 
execution.

Thank you,

Sergei