Обсуждение: Generate call graphs in run-time

Поиск
Список
Период
Сортировка

Generate call graphs in run-time

От
Joel Jacobson
Дата:
Hi hackers,

BACKGROUND
I have a database centric system consisting of thousands of stored procedures,
where each call to any top-level function makes use of a lot of subfunctions in many layers.

I was about to manually produce call graphs covering all important parts of the system.
I realized this was a huge and boring task. Automation to the rescue.

Simply parsing the functions source code would not be sufficient as I had some functions using EXECUTE.
I also didn't know exactly which functions where top-level functions, and didn't know exactly
which functions were actually being used in production.

Because of this I decided to sample data in run-time to get a real-life picture of the system.
Any functions not being called in productions are not that important to include in the documentation anyway.

DESCRIPTION
Generates call graphs of function calls within a transaction in run-time.
The call graphs are written the first time they are seen in a session to the log.
The format of the log entry is a digraph in the GraphViz DOT language of the oids.

IMPLEMENTATION
New structures added to pgstat:
    static List *call_stack
        all parent functions at the current depth in the current transaction
    static HTAB *call_graph
        all unique pair of function calls so far in the current transaction
    static HTAB *seen_call_graphs
        all unique call graphs seen in the current session to only log once per call graph
    typedef struct PgStat_CallGraphEdge
        contains caller_oid and called_oid

In pgstat_init_function_usage(), each call to a function will add the caller->called oid pair unless already set in the hash.
When pgstat_end_function_usage() is called, the current function oid is removed from the List of parent oids.
In AtEOXact_PgStat(), called upon commit/rollback, the call graph is sorted and written to the log unless already seen in the session.
The variables are resetted in pgstat_clear_snapshot().

This functionality is probably something one would like to enable only temporarily in the production environment.
A new configuration parameter would therefore be good, just like track_functions. Perhaps track_callgraphs?

Instead of writing the call graphs to the postgres log, it would ne more useful to let the statistics collector keep track of the call graphs, to allow easier access than having to parse through the log file.
Perhaps the call graph would be represented as an oid[] array, and be the primary key (or unique) in some new statistics table,
with any interesting columns, perhaps number of calls or any of the data provided by pg_stat_user_functions and pg_stat_user_tables.


EXAMPLE
example_functions.sql contains the functions a(), ab(), ac(), aca(), acb().
I executed the functions a() and ac() and got this in my log file:
2012-01-09 19:08:30.983 CET,"joel","test",28461,"[local]",4f0b2d18.6f2d,1,"SELECT",2012-01-09 19:08:24 CET,3/0,0,LOG,00000,"digraph {37615->37616;37615->37617;37617->37618;37617->37619}",,,,,,"select a();",,,"psql"
2012-01-09 19:08:39.837 CET,"joel","test",28461,"[local]",4f0b2d18.6f2d,2,"SELECT",2012-01-09 19:08:24 CET,3/0,0,LOG,00000,"digraph {37617->37618;37617->37619}",,,,,,"select ac();",,,"psql"

The perl script pg_callgraph.pl replaces the oids with actual function names before generating the call graphs using GraphVIz:
digraph {a->ab;a->ac;ac->aca;ac->acb}
digraph {ac->aca;ac->acb}

The GraphVIz dot command is then used to generate nice call graphs in PNG.

I've also added two real-life examples of quite complex call graphs:
    6edd2b9b520ff2428c461fad43f29131a6c6604f.png
    607e272a35cf2ac8579173f97147a32de9784877.png



--
Best regards,

Joel Jacobson

Вложения

Re: Generate call graphs in run-time

От
Martin Pihlak
Дата:
On 01/09/2012 10:00 PM, Joel Jacobson wrote:
> Because of this I decided to sample data in run-time to get a real-life
> picture of the system.
> Any functions not being called in productions are not that important to
> include in the documentation anyway.

FWIW I have a similar problem - with a similar solution. I'll try to
describe it, in case there is some synergy to be leveraged :)

> In pgstat_init_function_usage(), each call to a function will add the
> caller->called oid pair unless already set in the hash.
> When pgstat_end_function_usage() is called, the current function oid is
> removed from the List of parent oids.
> In AtEOXact_PgStat(), called upon commit/rollback, the call graph is
> sorted and written to the log unless already seen in the session.
> The variables are resetted in pgstat_clear_snapshot().
> 

My approach was to add parent oid to the per-backend function stats
structure - PgStat_BackendFunctionEntry. Also, I changed the hash key
for that structure to (oid, parent) pair. This means that within the
backend the function usage is always tracked with the context of
calling function. This has the nice property that you get the per-parent
usage stats as well. Also the additional lists for parent tracking are
avoided.

During pgstat_report_stat() the call graph (with stats) is output
to logs and the statistics uploaded to collector -- with the parent oid
removed.

> This functionality is probably something one would like to enable only
> temporarily in the production environment.
> A new configuration parameter would therefore be good, just like
> track_functions. Perhaps track_callgraphs?
> 

I opted to always track the function parents -- this is very cheap. The
logging on the other hand is quite heavy and needs to be explicitly
configured (I used log_usage_stats GUC).

There is a patch for this and we do use it in production for occasional
troubleshooting and dependency analysis. Can't attach immediately
though -- it has some extra cruft in it that needs to be cleaned up.

> Instead of writing the call graphs to the postgres log, it would ne more
> useful to let the statistics collector keep track of the call graphs, to
> allow easier access than having to parse through the log file.

Indeed. Something like a pg_stat_user_function_details view would be
very useful. Something along the lines of:
  Column     |  Type  |
--------------+--------+funcid       | oid    |parentfuncid | oid    | <-- newschemaname   | name   |funcname     |
name  |calls        | bigint |total_time   | bigint |self_time    | bigint |
 

And then rewrite pg_stat_user_functions by aggregating the detailed
view. That'd make the individual pg_stat_get_function* functions a
bit slower, but that is probably a non-issue - at least not if the
pg_stat_user_functions view is rewritten to use a SRF.

regards,
Martin


Re: Generate call graphs in run-time

От
Joel Jacobson
Дата:
On Mon, Jan 16, 2012 at 2:23 PM, Martin Pihlak <martin.pihlak@gmail.com> wrote:
> My approach was to add parent oid to the per-backend function stats
> structure - PgStat_BackendFunctionEntry. Also, I changed the hash key
> for that structure to (oid, parent) pair. This means that within the
> backend the function usage is always tracked with the context of
> calling function. This has the nice property that you get the per-parent
> usage stats as well. Also the additional lists for parent tracking are
> avoided.
>
> During pgstat_report_stat() the call graph (with stats) is output
> to logs and the statistics uploaded to collector -- with the parent oid
> removed.

Since you only care about the parentfuncid in one level, it looks like
you will only be able to get a total call graph of all possible
function calls, and not each unique call graph per transaction.
If you have two separate transactions:
T1: a->b, b->c
T2: b->d
You would have two unique call graphs {a->b, b->c} and {b->d}.
The global call graph, where you only keep track of all unique
parentfuncid->funcid pairs, would be {a->b, b->c, b->d}, which lacks
the information on what different call graphs are actually being
executed per transaction.

Also, why remove the parent oid when uploading the statistics to the collector?
It would be nice to have the statistics for each function per parent,
to see where you have a bottleneck which might only be occurring in a
function when called from a specific parent.
Even more fine-grained would be to have the statistics per unique
call-graph, i.e. the entire tree of functions called in the
transactions.

> There is a patch for this and we do use it in production for occasional
> troubleshooting and dependency analysis. Can't attach immediately
> though -- it has some extra cruft in it that needs to be cleaned up.

I would highly appreciate a patch, don't worry about cleaning up, I
can do that, unless it's some code you can't share for other reasons.

> Indeed. Something like a pg_stat_user_function_details view would be
> very useful. Something along the lines of:
>
>   Column     |  Type  |
> --------------+--------+
>  funcid       | oid    |
>  parentfuncid | oid    | <-- new
>  schemaname   | name   |
>  funcname     | name   |
>  calls        | bigint |
>  total_time   | bigint |
>  self_time    | bigint |

funcid->parentfuncid might be sufficient for performance
optimizations, but to automatically generate directional graphs of all
unique call graphs in run-time, you would need all the unique pairs of
funcid->parentfuncid as a singel column, probably a sorted array of
oids[][], example: [[1,2],[1,3],[2,4],[2,5]] if the call craph would
be {1->2, 1->3, 2->4, 2->5}.

>
> And then rewrite pg_stat_user_functions by aggregating the detailed
> view. That'd make the individual pg_stat_get_function* functions a
> bit slower, but that is probably a non-issue - at least not if the
> pg_stat_user_functions view is rewritten to use a SRF.
>
> regards,
> Martin


Re: Generate call graphs in run-time

От
Martin Pihlak
Дата:
On 01/17/2012 11:13 AM, Joel Jacobson wrote:
> Since you only care about the parentfuncid in one level, it looks like
> you will only be able to get a total call graph of all possible
> function calls, and not each unique call graph per transaction.

True, for my purposes (function dependencies and performance analysis)
the global graph is sufficient.

> Also, why remove the parent oid when uploading the statistics to the collector?
> It would be nice to have the statistics for each function per parent,
> to see where you have a bottleneck which might only be occurring in a
> function when called from a specific parent.

I guess I was just lazy at the time I wrote it. But it shouldn't be too
much of an effort to store the global call graph in statistics
collector. Unique call graphs would be somewhat more complicated I
guess.

>> There is a patch for this and we do use it in production for occasional
>> troubleshooting and dependency analysis. Can't attach immediately
>> though -- it has some extra cruft in it that needs to be cleaned up.
>
> I would highly appreciate a patch, don't worry about cleaning up, I
> can do that, unless it's some code you can't share for other reasons.
>

Patch attached. It was developed against 9.1, but also applies to
HEAD but gives some fuzz and offsets.

It adds 2 GUC variables: log_function_calls and log_usage_stats. The
first just output function statistics to log (with no parent info).
With log_usage_stats enabled, it also outputs detailed function usage
plus relation usage. So you basically get output such as:

# select * from pgq.get_consumer_info();
LOG:  duration: 11.153 ms  statement: select * from pgq.get_consumer_info();
LOG:  function call: pgq.get_consumer_info(0) calls=1 time=9726
self_time=536
LOG:  USAGE STATISTICS
DETAIL:  ! object access stats:
! F 1892464226 0 pgq.get_consumer_info(0) calls=1 time=9726 self_time=536
! F 1892464228 1892464226 pgq.get_consumer_info(2) calls=1 time=9190
self_time=9190
! r 167558000 pgq.queue: blks_read=28 blks_hit=28
! r 167557988 pgq.consumer: blks_read=56 blks_hit=56
! r 167558021 pgq.subscription: blks_read=54 blks_hit=50
! r 167558050 pgq.tick: blks_read=103 blks_hit=102
! i 1892464189 pgq.queue_pkey: scans=1 tup_ret=37 tup_fetch=37
blks_read=2 blks_hit=2
! i 167557995 pgq.consumer_pkey: scans=56 tup_ret=56 tup_fetch=56
blks_read=57 blks_hit=56
! i 1892464195 pgq.subscription_pkey: scans=37 tup_ret=156 tup_fetch=56
blks_read=127 blks_hit=123
! i 167558058 pgq.tick_pkey: scans=112 tup_ret=103 tup_fetch=103
blks_read=367 blks_hit=366

> funcid->parentfuncid might be sufficient for performance
> optimizations, but to automatically generate directional graphs of all
> unique call graphs in run-time, you would need all the unique pairs of
> funcid->parentfuncid as a singel column, probably a sorted array of
> oids[][], example: [[1,2],[1,3],[2,4],[2,5]] if the call craph would
> be {1->2, 1->3, 2->4, 2->5}.
>

Hmm, array would probably work, although I wonder if there is a
more elegant solution ...

regards,
Martin

Вложения