On Wed, Dec 19, 2012 at 10:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> valgrind comes with a tool called cachegrind which can emulate the
>> cache algorithm on some variants of various cpus and produce reports.
>> Can it be made to produce a report for a specific block of memory?
>
> I believe that oprofile can be persuaded to produce statistics about
> where in one's code are the most cache misses, not just the most
> wall-clock ticks; which would shed a lot of light on this question.
> However, my oprofile-fu doesn't quite extend to actually persuading it.
perf can certainly do this.
$ perf record -a -e cache-misses pgbench -n -S -T 30
...output elided...
$ perf report -d postgres | grep -v '^#' | head 8.88% postgres base_yyparse 7.05% swapper 0x807c 4.67%
postgres SearchCatCache 3.77% pgbench 0x172dd58 3.47% postgres hash_search_with_hash_value 3.23%
postgres AllocSetAlloc 2.58% postgres core_yylex 1.87% postgres LWLockAcquire 1.83% postgres
fmgr_info_cxt_security 1.75% postgres 0x4d1054
For comparison:
$ perf record -a -e cycles -d postgres pgbench -n -S -T 30
$ perf report -d postgres | grep -v '^#' | head 6.54% postgres AllocSetAlloc 4.08% swapper 0x4ce754
3.60% postgres hash_search_with_hash_value 2.74% postgres base_yyparse 2.71% postgres
MemoryContextAllocZeroAligned 2.57% postgres MemoryContextAlloc 2.36% postgres SearchCatCache 2.10%
postgres _bt_compare 1.70% postgres LWLockAcquire 1.54% postgres FunctionCall2Coll
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company