Add a --branch-history option to perf report that changes all the
settings necessary for using the branches in callstacks.
This is just a short cut to make this nicer to use, it does not enable
any functionality by itself.
v2: Change sort order. Rename option to --branch-history to
be less confusing.
v3: Updates
v4: Fix conflict with newer perf base
v5: Port to latest tip
v6: Add more comments. Remove CCKEY_ADDRESS setting. Remove
unnecessary branch_mode setting. Use a boolean.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-5-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When debugging the tui browser I find it useful to redirect the debug
log into a file. Currently it's always forced to the message line.
Add an option to force it to stderr. Then it can be easily redirected.
Example:
[root@zoo ~]# perf --debug stderr report -vv 2> /tmp/debug
[root@zoo ~]# tail /tmp/debug
dso open failed, mmap: No such file or directory
dso open failed, mmap: No such file or directory
dso open failed, mmap: No such file or directory
dso open failed, mmap: No such file or directory
dso open failed, mmap: No such file or directory
Using /root/.debug/.build-id/4e/841948927029fb650132253642d5dbb2c1fb93 for symbols
Failed to open /tmp/perf-8831.map, continuing without symbols
Failed to open /tmp/perf-12721.map, continuing without symbols
Failed to open /tmp/perf-6966.map, continuing without symbols
Failed to open /tmp/perf-8802.map, continuing without symbols
[root@zoo ~]#
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1416605880-25055-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We need to define bfd_demangle() to either a wrapper for
cplus_demangle() or to a stub when NO_DEMANGLE is defined.
That is at odds with using bfd.h for some other reason, as it defines
bfd_demangle() and then if code that wants to use symbol.h, where the
above stubbing/wrapping is done, and bfd.h for other reasons, we end up
with a build error where bfd_demangle() is found to be redefined.
Avoid that by moving the stubbing/wrapping to symbol-elf.c, that is the
only user of such function. If we ever get to a point where there are
more valid users, we can then introduce a header for that.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-6wzjpe2fy9xtgchshulixlzw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For lbr-as-callgraph we need to see the line number in the history,
because many LBR entries can be in a single function, and just
showing the same function name many times is not useful.
When the history code is configured to sort by address, also try to
resolve the address to a file:srcline and display this in the browser.
If that doesn't work still display the address.
This can be also useful without LBRs for understanding which call in a large
function (or in which inlined function) called something else.
Contains fixes from Namhyung Kim
v2: Refactor code into common function
v3: Fix GTK build
v4: Rebase
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-7-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently perf report on TUI doesn't print percent for first-level
callchain entry.
I guess it (wrongly) assumes that there's only a single callchain in the
first level.
This patch fixes it by handling the first level callchains same as
others - if it's not 100% it should print the percent value.
Also it'll affect other callchains in the other way around - if it's
100% (single callchain) it should not print the percentage.
Before:
- 30.95% 6.84% abc2 abc2 [.] a
- a
- 70.00% c
- 100.00% apic_timer_interrupt
smp_apic_timer_interrupt
local_apic_timer_interrupt
hrtimer_interrupt
...
+ 30.00% b
+ __libc_start_main
After:
- 30.95% 6.84% abc2 abc2 [.] a
- 77.90% a
- 70.00% c
- apic_timer_interrupt
smp_apic_timer_interrupt
local_apic_timer_interrupt
hrtimer_interrupt
...
+ 30.00% b
+ 22.10% __libc_start_main
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1416816807-6495-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With srcline key/sort'ing it's useful to have line numbers in the
annotate window. This patch implements this.
Use objdump -l to request the line numbers and save them in the line
structure. Then the browser displays them for source lines.
The line numbers are not displayed by default, but can be toggled on
with 'k'
There is one unfortunate problem with this setup. For lines not
containing source and which are outside functions objdump -l reports
line numbers off by a few: it always reports the first line number in
the next function even for lines that are outside the function.
I haven't found a nice way to detect/correct this. Probably objdump has
to be fixed.
See https://sourceware.org/bugzilla/show_bug.cgi?id=16433
The line numbers are still useful even with these problems, as most are
correct and the ones which are not are nearby.
v2: Fix help text. Handle (discriminator...) output in objdump.
Left align the line numbers.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-9-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's a problem on finding correct kernel symbols when perf report
runs on a different kernel. Although a part of the problem was solved
by the prior commit 0a7e6d1b68 ("perf tools: Check recorded kernel
version when finding vmlinux"), there's a remaining problem still.
When perf records samples, it synthesizes the kernel map using
machine__mmap_name() and ref_reloc_sym like "[kernel.kallsyms]_text".
You can easily see it using 'perf report -D' command.
After finishing record, it goes through the recorded events to find
maps/dsos actually used. And then record build-id info of them.
During this process, it needs to load symbols in a dso and it'd call
dso__load_vmlinux_path() since the default value of the symbol_conf.
try_vmlinux_path is true. However it changes dso->long_name to a real
path of the vmlinux file (e.g. /lib/modules/3.16.4/build/vmlinux) if one
is running on a custom kernel.
It resulted in that perf report reads the build-id of the vmlinux, but
cannot use it since it only knows about the [kernel.kallsyms] map. It
then falls back to possible vmlinux paths by using the recorded kernel
version (in case of a recent version) or a running kernel silently.
Even with the recent tools, this still has a possibility of breaking
the result. As the build directory is a symbolic link, if one built a
new kernel in the same directory with different source/config, the old
link to vmlinux will point the new file. So it's absolutely needed to
use build-id when finding a kernel image.
In this patch, it's now changed to try to search a kernel dso in the
existing dso list which was constructed during build-id table parsing
so it'll always have a build-id. If not found, search "[kernel.kallsyms]".
Before:
$ perf report
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ...............................
#
72.15% 0.00% swapper [kernel.kallsyms] [k] set_curr_task_rt
72.15% 0.00% swapper [kernel.kallsyms] [k] native_calibrate_tsc
72.15% 0.00% swapper [kernel.kallsyms] [k] tsc_refine_calibration_work
71.87% 71.87% swapper [kernel.kallsyms] [k] module_finalize
...
After (for the same perf.data):
72.15% 0.00% swapper vmlinux [k] cpu_startup_entry
72.15% 0.00% swapper vmlinux [k] arch_cpu_idle
72.15% 0.00% swapper vmlinux [k] default_idle
71.87% 71.87% swapper vmlinux [k] native_safe_halt
...
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20140924073356.GB1962@gmail.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1415063674-17206-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When perf record finishes a session, it pre-processes samples in order
to write build-id info from DSOs that had samples.
During this process it'll call map__load() for the kernel map, and it
ends up calling dso__load_vmlinux_path() which replaces dso->long_name.
But this function checks kernel's build-id before searching vmlinux path
so it'll end up with a cryptic name, the pathname for the entry in the
~/.debug cache, which can be confusing to users.
This patch adds a flag to skip the build-id check during record, so
that it'll have the original vmlinux path for the kernel dso->long_name,
not the entry in the ~/.debug cache.
Before:
# perf record -va sleep 3
mmap size 528384B
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.196 MB perf.data (~8545 samples) ]
Looking at the vmlinux_path (7 entries long)
Using /home/namhyung/.debug/.build-id/f0/6e17aa50adf4d00b88925e03775de107611551 for symbols
After:
# perf record -va sleep 3
mmap size 528384B
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.193 MB perf.data (~8432 samples) ]
Looking at the vmlinux_path (7 entries long)
Using /lib/modules/3.16.4-1-ARCH/build/vmlinux for symbols
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1415063674-17206-7-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>