linux 平臺性能分析工具

syden1981 2013-11-25

展開全文

Linux平臺下面有不少性能分析的工具，每個工具有何優(yōu)劣卻很難找到一個完整的列表，這里做一下記錄，以便參考。

1. Intel VTune http://software.intel.com/en-us/intel-vtune-amplifier-xe/

大名鼎鼎的分析工具，可以直接啟動一個程序來分析，

比如

$vtuneHome/amplxe-cl -collect hotspots -duration 600 -r /apsara/save_result ./myapp para1 para2

也可以針對運行中的進程來分析，

$vtuneHome/amplxe-cl -collect hotspots -duration 600 -r /apsara/save_result -target-pid $myapp_pid

運行結(jié)束后，就可以使用gui工具來展示結(jié)果，直觀易懂，非常方便。

VTune不僅支持對熱點函數(shù)(hotspots)的分析，還支持對并發(fā)、鎖等待、不同類型的CPU內(nèi)存訪問（都是Intel自家的）、讀寫帶寬等進行分析，功能強大。

總之，VTune唯一不足的就是收費，免費試用一個月就過期，過期了需要重新申請，比較麻煩。

2. OProfile http://en./wiki/OProfile

OProfile最大的好處是方便，一般OS都自帶了這個工具，不用做任何準備就可以使用。

該工具設計初衷是針對事件進行采樣，比如CPU時鐘，L2 cache miss等，其將整個系統(tǒng)當做一個整體來看，對于分析kernel或者系統(tǒng)級別的問題比較有用，而如果用于分析個人開發(fā)的應用程序，其顯得不足，主要表現(xiàn)在其callgraph不清晰。

比如OProfile告訴你std::find調(diào)用占了30%，如果你的程序只有幾百行，那么你很快就能定位到使用std::find的地方；而如果程序有幾萬行，而OProfile的Callgraph很不給力，那么想知道這些std::find都是誰調(diào)用的就很困難了。

一句話，對于千行以內(nèi)的程序或者很少使用stl的程序，OProfile能發(fā)揮作用；對于萬行程序或者大量使用了stl的程序，OProfile不那么給力。

其基本的使用方法是，

opcontrol --no-vmlinux : 指示oprofile啟動檢測后，不記錄內(nèi)核模塊、內(nèi)核代碼相關統(tǒng)計數(shù)據(jù)
opcontrol --init : 加載oprofile模塊、oprofile驅(qū)動程序
opcontrol --start : 指示oprofile啟動檢測
opcontrol --dump : 指示將oprofile檢測到的數(shù)據(jù)寫入文件
啟動你的應用程序；
opcontrol --stop

opreport -D smart -l > /tmp/report : 寫入分析結(jié)果，不包括callgraph，如果需要callgraph，則使用
opreport -c -D smart -l > /tmp/report : 這一步很慢；

還可以分析源代碼，會將每行代碼標上所耗費CPU的比例；
opannotate -s /lib64/libc-2.4.so : 以代碼的角度，針對libc-2.4.so庫顯示檢測結(jié)果

還有一些用法，man可以看到更詳細的解釋；
opcontrol --reset : 清空之前檢測的數(shù)據(jù)記錄
opcontrol -h : 關閉oprofile進程

有時候opreport會報告說buffersize不夠丟了一些采樣點，這時候可以調(diào)整buffer size。buffer size不是以byte計數(shù)，而是以能做多少sample來計數(shù)的；分為兩級，一個是總的buffer size，一個是每個cpu的buffer size，如果超過了bufer watershed就flush到磁盤文件中。如下，是修改之后的：

$sudo opcontrol --status
Daemon not running
Session-dir: /var/lib/oprofile
Separate options: library
vmlinux file: none
Image filter: none
Call-graph depth: 25
Buffer size: 1000000
CPU buffer watershed: 256000
CPU buffer size: 32000

在安裝OProfile的過程中，也可能碰到各種各樣的問題，總結(jié)如下：

install libiberty.h

checking for libiberty.h... no

checking for cplus_demangle in -liberty... no

configure: error: liberty library not found

安裝 binutils-devel，uname -a 確認是x86_64 還是 386

如果出現(xiàn)類似，op_cpu_type.c:259:39: error: 'AT_BASE_PLATFORM' undeclared (first use in this function)，如果你是X86_64平臺，打上patch屏蔽掉PPC平臺的編譯；

http:///p/oprofile/bugs/245/

Warning: QT version 3 was requested but not found. No GUI will be built.
Warning: You requested to build with the '--with-kernel' option, but your kernel
headers were not accessible at the given location. Be sure you have run the following
command from within your kernel source tree:
make headers_install INSTALL_HDR_PATH=<kernel-hdrs-install-dir>
Then pass <kernel-hdrs-install-dir> to oprofile's '--with-kernel' configure option.

If you run 'make' now, only the legacy ocontrol-based profiler will be built.

這個原因是因為內(nèi)部不支持perf event，這時候無法只針對單獨的進程進行采樣。

3. GProf http://en./wiki/Gprof

GProf用起來很麻煩，編譯時候需要加入-pg 選項，而且默認只能針對單線程程序（這個patch可以支持多線程，http://sam./writings/programming/gprof.html），總之極其不便；

4. Google Perf Tools https://code.google.com/p/gperftools/?redir=1

強大免費的工具來了，Gperf 里面包含內(nèi)存分配器，內(nèi)存泄露分析器，CPU使用分析器，callgraph也比較精準，用起來簡單方便（只要鏈接lib然后配置一個環(huán)境變量就好了），并且自己可以使用代碼精確控制profile的配置，具體參考鏈接左邊的幾個文章。

還有一些其他的分析方法和工具，比如連續(xù)做幾次pstack看看大部分線程在干嗎也可以初步判斷可能是哪里出了問題。

總之，如果簡單的分析一下，那么OProfile是可以勝任的；如果想比較認真仔細的做性能調(diào)優(yōu)，最好使用Google perf tools。