感谢支持
我们一直在努力

Ubuntu下使用icc/ifort/MKL编译R及性能实测

1 基本环境
Ubuntu 11.04 32-bit
R 32-bit 2.13.0
Intel Composer XE 2011.3.174(含icc/ifort/MKL)
MKL(Intel Math Kernel Library), 号称”provides extremely well-tuned BLAS and LAPACK implementations that deliver significant performance leadership over alternative math libraries”. 在特定的计算情境下会带来一定的性能提升. 记得哪里看到过Revolution R和MATLAB就有用到MKL. 这里尝试用Intel提供的C和Fortran编译器结合其MKL库编译R, 以期尽量发挥现有硬件的性能. Intel的Composer XE中自带了MKL, 而且icc/ifort基本是傻瓜化安装, 这里不再赘述.


2 编译过程
假设icc/ifort/MKL安装在默认的 /opt/intel/composerxe-2011.3.174/ 目录下.


通常我们会这样简单编译安装:


wget http://ftp.ctex.org/mirrors/CRAN/src/base/R-2/R-2.13.0.tar.gz
tar -xf R-2.13.0.tar.gz
cd R-2.13.0
./configure
make
sudo make install


这次具体指定一下参数就好了.



卸载原有的R:


sudo apt-get remove r-base
sudo apt-get autoremove


设以下为foo.sh


source /opt/intel/composerxe-2011.3.174/bin/iccvars.sh ia32
source /opt/intel/composerxe-2011.3.174/bin/ifortvars.sh ia32
source /opt/intel/composerxe-2011.3.174/mkl/bin/mklvars.sh ia32


export CC=icc
export CFLAGS=”-g -O2 -wd188 -ip -std=c99″
export F77=ifort
export FFLAGS=”-g -O3″
export CXX=icpc
export CXXFLAGS=”-g -O3″
export FC=ifort
export FCFLAGS=”-g -O3″
export ICC_LIBS=/opt/intel/composerxe-2011.3.174/compiler/lib/ia32
export IFC_LIBS=/opt/intel/composerxe-2011.3.174/compiler/lib/ia32
export SHLIB_CXXLD=icpc
export SHLIB_CXXLDFLAGS=-shared


MKL_LIB_PATH=/opt/intel/composerxe-2011.3.174/mkl/lib/ia32
export LD_LIBRARY_PATH=$MKL_LIB_PATH


OMP_NUM_THREADS=2


export LDFLAGS=”-L${MKL_LIB_PATH},-Bdirect,–hash-style=both,-Wl,-O1 -L$ICC_LIBS -L$IFC_LIBS -L/usr/local/lib”


export SHLIB_LDFLAGS=”-lpthread”
export MAIN_LDFLAGS=”-lpthread”


MKL=”-L${MKL_LIB_PATH} -lmkl_blas95 -lmkl_lapack95  -Wl,–start-group -lmkl_intel -lmkl_intel_thread -lmkl_core -Wl,–end-group -openmp -lpthread”


source一下foo.sh:


source foo.sh


下载解压和configure:


wget http://ftp.ctex.org/mirrors/CRAN/src/base/R-2/R-2.13.0.tar.gz
tar -xf R-2.13.0.tar.gz
cd R-2.13.0
./configure –enable-R-shlib –with-blas=”$MKL”  –with-lapack


最后的configure结果大概会是这样的:


R is now configured for i686-pc-linux-gnu


Source directory:          .
Installation directory:    /usr/local


C compiler:                icc  -g -O2 -wd188 -ip -std=c99
Fortran 77 compiler:       ifort  -g -O3


C++ compiler:              icpc  -g -O3
Fortran 90/95 compiler:    ifort -g -O3
Obj-C compiler:


Interfaces supported:      X11
External libraries:        readline, BLAS(generic), LAPACK(in blas)
Additional capabilities:   PNG, JPEG, NLS, cairo
Options enabled:           shared R library, R profiling, Java


Recommended packages:      yes


然后


make
make check
sudo make install


Bingo.


三点说明:


make到中间出现找不到/usr/include/asm/errno.h的情况. 加一个符号链接, 解决:
sudo ln -s /usr/include/asm-generic /usr/include/asm
参考网站[1]给出的中CFLAGS等4处带有参数 -mieee-fp , make时过不去. 在此去掉此参数以后, make通过, 后果未知.
make install时提示


icc -I. -I../../src/include -I../../src/include  -I/usr/local/include -DHAVE_CONFIG_H   -openmp -fpic  -g -O2 -wd188 -ip -std=c99 -DR_HOME='”/usr/local/lib/R”‘ -o Rscript \
./Rscript.c
/bin/bash: icc: 未找到命令
make[2]: *** [install-Rscript] 错误 127


建立符号链接, 解决:
sudo ln -s /opt/intel/composerxe-2011.3.174/bin/ia32/icc /bin/icc

3 性能测试
根据 AT&T Research R Benchmark 提供的 R-benchmark-25.R 和 bench.R 做了简单的性能测试, 测试主要涵盖了常用的矩阵计算.


3.1 默认安装的测试结果
= R-benchmark-25.R =


R Benchmark 2.5
===============
Number of times each test is run__________________________:  3


I. Matrix calculation
———————
Creation, transp., deformation of a 2500×2500 matrix (sec):  1.11666666666666
2400×2400 normal distributed random matrix ^1000____ (sec):  1.86866666666666
Sorting of 7,000,000 random values__________________ (sec):  1.323
2800×2800 cross-product matrix (b = a’ * a)_________ (sec):  23.1193333333333
Linear regr. over a 3000×3000 matrix (c = a \ b’)___ (sec):  18.635
——————————————–
Trimmed geom. mean (2 extremes eliminated):  3.58487232845403


II. Matrix functions
——————–
FFT over 2,400,000 random values____________________ (sec):  1.50533333333333
Eigenvalues of a 640×640 random matrix______________ (sec):  3.05366666666669
Determinant of a 2500×2500 random matrix____________ (sec):  12.1626666666667
Cholesky decomposition of a 3000×3000 matrix________ (sec):  11.2026666666667
Inverse of a 1600×1600 random matrix________________ (sec):  9.61066666666667
——————————————–
Trimmed geom. mean (2 extremes eliminated):  6.90185004025158


III. Programmation
——————
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  1.67933333333336
Creation of a 3000×3000 Hilbert matrix (matrix calc) (sec):  1.09566666666666
Grand common divisors of 400,000 pairs (recursion)__ (sec):  1.77066666666667
Creation of a 500×500 Toeplitz matrix (loops)_______ (sec):  2.82166666666666
Escoufier’s method on a 45×45 matrix (mixed)________ (sec):  1.90700000000004
——————————————–
Trimmed geom. mean (2 extremes eliminated):  1.78323318573098


Total time for all 15 tests_________________________ (sec):  92.8720000000001
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  3.53358695795615
— End of test —


= bench.R =


[1] “hilbert n=500”
用户  系统  流逝
0.604 0.016 0.657
用户  系统  流逝
0.492 0.040 0.531
用户  系统  流逝
0.508 0.024 0.533
[1] “hilbert n=1000”
用户  系统  流逝
3.480 0.140 3.624
用户  系统  流逝
2.768 0.108 2.879
用户  系统  流逝
2.904 0.128 3.037
[1] “sort n=6”
用户  系统  流逝
0.536 0.036 0.568
用户  系统  流逝
0.548 0.020 0.569
用户  系统  流逝
0.544 0.024 0.568
[1] “sort n=7”
用户  系统  流逝
6.921 0.232 7.161
用户  系统  流逝
6.840 0.184 7.034
用户  系统  流逝
6.989 0.184 7.212
[1] “loess n=3”
用户  系统  流逝
0.084 0.000 0.188
用户  系统  流逝
0.080 0.000 0.081
用户 系统 流逝
0.08 0.00 0.08
用户  系统  流逝
0.080 0.000 0.079
用户  系统  流逝
0.080 0.000 0.081
[1] “loess n=4”
用户  系统  流逝
7.120 0.004 7.125
用户  系统  流逝
7.113 0.000 7.119
用户  系统  流逝
7.100 0.000 7.109
用户  系统  流逝
7.109 0.000 7.116
用户  系统  流逝
7.112 0.000 7.119

3.2 icc+ifort+MKL编译的测试结果
= R-benchmark-25.R =


R Benchmark 2.5
===============
Number of times each test is run__________________________:  3


I. Matrix calculation
———————
Creation, transp., deformation of a 2500×2500 matrix (sec):  1.00633333333333
2400×2400 normal distributed random matrix ^1000____ (sec):  1.07333333333333
Sorting of 7,000,000 random values__________________ (sec):  1.22766666666667
2800×2800 cross-product matrix (b = a’ * a)_________ (sec):  2.14133333333334
Linear regr. over a 3000×3000 matrix (c = a \ b’)___ (sec):  1.248
——————————————–
Trimmed geom. mean (2 extremes eliminated):  1.180347511679


II. Matrix functions
——————–
FFT over 2,400,000 random values____________________ (sec):  1.34566666666667
Eigenvalues of a 640×640 random matrix______________ (sec):  1.496
Determinant of a 2500×2500 random matrix____________ (sec):  1.53566666666667
Cholesky decomposition of a 3000×3000 matrix________ (sec):  1.47066666666667
Inverse of a 1600×1600 random matrix________________ (sec):  1.59333333333333
——————————————–
Trimmed geom. mean (2 extremes eliminated):  1.50054007982371


III. Programmation
——————
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  1.42833333333333
Creation of a 3000×3000 Hilbert matrix (matrix calc) (sec):  1.348
Grand common divisors of 400,000 pairs (recursion)__ (sec):  1.77033333333334
Creation of a 500×500 Toeplitz matrix (loops)_______ (sec):  2.34466666666667
Escoufier’s method on a 45×45 matrix (mixed)________ (sec):  1.211
——————————————–
Trimmed geom. mean (2 extremes eliminated):  1.5049595832868


Total time for all 15 tests_________________________ (sec):  22.2403333333333
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  1.38652416123196
— End of test —


= bench.R =


[1] “hilbert n=500”
用户  系统  流逝
0.280 0.032 0.289
用户  系统  流逝
0.292 0.032 0.169
用户  系统  流逝
0.296 0.036 0.169
[1] “hilbert n=1000”
用户  系统  流逝
1.440 0.140 0.971
用户  系统  流逝
1.356 0.124 0.842
用户  系统  流逝
1.276 0.140 0.834
[1] “sort n=6”
用户  系统  流逝
0.408 0.024 0.461
用户  系统  流逝
0.416 0.016 0.431
用户  系统  流逝
0.408 0.024 0.430
[1] “sort n=7”
用户  系统  流逝
5.512 0.216 5.740
用户  系统  流逝
5.437 0.260 5.704
用户  系统  流逝
5.504 0.192 5.703
[1] “loess n=3”
用户  系统  流逝
0.060 0.000 0.128
用户  系统  流逝
0.052 0.000 0.051
用户  系统  流逝
0.052 0.000 0.051
用户  系统  流逝
0.048 0.000 0.051
用户  系统  流逝
0.052 0.000 0.050
[1] “loess n=4”
用户  系统  流逝
8.897 0.016 4.560
用户  系统  流逝
9.184 0.004 4.607
用户  系统  流逝
9.141 0.008 4.585
用户  系统  流逝
9.204 0.008 4.621
用户  系统  流逝
9.209 0.004 4.619


针对 R-benchmark-25.R 的15项测试结果绘制barchart一张. 在部分情况下, Intel编译器和数学库的组合反较默认编译慢, 绝大部分项目icc/ifort/MKL都较默认编译有一定的性能提升, 在5个项目上远远超出默认库的性能(系MKL所致).



4 主要参考
1. Building R-2.8.0 with Intel Compiler Suite 11.0 (icc 11, ifort 11, MKL 10)


2. Compiling 64-bit R 2.10.1 with MKL in Linux: The rationale for compiling R using the Intel Math Kernel Library


3. 《R Installation and Administration Guide》 Appendix A.3.1.4


4. Intel MKL参数选择工具


5 结语
自行编译需谨慎, 本人又属linux菜鸟, 不能保证以上设定都是正确和最优的, 且GNU和Intel两家的货混用, 很有可能出问题. 如果因为追求一点速度导致计算结果有误, 都懂的.


6 已知问题
q(“no”)以后无响应, 需要强制退出.

赞(0) 打赏
转载请注明出处:服务器评测 » Ubuntu下使用icc/ifort/MKL编译R及性能实测
分享到: 更多 (0)

听说打赏我的人,都进福布斯排行榜啦!

支付宝扫一扫打赏

微信扫一扫打赏