八宝粥 发表于 2013-7-1 22:14:29

树莓派性能测试

本帖最后由 八宝粥 于 2013-7-1 22:29 编辑

CPU
Linpack基准测试已经完成对Arm的linpack基准测试, 选用gcc带-O3 (优化等级3)编译代码. 用200大小的数组运行.包括软件浮点源码编译/运行cc -O3 -o linpack linpack.c -lm
linpack.c: In function ‘main’:
linpack.c:69: warning: return type of ‘main’ is not ‘int’
./linpack
Enter array size (q to quit) : 200结果Crippled
Memory required:315K.

LINPACK benchmark, Double precision.
Machine precision:15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESLOVERHEAD    KFLOPS
       2   0.5392.45%   1.89%   5.66%   5493.333
       4   1.0792.52%   2.80%   4.67%   5385.621
       8   2.1292.45%   2.36%   5.19%   5466.003
      16   4.2492.45%   2.83%   4.72%   5438.944
      32   8.4992.11%   2.71%   5.18%   5459.213
      6416.9892.05%   2.89%   5.06%   5452.440硬件浮点 (-mfloat-abi=softfp)Memory required:315K.
LINPACK benchmark, Double precision.
Machine precision:15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESLOVERHEAD    KFLOPS
       8   0.5190.20%   3.92%   5.88%22888.889
      16   1.0289.22%   4.90%   5.88%22888.889
      32   2.0590.24%   3.41%   6.34%22888.889
      64   4.0891.42%   2.94%   5.64%22829.437
   128   8.1691.54%   2.94%   5.51%22799.827
   25616.3191.35%   2.76%   5.89%22903.800Raspbian下的全硬件浮点 (-mfloat-abi=hard -mfpu=vfp), 频率arm_freq=700Memory required:315K.
LINPACK benchmark, Double precision.
Machine precision:15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESLOVERHEAD    KFLOPS
      16   0.5889.66%   3.45%   6.90%40691.358
      32   1.1787.18%   4.27%   8.55%41071.651
      64   2.3288.36%   3.02%   8.62%41459.119
   128   4.6788.22%   3.43%   8.35%41071.651
   256   9.3388.85%   3.32%   7.82%40880.620
   51218.6389.00%   2.95%   8.05%41047.675Raspbian下的全硬件浮点 (-mfloat-abi=hard -mfpu=vfp), 频率arm_freq=1000 core_freq=500Memory required:315K.
LINPACK benchmark, Double precision.
Machine precision:15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESLOVERHEAD    KFLOPS
      32   0.7989.87%   0.00%10.13%61896.714
      64   1.5889.24%   1.27%   9.49%61463.869
   128   3.1690.19%   1.90%   7.91%60407.789
   256   6.3288.13%   3.80%   8.07%60511.761
   51212.6587.83%   3.56%   8.62%60825.836Gentoo下的全硬件浮点, 带编译器优化(gcc-4.6.3 -Ofast -fno-fast-math), 默认时序Memory required:315K.
LINPACK benchmark, Double precision.
Machine precision:15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESLOVERHEAD    KFLOPS
----------------------------------------------------
      16   0.5689.29%   1.79%   8.93%43084.967
      32   1.1391.15%   4.42%   4.42%40691.358
      64   2.2589.78%   3.56%   6.67%41853.968
   128   4.5187.80%   4.21%   7.98%42358.233
   256   9.0188.68%   3.88%   7.44%42155.076
   51218.0189.23%   2.78%   8.00%42434.923Whetstone/Dhrystone综合基准测试用gcc带参数-float-abi=softfp -O3编译全部代码源码测试代码在http://www.rowley.co.uk/arm/whet_dhry.zip.备用地址http://freespace.virgin.net/roy.longbottom/benchnt.zip编译/运行?结果DhrystoneMicroseconds for one run through Dhrystone: 1.2

Dhrystones per Second: 809061.5Whetstone CrippledLoops: 1000, Iterations: 10, Duration: 24 sec.

C Converted Double Precision Whetstones: 41.7 MIPS用'gcc -mfpu -float-abi=softfp'重新编译Whetstone, 结果更好:Loops: 1000, Iterations: 100, Duration: 106 sec.
C Converted Double Precision Whetstones: 94.3 MIPS上面的测试没有带-mfpu=vfp编译, 所以大部分运算时间都花费在SQRT方法上. 用了vfp后提升很大:Loops: 1000, Iterations: 100, Duration: 15 sec.
C Converted Double Precision Whetstones: 666.7 MIPSOpenSSL安全协议测试源码编译/运行openssl version;
openssl speed;结果关闭汇编优化:OpenSSL 0.9.8o 01 Jun 2010
built on: Thu Aug 26 18:56:26 UTC 2010
options:bn(64,32) md2(int) rc4(ptr,int) des(idx,risc1,4,long) aes(partial) blowfish(idx)
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -Wa,--noexecstack -g -Wall
available timing options: TIMES TIMEB HZ=100
timing function used: times
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes   64 bytes    256 bytes   1024 bytes   8192 bytes
md2                148.81k      372.18k      624.81k      769.95k      832.90k
mdc2               0.00         0.00         0.00         0.00         0.00
md4                615.30k   2468.76k   7612.19k    16707.01k    28104.86k
md5                380.13k   1501.12k   4800.77k    11312.81k    21682.77k
hmac(md5)         1022.28k   3480.23k   9587.80k    17492.25k    25441.78k
sha1               303.72k   1092.39k   3106.50k   6302.57k   9852.39k
rmd160             244.29k      849.04k   2414.53k   4747.26k   7513.00k
rc4            14658.70k    16836.49k    17462.03k    17628.21k    17522.08k
des cbc         2913.17k   3221.30k   3289.77k   3360.09k   3367.21k
des ede3          1149.87k   1188.59k   1198.46k   1206.00k   1208.25k
idea cbc             0.00         0.00         0.00         0.00         0.00
seed cbc             0.00         0.00         0.00         0.00         0.00
rc2 cbc         2812.71k   3012.02k   3054.19k   3077.82k   3076.12k
rc5-32/12 cbc      0.00         0.00         0.00         0.00         0.00
blowfish cbc      6091.32k   7007.89k   7250.62k   7288.21k   7163.88k
cast cbc          5068.25k   6020.03k   6345.71k   6367.64k   6260.44k
aes-128 cbc       3205.76k   3497.72k   3616.00k   3652.49k   3665.85k
aes-192 cbc       2730.65k   2981.88k   3073.20k   3102.38k   3111.86k
aes-256 cbc       2383.90k   2596.12k   2659.91k   2702.13k   2732.50k
camellia-128 cbc   0.00         0.00         0.00         0.00         0.00
camellia-192 cbc   0.00         0.00         0.00         0.00         0.00
camellia-256 cbc   0.00         0.00         0.00         0.00         0.00
sha256             679.98k   1629.47k   2905.43k   3708.32k   4175.45k
sha512            41.02k      163.83k      232.63k      318.20k      353.81k
aes-128 ige       3089.03k   3579.08k   3698.68k   3689.14k   3578.18k
aes-192 ige       2641.68k   3019.45k   3111.38k   3144.95k   3035.70k
aes-256 ige       2334.50k   2632.35k   2705.04k   2735.69k   2687.74k
                  sign    verify    sign/s verify/s
rsa512 bits 0.013747s 0.001193s   72.7    838.4
rsa 1024 bits 0.063481s 0.002742s   15.8    364.7
rsa 2048 bits 0.321250s 0.007378s      3.1    135.5
rsa 4096 bits 1.805000s 0.022528s      0.6   44.4
                  sign    verify    sign/s verify/s
dsa512 bits 0.011690s 0.013597s   85.5   73.5
dsa 1024 bits 0.027233s 0.031683s   36.7   31.6
dsa 2048 bits 0.073897s 0.087304s   13.5   11.5

八宝粥 发表于 2013-7-1 22:19:14

本帖最后由 八宝粥 于 2013-7-1 22:31 编辑

开启汇编优化:OpenSSL 1.0.1c 10 May 2012
built on: Sun Jul 29 00:43:16 CEST 2012
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
compiler: armv6j-hardfloat-linux-gnueabi-gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN \
-DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM \
-DSHA512_ASM -DAES_ASM -DGHASH_ASM -O2 -march=armv6j -mfpu=vfp -mfloat-abi=hard -fno-strict-aliasing -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes   64 bytes    256 bytes   1024 bytes   8192 bytes
md2                  0.00         0.00         0.00         0.00         0.00
mdc2               724.41k      933.06k   1024.68k   1063.59k   1075.88k
md4               2547.83k   9441.28k    27284.48k    51824.98k    69894.14k
md5               1954.05k   7217.96k    20805.95k    39365.29k    53226.15k
hmac(md5)         3075.61k    10241.88k    26669.65k    44729.00k    55386.11k
sha1            2115.34k   6823.83k    16264.45k    25053.18k    30121.35k
rmd160            1487.88k   4783.96k    10707.71k    15800.32k    19303.08k
rc4            34205.43k    39535.98k    41215.83k    41561.43k    41570.04k
des cbc         6251.12k   6605.08k   6686.81k   6713.01k   6707.54k
des ede3          2326.45k   2368.36k   2385.83k   2397.53k   2391.84k
idea cbc          8758.77k   9421.31k   9607.34k   9653.93k   9687.93k
seed cbc          8274.52k   9036.46k   9264.64k   9321.47k   9284.27k
rc2 cbc         6047.90k   6354.82k   6458.82k   6465.19k   6485.33k
rc5-32/12 cbc    16204.47k    18649.32k    19367.94k    19560.11k    19649.84k
blowfish cbc   11934.03k    13189.85k    13546.92k    13633.19k    13486.76k
cast cbc         10797.59k    11828.46k    12156.58k    12187.65k    12050.43k
aes-128 cbc      12978.72k    14708.69k    15387.40k    15472.93k    15529.06k
aes-192 cbc      11441.49k    12834.60k    13315.69k    13453.78k    13430.80k
aes-256 cbc      10267.01k    11409.83k    11744.41k    11812.86k    11859.64k
camellia-128 cbc   9312.98k    10278.89k    10572.46k    10646.19k    10657.82k
camellia-192 cbc   7541.38k   8140.71k   8325.63k   8370.18k   8361.30k
camellia-256 cbc   7513.97k   8138.65k   8297.98k   8351.40k   8347.65k
sha256            3598.03k   8377.26k    14605.57k    17979.39k    19300.35k
sha512            1080.74k   4322.82k   6151.85k   8416.32k   9418.07k
whirlpool          361.82k      729.24k   1186.42k   1425.38k   1512.79k
aes-128 ige      11702.57k    13853.45k    14429.53k    14671.38k    14057.47k
aes-192 ige      10468.67k    12165.24k    12628.24k    12743.72k    12331.69k
aes-256 ige       9505.78k    10831.25k    11205.36k    11333.43k    10982.74k
ghash            15681.70k    17279.32k    17770.84k    17894.06k    17940.48k
                  sign    verify    sign/s verify/s
rsa512 bits 0.002185s 0.000217s    457.6   4611.1
rsa 1024 bits 0.011325s 0.000640s   88.3   1563.5
rsa 2048 bits 0.074296s 0.002289s   13.5    436.8
rsa 4096 bits 0.544211s 0.008741s      1.8    114.4
                  sign    verify    sign/s verify/s
dsa512 bits 0.002157s 0.002262s    463.5    442.0
dsa 1024 bits 0.006234s 0.007123s    160.4    140.4
dsa 2048 bits 0.022247s 0.025884s   44.9   38.6GPU树莓派从USB到HDMI处理h264 1080p电影至少能达到4MB/s.管理员"JamesH"说"基本上1080p30高清的都>40Mb/s." h264也有5MB/s 包括WVGA(480p30) 或 720p20的VP8/WEBMioquake3(雷神之锤3修改版)
源码https://github.com/raspberrypi/quake3编译/运行- Download source, compile as delivered
- Start game
- Runs at display's native res, in my case 1280x1024
- Bitdepth stuck at 16bpp, not sure how to change, values in q3config.cfg seem to be ignored
- In-game console commands:
\timedemo 1
\demo four结果armel "driver info" : http://i.imgur.com/wtYhB.jpg armel timedemo score: http://i.imgur.com/i2TkN.jpg 20.2fpsarmhf "driver info" : http://i.imgur.com/8nqa1.jpg armhf timedemo score: http://i.imgur.com/dUu0g.jpg 28.5fpsIO
USB总线
[*]所有的IO都用同一个总线, 所以所有IO的总和不可能超过设计的60MB/s速度;
[*]高速USB盘的测试结果显示速度可达30 MB/s:
root@raspberrypi:~# dd if=/dev/sda of=/dev/null bs=32M count=10 iflag=direct
10+0 records in
10+0 records out
335544320 bytes (336 MB) copied, 10.6428 s, 31.5 MB/sSD卡这部分已经移到RPi_SD_cards#PerformanceNIC(网卡)编译/运行局域网中:iperf -s树莓派中:iperf -t 60 -c <SERVER_IP_ADDRESS> -d结果
宽带 (Mbit/s)CPU使用率 (峰值)发行版内核备注
52.1 + 46.45.1%us, 66.2%sy, 28.7%siDebian Squeeze "debian6-19-04-2012"Linux raspberrypi 3.1.9+ #95 PREEMPT
91.8 + 36.81.6%us, 60.8%sy, 37.5%siDebian Wheezy "Raspbian"Linux raspbian 3.1.9+ #101 PREEMPT
65.1 + 48.81.3%us, 61.9%sy, 36.8%siArch Linux 2012-04-29Linux alarmpi 3.1.9-12+ #5 Sat Apr 28 04:49:38 UTC 2012 armv6l ARMv6-compatible processor rev 7 (v6l) BCM2708 GNU/LinuxRemote host connected at gigabit
69.5 + 29.10.6%us, 55.5%sy, 40.0%siDebian Wheezy "Raspbian"Linux rpi 3.1.9+ #168 PREEMPTRemote connected at gigabit, values for si between 30 and 55 %
90.8 + 91.40.3%us, 62.2%sy, 37.5%siGentoo Linux ARMLinux genpi 3.2.23-bootc #1Remote host connected at gigabit, vm.min_free_kbytes = 4096

路由
方法
[*]一台带两个lan口的OpenBSD 5.2电脑在不同域用两个lan口直接连接树莓派.
[*]在OpenBSD上的一个接口跑iperf -s, 另一个跑iperf -t 300 -c, 树莓派负责在中间路由.
[*]iperf v2.0.5.
[*]测试150秒后树莓派系统负载.
结果
宽带 (Mbit/s)CPU使用率 (峰值)发行版内核备注
50.2 Mbps0.0%us, 0.1%sy, 99.8%siDebian Wheezy "Raspbian"Linux raspberrypi 3.6.11+ #366 PREEMPTStock clock
85.5 Mbps0.0%us, 2.8%sy, 69.7%siDebian Wheezy "Raspbian"Linux raspberrypi 3.6.11+ #366 PREEMPTOverclocked via raspi_config "turbo"
87.1 Mbps4.3%us, 4.3%sy, 52.1%siDebian Wheezy "Raspbian"Linux raspberrypi 3.6.11+ #366 PREEMPTOverclocked arm_freq 1100 core_freq 500 sdram_freq 600 over_voltage 6
62.1 Mbps0.0%us, 0.1%sy, 99.2%siDebian Wheezy "Raspbian"Linux raspberrypi 3.6.11+ #366 PREEMPTOverclocked arm_freq 1100 core_freq 500 sdram_freq 600 over_voltage 6 - SNAT enabled
电源表格中列出了用5V电源线时同时执行各种任务所消耗电量.
进程电量使用 (mA)备注
启动 (无外设)120-400Taken from Agilent lab power supply readings. No composite/keyboard/mouse/network connected. HDMI was enabled but the cable was disconnected.
待机(HDMI开启,网络开启)370Taken from Agilent lab power supply readings. No composite/keyboard/mouse connected.
待机(HDMI开启,网络关闭)320Taken from Agilent lab power supply readings. No composite/keyboard/mouse/network connected.
播放1080p视频750About 3h on 4 AA batteries
编辑文本-Same as idling
编译C代码(QuakeIII)364Measured with a Fluke 87V multimeter. No composite, keyboard, mouse, or network connected. HDMI was connected. Current peaked at 418mA.
运行Python程序 ?
玩Quake III461Measured with a Fluke 87V multimeter. No composite or network connected. HDMI was connected. Current peaks at 551mA.

树老大 发表于 2013-7-1 23:02:30

幸苦了啊

MetalX 发表于 2013-7-3 00:37:06

悲剧,为啥我pcDuino跑出来还不如LZ贴出来的这个参数呢...莫非1GHz的A10还是没法跟700MHz的BCM比...

fashoionxu 发表于 2013-7-3 17:38:06

很好的资料。

unlucky 发表于 2013-8-22 16:34:47

不太看得懂啊

Tomcat猫纸 发表于 2013-8-23 08:49:05

MetalX 发表于 2013-7-3 00:37 static/image/common/back.gif
悲剧,为啥我pcDuino跑出来还不如LZ贴出来的这个参数呢...莫非1GHz的A10还是没法跟700MHz的BCM比...

构架、指令集等等不同
页: [1]
查看完整版本: 树莓派性能测试