mkl Note: Performance tuning

2010年2月5日星期五

Performance tuning

Hardware optimization
- CPU Speed
- RAM speed
- Bus (AXI/AHB) burst access
- Ensure cache (L1/L2) is running in shortest delay/in fastest mode
Software optimization
- compilier optimization (-O -O3)
- Algorithm improving
- Have all the buffers use cache, unless it is faster to use NCNB buffer, but that is usually not the case.

ARM11 MPCore 600MHz, DDR2 400MHz, with L1 and L2 cache enabled. 32-byte cacheline
time measured with spectrum by toggling GPIO, NCNB buffer allocated by dma_alloc_coherent, cacheable buffer allocated by kmalloc.

no op, just toggling GPIO: on 97ns, off 97ns
32-bit write-then-read (ncnb/cacheable+flush): 493/190+335.1 ns =>(-97) 396/93+238 ns
8 x 32-bit write-then-read (ncnb/cacheable+flush): 2082/754+372 ns =>(-97) 1985/657+275 ns

a cache flush is supposed equivalent to 8 32-bit memory write(32-byte cacheline) , but it seems that somehow it's not. And so do the 8 ncnb access are not 8 times of 1 ncnb access.

Though the result might not be so accurate, but if the driver would access a cacheline more than twice, then it is worth to use cacheable buffer. In some extreme case one might like to try it out, but cacheable buffer might still be the one

沒有留言:

張貼留言

mkl Note

2010年2月5日星期五

Performance tuning

沒有留言:

文章分類

網誌存檔

關於我自己

mkl Note

2010年2月5日 星期五

Performance tuning

沒有留言:

文章分類

網誌存檔

關於我自己

2010年2月5日星期五