2010年2月5日 星期五

Performance tuning

  • Hardware optimization
    • CPU Speed
    • RAM speed
    • Bus (AXI/AHB) burst access
    • Ensure cache (L1/L2) is running in shortest delay/in fastest mode
  • Software optimization
    • compilier optimization (-O -O3)
    • Algorithm improving
    • Have all the buffers use cache, unless it is faster to use NCNB buffer, but that is usually not the case.

ARM11 MPCore 600MHz, DDR2 400MHz, with L1 and L2 cache enabled. 32-byte cacheline
time measured with spectrum by toggling GPIO, NCNB buffer allocated by dma_alloc_coherent, cacheable buffer allocated by kmalloc.

no op, just toggling GPIO: on 97ns, off 97ns
32-bit write-then-read (ncnb/cacheable+flush): 493/190+335.1 ns =>(-97) 396/93+238 ns
8 x 32-bit write-then-read (ncnb/cacheable+flush): 2082/754+372 ns =>(-97) 1985/657+275 ns

a cache flush is supposed equivalent to 8 32-bit memory write(32-byte cacheline) , but it seems that somehow it's not. And so do the 8 ncnb access are not 8 times of 1 ncnb access.

Though the result might not be so accurate, but if the driver would access a cacheline more than twice, then it is worth to use cacheable buffer. In some extreme case one might like to try it out, but cacheable buffer might still be the one

沒有留言: