2008年5月29日 星期四

Cache coherency problems

cache coherency problems could happened in SMP or DMA presented situations

Cache thrashing
Cache sloshing
False cache line sharing


Performance Benchmarks Guide - Windows NT 4.0
http://www.microsoft.com/technet/archive/winntas/evaluate/perfben.mspx?mfr=true

SMP systems that provide separate caches for each processor introduce additional issues that affect application performance. Memory caches must maintain a consistent view of memory for all processors. This is accomplished by dividing up memory into small chunks (called a cache line) and tracking the state of each chunk that is present in one of the caches. In order to update a cache line, a processor must first gain exclusive access to it by invalidating all other copies in other processor's caches. Once the processor has exclusive access to the cache line, it may safely update it. If the same cache line is being continuously updated from many different processors, the cache line will bounce from one processor's cache to another. Since the processor cannot complete the write instruction until its cache acquires exclusive access to the cache line, it must stall. This behavior is called cache sloshing, since the cache line "sloshes" from one processor's cache to another.

False cache line sharing
http://docs.hp.com/en/B6056-96006/ch13s02.html

(false cache line sharing)
It occurs whenever two or more threads in a parallel program are assigning different data items in the same cache line
How to avoid false cache line sharing ?- not let it happen in the first place
ensure there is no concurrent access (CPUs & device DMA) to the same cache line
  • allocate buffer slightly bigger (+ L1_CACHE_BYTES) than needed, shift start address to align next cache line(shift max L1_CACHE_BYTES), pad the end address to the next cache line
or simply disable cache


DMA, small buffers, and cache incoherence
http://lwn.net/Articles/2265/

The only way to deal with this problem is to not let it happen in the first place. A number of possibilities are being considered. One way, suggested by Roland, is to create a __dma_buffer attribute which can be used in the declaration of small buffers; on non-cache-coherent systems, this attribute would force the size and alignment of the buffer such that it would not share cache lines with any other data. Another approach is to require that all DMA buffers be allocated separately; the kernel memory allocation primitives already ensure that even the smallest buffers are properly aligned and padded. Yet another approach could be to simply disable caching for the page(s) in question while the operation is in progress; most architectures support this in their page tables. This approach could create performance problems, however (if the page in question has heavily-used data), and it could be complex.

David Miller, who wrote much of the current DMA code, has a different approach. He thinks that this kind of subtle cache issue is a trap for driver writers that should be simply avoided altogether. Rather than come up with new ways of working around incoherent caches, it's better to just change the rules and tell driver writers to allocate their small DMA buffers using the "PCI pool" interface. This interface, which was added in 2.4.4, was designed for just this purpose: allocating small buffers for DMA. Rather than make driver writers deal with this sort of cache coherence issue - and watch some of them get it wrong, David would bury it in the PCI pool code. While no real resolution has been proclaimed, this last option appears to be the likely outcome.
PCI DMA to small buffers on cache-incoherent arch
http://lwn.net/Articles/2266/
[Roland Dreier] Re: PCI DMA to small buffers on cache-incoherent arch
http://lwn.net/Articles/2269/

[David S. Miller] Re: PCI DMA to small buffers on cache-incoherent arch
http://lwn.net/Articles/2270/




System architectures
http://docs.hp.com/en/B3909-90015/ch02s01.html

Cache thrashing occurs when two or more data items that are frequently needed by the program both map to the same cache address. Each time one of the items is encached, it overwrites another needed item, causing cache misses and impairing data reuse.


False Cache Line Sharing vs "Cache Sloshing"
http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/30228078.aspx

Thoughts on Shared Caches
www.cs.wisc.edu/condor/CondorWeek2007/paradyn_presentations/jodom.ppt


沒有留言: