2009年10月28日 星期三

[ARM] memory sync function _for_cpu series functions won't flash/invalidat memory as requested

linux-2.6.31.1, ARM, without CONFIG_DMABOUNCE
(it seems the code is changed at 2.6.28)

I ported my driver from linux-2.6.27.4 to linux-2.6.31.1, and found that some functions does not work. After some debugging, I found that only the *_for_device series function would actually do the memory invalidate/flush, while the *_for_cpu don't.

dma_sync_single
dma_sync_single_for_cpu

pci_dma_sync_single_for_cpu
dma_sync_single_for_cpu
dma_sync_single_range_for_cpu
dmabounce_sync_for_cpu
==> nothing done.

pci_dma_sync_single_for_device
dma_sync_single_for_device
dma_sync_single_range_for_device

dmabounce_sync_for_device
dma_cache_maint


Any thing call to dma_bounce_sync_for_* will result in a return 1, and nothing else.

With CONFIG_DMABOUNCE not defined:

arch/arm/include/asm/dma-mapping.h
/**
* dma_sync_single_range_for_cpu
* @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
* @handle: DMA address of buffer
* @offset: offset of region to start sync
* @size: size of region to sync
* @dir: DMA transfer direction (same as passed to dma_map_single)
*
* Make physical memory consistent for a single streaming mode DMA
* translation after a transfer.
*
* If you perform a dma_map_single() but wish to interrogate the
* buffer using the cpu, yet do not wish to teardown the PCI dma
* mapping, you must call this function before doing so. At the
* next point you give the PCI dma address back to the card, you
* must first the perform a dma_sync_for_device, and then the
* device again owns the buffer.

*/
static inline void dma_sync_single_range_for_cpu(struct device *dev,
dma_addr_t handle, unsigned long offset, size_t size,
enum dma_data_direction dir)
{
BUG_ON(!valid_dma_direction(dir));

dmabounce_sync_for_cpu(dev, handle, offset, size, dir);
}



arch/arm/include/asm/dma-mapping.h
#ifdef CONFIG_DMABOUNCE

(....................)

#else
static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr,
unsigned long offset, size_t size, enum dma_data_direction dir)
{
return 1;
}

static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr,
unsigned long offset, size_t size, enum dma_data_direction dir)
{
return 1;
}




[ARM] dma: don't touch cache on dma_*_for_cpu()
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.29.y.git;a=commitdiff;h=309dbbabee7b19e003e1ba4b98f43d28f390a84e
author Russell King <rmk@dyn-67.arm.linux.org.uk>
Mon, 29 Sep 2008 18:50:59 +0000 (19:50 +0100)
committer Russell King <rmk+kernel@arm.linux.org.uk>
Tue, 30 Sep 2008 10:01:36 +0000 (11:01 +0100)

As per the dma_unmap_* calls, we don't touch the cache when a DMA
buffer transitions from device to CPU ownership.
Presently, no
problems have been identified with speculative cache prefetching
which in itself is a new feature in later architectures. We may
have to revisit the DMA API later for these architectures anyway.


[PATCH 16/25] [ARM] add highmem support to DMA mapping functions
http://lists.arm.linux.org.uk/lurker/message/20081011.035554.0e3d3b9b.en.html
talks about the reason why don't touch cache on dma_*_for_cpu(), a long thread.

http://lists.arm.linux.org.uk/lurker/message/20080926.034224.1a2bec3e.en.html
BTW what's the point of calling dma_cache_maint() in
dma_sync_single_range_for_cpu()? When the CPU regains ownership of the
buffer, the cache is always clean making this call useless.


http://lists.arm.linux.org.uk/lurker/message/20080929.164105.eb5f7e5a.en.html
Let's consider a DMA buffer which starts in the middle of a cache line
meant to receive data from a device.

Upon dma_map_single() the first cache line is first cleaned then the
whole buffer is invalidated.

Upon dma_unmap_single() nothing is done. However, if the first part of
the shared cache line gets a miss, the whole cache line could be
repopulated _before_ the device has stored its data in memory
corresponding to the second half of the same cache line, hence driver
will obtain bad data from the DMA buffer.

If instead we use dma_sync_single_for_cpu() which currently perform
another cleaning of the shared cache line and invalidation of the whole
buffer then the issue above won't occur. things can be even worse when
that first half cache line gets dirty though. Upon the cleaning of that
first cache line, the device data stored in memory corresponding to the
second half cache line will be overwritten and lost. Not cleaning the
first cache line and simply invalidating the whole buffer would preserve
integrity of the device data but will lose the first half cache line
content which is not any better, and only if cache eviction doesn't
happen first.

So I don't see how the cache maintenance performed in
dma_sync_single_range_for_cpu() solves anything besides wasting cycles.
Sure, DMA buffers may span cache lines that overlaps with other data,
but if that data is touched while the DMA buffer is owned by the device
then we're screwed anyway, and I don't see any solution for that besides
completely forbiding DMA mappings that don't start and end on cache line
boundaries as well as disabling cache prefetching.

Since this doesn't appear to be a significant issue in practice given
that luck is on our side and things just work anyway, we could remove
that false "protection" from dma_sync_single_range_for_cpu().

沒有留言: