mkl Note: 2009

2009年12月28日星期一

Linux firmware

So it is required to bring up udev before using drivers that load its own firmware.

Documentation/firmware_class/README

request_firmware() hotplug interface:
------------------------------------

(..............)

High level behavior (mixed):
============================

kernel(driver): calls request_firmware(&fw_entry, $FIRMWARE, device)

userspace:
/sys/class/firmware/xxx/{loading,data} appear.
hotplug gets called with a firmware identifier in $FIRMWARE and the usual hotplug environment.
hotplug: echo 1 > /sys/class/firmware/xxx/loading

kernel: Discard any previous partial load.

userspace:
hotplug: cat appropriate_firmware_image > \
/sys/class/firmware/xxx/data

kernel: grows a buffer in PAGE_SIZE increments to hold the image as it
comes in.

userspace:
hotplug: echo 0 > /sys/class/firmware/xxx/loading

kernel: request_firmware() returns and the driver has the firmware
image in fw_entry->{data,size}. If something went wrong
request_firmware() returns non-zero and fw_entry is set to
NULL.

kernel(driver): Driver code calls release_firmware(fw_entry) releasing
the firmware image and any related resource.

2009年12月24日星期四

Linux kernel git sources

Check the MAINTAINERS first...

Russell King's git tree

http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm.git/

Catalin Marinas's git tree

git://linux-arm.org/linux-2.6-stable.git
git://linux-arm.org/linux-2.6.git

The kernel trees on linux-arm.org are all Catalin Marinas's (both master and devel branches).

Uwe Kleine-König

git://git.pengutronix.de/git/ukl/linux-2.6.git arm/booting

Anton Vorontsov

git://git.infradead.org/users/cbou/linux-cns3xxx.git master

Jeff Garzik (ata)
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git

Linux Scheduler: 娘家算什麼，看CFS vs SD才熱血嘛!!

CFS排班器初探
http://zylix666.blogspot.com/2007/10/cfs.html

不過背後還有一段kernel developers 之間的鬥爭，簡單來講就是Mingo（CFS開發者)把CK (SD-scheduler開發者)給鬥倒了，然後虛情假意讚許一下CK所貢獻的心力。

CK不爽之餘在LKML上多說了幾句話，惹得Linus不太爽，就決定不再採納SD-scheduler的開發。CK在悲憤下也發表了退黨宣言，並且言不由衷的宣稱這是因為他人生本就另有打算，與此次鬥爭無關...

難得看Linux的東西跟看連續劇一樣令人熱血
不過真的很長，不只要搬板凳，還要帶爆米花才行...
(還有翻譯機...Yahoo字典也行啦~~)

這就好像是FOSS一直很詬病的software IP, 有些廠商就靠有幾個IP來打壓Linux, 實在很沒道理；但是凡事不能無限上綱，我認為這種人就是蟑螂，偷別人的idea，自己做出來，還自己認可，"自我感覺良好"嗎? 如果說今天CK沒在動了，mingo拿他的idea來做我覺得OK；或者mingo不是審核者，經過較公開的討論，觀感也會好一點。

Linux: The Completely Fair Scheduler
April 18, 2007
http://kerneltrap.org/node/8059

Linux: CFS Updates, -v20
August 23, 2007
http://kerneltrap.org/Linux/CFS_Updates_v20

CFS Updates
September 25, 2007
http://kerneltrap.org/Linux/CFS_Updates

Some issues while running Linux SMP on ARM11MPCore

http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/006650.html

I'm using ARM11 MPCore with 2 CPU, Linux-2.6.31.1, SMP enabled, L1 enabled, L2 disabled

Under SMP environment, I have observed following issues:

Sometimes, console became extremely slow, print 1 character for 1-2 seconds
RVDS say that both CPU are idling. kernel seems find because messages response to inserting USB flash is quick and correct. fixed
Sometimes, the Linux console halt and canot accept any input.
RVDS say that both CPU are idling. kernel seems find because messages response to inserting USB flash is quick and correct. should be fixed with case 1
Sometimes, the test stop with no reason or some fault like segmantation fault and return to console prompt or login prompt.
Sometimes, the test stop with no reason, but not returning to console prompt. The console can accept input, but no further response, nor prompt.

RVDS says that one CPU is idling, the other is in IRQ context, at entry-armv.S(676) after __pabt_usr, seems like it's keeps getting prefetch abort.

I can duplicate case 1 by keep inserting a simple test module.

#include <linux/init.h>
#include <linux/module.h>


static int __init MYDRIVER_init(void)
{

printk("%s: \n",__func__);
return 0;
}

static void __exit MYDRIVER_exit(void)
{
printk("%s: \n",__func__);
}

MODULE_AUTHOR("Mac Lin");
MODULE_DESCRIPTION("MYDRIVER");
MODULE_LICENSE("GPL");

module_init(MYDRIVER_init);
module_exit(MYDRIVER_exit);

keep insert and remove modules like below:

module=mydriver;modprobe ${module};rmmod ${module}; (...repeat many times...i have it 30 times..) modprobe ${module};rmmod ${module};

and keep issuing it for, says , 10 times, without waiting the previous command to complete.
then at some point I'll got the case 1.

following command won't do, it just can keep runninng.

module=mydriver;while : ; do modprobe ${module};rmmod ${module};done;

After some tracking, I thought that CONFIG_LOCAL_TIMERS has strange behavior. I disable it, and the situation changed. It's harder to get case 1, but still have some issues. for example, it crash like the following, and it became case 3

[ 57.090000] MYDRIVER_exit:
[ 57.110000] MYDRIVER_init:
[ 57.150000] MYDRIVER_exit:
[ 57.180000] MYDRIVER_init:
[ 57.210000] MYDRIVER_exit:
[ 57.240000] MYDRIVER_init:
[ 57.270000] MYDRIVER_exit:
[ 57.300000] MYDRIVER_init:
[ 57.320000] sh: unhandled page fault (11) at 0x000b7dfc, code 0x017
[ 57.320000] pgd = c78b4000
[ 57.330000] [000b7dfc] *pgd=038f4031, *pte=00000000, *ppte=00000000
[ 57.350000]
[ 57.360000] Pid: 350, comm: sh
[ 57.370000] CPU: 1 Not tainted (2.6.31.1-XXXX1 #53)
[ 57.390000] PC is at 0x40058d04
[ 57.400000] LR is at 0xb7df8
[ 57.400000] pc : [<40058d04>] lr : [<000b7df8>] psr: 60000010
[ 57.400000] sp : bec8b6b8 ip : 0001d020 fp : 00000000
[ 57.440000] r10: 00000000 r9 : bec8b728 r8 : 00000002
[ 57.460000] r7 : 0009c038 r6 : 0001d028 r5 : 4009fe40 r4 : 400a02f8
[ 57.470000] r3 : 00000049 r2 : 0009add8 r1 : 0009add8 r0 : 00000049
[ 57.490000] Flags: nZCv IRQs on FIQs on Mode USER_32 ISA ARM Segment user
[ 57.520000] Control: 00c5787d Table: 078b400a DAC: 00000015
Segmentation fault

Without DCache and CONFIG_LOCAL_TIMERS, I can repeat the above procedure for 216 seconds, then it halted as case 4.

Case 1 also exists.

It means without DCache and CONFIG_LOCAL_TIMERS cannot avoid them, but only mitigate a little.

BTW, I have done a quick port to linux-2.6.33-rc1, branch master, based on commit f2d9a06. With DCache and CONFIG_LOCAL_TIMERS, I have seen case 1, which means this issue is not fixed yet.

Without SMP, I haven't seen such issue yet.

So currently all the clues led to SMP.

http://lists.infradead.org/pipermail/linux-arm-kernel/2010-January/006901.html

(......................)
(case 1 and 2) These two sounds like a problem with interrupts - userspace console IO is interrupt driven, whereas kernel console IO is not.
(......................)
It could be something to do with write allocate caches - we don't support these particularly well in the kernel, and I wouldn't be surprised if you've found some problem there.

The fact that it only happens in SMP mode rather points at that, because that's one of the few hardware configurations which does have write allocate caches. To confirm this, we need someone who can run your tests on a UP platform which has write allocate caches...

http://lists.infradead.org/pipermail/linux-arm-kernel/2010-January/006945.html
http://lists.infradead.org/pipermail/linux-arm-kernel/2010-January/006955.html
Neither without SMP nor SMP with maxcpus=1 have the same behavior.

Fix for case 1 and case 2
http://lists.infradead.org/pipermail/linux-arm-kernel/2010-January/007052.html
Thanks for Russell's advice, after some tracing, I found that my IER (Interrupt Enable Register) of the serial port is 0 under case 1!!

Case 2 is actually the same with case 1. Case 1 would come first, if I don't keep input things and let it finish its slow printing, it would then become case 2.

UART_BUG_THRE are detected and enabled on my platform, causing serial8250_backup_timeout to be used.

There are many places that do ( get IER, clear IER, restore IER ), like serial8250_console_write called by printk, and serial8250_backup_timeout. serial8250_backup_timeout is not protected by spinlock, causing the race condition, and result in wrong IER value.

Following patch fix this issue. Case 3 and Case 4 are still often seen, but not case 1 and case 2.

diff --git a/kernels/linux-2.6.31.1-X/drivers/serial/8250.c b/kernels/linux-2.6.31.1-X/drivers/serial/8250.c
index 288a0e4..55602c3 100644
--- a/kernels/linux-2.6.31.1-cavm1/drivers/serial/8250.c
+++ b/kernels/linux-2.6.31.1-cavm1/drivers/serial/8250.c
@@ -1752,6 +1758,8 @@ static void serial8250_backup_timeout(unsigned long data)
  unsigned int iir, ier = 0, lsr;
  unsigned long flags;
 
+
+ spin_lock_irqsave(&up->port.lock, flags);
  /*
   * Must disable interrupts or else we risk racing with the interrupt
   * based handler.
@@ -1769,10 +1777,8 @@ static void serial8250_backup_timeout(unsigned long data)
   * the "Diva" UART used on the management processor on many HP
   * ia64 and parisc boxes.
   */
- spin_lock_irqsave(&up->port.lock, flags);
  lsr = serial_in(up, UART_LSR);
  up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS;
- spin_unlock_irqrestore(&up->port.lock, flags);
  if ((iir & UART_IIR_NO_INT) && (up->ier & UART_IER_THRI) &&
      (!uart_circ_empty(&up->port.info->xmit) || up->port.x_char) &&
      (lsr & UART_LSR_THRE)) {
@@ -1780,12 +1786,14 @@ static void serial8250_backup_timeout(unsigned long data)
   iir |= UART_IIR_THRI;
  }
 
- if (!(iir & UART_IIR_NO_INT))
-  serial8250_handle_port(up);
-
  if (is_real_interrupt(up->port.irq))
   serial_out(up, UART_IER, ier);
 
+ spin_unlock_irqrestore(&up->port.lock, flags);
+
+ if (!(iir & UART_IIR_NO_INT))
+  serial8250_handle_port(up);
+
  /* Standard timer interval plus 0.2s to keep the port running */
  mod_timer(&up->timer,
   jiffies + poll_timeout(up->port.timeout) + HZ / 5);

SMP issues with 8250.c‏
http://old.nabble.com/SMP-issues-with-8250.c%E2%80%8F-to27090634.html
http://www.spinics.net/lists/linux-serial/msg02106.html

2009年12月23日星期三

喲哪桑 Speaking 之專案工作日誌

魔鬼教頭的工程師基本教練
http://jonathanspeaking.blogspot.com/2007/03/blog-post.html

工程師！你自認為PRO嗎?
http://jonathanspeaking.blogspot.com/2007/02/pro.html

我要點一塊半的漢堡、可樂加薯條！
http://jonathanspeaking.blogspot.com/2009/11/blog-post_20.html#more

2009年12月22日星期二

Kernel Images: from building to booting

I made those picture about 3 years ago, but it looks like the same, and quiet useful!!!

Same picture, just in another layout.

bootpImage Layout

(The KERNEL_PHYS, and the 2nd step in diagram are not in vanila kernel.)

arch/arm/boot/Makefile

ZRELADDR := $(zreladdr-y)
PARAMS_PHYS := $(params_phys-y)
INITRD_PHYS := $(initrd_phys-y)
KERNEL_PHYS := $(kernel_phys-y)

arch/arm/mach-ARCH/Makefile.boot

zreladdr-y := 0x00008000
params_phys-y := 0x00000100
initrd_phys-y := 0x00C00000
kernel_phys-y := 0x00600000

arch/arm/Makefile

textofs-y := 0x00008000
(...........)
TEXT_OFFSET := $(textofs-y)

Re: About TEXTADDR, ZTEXTADDR, PAGE_OFFSET etc...
http://lists.arm.linux.org.uk/lurker/message/20010723.185051.94ce743c.en.html

defining ZRELADDR as PHYS_OFFSET + TEXT_OFFSET
http://lists.arm.linux.org.uk/lurker/message/20100127.101228.78a1533e.en.html

Linux bluetooth for console-only devices

File transfer without X

I was using a NB with Ubuntu, and a development board, with ARM CPU, ARM Debian installed, which currently only have serial console.

bluetooth
bluez
bluez-gnome
obexftp
obexd-server

DUT to other device

insert required module
if DBus/Bluetoothd is not up yet, bring it up.
/etc/init.d/dbus restart;
/etc/init.d/bluetooth restart;
hcitool dev
l2test
(For unknown reason, I cannot find l2test in any package of Ubuntu, so I download a new bluez-4.58, running configure with --enable-test to get l2test. It works just fine.)
l2test -I 2000 -r&
l2test -s 00:11:67:8A:A9:26
./l2test -s 0:11:67:8A:A9:27
start authorization agent
bluetooth-agent 0000 &
do file transfer

bluetooth-sendto --dest=0E:70:24:91:66:01 video.mp4
(require X)

obexftp -b 0E:70:24:91:66:01 -p 2008-10-09-211.jpg
(doesn't require X)

other device to DUT

enable inquiry scan
hciconfig hci0 piscan
enable file transfer server
obexftpd -b /
(doesn't require X)

obex-data-server
(require X)

BlueZ
http://www.bluez.org/

Linux BlueZ Howto
http://jeremythompson.uklinux.net/RH8-0/bluezhowto.pdf

The Penguin with the BlueZ
http://fedoraproject.org/w/uploads/4/40/FUDCon_FUDCon2_FUDCon2MarcelHoltmann.pdf

BlueZ aware applications
http://wiki.bluez.org/wiki/UsingBluez

BlueZ: HOWTO/Authorization
http://wiki.bluez.org/wiki/HOWTO/Authorization

BlueZ: Bluetooth Services
http://wiki.bluez.org/wiki/Services

pastebin - collaborative debugging tool
http://pastebin.com

2009年12月15日星期二

git-svn

Reconstructing git-svn metadata after a git clone
http://www.spinics.net/lists/git/msg130949.html

git clone git://git.webkit.org/WebKit.git WebKit
cd WebKit
git svn init -T trunk http://svn.webkit.org/repository/webkit
git update-ref refs/remotes/trunk origin/master

(there are other means to get the svn metadata...some works, some don't...)

git svn clone --stdlayout file:///tmp/test/hello hello-git

git svn clone --username your-name -s https://your-project.googlecode.com/svn
# older versions of git: replace "-s" with "-Ttrunk -bbranches -ttags"
(perform git operations)
git svn rebase # think "svn update"
git svn dcommit # think "svn commit"

使用 git-svn 整合 git 與 svn
http://blog.kanru.info/archives/466/comment-page-1

git-svn(1) Manual Page
http://www.kernel.org/pub/software/scm/git/docs/git-svn.html

[Linux][軟體] Git-svn 使用簡單介紹
http://antontw.blogspot.com/2008/05/linux-git-svn.html

Develop with Git on a Google Code Project
http://google-opensource.blogspot.com/2008/05/develop-with-git-on-google-code-project.html

2009年12月14日星期一

Cache

Fact
VA to PA is aligned by page size, therefore any VA/PA pair has the same page offset, i.e. for 4KB page size, their 0-11 bits are the same

Assumption
Cache of 32KB, 4-way, 32B cache line

8KB per way = 2^13
32B cache line = 2^5

Page size 4KB = 2^12

[wiki] Harvard architecture
http://en.wikipedia.org/wiki/Harvard_architecture

[wiki] Cache
http://en.wikipedia.org/wiki/Cache

[wiki] Cache coherence
http://en.wikipedia.org/wiki/Cache_coherency

[wiki] CPU cache
http://en.wikipedia.org/wiki/CPU_cache

Cache Aliases 小註解
http://www.cash.idv.tw/wordpress/?p=2273

Understanding Caching
http://www.linuxjournal.com/article/7105

kmalloc never allocates two regions that overlap in a cache line

(...........)
PIPT

get virt_addr, address translation to get phy_addr, lookup in cache to get data
slow; address translation and lookup in cache have to be sequential
no alias
VIVT

get virt_addr, lookup in cache to get data
fast; does not need address translation
alias; don't know if multiple virt_addr mapped to the same phy_addr
VIPT

get virt_addr, do (cache lookup) and (address translation)
faster than PIPT; (cache lookup) and (address translation) can be done in parallel
COULD be no alias; alias can be detected by seeing same tag (phy_addr) in multiple index(virt_addr), IF the hardware have implemented so.
larger tag; for the phy_addr
(...........)

The Aliasing Problem

Any time the kernel sets up more than one virtual mapping for the same physical page, cache line aliasing may occur. The kernel is careful to avoid aliasing, so it usually occurs only in one particular instance: when the user mmaps a file. Here, the kernel has one virtual address for pages of the file in the page cache, and the user may have one or more different virtual addresses. This is possible because nothing prevents the user from mmaping the file at multiple locations.

When a file is mmaped, the kernel adds the mapping to one of the inode's lists: i_mmap, for maps that cannot change the underlying data, or i_mmap_shared, for maps that can change the file's data. The API for bringing the cache aliases of a page into sync is:

void flush_dcache_page(struct page *page);

This API must be called every time data on the page is altered by the kernel, and it should be called before reading data from the page if page->mapping->i_mmap_shared is not empty. In architecture-specific code, flush_dcache_page loops over the i_mmap_shared list and flushes the cache data. It then loops over the i_mmap list and invalidates it, thus bringing all the aliases into sync.

Separate Instruction and Data Caches

In their quest for efficiency, processors often have separate caches for the instructions they execute and the data on which they operate. Often, these caches are separate mechanisms, and a data write may not be seen by the instruction cache. This causes problems if you are trying to execute instructions you just wrote into memory, for example, during module loading or when using a trampoline(?). You must use the following API:

void
flush_icache_range(unsigned long start,
unsigned long end);

to ensure that the instructions are seen by the instruction cache prior to execution. start and end are the starting and ending addresses, respectively, of the block of memory you modified to contain your instructions.

(...........)

(MY COMMENT)
For the SMP, USUALLY, there should be some "hardware" would somehow sync the L1 cache in all CPUs or maintain the coherency of the caches, such that one inv/flush/clean in one CPU, all CPU caches get updated. However for SOME that doesn't (like ARM MPCore), one have to do the operation in EVERY CPU to ensure all CPU get updated, via mechanisms like IPI(Inter Processor Interrupt).

for more details in MPCore, refer to
ARM11 MPCore DMA DCache issue with Linux SMP
http://mkl-note.blogspot.com/2009/12/linux-arm11-mpcore-smp-cache-issue.html

The MIPS Cache Architecture
http://people.openrays.org/~comcat/mydoc/mips.cache.arch.pdf
(簡體中文，作者也是kernel developer，因此格外好讀，不但解釋的很清楚，而且有很多實例)

Writting data
Cache Hit
Write Through
write to cache and RAM
Write Back
write to cache only
Cache Miss
Write Allocate
allocate cache, read data from RAM, write data to cache
No Write Allocate
write to next level Cache/Memory directly
Read
Cache Hit
send data to CPU
Cache Miss
Read Allocate
allocate cache, read data from RAM, send data to CPU
No Read Allocate
read data from RAM and send data to CPU directly
(.............)

5. Cache Aliases Issue
For VIPT, Cache Alias Issue exists if WAY_SIZE > PAGE_SIZE.
Page address is always page-size-aligned, no matter VA or PA.
WAY_SIZE (8K, 2^13) = CACHE_SIZE (32K) / WAY (4)
PAGE_SIZE (4K, 2^12)

INDEX_SIZE=log2(WAY_SIZE) (13)
VA lower INDEX_SIZE bits are used as index.

color bit=log2(WAY_SIZE)-log2(PAGE_SIZE) = 13-12 = 1
if 2 VA could map to the same PA:
if they are of the same color, VIPT are all the same, so both will be in the same cache slot. No problem.
if they are of different color, they will be located in 2 different cache slot, which is the Cache Alias Issue

Solution
remove color bit: let PAGE_SIZE = WAY_SIZE
only uses the VAs of the same color to map a PA
somehow complicated
flush Cache if there might be Cache Alias

Focus (mostly occurs when...)
copy_to_user_page/copy_from_user_page
fork
while forking, child process would COW(copy-on-write) some pages of some parent process. While copying, the VA that child process uses is usually different from VA of parent process, which introdueces cache alias.

ARM11 MPCore? Processor, Technical Reference Manual, Revision: r2p0

Each MP11 CPU features:
‧ an integer unit with integral EmbeddedICE-RT logic
‧ an 8-stage pipeline
‧ branch prediction with return stack
‧ coprocessors 14 and 15
‧ Instruction and Data Memory Management Units (MMUs), managed using MicroTLB structures backed by a unified main TLB

(32K 4way-associated for each I/D cache)

‧ Instruction and data caches, including a non-blocking data cache with Hit-Under-Miss (HUM)
‧ a data cache that is physically indexed, physically tagged (PIPT)
‧ a data cache that is write back, write allocate only
‧ an instruction cache that is virtually indexed, physically tagged (VIPT)
‧ 32-bit interface to the instruction cache and 64-bit interface to the data cache
‧ hardware support for data cache coherency (among CPUs, but not with DMA)
‧ Vector Floating-Point (VFP) coprocessor support
‧ JTAG-based debug.

write-allocate

CSE 141 Ungraded Homework #5 Answer Sheet
http://cseweb.ucsd.edu/~carter/141/hw5ans.html

NOTE FROM GREG: Last week in section we discussed how caches deal with stores. Specifically, we looked at caches that are write-back vs. write-through, and write-allocate vs. write around. I believe I may have oversimplified things and would like to provide some clarification. It is true that write-back vs. write-through deals with what happens when you write to the cache and find the data present in the cache. However, some students may have been a little confused in reading the solutions to HW #5 when we actually set the dirty bit on a cache miss. The natural question is, "Hey! I thought we only worry about write-back vs. write-through when we have a cache HIT." The potential tricky point here is what happens if your cache is write-back and write-allocate? In this situation, suppose you have a cache miss. The write-allocate policy of the cache will load the data in question into the cache, and the write-back policy will cause only the cache copy to be modified, also turning on the corresponding dirty bit. On the other hand, if your cache is write-through and write allocate, the same thing will happen, but then both the cache copy and the memory copy will be modified.

Secondly, during section we only spoke of a write-allocate cache in terms of a write-allocate and write-through cache. From the previous paragraph, just make note that it is possible for a write-allocate cache to also be write-back, in which case it is not necessarily true that both the cache copy and memory copy are updated on a write. Sorry for the confusion!

VIVT

[wiki] CPU cache: Virtual tags and vhints
http://en.wikipedia.org/wiki/CPU_cache#Virtual_tags_and_vhints

?ARM VIVT看linux的cache ?理
http://docs.google.com/Doc?id=dcbsxfpf_282csrs2pfn
http://blog.chinaunix.net/u2/79526/showart_1200081.html

ARM Architecture Support
http://msdn.microsoft.com/en-us/library/bb905767.aspx

On ARMv4 and ARMv5 processors, cache is organized as a virtual-indexed, virtual-tagged (VIVT) cache in which both the index and the tag are based on the virtual address. The main advantage of this method is that cache lookups are faster because the translation look-aside buffer (TLB) is not involved in matching cache lines for a virtual address. However, this caching method does require more frequent cache flushing because of cache aliasing, in which the same physical address can be mapped to multiple virtual addresses.

On ARMv6 and ARMv7 processors, cache is organized as a virtual-indexed, physical-tagged (VIPT) cache. The cache line index is derived from the virtual address. However, the tag is specified by using the physical address. The main advantage is that cache aliasing is not an issue because every physical address has a unique tag in the cache. However, a cache entry cannot be determined to be valid until the TLB has translated the virtual address to a physical address that matches the tag. Generally, the TLB lookup cost offsets the performance gain achieved by avoiding cache aliasing.

(......................)

For ARMv6 and ARMv7 processors, cache flushing in thread switching to a process other than the current active process is limited to the following instances:

* The hardware does not support VIPT I-cache: In ARMv6 and ARMv7, it is optional for I-cache to be VIPT. Data cache is VIPT or physically-indexed and physically-tagged (PIPT) in MPCore systems. If the hardware does not support VIPT I-cache, the OS flushes the I-cache.
* The system is out of address-space identifiers (ASIDs) for each virtual address: In this case, the OS flushes the whole TLB.

This means that, whereas on ARMv4 and ARMv5 processors the whole cache, I-cache, D-cache, and TLB, is flushed on every thread switch to a different process. On ARMv6 and ARMv7 processors, the D-cache is never flushed on thread switch. The I-cache is flushed only if the processor does not support VIPT cache. The TLB is flushed only if all 255 supported ASIDs have been used. This reduction of cache flushes should improve overall system performance.

In addition, moving to VIPT has performance advantages for the following OS features in CE 6.0:

* Memory-mapped files: On an ARMv4 or ARMv5 system, all read/write views are marked as uncached to prevent aliasing. Marking the views as uncached affects overall system performance. However, in VIVT, you must prevent aliasing. On ARMv6 and ARMv7 systems, views are marked as cached.
* VirtualAllocCopyEx: In CE 6.0, if a kernel mode driver creates an explicit alias in which two virtual addresses map to the same physical address by using VirtualAllocCopyEx, the OS marks both the source and destination addresses as uncached to avoid cache aliasing on ARMv4 and ARMv5 systems. On ARMv6 and ARMv7 systems, source and destination addresses are marked as cached. Even though this function can be called only from kernel mode, this affects both kernel-mode and user-mode drivers. Device Manager copies the data only for user-mode drivers.

Linux barriers and ARM barriers

./Documentation/memory-barriers.txt

SMP barriers semantics
http://eeek.borgchat.net/lists/linux-arch/msg09402.html
http://marc.info/?l=linux-arch&m=126752718913718&w=2
http://www.spinics.net/lists/linux-arch/msg09402.html
http://article.gmane.org/gmane.linux.kernel.cross-arch/5250

http://www.spinics.net/lists/linux-arch/msg09406.html

The SMP barriers are only required to order cacheable accesses. The plain (non-SMP) barriers (mb, wmb, rmb) are required to order both cacheable and non-cacheable accesses.

ARM11 MPCore™ Processor r2p0 Technical Reference Manual

Data Synchronization Barrier
The Data Synchronization Barrier (DSB) operation acts as a special kind of memory barrier. In the program flow, the DSB occurs at the MCR instruction that performs the DSB. The DSB completes when:
all explicit reads and writes before the DSB complete
all Cache, Branch predictor and TLB maintenance operations before the DSB complete.
No instruction after the DSB can execute until the DSB completes.

Data Memory Barrier
The Data Memory Barrier (DMB) is a general memory barrier with the following behavior. This description considers the program flow as executing instructions in program order. The DMB occurs at the MCR instruction that performs the DMB.
Any explicit memory access by an instruction before the DMB is globally observed before any memory accesses caused by an instruction after the DMB.
The DMB has no effect on the ordering of any other instructions executing on the processor.
As such, DMB ensures the apparent order of the explicit memory operations before and after the DMB instruction, but does not ensure the completion of those memory operations. For more information see the ARM Architecture Reference Manual.

ARM: Change the mandatory barriers implementation
http://www.spinics.net/lists/arm-kernel/msg84605.html

The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
systems for things like ordering Normal Non-cacheable memory accesses
with DMA transfer (via Device memory writes). The current implementation
uses dmb() for mb() and friends but this is not sufficient. The DMB only
ensures the relative ordering of the observability of accesses by other
processors or devices acting as masters. In case of DMA transfers
started by writes to device memory, the relative ordering is not ensured
because accesses to slave ports of a device are not considered
observable by the DMB definition.

A DSB is required for the data to reach the main memory (even if mapped
as Normal Non-cacheable) before the device receives the notification to
begin the transfer. Furthermore, some L2 cache controllers (like L2x0 or
PL310) buffer stores to Normal Non-cacheable memory and this would need
to be drained with the outer_sync() function call.

The patch also allows platforms to define their own mandatory barriers
implementation by selecting CONFIG_ARCH_HAS_BARRIERS and providing a
mach/barriers.h file.

Note that the SMP barriers are unchanged (being DMBs as before) since
they are only guaranteed to work with Normal Cacheable memory.

dma_free_coherent不能在IRQ disabled下跑..

dma_free_coherent不能在IRQ disabled下跑
但是有些東西非得在IRQ時free掉的話
把實際的free移到workqueue來跑即可

ref dev_kfree_skb_any, net_tx_action, completion_queue


LIST_HEAD(tofree_list);
spinlock_t tofree_list_lock=SPIN_LOCK_UNLOCKED;

struct free_param {
        struct list_head list;

        void* addr;
        dma_addr_t dma_addr;
        uint32_t size;
};
void free_list_agent_fn(void *data){
        struct list_head free_list;
        struct free_param *cur,*next;

        spin_lock(tofree_list_lock);
        list_add(&free_list,&tofree_list);
        list_del_init(&tofree_list);
        spin_unlock(tofree_list_lock);

        list_for_each_entry_safe(cur,next,&free_list,list){
                if(cur==&free_list) break;
                dma_free_coherent(NULL,cur->size,cur->addr,cur->dma_addr);
                list_del(&cur->list);
                kfree(cur);
        }
}
DECLARE_WORK(free_list_agent,free_list_agent_fn);


void some_free_func(){
        if(irqs_disabled()){
                struct free_param* fp=kmalloc(sizeof(struct free_param),GFP_KERNEL);
                fp->addr=desc_addr;
                fp->dma_addr=dma_desc_addr;
                fp->size=count*sizeof(dwc_otg_dma_desc_t);

                spin_lock(tofree_list_lock);
                list_add(&fp->list,&tofree_list);
                spin_unlock(tofree_list_lock);

                schedule_work(&free_list_agent);
                return ;
        }
        dma_free_coherent(blablabla);
}

2009年12月11日星期五

Linux clock

clocksource 為一 interface 提供 counter 給 kenel 讀，並提供參數給cyc2ns()將cycles轉換為實際時間 (nsec)

clockevent 則是實際產生 interrupt 推動 Linux time subsystem的

改ARM的HZ直接改arch/arm/Kconfig裡HZ的default，同時clockevent.set_mode()裡CLOCK_EVT_MODE_PERIODIC裡推動timer的timeout cycle count也要依HZ更新為(1/HZ)sec的相對應cycle(hardware dependent)；但如果使用hrtimer(使用CLOCK_EVT_MODE_ONESHOT)則要注意clockevent.mult設定為div_sc(timer_clk_in_hz, NSEC_PER_SEC, shift)即可，跟HZ無關，因為 kernel 要 clockevent發送的時間會在clockevents_program_event()用clockevent.multi被轉成cycle再丟給clockevent.set_next_event()

Kernel Timer Systems - eLinux.org
http://elinux.org/Kernel_Timer_Systems#clock_events

第七章 Linux内核的时钟中断(中)
http://www.myfaq.com.cn/2005September/2005-09-13/202037.html

Lab 7 timer interrupt
http://opencsl.openfoundry.org/Lab07_timer_interrupt.rst.html

hrtimer + clockevent + timekeeping
http://www.360doc.com/content/09/0715/19/74585_4282710.shtml
[精华] 研究下hrtimer及内核clock/timer子系统变化
http://www.unixresources.net/linux/clf/linuxK/archive/00/00/66/47/664730.html

struct clocksource
clocksource_register()

struct clock_event_device
clockevents_register_device()

目前kernel内部有periodic/highres/dynamic tick三种时钟中断处理方式

Times in Kernel

system time
系统启动到现在的nanosecond
A monotonically increasing value that represents the amount of time the system has been running.
单调增长的系统运行时间, 可以通过time source, xtime及wall_to_monotonic计算出来.

system_time = xtime + cyc2ns(clock->read() - clock->cycle_last) + wall_to_monotonic;

wall time (xtime)
A value representing the the human time of day, as seen on a wrist-watch. Realtime时间. xtime.

time source (clocksource)
A representation of a free running counter running at a known frequency, usually in hardware, e.g GPT. 可以通过clocksource->read()得到counter值

tick
A periodic interrupt generated by a hardware-timer, typically with a fixed interval

defined by HZ: jiffies

real time是从1970年开始到现在的nanosecond
real_time = xtime + cyc2ns(clock->read() - clock->cycle_last)

wall_to_monotonic紀錄boot時的xtime(real time)
system_time = xtime + cyc2ns(clock->read() - clock->cycle_last) + wall_to_monotonic;

2.6.31.1, getrawmonotonic():
system_time = clocksource->raw_time + cyc2ns(clock->read() - clock->cycle_last)

initialization:
xtime=read_persistent_clock() 通常return 0, by arch code might get value from RTC or flash.
wall_to_monotonic = -xtime

2009年12月8日星期二

Using OProfile

Build Requirement
binutils
popt

Required Kernel Config
CONFIG_SMP cannot be enabled in ARM11 MPCore, due to arch/arm/oprofile/op_model_mpcore.c is arch/arm/mach-realview/ specific.

Linux 2.6.35.12

General setup --->
[*] Profiling support
<M> OProfile system profiling

kernel modules must be built with "-g" and not stripped. Since compiler can optimize code to further improve the performance, "-O3"
is suggested.

Kernel hacking --->
[*] Kernel debugging
[*] Compile the kernel with debug info

To support call-graph profile, kernel and modules must NOT be built with gcc's -fomit-frame-pointer, or enable the CONFIG_FRAME_POINTER. But only x86 has CONFIG_ARCH_WANT_FRAME_POINTERS enabled on 2.6.35.

Kernel hacking --->
[*]Compile the kernel with frame pointers

Required tools
awk, objdump

Required files
vmlinux is required as input to oprofile.
An external disk maybe required, for the log and samples, stored at /var/lib/oprofile/, and vmlinux, may take up to tens of MBs.
Commands
opcontrol --reset;
opcontrol --setup --vmlinux=<vmlinux_path>
opcontrol --start -V all;
opcontrol --shutdown;
opreport image:<vmlinux_path>,<module_path> -p <module_dir> -l -g -w

References
4. Configuration details
http://oprofile.sourceforge.net/doc/detailed-parameters.html

4.3. OProfile in timer interrupt mode
Note
This section applies to 2.6 kernels and above only.

In 2.6 kernels on CPUs without OProfile support for the hardware performance counters, the driver falls back to using the timer interrupt for profiling. Like the RTC mode in 2.4 kernels, this is not able to profile code that has interrupts disabled. Note that there are no configuration parameters for setting this, unlike the RTC and hardware performance counter setup.

You can force use of the timer interrupt by using the timer=1 module parameter (or oprofile.timer=1 on the boot command line if OProfile is built-in).

3. Interpreting call-graph profiles
http://oprofile.sourceforge.net/doc/interpreting-callgraph.html

OProfile usage
http://blog.chinaunix.net/space.php?uid=20585891&do=blog&cuid=1110505

使用oprofile分析性能瓶頸(1)
http://tw.myblog.yahoo.com/chimei-015/article?mid=1023&prev=1024&next=1022

oprofile抓不到采样数据问题和解决方法
http://blog.yufeng.info/archives/1283

warning: /no-vmlinux could not be found.

OProfile unable to find image.
http://old.nabble.com/OProfile-unable-to-find-image.-td29336543.html

The "--no-vmlinux" option is used when you are not interested in analyzing the samples from the kernel. The samples from the kernel are recorded in /novmlinux. However, the needed information used by opreport is missing.

CPU: CPU with timer interrupt, speed 0 MHz (estimated)

OProfile get cpu speed by parsing /proc/cpuinfo. Just it's not found what it expect.
libutil/op_cpufreq.c

double op_cpu_frequency(void)
{

(................)

FILE * fp = op_try_open_file("/proc/cpuinfo", "r");

(................)

/* x86/parisc/ia64/x86_64 */
if (sscanf(line, "cpu MHz : %lf", &fval) == 1)
break;
/* ppc/ppc64 */
if (sscanf(line, "clock : %lfMHz", &fval) == 1)
break;
/* alpha */
if (sscanf(line, "cycle frequency [Hz] : %lu", &uval) == 1) {
fval = uval / 1E6;
break;
}
/* sparc64 if CONFIG_SMP only */
if (sscanf(line, "Cpu0ClkTck : %lx", &uval) == 1) {
fval = uval / 1E6;
break;
}

2009年12月6日星期日

How all the secondary cores boot in MPCore

http://lists.arm.linux.org.uk/lurker/message/20080611.175337.ecfc6e1c.en.html#linux-arm

Author: Catalin Marinas
Date: 2008-06-12 01:53 +800
To: Charly Bechara
CC: linux-arm
Subject: Re: How all the secondary cores boot in MPCore
On Wed, 2008-06-11 at 15:33 +0000, Charly Bechara wrote:
> I am investigating the boot process of the ARM11 MPCore on the
> PB11MPCore (mach-realview) board.
>
> Initially, the start_kernel() function is executing on CPU0, it
> creates kernel_init thead which I assume it is also executing on CPU0
> or am I wrong?

That's correct, it runs on CPU 0.

> When kernel_init thread executes, it calls smp_prepare_cpus() in
> arch/arm/mach-realview/platsmp.c code, where issupposed to start the
> secondary processors using secondary_startup() (head.S)

smp_prepare_cpus() calls poke_milo() which triggers the other CPUs to
execute realview_secondary_startup.

> After this stage, I am completely lost and I couldnt find any related
> documentation or understand the code :(

Maybe not that clear but it might help:

secondary_startup (in arch/arm/kernel/head.S) does a similar thing to
the initial CPU setup (stext in the same file), i.e. it looks up the
processor type and calls the processor initialisation function
(__v6_setup in arch/arm/mm/proc-v6.S). When returning from the setup
function, it gets into __enable_mmu followed by __turn_mmu_on. The
latter branches to __secondary_data_switch which branches to
secondary_start_kernel (in arch/arm/kernel/smp.c) after setting the
stack pointer to a thread structure allocated via cpu_up() called from
smp_init() called from kernel_init().

secondary_start_kernel (in arch/arm/kernel/smp.c) does some further
initialisation (local timers etc.) and calls cpu_idle() (not(e?) that
secondary_start_kernel is already considered a kernel thread as
described above). From this point, it is up to the scheduler to migrate
threads to the new CPUs since they are initially only executing the idle
task.

--
Catalin

2009年12月5日星期六

ARM11 MPCore Cache Coherency issue with DMA and SMP

Implementing DMA on ARM SMP Systems
http://infocenter.arm.com/help/topic/com.arm.doc.dai0228a/DAI228A_DMA_on_SMP_systems.pdf
In short, on ARM11 MPCore with SMP enabled, cache operation (inv/clean/...) should be done on ALL of the cores, or stall data could still be accessed. There are 4 solutions provided. Solution A and B are application dependent.

PERFORMANCE ISSUE
http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/005915.html
Linux 2.6.31.1, pcie sata adapter, ahci.c, read 1GB file

1CPU:13.56/13.56/13.60 (sec) ~ 75.5MBps
SMP+IPI:16.29/16.14/16.05 (sec) ~ 63.8MBps
SMP+RFO/WFO:21.71/21.72/21.70 (sec) ~ 47.18MBps
SMP+RFO/WFO/pld:21.63/21.46/21.41 (sec) ~ 47.82MBps

Interrupt of AHCI and dma cache IPI(int # of IPI DMA cache operation/Interrupt # of AHCI)
MYARCH_ahci: 98509/ 4505
pcie_ahci: 81792/ 4501

Both solutions suffer the performance drop. Drivers that accept cacheable buffer and doing DMA would be affected, eg. network, USB, storage ...,etc, which unfortunately, are usually important blocks. Currently I think ARM11 MPCore cannot be used for Linux SMP. There might be chances for AMP.

Solution C. Read for ownership
http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/005854.html

* Drop my DMA broadcasting patch
* In the dma_cache_maint (and the contiguous one) do the following
based on direction:
* TO_DEVICE: read each cache line in the buffer (you can
read a long variable every 32 bytes) before the local
cache maintenance operations. This ensures that any
dirty cache lines on other CPUs are transferred to L2
(or main memory) or the current CPU and the cache
operation would have the intended effect. The cache
lines on other CPUs may change to a Shared state (see
the MESI protocol)
* FROM_DEVICE: we don't care about the buffer, just write
0 in each cache line (as above, you can only write a
long every 32 bytes). This ensures that the cache lines
become Exclusive to the current CPU (no other CPU has
any copies) and the invalidation would ensure that there
are no cache lines on any CPU
* BIDIRECTIONAL: read a word in a cache line and write it
back. After cache clean&invalidate, the cache lines
would be removed from all the existing CPUs

formal patch not yet available, try Catalin Marinas's patch,
http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/005860.html
Or my patch:
http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20091210/aed3f989/attachment-0001.obj)
Catalin Marinas's formal commit
http://linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=8108d60829c2d10fe62aaa7b2fae10f0e4abad36

There will be a performance dropping, in my case, SATA read performance is 64MBps with IPI patch, became 44MBps with RFO patch, which is about 31% lower in performance.
http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/005863.html

I would be very surprised if going down this route doesn't result in
block IO data performance (and network performance) dropping my more
than 60% of the DMA value (that's DMA performance * 0.4).

Solution D. Broadcast of cache maintenance operations
By IPI
Currently this solution would cause a deadlock in ata_scsi_queuecmd. The deadlock situation and the patch that could fix this:
http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/006051.html
http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20091210/520c074d/attachment-0001.obj
However, it is not an one-shot solution, one have to fix it whenever encountered. And it will corrupt the atomic context:
http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/005858.html

Having functions which enable interrupts while the parent is supposed to
be in an atomic context is definitely a recipe for things to go badly
wrong - this is not a solution anyone in their right mind should
contemplate.

The following patch use the IPI(Inter-Processor Interrupt) to broadcast the dma_cache_maint operation.
http://linux-arm.org/git?p=linux-2.6-stable.git;a=commitdiff;h=95298b1792121e7068258de451caec7f3dda0e78
linux-2.6-stable.git@linux-arm.org
commit 95298b1792121e7068258de451caec7f3dda0e78
Author: Catalin Marinas <catalin.marinas@arm.com>
Date: Tue Mar 10 10:22:54 2009 +0000

http://linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=f1c242dc5f326713578e469c9f5be647978ebe24
linux-2.6-git@linux-arm.org
commit f1c242dc5f326713578e469c9f5be647978ebe24
Author: Catalin Marinas <catalin.marinas@arm.com>
Date: Wed Oct 28 13:27:49 2009 +0000

http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/005568.html
The new updated patch which includes patch for dma_cache_maint_contiguous
http://linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=6dd5056b9abe1e38fae3eb8d576e562f49452b0f

Even with above patch, tests still might failed. USB EHCI and a flash are used to verify this issue, by keep reading an 1MB file in flash and compare it to the original file in ramdisk.
without SMP, no fails.
SMP without L1, no fails.
SMP with L1, 13.26% failed
If I forced the dma_unmap_single to do dma_cache_maint, the SMP with L1, the fail rate reduced to 0.3~0.4%.

By FIQ
Laguna SMP Benchmarks 10001092-00.pdf
http://trac.gateworks.com/wiki/laguna%3Agw2388-4%3Asmp_benchmarks

linux-3.2 in src/linux/laguna – DD-WRT
http://svn.dd-wrt.com:8000/browser/src/linux/laguna/linux-3.2
http://svn.dd-wrt.com:8000/browser/src/linux/laguna/linux-3.2?rev=18083
first patch on the site
http://svn.dd-wrt.com:8000/browser/src/linux/laguna/linux-3.0?rev=17578

[RFC PATCH] Broadcast the DMA cache operations on ARMv6 SMP hardware
http://lists.arm.linux.org.uk/lurker/message/20080620.124707.2bff9c7f.en.html
http://lists.arm.linux.org.uk/lurker/message/20080620.154546.aaa33d72.en.html

> By the way, besides these dmac cache functions,
> don't you think other cache mantain functions
> like clean_dcache_area() need broadcasting?

AFAIK, it was discussed some years ago and these operations don't
require broadcasting. Basically, once a write to a memory location
occurs, the MESI protocol used by the SCU ensures that the owner of that
cache line is CPU that did the writing. If the cacheline exists on other
CPUs, it is invalidated. Therefore a cache cleaning operation on the CPU
that did the writing is enough since no other CPU has a valid cache
line.

The problem is slightly different with the DMA API since the driver
might invalidate an area of memory (dma_map_singe(FROM_DEVICE)) without
reading or writing it before and hence the CPU is not the owner of those
lines. The same goes for cleaning or flushing since some drivers may run
dma_map_single(TO_DEVICE) in an interrupt routine handled on a CPU but
the buffer to be transmitted could have been written on a different CPU.

Regarding the I-cache invalidation (which, BTW, is completely missing
from the mainline kernel), the patch I proposed (posted again last week)
does this when a thread migrates to another CPU that it hadn't run on
before and there is no need for broadcasting as we track the CPU via
mm->cpu_vm_mask (see switch_mm in mmu_context.h).

The following patch fix my issue. The patch that patch dma_cache_maint should be applied to dma_cache_maint_contiguous, which is called by dma_cache_maint_page.

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index be56c43..15dafb6 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -584,15 +584,15 @@ static void dma_cache_maint_contiguous(struct page *page,

   switch (direction) {
   case DMA_FROM_DEVICE:           /* invalidate only */
-               inner_op = dmac_inv_range;
+               inner_op = smp_dma_inv_range;
           outer_op = outer_inv_range;
           break;
   case DMA_TO_DEVICE:             /* writeback only */
-               inner_op = dmac_clean_range;
+               inner_op = smp_dma_clean_range;
           outer_op = outer_clean_range;
           break;
   case DMA_BIDIRECTIONAL:         /* writeback and invalidate */
-               inner_op = dmac_flush_range;
+               inner_op = smp_dma_flush_range;
           outer_op = outer_flush_range;
           break;
   default:

Call flush_dcache_page after PIO data transfer in libata-aff.c
http://linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=026f474ca17dd758#patch1

When reading data from an ATA device using PIO, the kernel dirties the
D-cache but there is no flush_dcache_page() call in ata_pio_sector().
Since neither the VFS layer calls this function, a subsequent
update_mmu_cache() is not aware of the dirty page which may lead to
cache incoherency in user space.

Call flush_dcache_page in usb_stor_access_xfer_buf
http://linux-arm.org/git?p=linux-2.6.git;a=commitdiff;h=d0c91030c392ef4e

Transferring buffers using memcpy dirties the D-cache but there is no
corresponding flush_dcache_page call which leads to data corruption in
user-space.

in setup_processor():
struct cpu_cache_funs* cpu_cache = __v6_proc_info.cache = v6_cache_fns

2009年11月30日星期一

如何為不同年齡的孩子選擇童話故事

Streaming DMA mappings

(以下應該算是讀書心得與我自己的想法，並不是Linux的實際行為，例如ARM就不完全這樣做，讓我很搞不懂到底是我理解錯了，還是ARM Linux的問題...)

Linux Device Driver, 3rd ed, Chapter 15: Memory Mapping and DMA

dma_map_single/dma_unmap_single: setup/undo transfer buffer for device DMA.
dma_sync_single_for_cpu/dma_sync_single_for_device: for temporarily uses of the buffer

主要是用來解決下面兩個問題:
cache coherence(CC): Cache與memory內容可能不一致。cpu給device的資料可能在cache中，則device可能會讀到memory上的舊資料；或 device已經改變memory了，但cpu卻讀到cache的舊資料
bounce buffer(BB): 可能因為某些原因，若device無法用所給的buffer address做DMA時，這個時候需要另外allocate一塊buffer取代原來的buffer做DMA，把原buffer的資料搬到新buffer上，完成後再把資料搬回去。

資料流的方向:
DMA_TO_DEVICE
DMA開始前，BB需要把原buffer copy 到新buffer；CC要"clean"(writeback) cache to memory
(dma_map_single, dma_sync_single_for_device)

DMA_FROM_DEVICE
DMA結束後，CC要invalidate cache；BB要把新buffer內DMA的結果copy到原buffer上
(dma_unmap_single, dma_sync_single_for_cpu)

DMA_BIDIRECTION
DMA開始前，BB需要把原buffer copy 到新buffer；CC要"clean"(writeback) cache to memory
DMA結束後，CC要invalidate cache；BB要把新buffer內DMA的結果copy到原buffer上
但實際上kernel會做"flush" cache -- writeback and invalidate
DMA開始前，BB需要把原buffer copy 到新buffer；CC要flush cache
DMA結束後，BB要把新buffer內DMA的結果copy到原buffer上

dma_map_single/dma_unmap_single可以用flush，但是dma_sync_single_for_device和dma_sync_single_for_cpu則不應該用flush

dma_sync_single_for_cpu
buffer要給CPU使用前，CC要invalidate cache；BB要把新buffer內DMA的結果copy到原buffer上

dma_sync_single_for_device
buffer還給device DMA前，BB需要把原buffer copy 到新buffer；CC要"clean"(writeback) cache to memory

用這四個function去區隔出CPU access與device access的時間

Linux Device Driver, 3rd ed, Chapter 15: Memory Mapping and DMA, page 445.

Bounce buffers are created when a driver attempts to perform DMA on an address that is not reachable by the peripheral device - a high-memory address, for example.

How about address not aligned as device required?

2009年11月24日星期二

Linux CONFIG_XXX definition in C code

include/linux/autoconf.h

config with 'y' would be defined by
#define CONFIG_XXX 1

config with 'm' would be defined.
#define CONFIG_XXX_MODULE 1

If you are going to use the config macro in C code, be careful with config that could be build as module.

2009年11月20日星期五

投資標的

Stock:Bond:Other
46:46:8

Stock
VWO(0.27), VT(0.3)
VT=VXUS,+PG+JNJ+KO

Bond
IEF(0.15), BWX(0.5) TIP(0.2) WIP(0.5)

Other
GLD, EMB

Stock
VWO(0.27), VT(0.3)
VIG(0.23),SDY/SCHE(0.25),

Bond
BND(0.12), IEF(0.15), BWX(0.5) TIP(0.2) WIP(0.5)
SPDR DB INTERNATIONAL GOVERNMENT INFLATION-PROTECTED BOND ETF WIP : NYSE Arca
SPDR BARCLAYS CAPITAL INTERNATIONAL TREASURY BOND ETF BWX (AAA-BBB)
ISHARES S&P/CITIGROUP 1-3 YEAR INTERNATIONAL TREASURY BOND FUND ISHG (AAA-A,BB, Not Rated)(0.35)
ISHARES S&P/CITIGROUP INTERNATIONAL TREASURY BOND FUND EX-US IGOV(0.35)(AAA-AA,BB,Not Raged)

Reit
VNQ(0.13)
SPDR DOW JONES INTERNATIONAL REAL ESTATE ETF RWX : NYSE Arca (0.59)
POWERSHARES EMERGING MARKETS SOVEREIGN DEBT PORTFOLIO PCY : NYSE Arca(0.5)

Commodity
GSG, DBC

The World's Cheapest ETF Model Portfolio Gets Cheaper
http://buffettism.blogspot.com/2010/08/sharethis-blog-worlds-cheapest-etf.html

市值超過十億美金的100檔ETFs－分類與績效比較
http://buffettism.blogspot.com/2009/06/100etfs_18.html

Vanguard Dividend Appreciation ETF (VIG)
https://www.etrade.wallst.com/v1/stocks/fund_portfolio/portfolio.asp?symbol=VIG
https://personal.vanguard.com/us/funds/snapshot?FundId=0920&FundIntExt=INT#hist=tab%3A2

Vanguard Emerging Markets ETF (VWO)
https://personal.vanguard.com/us/funds/snapshot?FundId=0964&FundIntExt=INT#hist=tab%3A2

資產配置初步— 配重的決定續2(Asset Allocation in Essence—Determine Allocation Weight)
http://greenhornfinancefootnote.blogspot.com/2008/07/2asset-allocation-in-essencedetermine.html
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0yN3_xonTahoRV8TRISI7AJ33WYf-OptuJrLmH1BvrBALfB5H0woyRIiL-czpAyIm6-XWedhU6QrM6blWHezMhf-NQG9dZL4_TG9RqC7rf7KO9JofajzsbOPk_2EzPsVMW2Rqq8f2h1c/s1600-h/AA32+Bond+ETF.JPG

ISHARES TR S&P CITINT TBD IGOV : NASDAQ
SPDR SERIES TRUST BRCLYS INTER ETF ITE : NYSE Arca
SPDR SERIES TRUST BRCLYS LG TRS ET TLO : NYSE Arca

The 10 Best U.S. Dividend Stocks
http://seekingalpha.com/article/165326-the-10-best-u-s-dividend-stocks?source=feed
http://buffettism.blogspot.com/2009/11/10-best-us-dividend-stocks.html

1. Johnson & Johnson (JNJ)
2. The Procter & Gamble Company (PG)
3. 3M Co. (MMM)
4. SYSCO Corporation (SYY)
5. Emerson Electric Co. (EMR)
6. Abbott Laboratories (ABT)
7. McDonald’s Corporation (MCD)
8. The Coca-Cola Company (KO)
9. Wal-Mart Stores (WMT)
10. Automatic Data Processing Inc. (ADP)

VANGUARD DIVIDEND GROWTH FD (VDIGX)
MORGAN STANLEY GLOBAL DIVIDEND GROWTH SECURITIES A (GLBAX) 有佣
FIDELITY STRATEGIC DIV AND INCOME (FSDIX)無佣

2009年11月17日星期二

調鋼絲

http://bbs.nsysu.edu.tw/txtVersion/treasure/bicycle/M.861880797.A/M.1053249416.A.html

先去買個調鋼絲工具吧,park的不錯.
若某個地方偏右,則把該處左邊的鋼絲調緊,右邊的調鬆,反之亦然.
記得左邊鋼絲跟右邊的鋼絲鬆緊不要差太多,以免整個輪框偏一邊.
還有輪框應該是圓的,若輪框的某個地方左右鋼絲都調太緊或太鬆則會有失圓的現象.
還有鋼絲調太鬆或太緊都不好,太鬆則造成動力的抵銷,太緊又降低輪框的壽命.
剛調者可能一次調個半圈或四分之一圈就差不多了,
且能調到輪框不會左右擺動就不錯了.
而所謂調緊是指從輪框外側往輪軸的方向看時,順時鐘為轉緊,逆時鐘為轉鬆.

整胎

http://www.mobile01.com/topicdetail.php?f=268&t=1161017&m=f&p=1&img=0##13618980

換細胎~整胎最好做一下~

先把內胎充一點氣

再把輪胎像這樣翻一輪...( 整胎 )看看有沒夾到內胎

整胎的功效~1:防夾到胎壁 2:讓內胎更服貼外胎(因為把內胎放入外胎有時會積在一起)

正常的情況

就我遇過~ 胎ok但胎會變形的原因
1:內胎放入外胎有時會"積"在一起太嚴重,充氣後.....就變形囉
2:外胎夾到胎壁.....就變形囉
3:可能外胎有瑕疵 ( 如外胎過度磨損或外胎纖維斷裂 ).....就變形囉

2009年11月15日星期日

[綠角] 對於長期投資的莫名恐懼系列

對於長期投資的莫名恐懼(Unfounded Fear for Long-Term Investing)
http://greenhornfinancefootnote.blogspot.com/2009/11/unfounded-fear-for-long-term-investing.html

對於長期投資的莫名恐懼(Unfounded Fear for Long-Term Investing)續1
http://greenhornfinancefootnote.blogspot.com/2009/11/unfounded-fear-for-long-term-investing1.html

對於長期投資的莫名恐懼(Unfounded Fear for Long-Term Investing)續2
http://greenhornfinancefootnote.blogspot.com/2009/11/unfounded-fear-for-long-term-investing2.html

對於長期投資的莫名恐懼(Unfounded Fear for Long-Term Investing)續3
http://greenhornfinancefootnote.blogspot.com/2009/11/unfounded-fear-for-long-term-investing3.html

對於長期投資的莫名恐懼(Unfounded Fear for Long-Term Investing)續4
http://greenhornfinancefootnote.blogspot.com/2009/11/unfounded-fear-for-long-term-investing4.html

2009年11月12日星期四

etrade匯款

Wire money into my account
https://us.etrade.com/e/t/estation/help?id=1901000000#Wire

If your financial institution is located outside the United States:

* The amount you want to wire in U.S. dollars
* The receiving institution information:

The Bank of New York
One Wall Street
New York, NY 10286
Swift Code: irvtus3n
Routing Number: 021000018
FBO: E*TRADE Clearing LLC
Account Number: 890 034 6256

* Your eight-digit E*TRADE Securities account number (add the letters "ET" before your account number, for example ET12345678)
* Your name and address
* Please note that international wires usually take 2-7 business days to process.

RichXXX大大的親身經歷

我是在中國信託匯的
中國信託的匯款手續費是 900 元, 中間行匯費是 US$18.-(可能會變動)

受款人資料
帳號(A/C No.) : 8900346256
戶名: E*Trade Securities
國家: USA

受款銀行
銀行代碼(SWIFT/Bank Code): irvtus3n
銀行名稱(Bank Name): The Bank of New York
地址(Address): One Wall Street New York, NY 10286
城市/國家(City/Country): New York/USA

附言寫:
NAME: 你的護照中文名稱, 和你的表單上是一樣的
Account NO: ET12345678 (12345678是你的 E*Trade 帳號號碼)
ADDR: 你在 E*Trade 留的英文地址

知道怎麼把錢匯過去，也要知道怎麼把錢匯回來吧。但是很奇怪的一點是，它(etrade)沒有直接提供link，而是從FAQ找的這個link的。總覺得它不想我把錢提走，所以藏來藏去的...
Know all the ways I can withdraw money
https://us.etrade.com/e/t/estation/contexthelp?id=1903000000&traxui=TRAXUI#Know

Wire Request
https://us.etrade.com/e/t/xferwire/WireXfrAuthentication
~~我有employee stock plan account，但要匯款回來的是securities account，可能是這樣所以失敗~~原來是employee ID填錯了...

Erroneous Employee ID and/or Company ticker match: The employee Id and/or company ticker entered does not match our records please re-enter.

現在這樣是就可以wire回來了(Note: TaiwanXXX)

可是用Security code卻還要Account Holder's Social Security Number...這..這..這..這不是為難我嗎?
這樣也不知道該怎麼匯回來了Security code需要Social Security, 沒有就不能用

https://us.etrade.com/e/t/xferwire/abalookuppage

Note: Names on the sending and receiving accounts must be the same, even for joint accounts. You may check the registered name(s) on the Change My Info page. If you are wiring funds to an account that is not registered in the same name, please first mail us a notarized Letter of Authorization that is signed by all registered account holders.

問了客服，notarized Letter of Authorization (NLOA) 沒有特定格式，只要填明wire instruction就可以，或者說從etrade轉出去不提供form啦!! 真的需要的話可以參考轉入的看看轉出怎麼填
Account Transfer Form
https://content.etrade.com/etrade/estation/pdf/Acct_Trans.pdf#xml=http://search1w4m3.etrade.com:8222/inquiraapp/ui.jsp?ui_mode=answer&prior_transaction_id=87317&iq_action=6&answer_id=16777220&highlight_info=16777243,140,161&turl=https%3A%2F%2Fcontent.etrade.com%2Fetrade%2Festation%2Fpdf%2FAcct_Trans.pdf
只是每次要轉就得寄一次。

2009年11月10日星期二

[etrade]必須要居住在美國才能買基金?

Message from etrade

After reviewing your account, I see that you have a foreign address. Mutual funds are not permitted for investors located outside the US. This is both an E*TRADE requirement as well as a requirement of the mutual fund companies. Their funds are not registered with the appropriate authorities to be sold outside the US. If you live in the US in the future, you will be able to invest in mutual funds at that time.

http://greenhornfinancefootnote.blogspot.com/2007/05/blog-post_936.html?showComment=1195179180000#c7506260630406398277

子飛提到...
我在etrade下單，因為是國際帳戶，所以不讓我買美國國內的開放型基金，因為上述基金是要本人在美國，開國內帳戶才能買，我想請教的是您在firstrade開的是國際帳戶嗎？還是您還在美國時就已經開好得戶頭？

(............)

綠角提到...
子飛 Firstrade的國際帳戶可以購買美國的基金

message from etrade:

To purchase shares of a U.S.-based mutual fund, you must have a physical address in the United States and have a W-9 Form on file with us to certify your Social Security number or Tax-Identification Number. You can review any individual fund prospectus by bringing up a quote for the fund on our website or visiting the fund company website directly to view eligibility information for the fund.

Vanguard 500 Index Fund Investor Shares (VFINX)
https://personal.vanguard.com/us/funds/snapshot?FundId=0040&FundIntExt=INT -> [View prospectus and reports], page 42, Account Registration Form

Vanguard funds are registered for sale to U.S. residents only.
You must provide your U.S. address on this form.

Download a Fidelity Account® Application
https://scs.fidelity.com/accounts/services/content/fidaccount_details.shtml?tab=MostRequested&source=aong

This account is intended for U.S. residents only with a valid U.S. address. Non-U.S residents should visit our International Investment site.

2009年11月4日星期三

定期定額 Dollar cost averaging

[wiki] Dollar cost averaging
http://en.wikipedia.org/wiki/Dollar_cost_averaging

Dollar cost averaging is also called DCA and constant dollar plan in the US, pound-cost averaging in the UK, and by the currency-neutral terms unit cost averaging and cost average effect

定時定額申購　Dollar-cost averaging
http://fund.bot.com.tw/z/glossary/glexp_4448.asp.htm

etrade叫Automatic Investment Plan (AIP)
https://us.etrade.com/e/t/pict/automaticinvestment#
firstrade叫Periodic Investment Plan (PIP)
http://www.firstrade.com/public/en_us/productsservices/investmentchoices/mutualfunds/

2009年11月3日星期二

Individual Retirement Account (IRA)

[wiki] Individual Retirement Account
http://en.wikipedia.org/wiki/Individual_Retirement_Account

retirement plan account that provides some tax advantages for retirement savings in the United States

我看晨星
http://greenhornfinancefootnote.blogspot.com/2007/05/blog-post_2374.html

還有，他們常會提到IRA、401K這些稅務優惠退休帳戶，這些對我們是沒用的論述。

我對Ultra ProShares的看法(2X Volatility Does Not Mean 2X Return)
http://greenhornfinancefootnote.blogspot.com/2008/02/ultra-proshares2x-volatility-does-not.html

　在禁止保證金帳戶裡用於放空：UltraShort ETF可被用於延稅的投資組合(如IRA、Roth IRA、401(k)等等)，這些帳戶是不允許進行保證金交易的。有了UltraShort ETF，你就可以透過投資一支基金放空股票

資產配置初步—再平衡的執行策略續2(Asset Allocation in Essence—Rebalancing Strategies)
http://greenhornfinancefootnote.blogspot.com/2008/09/2asset-allocation-in-essencerebalancing.html

是沒錯，在美國可以使用401k或IRA來避免掉稅的問題，但是通常401K或IRA的限制較多，且401K等TSA帳號大部份沒有index fund可供選擇。因此如果要能夠自由選擇投資標的，大概只能用個人一般帳號了，這些提供再平衡策略的作者不知道是否有考慮到稅務的問題，如果要考慮稅的問題，那我個人覺得大概是每年投投資一次，且再平衡的間隔一定要超過一年（３６６天至少），這樣才能確保是使用（１５％）的稅率。
不然就只能開傳統IRA帳戶並且自已操作。但這樣就有59歲半才能提領的限制。在金錢使用上就較沒有彈性了。

Ask Mr. Firstrade 專欄
http://greenhornfinancefootnote.blogspot.com/2008/02/ask-mr-firstrade.html

問:IRA帳戶受益人(Beneficiary)是否一定要有社會安全碼(SSN)?

答: 是的，這是美國國稅局IRS的規定。IRA因為屬於稅益帳戶，功能是幫助美國居民節稅，因此受益人必須填寫有美國SSN的人。其實若您真的擔心您的資產若發生意外海外親人該如何處置，其實最好的方式是一份妥當的Will遺書。只要請證人簽字公證，相信那會是在法院最有力的文件。

W-8BEN Form

Message from etrade:

E*TRADE Securities has updated its Form W-8BEN process to conform with new Internal Revenue Service electronic documentation requirements. In order to ensure that all Form W-8`s will follow this new process we will require you to submit a new IRS Form W-8 to certify your foreign tax status for the account shown above. As a result your current Form W-8 on file will be treated as expiring as of December 31, 2009.

If you do not recertify by December 31, 2009, your account will be subject to a U.S. withholding tax of at least 28% to 30% on all trade gross proceeds, taxable dividends, interest payments, and other taxable payments starting on January 1, 2010. If U.S. tax is withheld, you will need to work with the Internal Revenue Service to obtain any refund.

"W-8BEN Certification of Foreign Status" at etrade.com/forms
http://etrade.com/forms#Tax

http://greenhornfinancefootnote.blogspot.com/2009/06/analysis-of-blackrock-global-allocation.html

W-8 Ben 提報美國IRA, 聲明你為非美國人，不然會有withholding tax 的問題，歐銀或台灣銀行便無此問題，各國的交易稅、所得稅與基金產業政策不同，

http://greenhornfinancefootnote.blogspot.com/2007/08/firstrade.html

另外有一點跟國內不太一樣的是，Firsttrade 本身無法清算，所以他們是委託 Ridge Clearing這一家公司做清算交割(不過你在跟First trade 確認一下，我印象最近好像改Penson這一家公司了)，所以你匯的錢並不是匯到Firsttrade 而是清算公司。(scottrade 提到自己的一個優點就是其有清算的能力，所以匯款的帳戶名稱就是scotrade，而如果像firsttrade 倒掉的話，投資人就要找Ridge Clearing 要錢，雖然不會有太大風險，不過總是麻煩)

(...........)

請問綠角, W-8BEN 上 PART II Claim of Tax Treaty Benefits (if applicable) 是否需要勾選? 因為三個選項看起來沒有一樣是符合現況的???

(...........)

目前台灣與美國並沒有簽訂所得稅務條約我們也不會有Tax treaty benefits 你說的沒錯這欄空白即可

2009年11月2日星期一

Giant Yukon 08/09年款的差異

1. 前叉塗裝: 08 XCM, 09 XCM v2
2. 前叉避震: 08年的前叉只能在停止間鎖死，塑膠旋紐；09年的前叉可在行進間鎖死，遷線至手把處切換 (可是也好像有人是旋紐)
3. 後上叉二合一處的焊點: 08年的沒有黑色塞子，是用倒Y型焊接的；09年有一個黑色塞子，看來質感變差

GIANT】YUKON(AS840) 24段變速自行車 $8800
http://buy.yahoo.com.tw/gdsale/gdsale.asp?gdid=673394#
應該是08年款

疑問?這是YUKON哪一年的款式阿?
http://www.mobile01.com/topicdetail.php?f=315&t=712788&p=2#

(........)
車子是09年的，跟08年的差別就在前避震（右邊藍色那個)
改成可在行進間手動旋轉調整前避震的軟硬度
(........)
10800 就一定是09年款,
08年款價格是8800,
如果大大在車行用10800價格買到08年款, 記得留證據, 向巨大(捷安特公司名字)檢舉哦^^
(........)

請問G牌Yukon的08年式與09年式車架塗裝一模一樣嗎?(只有前叉不同嗎?)
http://www.mobile01.com/topicdetail.php?f=315&t=723067&p=1

前叉 -> 08年 XCM, 09年 XCM v2, 兩個都能鎖定, 你看錯了
焊點 -> 後上叉二合一處, 08年的沒有黑色塞子, 是用倒Y型焊接的, 09年有一個黑色塞子, 看來質感變差
(........)
08和09年的前叉都可以鎖死~~
08年的前叉只能在停止間鎖死
09年的前叉可在行進間鎖死~而且旋鈕也不一樣
(........)

[分享+簽到]08年式 Giant Yukon 銀白色(圖多),歷代Yukon車主簽到簿~
http://www.mobile01.com/topicdetail.php?f=315&t=358946#

07年的Yukon跟08年的Yukon最大差別為外胎使用不同的規格及塗裝的改款
07:Kenda
08:Michelin country trail

(分享)yukon與iguana新車入荷
http://www.mobile01.com/topicdetail.php?f=268&t=355833&p=3

我身高183
體重......哎....我最害怕就是人家問我體重
1xx....x都有4以上數字留給大家去想像吧
17吋的我覺得真的太矮.....因為在踏的時候腳無法完全伸直
如果調太高雖然可以伸直....不過怕椅子下面那根斷掉(體重太重所以怕﹦﹦)
換19吋就覺得滿剛好的
騎車的時候覺得就可以完全出力

2009年11月1日星期日

USB2.0 do_next_ping

Re: [linux-usb-devel] Scenario: usbtest test-14 failure: MUSBHDRC + Netchip2280
http://www.mail-archive.com/linux-usb-devel@lists.sourceforge.net/msg52405.html

Thus, using the PING protocol involves the host switching between two modes: one where it must send a PING and one where it can send an OUT. The phrase "return to using a PING token until the endpoint indicates it has space" means that the host must switch from the second mode back to the first. "return" == return to the first mode. "using a PING token" == send a PING before doing another OUT != send a PING.

http://www.mail-archive.com/linux-usb-devel@lists.sourceforge.net/msg52356.html

It says "return to using a PING token". As I mentioned above, that is
different from "send a PING".

EHCI r1.0, p.89

3 A Nyet response to an OUT means that the device has accepted the data, but cannot receive any more at this time. Host must (1)advance the transfer state and additionally, (2)transition the Ping State bit to Do Ping.

The Ping State bit has the following encoding:
Value:Meaning
0B: Do OUT The host controller will use an OUT PID during the next bus transaction to this endpoint.
1B: Do Ping The host controller will use a PING PID during the next bus transaction to this endpoint.

The defined ping protocol (see USB 2.0 Specification, Chapter 8) allows the host to be imprecise on the initialization of the ping protocol (i.e. start in Do OUT when we don't know whether there is space on the device or not).

The host controller manages the Ping State bit. System software sets the initial value in the queue head when it initializes a queue head. The host controller preserves the Ping State bit across all queue advancements. This means that when a new qTD is written into the queue head overlay area, the previous value of the Ping
State bit is preserved.

So, upon receiving NYET, one don't have to send Ping immediately. Ping can be deferred to the point before sending next OUT.

USB 2.0 Specification, 8.5.1 NAK Limiting via Ping Flow Control, p.217

If the endpoint instead responds to the OUT/DATA transaction with a NYET handshake, this means that the endpoint accepted the data but does not have room for another wMaxPacketSize data payload. The host controller must return to using a PING token until the endpoint indicates it has space.

2009年10月28日星期三

【轉載】衝突是幼兒成長必經的歷程

Linux Printing with Samba

SAMBA 伺服器 - 設定成為印表機伺服器 (CUPS 系統)
http://linux.vbird.org/linux_server/0370samba.php#server_printer

CUPS
http://www.cups.org/

CUPS - Download CUPS v1.4.1 (4.2M)
http://ftp.easysw.com/pub/cups/1.4.1/cups-1.4.1-source.tar.bz2

Linux 的列印元件 (列印工作、佇列、服務與印表機)
http://linux.vbird.org/linux_basic/0610hardware.php#cups

如何在 Linux 下列印？ (LPRng)
http://www.imacat.idv.tw/tech/lnxprint.html

LPRng
www.lprng.com
http://www.lprng.com/downloads.html

LPRng @ sourceforge
http://sourceforge.net/projects/lprng/
LPRng-3.8.Arc4.tar.gz
http://nchc.dl.sourceforge.net/project/lprng/lprng/release%20candidate%204%20of%203.8.A/LPRng-3.8.Arc4.tar.gz

[ARM] memory sync function _for_cpu series functions won't flash/invalidat memory as requested

linux-2.6.31.1, ARM, without CONFIG_DMABOUNCE
(it seems the code is changed at 2.6.28)

I ported my driver from linux-2.6.27.4 to linux-2.6.31.1, and found that some functions does not work. After some debugging, I found that only the *_for_device series function would actually do the memory invalidate/flush, while the *_for_cpu don't.

dma_sync_single
dma_sync_single_for_cpu

pci_dma_sync_single_for_cpu
dma_sync_single_for_cpu
dma_sync_single_range_for_cpu
dmabounce_sync_for_cpu
==> nothing done.

pci_dma_sync_single_for_device
dma_sync_single_for_device
dma_sync_single_range_for_device

dmabounce_sync_for_device
dma_cache_maint

Any thing call to dma_bounce_sync_for_* will result in a return 1, and nothing else.

With CONFIG_DMABOUNCE not defined:

arch/arm/include/asm/dma-mapping.h

/**
* dma_sync_single_range_for_cpu
* @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
* @handle: DMA address of buffer
* @offset: offset of region to start sync
* @size: size of region to sync
* @dir: DMA transfer direction (same as passed to dma_map_single)
*
* Make physical memory consistent for a single streaming mode DMA
* translation after a transfer.
*
* If you perform a dma_map_single() but wish to interrogate the
* buffer using the cpu, yet do not wish to teardown the PCI dma
* mapping, you must call this function before doing so.  At the
* next point you give the PCI dma address back to the card, you
* must first the perform a dma_sync_for_device, and then the
* device again owns the buffer.
*/
static inline void dma_sync_single_range_for_cpu(struct device *dev,
          dma_addr_t handle, unsigned long offset, size_t size,
          enum dma_data_direction dir)
{
  BUG_ON(!valid_dma_direction(dir));

  dmabounce_sync_for_cpu(dev, handle, offset, size, dir);
}

arch/arm/include/asm/dma-mapping.h

#ifdef CONFIG_DMABOUNCE

(....................)

#else
static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr,
  unsigned long offset, size_t size, enum dma_data_direction dir)
{   
  return 1;
} 

static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr,
  unsigned long offset, size_t size, enum dma_data_direction dir)
{
  return 1;
}

[ARM] dma: don't touch cache on dma_*_for_cpu()
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.29.y.git;a=commitdiff;h=309dbbabee7b19e003e1ba4b98f43d28f390a84e

author Russell King <rmk@dyn-67.arm.linux.org.uk>
Mon, 29 Sep 2008 18:50:59 +0000 (19:50 +0100)
committer Russell King <rmk+kernel@arm.linux.org.uk>
Tue, 30 Sep 2008 10:01:36 +0000 (11:01 +0100)

As per the dma_unmap_* calls, we don't touch the cache when a DMA
buffer transitions from device to CPU ownership. Presently, no
problems have been identified with speculative cache prefetching
which in itself is a new feature in later architectures. We may
have to revisit the DMA API later for these architectures anyway.

[PATCH 16/25] [ARM] add highmem support to DMA mapping functions
http://lists.arm.linux.org.uk/lurker/message/20081011.035554.0e3d3b9b.en.html
talks about the reason why don't touch cache on dma_*_for_cpu(), a long thread.

http://lists.arm.linux.org.uk/lurker/message/20080926.034224.1a2bec3e.en.html

BTW what's the point of calling dma_cache_maint() in
dma_sync_single_range_for_cpu()? When the CPU regains ownership of the
buffer, the cache is always clean making this call useless.

http://lists.arm.linux.org.uk/lurker/message/20080929.164105.eb5f7e5a.en.html

Let's consider a DMA buffer which starts in the middle of a cache line
meant to receive data from a device.

Upon dma_map_single() the first cache line is first cleaned then the
whole buffer is invalidated.

Upon dma_unmap_single() nothing is done. However, if the first part of
the shared cache line gets a miss, the whole cache line could be
repopulated _before_ the device has stored its data in memory
corresponding to the second half of the same cache line, hence driver
will obtain bad data from the DMA buffer.

If instead we use dma_sync_single_for_cpu() which currently perform
another cleaning of the shared cache line and invalidation of the whole
buffer then the issue above won't occur. things can be even worse when
that first half cache line gets dirty though. Upon the cleaning of that
first cache line, the device data stored in memory corresponding to the
second half cache line will be overwritten and lost. Not cleaning the
first cache line and simply invalidating the whole buffer would preserve
integrity of the device data but will lose the first half cache line
content which is not any better, and only if cache eviction doesn't
happen first.

So I don't see how the cache maintenance performed in
dma_sync_single_range_for_cpu() solves anything besides wasting cycles.
Sure, DMA buffers may span cache lines that overlaps with other data,
but if that data is touched while the DMA buffer is owned by the device
then we're screwed anyway, and I don't see any solution for that besides
completely forbiding DMA mappings that don't start and end on cache line
boundaries as well as disabling cache prefetching.

Since this doesn't appear to be a significant issue in practice given
that luck is on our side and things just work anyway, we could remove
that false "protection" from dma_sync_single_range_for_cpu().

訂閱：文章 (Atom)

2009年12月28日 星期一

2009年12月24日 星期四

2009年12月23日 星期三

2009年12月22日 星期二

2009年12月15日 星期二

2009年12月14日 星期一

2009年12月11日 星期五

2009年12月8日 星期二

2009年12月6日 星期日

2009年12月5日 星期六

2009年11月30日 星期一

2009年11月24日 星期二

2009年11月20日 星期五

2009年11月17日 星期二

2009年11月15日 星期日

2009年11月12日 星期四

2009年11月10日 星期二

2009年11月4日 星期三

2009年11月3日 星期二

2009年11月2日 星期一

2009年11月1日 星期日

2009年10月28日 星期三

文章分類

網誌存檔

關於我自己