2008年6月30日 星期一

[轉錄] Tracking down hardware exceptions on hardware

from http://www.newlc.com/tracking-down-hardware-exceptions-hardware

in Platforms:

Hardware exceptions are abnormal events triggered by the ARM instruction being executed. The sources of this can be accessing non-existing memory, accessing mis-aligned memory, writing to a read-only area of memory, jumping to invalid memory address, local stack overflow and so forth.

This tutorial is divided into following sections:

  1. Section 1, describes the classification of debugging techniques.
  2. Section 2, describes very briefly the ARM processor modes.
  3. Section 3, explains an algorithm for debugging hardware exceptions.

Tracking down hardware exceptions is a developer’s nightmare.

My device driver is working correctly giving the desired result with one sample test application, and when I use some other application the device is crashing and rebooting Sad. What wrong did I do:-(? How to find the culprit?

Let us look how to tackle this in the following sections.

1. Classification of Debugging Techniques


Debugging techniques can be classified broadly into the following four categories:

  1. Crash and Burn (Debug Trace)
  2. Debugger Monitor (aka ROM monitor)
  3. In-Circuit Emulator
  4. On Chip Debugger

Crash and Burn/Debug Trace


We sprinkle output messages at various parts of our code. Different tasks or
Routines can “announce” their execution by displaying a message on the output device. This is not necessarily a bad method but it can be extremely slow.
Moreover displaying a message on the output device can also affect the real time performance of the system.

Debugger Monitor


This is a small piece of code (that’s a relative statement) to help with debugging. It usually communicates via a serial interface/ JTAG interface to a host computer or some form of terminal. A basic monitor may allow download of code, reading and writing of memory and registers, and perhaps most importantly setting breakpoints, single stepping, and real-time execution.

In-Circuit Emulator


The in-circuit emulator (ICE) remains the most powerful debugging tool available. Nothing else approaches an ICE’s wide range of troubleshooting resources; no other tool so completely addresses problems stemming from hardware/software interaction.
Emulators traditionally provide many debug features that are common to all debugging tools: breakpoints, single stepping, and ways to display and alter registers and memory. Like similar tools, they link your source code to the debugging session—you break on a line of C code, not hex, and data is shown in its appropriate type.
Unlike other debugging tools though, an ICE includes features you’ll get only with a device packed with hardware. Real-time trace, emulation RAM, complex breakpoints, and other features assist in finding the more difficult problems associated with real-time issues or complicated interactions in the code.

On-Chip Debugging


In the general sense, on-chip debugging is a combination of hardware and software, both on and off the chip. The part that resides on the chip is implemented in various ways. It may be a microcode based monitor (Motorola CPU32) or hardware implemented resources (IBM PPC403/ARM).
A target system with an OCD processor is useless unless they have a host to communicate with. The host runs debugger software and interfaces to the OCD in various ways.

2. A brief introduction to ARM processor modes


Note: In this section the ARM processor register sets and details about the processor modes are not discussed. This contains a very very brief introduction to ARM modes. Please refer to ARM Architecture Reference Manual available at www.arm.com for more details about ARM architecture.

ARM processors support several modes. They are:

  1. User (usr)
  2. System (sys)
  3. Supervisor (svc)
  4. IRQ (irq)
  5. FIQ (fiq)
  6. Abort (abt)
  7. Undefined (und)

The processor modes are used for saving the context when handling an exception/interrupt, to restrict some instructions to privileged mode.

Processor modes and Symbian OS


In Symbian the processor is never put in sys mode.

The processor mode and Symbian OS execution:

  1. usr: The user mode code runs entirely in usr mode of ARM. It has access to only limited access to memory region.
  2. sys: This mode is not used in Symbian.
  3. svc: This mode is entered on a SWI instruction. Device drivers and kernel execute user side requests and DFC in this mode.
  4. irq: This mode is entered on occurrence of a normal interrupt (an interrupt on IRQ pin of ARM). The ISR’s of device drivers are executed in this processor mode. The IRQ dispatcher routine of kernel is executed in this mode.
  5. fiq: This mode is entered on occurrence of a fast interrupt. Kernel’s FIQ dispatched will execute in this mode.
  6. abt: This mode is entered due to hardware exceptions. The exceptions can be data abort or prefetch abort. (These are not detailed here). Symbian OS exception handler routine is executed in this mode.
  7. und: This mode is entered on execution of undefined ARM instruction. This is the least common mode of ARM processor. This can be used for emulating unsupported instructions. Symbian OS exception handler routine is executed in this mode.

Insight to FAR and FSR registers of ARM

Fault Status Register

This register contains abort source similar to Symbian error codes. This is updated after data abort only. There is also a IFSR (Instruction FSR) which will be updated after pre-fetch abort.
Here below find the meaning of the value in the FSR

FSR[3:0]source
0010 Terminal exception
0000 Vector exception
00x1 Alignment
1100 External abort on transation 1st level
1110 External abort on translation 2nd level
0101 Translation section
0111 Translation page
1001 Domain section
1011 Domain page
1101 Permission section
1111 Permission page
0100 External abort on linefetch section
0110 External abort on linefetch page
1000 External abort on non-linefetch section
1010 External abort on non-linefetch page

Fault Address Register

This register is updated after data abort only. It contains data address that triggered exception

CPSR (Current Program Status Register)

The Current Program Status Register is accessible in all processor modes. It contains condition code flags, Interrupt disable bits, the current processor modes, and other status and control information.
The bits [4:0] represent the current processor mode.

CPSR[4:0] Mode
10000 User
10001 FIQ
10010 IRQ
10011 Supervisor
10111 Abort
11011 Undefined
11111 System

Each exception mode also has a Saved Program Status Register (SPSR), that is used to preserve the value of CPSR when the associated exception occurs.

3. A brief introduction to debugging Hardware exceptions

Hardware exceptions are different than the “Panic” of Symbian. These are events triggered by ARM instruction being executed.

The aborts are triggered due to memory access failure. There are two kinds of abort Data abort and Pre-fetch abort.
Data Abort: Triggered while storing or loading data from memory. This may arise due to:

  1. Accessing non-existing memory
    TInt* p=NULL; *p = 123;
  2. Accessing misaligned memory
    HBufC8* des = …;
    TUint8* p = des->Ptr();
    TUint32* q = (TUint32*)(p+7);
    return *q;
  3. Writing to read only memory
    TUint8* p = ReadOnlyBuffer();
    *p = 42
  4. Prefetch abort: Triggered due to jumping to invalid address.
    void (*pfn)(void) = NULL; (*p)();
  5. Local buffer overflow.
    void F()
    {
    // overflow trashes saved return address on stack
    TUint buf[10]; for (TInt i=0; i<100; i++)
    { Function1(); buf[i] = 0;}
    }

Whenever an exception occurs, the PC (Program Counter) is changed to point to pre-defined address (the address table is known as exception vector table) and the execution starts from that address.

Below is the exception vector table:

                     Mode Normal @   High @
reset Svc 0x00000000 0xFFFF0000
Undefined instruction Und 0x00000004 0xFFFF0004
Software interrupt Svc 0x00000008 0xFFFF0008
Prefetch abort Abt 0x0000000C 0xFFFF000C
Data abort Abt 0x00000010 0xFFFF0010
IRQ Irq 0x00000018 0xFFFF0018
FIQ Fiq 0x0000001C 0xFFFF001C

The entries at the specified address contains one 32-bit instruction which generally will be a jump to handler ex: __ArmVectorSwi(), __ArmVectorAbortData(), …
FIQ is the last entry in the table. FIQ handler is concatenated to table address. i.e; the FIQ handler will be residing at the FIQ vector address. This is usually done to save one jump, from FIQ vector address to FIQ handler.

After the preamble of ARM processor mode and exception handling, now let us look at a few methods which can be used to bust a bug causing the exception.

When an exception occur the ARM CPU perform the following actions:

  1. Saves return link in R14_
  2. Saves CPSR (Current Processor State Register of ARM register set) in SPSR_ (Saved Processor State Register)
  3. Sets exception mode in CPSR
  4. Switches to ARM instruction set
  5. Disables IRQ if == irq
  6. Disables IRQ and FIQ if == fiq
  7. Jumps to exception handler thru vector table (explained earlier)

The Symbian Kernel on exception:

  1. Runs first half of exception handler (__ArmVectorAbortData()) (in exec mode)
  2. Runs second half of exception handler (Exc::Dispatch()) (supervisor mode) and ends up in Kern::Fault(). The Kernel faults and the phone reboots.
  3. If Kernel Debugger is present it is run after kernel fault. For information and available commands of kernel debugger refer to “Device Driver Guide for EKA2 versions of Symbian OS “

A sample Debugging Session


Now let us take an example and try to locate/solve the bug. In the example ROM debug monitor is present that will be run when the Kernel faults. In this document debug monitor is not explained.

After the addition of some application, the driver is crashing and causing a system reboot.

In this scenario, the output on the debug port will look like:
debug_monitor.jpg

Extracting information from Kernel register Dump

exception_details.jpg

In our example case the abort is data abort (Exc =1), FSR = 5 and FAR = 0.
Since it is a data abort FSR and FAR are valid.
FSR = 5 indicating that the error is “Translation Section”
FAR = 0 indicating the address that triggered the exception.
From this it looks like the problem occurred due to de-referencing a NULL pointer.

Now we know the reason why there is fault, what next?
Let us try to find the instruction that is de-referencing the NULL pointer.

Inspecting the ARM registers

Using Debug Monitor
When the Kernel faults Symbian Kernel Debugger will run. (This will not be present in production ROM’s. It is used for debugging purpose only.)
The command “r” dumps the content of ARM registers.

The contents of registers in the example scenario are:

MODE_USR:
R0=6571de54 R1=0000002a R2=00000002 R3=ffffffff
R4=0000002a R5=f8170414 R6=6571df14 R7=6403cba8
R8=00000001 R9=6403c41c R10=640002f8 R11=6571de70
R12=00000020 R13=00404e00 R14=f80818c0 R15=f800bfa8
CPSR=80000097
MODE_FIQ:
R8=00000000 R9=ffffffff R10=ffffffff R11=00000000
R12=00000000 R13=64000d0c R14=c080079c SPSR=e00000dc
MODE_IRQ:
R13=6400110c R14=00000013 SPSR=80000013
MODE_SVC:
R13=6571de54 R14=f80328bc SPSR=60000010
MODE_ABT:
R13=6400090c R14=f80818c8 SPSR=80000013
MODE_UND:
R13=6400090c R14=95221110 SPSR=f000009d

The register contents can also be obtained using IDE that supports on target debugging.

From the bits [4:0] in the CPSR it is clear that the current processor mode is abort and the SPSR of abort mode specifies the previous mode.
In our case it is “SPSR=80000013” it is indicating that the previous mode was supervisor (SVC). This indicates that a Kernel side component has dereferenced NULL pointer.

The Link Register R-14:
In case of data abort the link register R14_ABT points to faulty_inst + 8. For other exception pre-fetch abort this does not hold good.

In our case the value of R14_ABT is f827f2c8. So the instruction that generated this exception is at address “f80818c0”.

The information we have till now:

  1. The exception type: “Data Abort”
  2. Cause: due to de-referencing NULL pointer.
  3. A Kernel component has performed this activity.
  4. The (address of) instruction that caused the exception: f80818c0

How to proceed further

Restart the hardware and insert a break point at address “f827f2c0” and execute the application (If the IDE does not support un-winding of call stack after the occurrence of exception, insert two break points one at the faulty instruction and one at instruction prior to that).

When the execution breaks at the break point, unwinding the call stack will show the calling method. In 98% of the cases this will be a device driver not part of standard Symbian delivery.
Inspecting the source code of the method will reveal the exact location and instruction causing the exception. Most probably this will because of not checking the parameter/variable against NULL.

Footer:
One golden rule of thumb: “Always check the value of pointers against NULL before de-referencing them.”. This holds good for both user-side and kernel-side components.

References


  1. ARM Architecture Reference Manual
  2. Hardware debugging Presentation, Symbian,
  3. Device driver Guide for EKA2 Symbian Help manual, Symbian.

This tutorial is not of any help for debugging on emulator.
The basic knowledge of ARM architecture and knowledge of on target debugging are pre-requisite.

沒有留言: