Cortex M0 HardFault_Handler and getting the fault address - debugging

I'm having a HardFault when executing my program. I've found dozens of ways to get PC's value, but I'm using Keil uVision 5 and none of them has worked.
As far as I know I'm not in a multitasking context, and PSP contains 0xFFFFFFF1, so adding 24 to it would cause overflow.
Here's what I've managed to get working (as in, it compiles and execute):
enum { r0, r1, r2, r3, r12, lr, pc, psr};
extern "C" void HardFault_Handler()
{
uint32_t *stack;
__ASM volatile("MRS stack, MSP");
stack += 0x20;
pc = stack[pc];
psr = stack[psr];
__ASM volatile("BKPT #01");
}
Note the "+= 0x20", which is here to compensate for C function stack.
Whenever I read the PC's value, it's 0.
Would anyone have working code for that?
Otherwise, here's how I do it manually:
Put a breakpoint on HardFault_Handler (the original one)
When it breaks, look as MSP
Add 24 to its value.
Dump memory at that address.
And there it is, 0x00000000.
What am I doing wrong?

A few problems with your code
uint32_t *stack;
__ASM volatile("MRS stack, MSP");
MRS supports register destinations only. Your assembler migt be clever enough to transfer it to a temporary register first, but I'd like to see the machine code generated from that.
If you are using some kind of multitasking system, it might use PSP instead of MSP. See the linked code below on how one can distinguish that.
pc = stack[pc];
psr = stack[psr];
It uses the previous values of pc and psr as an index. Should be
pc = stack[6];
psr = stack[7];
Whenever I read the PC's value, it's 0.
Your program might actually have jumped to address 0 (e.g. through a null function pointer), tried to execute the value found there, which was probably not a valid instruction but the initial SP value from the vector table, and faulted on that. This code
void (*f)(void) = 0;
f();
does exactly that, I'm seeing 0x00000000 at offset 24.
Would anyone have working code for that?
This works for me. Note the code choosing between psp and msp, and the __attribute__((naked)) directive. You could try to find some equivalent for your compiler, to prevent the compiler from allocating a stack frame at all.

Related

Debugging a hard fault in ARM Cortex-M4

I am at my wit's end trying to debug a hard fault on an EFR32BG12 processor. I've been following the instructions in the Silicon Labs knowledge base here:
https://www.silabs.com/community/mcu/32-bit/knowledge-base.entry.html/2014/05/26/debug_a_hardfault-78gc
I've also been using the Keil app note here to fill in some details:
http://www.keil.com/appnotes/files/apnt209.pdf
I've managed to get the hard fault to occur quite consistently in one place. When the hard fault occurs, the code from the knowledge base article gives me the following values (pushed onto the stack by the processor before calling the hard fault handler):
Name Type Value Location
~~~~ ~~~~ ~~~~~ ~~~~~~~~
cfsr uint32_t 0x20000 (Hex) 0x2000078c
hfsr uint32_t 0x40000000 (Hex) 0x20000788
mmfar uint32_t 0xe000ed34 (Hex) 0x20000784
bfar uint32_t 0xe000ed38 (Hex) 0x20000780
r0 uint32_t 0x0 (Hex) 0x2000077c
r1 uint32_t 0x8 (Hex) 0x20000778
r2 uint32_t 0x0 (Hex) 0x20000774
r3 uint32_t 0x0 (Hex) 0x20000770
r12 uint32_t 0x1 (Hex) 0x2000076c
lr uint32_t 0xab61 (Hex) 0x20000768
pc uint32_t 0x38dc8 (Hex) 0x20000764
psr uint32_t 0x0 (Hex) 0x20000760
Looking at the Keil app note, I believe a CFSR value of 0x20000 indicates a Usage Fault with the INVSTATE bit set, i.e.:
INVSTATE: Invalid state: 0 = no invalid state 1 = the processor has
attempted to execute an instruction that makes illegal use of the
Execution Program Status Register (EPSR). When this bit is set, the PC
value stacked for the exception return points to the instruction that
attempted the illegal use of the EPSR. Potential reasons: a) Loading a
branch target address to PC with LSB=0. b) Stacked PSR corrupted
during exception or interrupt handling. c) Vector table contains a
vector address with LSB=0.
The PC value pushed onto the stack by the exception (provided by the code from the knowledge base article) seems to be 0x38dc8. If I go to this address in the Simplicity Studio "Disassembly" window, I see the following:
00038db8: str r5,[r5,#0x14]
00038dba: str r0,[r7,r1]
00038dbc: str r4,[r5,#0x14]
00038dbe: ldr r4,[pc,#0x1e4] ; 0x38fa0
00038dc0: strb r1,[r4,#0x11]
00038dc2: ldr r5,[r4,#0x64]
00038dc4: ldrb r3,[r4,#0x5]
00038dc6: movs r3,r6
00038dc8: strb r1,[r4,#0x15]
00038dca: ldr r4,[r4,#0x14]
00038dcc: cmp r7,#0x6f
00038dce: cmp r6,#0x30
00038dd0: str r7,[r6,#0x14]
00038dd2: lsls r6,r6,#1
00038dd4: movs r5,r0
00038dd6: movs r0,r0
The address appears to be well past the end of my code. If I look at the same address in the "Memory" window, this is what I see:
0x00038DC8 69647561 2E302F6F 00766177 00000005 audio/0.wav.....
0x00038DD8 00000000 000F4240 00000105 00000000 ....#B..........
0x00038DE8 00000000 00000000 00000005 00000000 ................
0x00038DF8 0001C200 00000500 00001000 00000000 .Â..............
0x00038E08 00000000 F00000F0 02F00001 0003F000 ....ð..ð..ð..ð..
0x00038E18 F00004F0 06010005 01020101 01011201 ð..ð............
0x00038E28 35010121 01010D01 6C363025 2E6E6775 !..5....%06lugn.
0x00038E38 00746164 00000001 000008D0 00038400 dat.....Ð.......
Curiously, "audio/0.wav" is a static string which is part of the firmware. If I understand correctly, what I've learned here is that PC somehow gets set to this point in memory, which of course is not a valid instruction and causes the hard fault.
To debug the issue, I need to know how PC came to be set to this incorrect value. I believe the LR register should give me an idea. The LR register pushed onto the stack by the exception seems to be 0xab61. If I look at this location, I see the following in the Disassembly window:
1270 dp->sect = clst2sect(fs, clst);
0000ab58: ldr r0,[r7,#0x10]
0000ab5a: ldr r1,[r7,#0x14]
0000ab5c: bl 0x00009904
0000ab60: mov r2,r0
0000ab62: ldr r3,[r7,#0x4]
0000ab64: str r2,[r3,#0x18]
It looks to me like the problem occurs during this call specifically:
0000ab5c: bl 0x00009904
This makes me think that the problem occurs as a result of a corrupt stack, which causes clst2sect to return to an invalid part of memory rather than to 0xab60. The code for clst2sect is pretty innocuous:
/*-----------------------------------------------------------------------*/
/* Get physical sector number from cluster number */
/*-----------------------------------------------------------------------*/
DWORD clst2sect ( /* !=0:Sector number, 0:Failed (invalid cluster#) */
FATFS* fs, /* Filesystem object */
DWORD clst /* Cluster# to be converted */
)
{
clst -= 2; /* Cluster number is origin from 2 */
if (clst >= fs->n_fatent - 2) return 0; /* Is it invalid cluster number? */
return fs->database + fs->csize * clst; /* Start sector number of the cluster */
}
Does this analysis sound about right?
I suppose the problem I've run into is that I have no idea what might cause this kind of behaviour... I've tried putting breakpoints in all of my interrupt handlers, to see if one of them might be corrupting the stack, but there doesn't seem to be any pattern--sometimes, no interrupt handler is called but the problem still occurs.
In that case, though, it's hard for me to see how a program might try to execute code at a location well past the actual end of the code... I feel like a function pointer might be a likely candidate, but in that case I would expect to see the problem show up, e.g., where a function pointer is used. However, I don't see any function pointers used near where the error is occurring.
Perhaps there is more information I can extract from the debug information I've given above? The problem is quite reproducible, so if there's something I have not tried, but which you think might give some insight, I would love to hear it.
Thanks for any help you can offer!
After about a month of chasing this one, I managed to identify the cause of the problem. I hope I can give enough information here that this will be useful to someone else.
In the end, the problem was caused by passing a pointer to a non-static local variable to a state machine which changed the value at that memory location later on. Because the local variable was no longer in scope, that memory location was a random point in the stack, and changing the value there corrupted the stack.
The problem was difficult to track down for two reasons:
Depending on how the code compiled, the changed memory location could be something non-critical like another local variable, which would cause a much more subtle error. Only when I got lucky would the change affect the PC register and cause a hard fault.
Even when I found a version of the code that consistently generated a hard fault, the actual hard fault typically occurred somewhere up the call stack, when a function returned and popped the stack value into PC. This made it difficult to identify the cause of the problem--all I knew was that something was corrupting the stack before that function return.
A few tools were really helpful in identifying the cause of the problem:
Early on, I had identified a block of code where the hard fault usually occurred using GPIO pins. I would toggle a pin high before entering the block and low when exiting the block. Then I performed many tests, checking if the pin was high or low when the hard fault occurred, and used a sort of binary search to determine the smallest block of code which consistently contained all the hard faults.
The hard fault pushes a number of important registers onto the stack. These helped me confirm where the PC register was becoming corrupt, and also helped me understand that it was becoming corrupt as a result of a stack corruption.
Starting somewhere before that block of code and stepping forward while keeping an eye on local variables, I was able to identify a function call that was corrupting the stack. I could confirm this using Simplicity Studio's memory view.
Finally, stepping through the offending function in detail, I realized that the problem was occurring when I dereferenced a stored pointer and wrote to that memory location. Looking back at where that pointer value was set, I realized it had been set to point to a non-static local variable that was now out of scope.
Thanks to #SeanHoulihane and #cooperised, who helped me eliminate a few possible causes and gave me a little more confidence with the debugging tools.

Branch to an address using GCC inline ARM asm

I want to branch to a particular address(NOT a label) using ARM assembly, without modifying the LR register. So I go with B instead of BL or BX.
I want this to be done in GCC inline asm.
Here is the documentation, and here is what I 've tried:
#define JMP(addr) \
__asm__("b %0" \
: /*output*/ \
: /*input*/ \
"r" (addr) \
);
It is a C macro, that can be called with an address. When I run it I get the following error:
error: undefined reference to 'r3'
The error is because of the usage of "r". I looked into it a bit, and I've found that it could be a bug on gcc 4.9.* version.
BTW, I am using Android/Linux Gcc 4.9 cross compiler, on an OSX.
Also, I don't know wether I should have loaded something on Rm.
Cheers!
Edit:
I changed the macro to this, and I still get undefined reference to r3 and r4:
#define JMP(addr) \
__asm__("LDR r5,=%0\n\t" \
"LDR r4,[r5]\n\t"\
"ADD r4,#1\n\t" \
"B r4" \
: /*output*/ \
: /*input*/ \
"r" (addr) \
: /*clobbered*/ \
"r4" ,"r5" \
);
Explanation:
load the address of the variable to r5, then load the value of that address to r4. Then add 1 to LSB (emm required by ARM specification?). And finally Branch to that address.
Since you are programming in C, you could just use a plain C approach without any assembly at all: just cast the variable, that holds the pointer to address to which you want to jump, to a function pointer and call it right away:
((void (*)(void)) addr)();
just an explanation to this jungle of brackets:
with this code you are casting addr to a pointer (signified by the star (*)) to a function that takes no argument (the second void means there are no arguments) and that also returns nothing (first void). finally the last two brackets are the actual invocation of that function.
Google for "C function pointer" for more information about that approach.
But if that doesn't work for you and you still want to go with the assembly approach, the instruction that you are in looking for is in fact BX (not sure why you excluded that initially. but I can guess that the name "Branch and Exchange" mislead you to believe that the register argument is swapped (and thereby changed) with the program counter, which is NOT the case, but it confused me in the beginning, too).
For that just a simple recap of the instructions:
B would take a label as an argument. Actually the jump will be encoded as an offset from the current position, which tells the processor to jump that many instruction forwards or backwards (normally the compiler, assembler or linker will take care of calculating that offset for you). During execution, control flow will simply be transferred to that position without changing any register (this means also the link register LR will stay unchanged)
BX R0 will take the absolute (so not an offset) address from a register, in this case R0, and continue execution at that address. This is also done without changing any other register.
BL and BLX R0 are the corresponding counterparts to the previous two instruction. They will do the same thing control flow wise, but on top of that save the current program counter in the link register LR. This is needed if the called function is supposed to return later on.
so in essence, what you would need to do is:
asm("BX %0" : : "r"(addr));
instructing the compiler to make sure the variable addr is in a register (r), which you are promising to only read and not to change. on top of that, upon return you won't have changed (clobbered) any other register.
See here
https://gcc.gnu.org/onlinedocs/gcc/Constraints.html
for more information about inline assembly constraints.
To help you understand why there are also other solutions floating around, here some things about the ARM architecture:
the program counter PC is for many instruction accessible as a regular register R15. It's just an alias for that exact register number.
this means that almost all arithmetic and register altering instructions can take it as an argument. However, for many of them it is highly deprecated.
if you are looking at the disassembly of a program compiled to ARM code, any function will end with one of three things:
BX LR which does exactly what you want to do: take the content of the link register (LR is an alias for R14) and jump to that location, effectively returning to the caller
POP {R4-R11, PC} restoring the caller saved register and jumping back to the caller. This will almost certainly have counterpart of PUSH {R4-R11, LR} in the beginning of the function: your are pushing the content of the link register (the return address) onto the stack but store it back into the program counter effectively returning to the caller in the end
B branch to a different function, if this function ends with a tail call and leaving it up to that function to return to the original caller.
Hope that helps,
Martin
You can't branch to a register, you can only branch to a label. If you want to jump to address in a register you need to move it into the PC register (r15).
#define JMP(addr) \
__asm__("mov pc,%0" \
: /*output*/ \
: /*input*/ \
"r" (addr) \
);

GDB doesn't disassemble program running in RAM correctly

I have an application compiled using GCC for an STM32F407 ARM processor. The linker stores it in Flash, but is executed in RAM. A small bootstrap program copies the application from Flash to RAM and then branches to the application's ResetHandler.
memcpy(appRamStart, appFlashStart, appRamSize);
// run the application
__asm volatile (
"ldr r1, =_app_ram_start\n\t" // load a pointer to the application's vectors
"add r1, #4\n\t" // increment vector pointer to the second entry (ResetHandler pointer)
"ldr r2, [r1, #0x0]\n\t" // load the ResetHandler address via the vector pointer
// bit[0] must be 1 for THUMB instructions otherwise a bus error will occur.
"bx r2" // jump to the ResetHandler - does not return from here
);
This all works ok, except when I try to debug the application from RAM (using GDB from Eclipse) the disassembly is incorrect. The curious thing is the debugger gets the source code correct, and will accept and halt on breakpoints that I have set. I can single step the source code lines. However, when I single step the assembly instructions, they make no sense at all. It also contains numerous undefined instructions. I'm assuming it is some kind of alignment problem, but it all looks correct to me. Any suggestions?
It is possible that GDB relies on symbol table to check instruction set mode which can be Thumb(2)/ARM. When you move code to RAM it probably can't find this information and opts back to ARM mode.
You can use set arm force-mode thumb in gdb to force Thumb mode instruction.
As a side note, if you get illegal instruction when you debugging an ARM binary this is generally the problem if it is not complete nonsense like trying to disassembly data parts.
I personally find it strange that tools doesn't try a heuristic approach when disassembling ARM binaries. In case of auto it shouldn't be hard to try both modes and do an error count to decide which mode to use as a last resort.

Context switch using arm inline assembly

I have another question about an inline assembly instruction concerning a context switching. This code may work but I'm not sure at 100% so I submit this code to the pros of stackoverflow ;-)
I'm compiling using gcc (no optimization) for an arm7TDMI. At some point, the code must do a context switching.
/* Software Interrupt */
/* we must save lr in case it is called from SVC mode */
#define ngARMSwi(code) __asm__("SWI %0" : : "I"(code) : "lr")
// Note : code = 0x23
When I check the compiled code, I get this result :
svc 0x00000023
The person before me who coded this wrote "we must save lr" but in the compiled code, I don't see any traces of lr being saved.
The reason I think that code could be wrong is that the program run for some time before going into a reset exception and one of the last thing the code execute is a context switch...
The __asm__ statement lists lr as a clobbered register. This means that the compiler will save the register if it needs to.
As you're not seeing any save, I think you can assume the compiler was not using that register (in your testcase, at least).
I think that SWI instruction should be called in the user mode. if this is right. The mode of ARM is switched to SVC mode after this instruction. then the ARM core does the copy operation that the CPSR is copied into SPSR_svc and LR is copied into LR_svc. this should be used for saving the user mode cpu's context to return from svc mode. if your svc exception handler use lr like calling another function the lr register should be required to be preserved like using stack between the change of the mode. i guess the person before you wrote like that to talk about this situation.

Simple "Hello-World", null-free shellcode for Windows needed

I would like to test a buffer-overflow by writing "Hello World" to console (using Windows XP 32-Bit). The shellcode needs to be null-free in order to be passed by "scanf" into the program I want to overflow. I've found plenty of assembly-tutorials for Linux, however none for Windows. Could someone please step me through this using NASM? Thxxx!
Assembly opcodes are the same, so the regular tricks to produce null-free shellcodes still apply, but the way to make system calls is different.
In Linux you make system calls with the "int 0x80" instruction, while on Windows you must use DLL libraries and do normal usermode calls to their exported functions.
For that reason, on Windows your shellcode must either:
Hardcode the Win32 API function addresses (most likely will only work on your machine)
Use a Win32 API resolver shellcode (works on every Windows version)
If you're just learning, for now it's probably easier to just hardcode the addresses you see in the debugger. To make the calls position independent you can load the addresses in registers. For example, a call to a function with 4 arguments:
PUSH 4 ; argument #4 to the function
PUSH 3 ; argument #3 to the function
PUSH 2 ; argument #2 to the function
PUSH 1 ; argument #1 to the function
MOV EAX, 0xDEADBEEF ; put the address of the function to call
CALL EAX
Note that the argument are pushed in reverse order. After the CALL instruction EAX contains the return value, and the stack will be just like it was before (i.e. the function pops its own arguments). The ECX and EDX registers may contain garbage, so don't rely on them keeping their values after the call.
A direct CALL instruction won't work, because those are position dependent.
To avoid zeros in the address itself try any of the null-free tricks for x86 shellcode, there are many out there but my favorite (albeit lengthy) is encoding the values using XOR instructions:
MOV EAX, 0xDEADBEEF ^ 0xFFFFFFFF ; your value xor'ed against an arbitrary mask
XOR EAX, 0xFFFFFFFF ; the arbitrary mask
You can also try NEG EAX or NOT EAX (sign inversion and bit flipping) to see if they work, it's much cheaper (two bytes each).
You can get help on the different API functions you can call here: http://msdn.microsoft.com
The most important ones you'll need are probably the following:
WinExec(): http://msdn.microsoft.com/en-us/library/ms687393(VS.85).aspx
LoadLibrary(): http://msdn.microsoft.com/en-us/library/windows/desktop/ms684175(v=vs.85).aspx
GetProcAddress(): http://msdn.microsoft.com/en-us/library/ms683212%28v=VS.85%29.aspx
The first launches a command, the next two are for loading DLL files and getting the addresses of its functions.
Here's a complete tutorial on writing Windows shellcodes: http://www.codeproject.com/Articles/325776/The-Art-of-Win32-Shellcoding
Assembly language is defined by your processor, and assembly syntax is defined by the assembler (hence, at&t, and intel syntax) The main difference (at least i think it used to be...) is that windows is real-mode (call the actual interrupts to do stuff, and you can use all the memory accessible to your computer, instead of just your program) and linux is protected mode (You only have access to memory in your program's little cubby of memory, and you have to call int 0x80 and make calls to the kernel, instead of making calls to the hardware and bios) Anyway, hello world type stuff would more-or-less be the same between linux and windows, as long as they are compatible processors.
To get the shellcode from your program you've made, just load it into your target system's
debugger (gdb for linux, and debug for windows) and in debug, type d (or was it u? Anyway, it should say if you type h (help)) and between instructions and memory will be the opcodes.
Just copy them all over to your text editor into one string, and maybe make a program that translates them all into their ascii values. Not sure how to do this in gdb tho...
Anyway, to make it into a bof exploit, enter aaaaa... and keep adding a's until it crashes
from a buffer overflow error. But find exactly how many a's it takes to crash it. Then, it should tell you what memory adress that was. Usually it should tell you in the error message. If it says '9797[rest of original return adress]' then you got it. Now u gotta use ur debugger to find out where this was. disassemble the program with your debugger and look for where scanf was called. Set a breakpoint there, run and examine the stack. Look for all those 97's (which i forgot to mention is the ascii number for 'a'.) and see where they end. Then remove breakpoint and type the amount of a's you found out it took (exactly the amount. If the error message was "buffer overflow at '97[rest of original return adress]" then remove that last a, put the adress you found examining the stack, and insert your shellcode. If all goes well, you should see your shellcode execute.
Happy hacking...

Resources