ARM Cortex M4 SVC_Handler "UsageFault" - gcc

I'm creating a context switch program for a personal mini ARM kernel project and the context switch program is written entirely in Assembly. The problem is when I make a SVC call (svc 0) I enter the SVC_Handler but when I try to execute the next instruction I then enter a different handler ("UsageFault_Handler"). The fault occurs before I can pop any of the registers in the SVC_Handler.
Here's a register dump of my gdb screen (right after I enter SVC_Handler and encounter UsageFault_Handler):
(gdb) i r
r0 0x1 1
r1 0x20000bcc 536873932
r2 0x40004404 1073759236
r3 0x1 1
r4 0x0 0
r5 0xc 12
r6 0x3 3
r7 0x20000fe4 536874980
r8 0x1 1
r9 0x0 0
r10 0xb 11
r11 0xa 10
r12 0x2 2
sp 0x2001ffa8 0x2001ffa8
lr 0xfffffff1 4294967281
pc 0x8000188 0x8000188 <UsageFault_Handler>
cpsr 0x3 3
And my context switch:
activate:
cpsie i
/* save kernel state into msp */
mrs ip, msp
push {r4-r11,ip,lr}
/* retrieve routine parameters and switch to the process stack psp */
ldmfd r0!, {ip,lr}
msr control, ip
isb
msr psp, r0
/* software stack frame. load user state */
pop {r4-r11}
/* hardware stack frame. the cpu pops r0-r3, r12 (IP), LR, PC, xPSR automatically */
/* jump to user task*/
bx lr
SVC_Handler:
/* automatically use the msp as the sp when entering handler mode */
/* pop msp stack */
pop {r4-r11,ip,lr}
mov sp, ip
/* back to the thread mode if no other active exception */
bx lr
Not sure what could be causing this problem because I made sure interrupts are enabled and initialized SVC priority to 0x0 (highest priority). Also, I'm using the ARM Cortex M4 STM32F411E evaluation board.

The problem was my interrupt vectors were all even numbers (ARM mode). Attempting to execute instructions when the T bit is 0 (least significant bit of the vector numbers) results in a fault or lockup. Since Cortex-M runs only in Thumb2 state I had to indicate my exception was running in thumb state by placing ".thumb_func" above my interrupt handler in my context-switch assembly.

Related

AVR gcrt1.S weird call

So basically i have decompiled unoptimized simple program and saw that it runs through gcrt1.S, and i dived in to assembly language and tried to understand what exactly it does. here is my code and my assumption of what it does
00000034 CLR R1 Clear Register
00000035 OUT 0x3F,R1 Out to I/O location
00000036 SER R28 Set Register
00000037 LDI R29,0x08 Load immediate
00000038 OUT 0x3E,R29 Out to I/O location
00000039 OUT 0x3D,R28 Out to I/O location
0000003A CALL 0x00000040 Call subroutine
0000003C JMP 0x00000050 Jump
0000003E JMP 0x00000000 Jump
Clear R1
Clear stratus register
Set R28 1111 1111
Here is where my questions start:
Load R29 from 0x08 (PORTC ?)
OUT to SPH <-R29
OUT to SPL <-R28
Call Main
The confuision that i have is why it loads byte from PORTC register, since the default would be 0x00 anyway
Microcontroller is atmega328p link to a datasheet
Load R29 from 0x08 (PORTC ?)
The instruction is LDI R29,0x08 which loads 8 into R29. LDI is "load immediate to register"; it does not read from memory, see section "31. Instruction Set Summary" in the ATmega328 manual you are using. The code is initializing the frame pointer Y from symbol __stack, see startup code in gcrt1.S.

avr-gdb can not understand my input address

I use simavr and avr-gdb to debug a .hex file, here is the problem:
(gdb) i r pc
pc 0xcd0 0xcd0
(gdb) x/10i 0xc4
0x8000c4: nop
0x8000c6: nop
0x8000c8: nop
0x8000ca: nop
0x8000cc: nop
0x8000ce: nop
0x8000d0: nop
0x8000d2: nop
0x8000d4: nop
0x8000d6: nop
(gdb) x/10i $pc-0xc0c
0xc4: eor r1, r1
0xc6: out 0x3f, r1 ; 63
0xc8: ldi r28, 0xFF ; 255
0xca: ldi r29, 0x08 ; 8
0xcc: out 0x3e, r29 ; 62
0xce: out 0x3d, r28 ; 61
0xd0: ldi r17, 0x05 ; 5
0xd2: ldi r26, 0x00 ; 0
0xd4: ldi r27, 0x01 ; 1
0xd6: ldi r30, 0xEA ; 234
(gdb)
seems that avr-gdb can not understand my input address, and add an offset.
I'm the author of simavr. Sorry I'm not a member of stackoverflow.
The reason you see these addresses is that gdb/gcc do not handle very well architectures that have overlapping 'address space'. The AVR SRAM starts at 0x000, the AVR Flash starts also at 0x000 and the.... eeprom is ALSO considered to be at 0x000 -- this is the 'harvard' architecture.
So, to make gcc/gdb works, everything is compiled in 'virtual address spaces' by adding an arbitrary constant to these offsets -- So the breakdown is that the Flash is considered to be at 0x000 (fine!) the SRAM is considered to be at 0x800000 and the eeprom at 0x810000.
This allows gcc/gdb to be happy -- however the price to pay is that you will see these wierd addressed when debugging, as gdb firmly believe everything is at these offsets.
The best way to handle this is to ... ignore it! There's very little we can do -- I didn't come up with it, it was rolled into avr-gcc long before simavr started in 2009.
You can see the 'address decoder' for simavr addresses there, perhaps it makes things a little bit clearer.
https://github.com/buserror/simavr/blob/4c9efe1fc44b427a4ce1ca8e56e0843c39d0014d/simavr/sim/sim_gdb.c#L357
Hope this help -- if you have further questions, feel free to pop in to freenode #simavr, or even open and 'issue' on github.

LDMIA instruction not working correctly on external SRAM in cortex M4

I am using STM32L486ZG board in thumb mode. I am running a simple bare-metal application without any RTOS. I have external SRAM connected to the board using FSM. The external SRAM is located at address 0x60000000. The system is initialized and running at 72MHz (i have tried this issue with frequency from 18-80 MHz) now in my main function i have following code:
int main(){
asm volatile (
"push {r0}\n"
"mov r0, #0x60000000\n"
"add r0, #0x400\n"
"stmdb r0!, {r1-r12}\n"
"ldmia r0!, {r1-r12}\n"
"pop {r0}\n"
);
}
According to this code no register should be changed after this main function has executed, but that's not the case after the following instruction
ldmia r0!, {r1-r12}
i.e. r9 is not correct after execution. stmdb instruction is working correctly but ldmia is not loading the data correctly. I have verified this by viewing the contents from memory.
This issue is persistent with any arguments in the ldmia instruction: the 9th register is always affected.
Explanation:
Lets say I am debugging this code and the next instruction to execute is this:
stmdb r0!, {r1-r12}
after stepping up all these registers have been saved in the memory and value of r0 is 0x600003d0
the contents of memory:
0x600003D0 00000000 40021008 0000000C .......#....
0x600003DC 40000000 00000000 00000000 ...#........
0x600003E8 20017FEC 00000000 00000000 ì.. ........
0x600003F4 00000000 00000000 00000000 ............
content of the registers:
r0 0x600003d0
r1 0x00000000
r2 0x40021008
r3 0x0000000c
r4 0x40000000
r5 0x00000000
r6 0x00000000
r7 0x20017fec
r8 0x00000000
r9 0x00000000
r10 0x00000000
r11 0x00000000
r12 0x00000000
this shows that all the registers are successfully saved in the memory. Now i step the next instruction
ldmia r0!, {r1-r12}
after this
these are the contents of registers:
r0 0x60000400
r1 0x00000000
r2 0x40021008
r3 0x0000000c
r4 0x40000000
r5 0x00000000
r6 0x00000000
r7 0x20017fec
r8 0x00000000
r9 0x555555d5
r10 0x00000000
r11 0x00000000
r12 0x00000000
as you can see all the registers are restored except r9 which oddly has its value "pop"ed from 0x60000000 instead of 0x600003F0.
Any idea what could be causing this issue. I am using Jlink to write into flash.
P.S. This issue doesn't occur when the registers are saved to onchip SRAM as opposed to external SRAM;
edit
if the instruction
ldmia r0!, {r1-r12}
is split into two parts like:
ldmia r0!, {r1-r6}
ldmia r0!, {r7-r12}
then all the registers are restored successfully
You need to read the STM32L4xx6xx Silicon Limitations. Section 2.2.4 Read burst access of nine words or more is not supported by FMC. ( DocID026121 Rev 4 ) available from ST.
"CPU read burst access equal to or more than 9 registers to FMC returns corrupted data
starting from the 9th read word. These bursts can only be generated by Cortex®-M4 CPU
and not by the other masters (i.e not by DMA).
This issue occurs when the stack is remapped on the external memory on the FMC and
POP operations are performed with 9 or more registers.
This also occurs when LDM/VLDM operations are used with 9 or more registers."

SysTick Interrupt pending but won't execute, debug interrupt mask issue?

I've been trying to get a SysTick interrupt to work on a TM4C123GH6PM7. It's a cortex m4 based microcontroller. When using the Keil Debugger I can see that the Systick interrupt is pending int NVIC but it won't execute the handler. There are no other exceptions enabled and I have cleared the PRIMASK register. The code below is how I initialise the interrupt:
systck_init LDR R0,=NVIC_ST_CTRL_R
LDR R1,=NVIC_ST_RELOAD_R
LDR R2,=NVIC_ST_CURRENT_R
MOV R3,#0
STR R3,[R0]
STR R3,[R2]
MOV R3,#0x000020
STR R3,[R1]
MOV R3,#7
STR R3,[R0]
LDR R3,=NVIC_EN0_R
LDR R4,[R3]
ORR R4,#0x00008000
STR R4,[R3]
CPSIE I
MOV R3,#0x3
MSR CONTROL,R3
After a lot of searching I found that it may be the debugger masking all interrupts. The bit to control this is in a register called the Debug Halting Status and Control Register. Though I can't seem to view it in the debugger nor read/write to it with debug commands.
I used the Startup.s supplied by Keil and as far as I can tell the vectors/labels are correct.
And yes I know. Why bother doing it all in assembly.
Any ideas would be greatly appreciated. First time posting :)
I can see that the Systick interrupt is pending int NVIC
Systick has neither Enable nor Pending register bits in the NVIC. It is special that way, being tightly coupled to the MCU core itself.
Using 0x20 for the reload value is also dangerously low. You may get "stuck" in the Systick Handler, unable to leave it because the next interrupt triggers too early. Remember that Cortex M4 requires at least 12 clocks to enter and exit an interrupt handler - that consumes 24 out of your 32 cycles.
Additional hint: You last instruction changes the register used for the SP from MSP to PSP, but I don't see your code setting up the PSP first.
Be sure to implement the Hardfault_Handler - your code most likely triggers it.

What is ALIGN in arch/i386/kernel/head.S in linux source code

In the head.s file present in linux source code at path arch/i386/kernel/head.S, ALIGN is used as seen in code snippet given below after ret instruction. My question is that what is this ALIGN, as per my knowledge it is not instruction, not assembler directive, so what is this and why it is used here?
You can get the code of head.S at site given below:
http://kneuro.net/cgi-bin/lxr/http/source/arch/i386/kernel/head.S?v=2.4.0
Path: arch/i386/kernel/head.S
/*
* We depend on ET to be correct. This checks for 287/387.
*/
check_x87:
movb $0,X86_HARD_MATH
clts
fninit
fstsw %ax
cmpb $0,%al
je 1f
movl %cr0,%eax
xorl $4,%eax
movl %eax,%cr0
ret
ALIGN /* why ALIGN is used and what it is? */
1: movb $1,X86_HARD_MATH
.byte 0xDB,0xE4
ret
Actually ALIGN is just a macro, defined at include/linux/linkage.h file:
#ifdef __ASSEMBLY__
#define ALIGN __ALIGN
And __ALIGN definition depends on architecture. For x86 you have next definition (in kernel 2.4), in the same file:
#if defined(__i386__) && defined(CONFIG_X86_ALIGNMENT_16)
#define __ALIGN .align 16,0x90
#define __ALIGN_STR ".align 16,0x90"
#else
#define __ALIGN .align 4,0x90
#define __ALIGN_STR ".align 4,0x90"
#endif
So in the end ALIGN macro is just .align asm directive, and it's either 4- or 16-bytes alignment (depending on CONFIG_X86_ALIGNMENT_16 option value).
You can figure out your CONFIG_X86_ALIGNMENT_16 option value from arch/i386/config.in file. This value actually depends on your processor family.
Another question is why such an alignment is needed at all. And my understanding is next. Usually CPU can access only aligned addresses on bus (for 32-bit bus the address usually should be aligned by 4 bytes, e.g. you can access 0x0, 0x4, 0x8 addresses etc., but you can't access 0x1, 0x3 addresses, because it would lead to unaligned access on bus).
But in your case I believe it's not the case, and alignment is done only for performance reasons. Basically this alignment allows CPU to fetch 1: section more quickly:
ALIGN
1: movb $1,X86_HARD_MATH
.byte 0xDB,0xE4
ret
So it seems like this ALIGN is just some minor optimization.
See also next topics:
[1] Why should code be aligned to even-address boundaries on x86?
[2] Performance optimisations of x86-64 assembly - Alignment and branch prediction

Resources