Context switch using arm inline assembly - gcc

I have another question about an inline assembly instruction concerning a context switching. This code may work but I'm not sure at 100% so I submit this code to the pros of stackoverflow ;-)
I'm compiling using gcc (no optimization) for an arm7TDMI. At some point, the code must do a context switching.
/* Software Interrupt */
/* we must save lr in case it is called from SVC mode */
#define ngARMSwi(code) __asm__("SWI %0" : : "I"(code) : "lr")
// Note : code = 0x23
When I check the compiled code, I get this result :
svc 0x00000023
The person before me who coded this wrote "we must save lr" but in the compiled code, I don't see any traces of lr being saved.
The reason I think that code could be wrong is that the program run for some time before going into a reset exception and one of the last thing the code execute is a context switch...

The __asm__ statement lists lr as a clobbered register. This means that the compiler will save the register if it needs to.
As you're not seeing any save, I think you can assume the compiler was not using that register (in your testcase, at least).

I think that SWI instruction should be called in the user mode. if this is right. The mode of ARM is switched to SVC mode after this instruction. then the ARM core does the copy operation that the CPSR is copied into SPSR_svc and LR is copied into LR_svc. this should be used for saving the user mode cpu's context to return from svc mode. if your svc exception handler use lr like calling another function the lr register should be required to be preserved like using stack between the change of the mode. i guess the person before you wrote like that to talk about this situation.

Related

Cortex M3 - Calling a SVC inside a C function, and returning to thread mode

I am writing System Calls, as recommended by Joseph Yiu (M3 Guide), by taking the arguments from the stack. The Assembly SVC Handler is like:
SVC_Handler:
MOV R0, #0
MSR CONTROL, R0
CMP LR, #0xFFFFFFFD
BEQ KernelEntry
B KernelExit
KernelEntry:
<save current user stack>
B svchandler_main
KernelExit:
<code to restore the very same saved user stack saved before>
MOV LR, #0xFFFFFFFFD
BX LR
Well, so the svchandler_main is C function that recovers the immediates (arguments of the the system call) creates the kernel stack and branches to 0xFFFFFFFF9 (MSP privileged). The system call itself is made like:
#define svc(code) asm volatile ("svc %[immediate]"::[immediate] "I" (code))
void SysCall_MyCall(int32_t args)
{
svc(CallBack_Number);
}
That said, the callback function, running in handler mode:
void SysCallBack(void* args)
{
/* <my c routine>*/
asm volatile("svc #0"); //to exit the kernel mode
}
The last SV Call is performed so that the SVC_Handler in assembly will identify it is coming from handler mode (MSP privileged) and will exit the kernel - kind of a cooperative scheduling on the kernel. The problem is the context is saved with PSP pointing inside SysCall_MyCall, and it returns to there, and never exits. If I use inline functions, I will lose the handy svchandler_main. Any ideas? I didnt write the svchandler_main here because it is a classic code found on ARM application notes. Thanks.
Edit, to clarify: I am not branching to the callback function INSIDE SVC Handler. It creates the callback stack, change the LR to 0xFFFFFFF9 and execute a BX LR, exiting the the interruption, and going to the indicated MSP. To exit kernel another SVC is called, and user thread resumed.
It seems that you misunderstand how exception entry and return works on the Cortex-M. When you issue an SVC instruction from thread mode, the CPU transitions to handler mode just as for any other exception.
Handler mode is always privileged, and always uses the main stack (MSP). Thread mode can be either privileged or unprivileged depending on the nPRIV bit (bit 0) in the CONTROL register, and may be configured to use the process stack (PSP) by setting the SPSEL bit (bit 1) in the CONTROL register from thread mode.
On entry to handler mode, r0-r3, r12, lr, pc and xPSR are pushed to the active stack (PSP or MSP, depending on which is in use) and an exception return value is loaded into lr. The stack is switched to MSP. At the end of the handler, a BX lr instruction (or equivalent) causes this value to be used as a branch target, which automatically causes a restoration of the previous mode and stack, and pops r0-r3, r12, lr, pc and xPSR. The pop of pc restores execution from the place where the interrupt occurred.
The important thing about this mechanism is that it is 100% compatible with the ARM ABI. In other words, it is possible to write an ordinary function and use it as an exception handler, just by placing the address of the function in the appropriate place in the interrupt vector table. That's because the return at the end of a function is actioned by BX lr or equivalent, which is exactly the same instruction that triggers a return from handler mode.
So to write an SVC handler that makes use of callbacks, it is necessary to:
Work out which stack was in use when the SVC instruction was issued
Dig out the stacked pc to find the address of the SVC instruction itself, and extract the 8-bit constant from within the SVC instruction
Use the constant to work out which callback to invoke
Branch to (not call) the appropriate callback
The callback can be a perfectly ordinary function. When it returns, it will trigger the return from handler mode because the appropriate exception return code will still be in lr.
A handler that does all of this is presented in the M3 Guide, chapter 10.
If it is required that the callback receives arguments, this is a bit more complex but I can expand my answer if you'd like. Generally handler callbacks execute in handler mode (that's pretty much the point of SVC). If for some reason you need the callback to be executed without privilege, that's more complex still; there is an example in Chapter 23 of the M3 guide, though. You refer in the comments to not wanting to "manage nested interrupts" but really nested interrupts just manage themselves.

Cortex M0 HardFault_Handler and getting the fault address

I'm having a HardFault when executing my program. I've found dozens of ways to get PC's value, but I'm using Keil uVision 5 and none of them has worked.
As far as I know I'm not in a multitasking context, and PSP contains 0xFFFFFFF1, so adding 24 to it would cause overflow.
Here's what I've managed to get working (as in, it compiles and execute):
enum { r0, r1, r2, r3, r12, lr, pc, psr};
extern "C" void HardFault_Handler()
{
uint32_t *stack;
__ASM volatile("MRS stack, MSP");
stack += 0x20;
pc = stack[pc];
psr = stack[psr];
__ASM volatile("BKPT #01");
}
Note the "+= 0x20", which is here to compensate for C function stack.
Whenever I read the PC's value, it's 0.
Would anyone have working code for that?
Otherwise, here's how I do it manually:
Put a breakpoint on HardFault_Handler (the original one)
When it breaks, look as MSP
Add 24 to its value.
Dump memory at that address.
And there it is, 0x00000000.
What am I doing wrong?
A few problems with your code
uint32_t *stack;
__ASM volatile("MRS stack, MSP");
MRS supports register destinations only. Your assembler migt be clever enough to transfer it to a temporary register first, but I'd like to see the machine code generated from that.
If you are using some kind of multitasking system, it might use PSP instead of MSP. See the linked code below on how one can distinguish that.
pc = stack[pc];
psr = stack[psr];
It uses the previous values of pc and psr as an index. Should be
pc = stack[6];
psr = stack[7];
Whenever I read the PC's value, it's 0.
Your program might actually have jumped to address 0 (e.g. through a null function pointer), tried to execute the value found there, which was probably not a valid instruction but the initial SP value from the vector table, and faulted on that. This code
void (*f)(void) = 0;
f();
does exactly that, I'm seeing 0x00000000 at offset 24.
Would anyone have working code for that?
This works for me. Note the code choosing between psp and msp, and the __attribute__((naked)) directive. You could try to find some equivalent for your compiler, to prevent the compiler from allocating a stack frame at all.

PPC64 subroutine call from inline assembly

I am trying to write a simple function using inline assembly in C for powerpc64, my function calls another function and I have a couple of questions related to that.
1) How do I save the LR register before branching using 'bl ' to the subroutine?
Specifically, for this code:
void func(void *arg1, void *arg2)
{
void *result;
__asm__ volatile (
...
...
"bl <address>\n" //Call to subroutine
"nop\n"
...
: [result]"=r"(result)
: [arg1]"r"(arg1),
[arg2]"r"(arg2)
);
return result;
}
The compiler generates the prologue code for this without the "mflr 0;std 0, 16(1)" instructions to save LR since it does not know that a subroutine is being called in my assembly code. Do I include these instructions in my assembly code? If so, how do I know the stack size created by the compiler prologue code to get to the LR save area of the function calling 'func'? (from powerpc assembly tutorials on developerworks the LR registers needs to be saved in the 'calling' function's stack frame)
2) I believe I will need to save arg1 and arg2 before calling the subroutine, which is the right place to temp store these parameters before making a subroutine call - the parameter save area or non-volatile registers? I just want to know the right way this is done in production quality ppc64 code
Thanks in advance!
AFAIK there is no way to do this properly. Inline asm is not designed for calling functions.
You can't reliably know the size of the stack frame the compiler has generated, in fact the entire function could be inlined, or as you observed the compiler might not generate a stack frame at all.
But you don't have to store LR in the caller stack frame, it's best if you do, but it's not 100% required. So just put it in a non-volatile, mark that register as clobbered, and restore it on the way back.
You shouldn't need to save arg1 and arg2, but what you must do is mark all the volatile registers as clobbered. Then the compiler will save anything that is in volatile registers (like arg1 and arg2) before it calls your asm. Also remember that some CR fields might be clobbered. I'd also add 'memory' to the clobbers so that GCC is pessimistic about optimising across the asm.
If you do all that it might work, unless I'm forgetting something :)

GDB doesn't disassemble program running in RAM correctly

I have an application compiled using GCC for an STM32F407 ARM processor. The linker stores it in Flash, but is executed in RAM. A small bootstrap program copies the application from Flash to RAM and then branches to the application's ResetHandler.
memcpy(appRamStart, appFlashStart, appRamSize);
// run the application
__asm volatile (
"ldr r1, =_app_ram_start\n\t" // load a pointer to the application's vectors
"add r1, #4\n\t" // increment vector pointer to the second entry (ResetHandler pointer)
"ldr r2, [r1, #0x0]\n\t" // load the ResetHandler address via the vector pointer
// bit[0] must be 1 for THUMB instructions otherwise a bus error will occur.
"bx r2" // jump to the ResetHandler - does not return from here
);
This all works ok, except when I try to debug the application from RAM (using GDB from Eclipse) the disassembly is incorrect. The curious thing is the debugger gets the source code correct, and will accept and halt on breakpoints that I have set. I can single step the source code lines. However, when I single step the assembly instructions, they make no sense at all. It also contains numerous undefined instructions. I'm assuming it is some kind of alignment problem, but it all looks correct to me. Any suggestions?
It is possible that GDB relies on symbol table to check instruction set mode which can be Thumb(2)/ARM. When you move code to RAM it probably can't find this information and opts back to ARM mode.
You can use set arm force-mode thumb in gdb to force Thumb mode instruction.
As a side note, if you get illegal instruction when you debugging an ARM binary this is generally the problem if it is not complete nonsense like trying to disassembly data parts.
I personally find it strange that tools doesn't try a heuristic approach when disassembling ARM binaries. In case of auto it shouldn't be hard to try both modes and do an error count to decide which mode to use as a last resort.

inline assembly error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'

I have an inline AT&T style assembly block, which works with XMM registers and there are no problems in Release configuration of my XCode project, however I've stumbled upon this strange error (which is supposedly a GCC bug) in Debug configuration... Can I fix it somehow? There is nothing special in assembly code, but I am using a lot of memory constraints (12 constraints), can this cause this problem?
Not a complete answer, sorry, but the comments section is too short for this ...
Can you post a sample asm("..." :::) line that demonstrates the problem ?
The use of XMM registers is not the issue, the error message indicates that GCC wanted to create code like, say:
movdqa (%rax),%xmm0
i.e. memory loads/stores through pointers held in general registers, and you specified more memory locations than available general-purpose regs (it's probably 12 in debug mode because because RBP, RSP are used for frame/stackpointer and likely RBX for the global offset table and RAX reserved for returns) without realizing register re-use potential.
You might be able to eek things out by doing something like:
void *all_mem_args_tbl[16] = { memarg1, memarg2, ... };
void *trashme;
asm ("movq (%0), %1\n\t"
"movdqa (%1), %xmm0\n\t"
"movq 8(%0), %1\n\t"
"movdqa (%1), %xmm1\n\t"
...
: "r"all_mem_args_tbl : "r"(trashme) : ...);
i.e. put all the mem locations into a table that you pass as operand, and then manage the actual general-purpose register use on your own. It might be two pointer accesses through the indirection table, but whether that makes a difference is hard to say without knowing your complete assembler code piece.
The Debug configuration uses -O0 by default. Since this flag disables optimisations, the compiler is probably not being able to allocate registers given the constraints specified by your inline assembly code, resulting in register starvation.
One solution is to specify a different optimisation level, e.g. -Os, which is the one used by default in the Release configuration.

Resources