Why Interrupt handler entry code check Carry flag? - linux-kernel

I am trying to generate an interrupt in a VM and have written a simple interrupt handler but when I try to test this interrupt generation and handling, kernel crashes because of page fault. Now I debugged the issue and found out that in 'entry_64.S' file where 'error_entry' is called to push registers onto stack and check for GS there following code:
xorl %ebx,%ebx
testl $3,CS+8(%rsp)
je error_kernelspace
error_swapgs:
SWAPGS
When interrupt is handled, CPU will push EFLAGS to (rsp)+CS+8 location. So in above code 'testl' instruction check if flag's Carry flag was set at the time of interrupt to detect if interrupt was in kernel mode or in user mode.
Can please someone explain why Carry flag is checked here?

Actually, I think it's checking whether CS corresponds to a kernel thread, see the comment for a similar construct at ret_from_fork.

Related

How is execution resumed after a hardware breakpoint without an infinite loop?

As far as I know SW-breakpoints are working as follows:
The instruction the BP is set to gets substituted by a int/trap instruction, than the trap is handled in a trap handler, on continue the trap is replaced by the original instruction, the instruction is executed in single step mode, now the PC points to the next instruction and the original instruction is replaced again by a int/trap instruction.
HW Breakpoints work as follows according to my understanding:
The address of the instruction the BP is set to is written in a HW-BP Register. If the instruction is hit respectively the PC matches the address in the HW-BP Register, the CPU raises an interrupt which is also handled by a trap handler. Now if the program returns to the orignial instruction the HW BP is still active and one is caught in an infinite loop.
How is that problem treated?
Is the HW BP disabled before continuing and is the orignal instruction also getting executed in single step mode? Or is the original instruction executed before the trap handler is entered, so that the trap handler returns to the instruction after the original instruction? Or is there an other mechanism?
In case of the Intel 64 and IA-32 ("x64/x86") architectures, this is the task of the Resume Flag (RF), bit 16 in EFLAGS. (Other processor architectures that support hardware breakpoints probably have a similar mechanism.)
See section 18.3.1.1 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B:
Because the debug exception for an instruction breakpoint is generated before the instruction is executed, if the instruction breakpoint is not removed by the exception handler; the processor will detect the instruction breakpoint again when the instruction is restarted and generate another debug exception. To prevent looping on an instruction breakpoint, the Intel 64 and IA-32 architectures provide the RF flag (resume flag) in the EFLAGS register (see Section 2.3, “System Flags and Fields in the EFLAGS Register,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A). When the RF flag is set, the processor ignores instruction breakpoints.
[...]
The RF Flag is cleared at the start of the instruction after the check for code breakpoint, CS limit violation and FP exceptions.
[...]
If the RF flag in the EFLAGS image is set when the processor returns from the exception handler, it is copied into the RF flag in the EFLAGS register by IRETD/IRETQ or a task switch that causes the return. The processor then ignores instruction breakpoints for the duration of the next instruction. (Note that the POPF, POPFD, and IRET instructions do not transfer the RF image into the EFLAGS register.) Setting the RF flag does not prevent other types of debug-exception conditions (such as, I/O or data breakpoints) from being detected, nor does it prevent non-debug exceptions from being generated.
(Emphasis mine.)
So, the debugger will set RF before returning from the exception handler so that instruction breakpoints are "muted" for one instruction, after which the flag is automatically cleared by the processor.
Note that this is not a concern in the case of data breakpoints because these will fire after the instruction that triggered the read/write operation.
Recommendation: I find the slides of "Intermediate x86 Part 4" by Xeno Kovah to be helpful in understanding these things. He talks about various topics there but starts with debugging. This information in particular can be found on slides 12-13:
Image credit: Xeno Kovah, CC BY-SA 3.0

When kernel stack's esp is stored to TSS for interrupt return iret?

When I read Intel's X86 programmer's manual, see the following for interrupt & interrupt return with stack switching:
interrupt:
If a stack switch does occur, the processor does the following:
Temporarily saves (internally) the current contents of the SS, ESP, EFLAGS, CS, and EIP registers.
Loads the segment selector and stack pointer for the new stack (that is, the stack for the privilege level being called) from the TSS into the SS and ESP registers and switches to the new stack.
Pushes the temporarily saved SS, ESP, EFLAGS, CS, and EIP values for the interrupted procedure’s stack onto the new stack.
Pushes an error code on the new stack (if appropriate).
Loads the segment selector for the new code segment and the new instruction pointer (from the interrupt gate or trap gate) into the CS and EIP registers, respectively.
If the call is through an interrupt gate, clears the IF flag in the EFLAGS register.
Begins execution of the handler procedure at the new privilege level.
On return:
Performs a privilege check.
Restores the CS and EIP registers to their values prior to the interrupt or exception.
Restores the EFLAGS register.
Restores the SS and ESP registers to their values prior to the interrupt or exception, resulting in a stack switch back to the stack of the interrupted procedure.
Resumes execution of the interrupted procedure.
For example, one linux process P:
It's initially in kernel mode
It returns to user mode by iret. But from the manual, there is no change to TSS
It traps into kernel by int. Here it needs to find the kernel stack from ESP & SS in TSS. How is this kernel stack value set up, since they are not stored to TSS in step 2?
Once the kernel returns to user-space for a given task, it's done with that task's kernel stack until the next interrupt / exception. There's no useful data on it, so the TSS can hold a fixed SS:[ER]SP value that points to the top of the virtual page[s] allocated as the kernel stack for the current task.
Kernel state doesn't live on the kernel stack between entries into the kernel; it's kept elsewhere in a process control block. (Context switches between asks actually happen in the kernel, switching kernel stacks to the formerly-sleeping task's kernel stack, so eventually returning to user-space means returning up the call-chain of whatever that task was doing in the kernel first).
BTW, unless the kernel pushes a new CS:EIP / EFLAGS / SS:ESP for iret to pop, the stuff it pops will be the stuff pushed by hardware at the address specified in the TSS. So even if there was some desire to re-enter the kernel with the stack as you left it, that would normally be at the TSS location anyway. But this is irrelevant because Linux doesn't keep stuff on a task's kernel stack while user-space is running, except for a pointer to per-task stuff at the bottom of the region where the kernel can find it with [ER]SP & -16384.
(I think this is right; I've looked at a few bits of Linux kernel code but haven't really gotten my hands dirty experimenting with things. I think this is how Linux works, and a consistent viable design.)

How does the tct command work under the hood?

The windbg command tct executes a program until it reaches a call instruction or a ret instruction. I am wondering how the debugger implements this functionality under the hood.
I could imagine that the debugger scans the instructions from the current instructions for the next call or ret and sets according breakpoints on the found instructions. However, I think this is unlikely because it would also have to take into account jmp instructions so that there are an arbitrary number of possible call or ret instructions where such a breakpoint would have to be set.
On the other hand, I wonder if the x86/x64 CPU provides a functionality that raises an exception to be caught by the debugger whenever the CPU is about to process a call or ret instruction. Yet, I have not heard of such a functionality.
I'd guess that it single-steps repeatedly, until the next instruction is a call or ret, instead of trying to figure out where to set a breakpoint. (Which in the general case could be as hard as solving the Halting Problem.)
It's possible it could optimize that by scanning forward over "straight line" code and setting a breakpoint on the next jmp/jcc/loop or other control-transfer instruction (e.g. xabort), and also catching signals/exceptions that could transfer control to an SEH handler.
I'm also not aware of any HW support for breaking on a certain type of instruction or opcode: the x86 debug registers DR0..7 allow hardware breakpoints at code addresses without rewriting machine code to int3, and also hardware watchpoints (to trap data load/store to a specific address or range of addresses). But not filtering by opcode.

SysTick Interrupt pending but won't execute, debug interrupt mask issue?

I've been trying to get a SysTick interrupt to work on a TM4C123GH6PM7. It's a cortex m4 based microcontroller. When using the Keil Debugger I can see that the Systick interrupt is pending int NVIC but it won't execute the handler. There are no other exceptions enabled and I have cleared the PRIMASK register. The code below is how I initialise the interrupt:
systck_init LDR R0,=NVIC_ST_CTRL_R
LDR R1,=NVIC_ST_RELOAD_R
LDR R2,=NVIC_ST_CURRENT_R
MOV R3,#0
STR R3,[R0]
STR R3,[R2]
MOV R3,#0x000020
STR R3,[R1]
MOV R3,#7
STR R3,[R0]
LDR R3,=NVIC_EN0_R
LDR R4,[R3]
ORR R4,#0x00008000
STR R4,[R3]
CPSIE I
MOV R3,#0x3
MSR CONTROL,R3
After a lot of searching I found that it may be the debugger masking all interrupts. The bit to control this is in a register called the Debug Halting Status and Control Register. Though I can't seem to view it in the debugger nor read/write to it with debug commands.
I used the Startup.s supplied by Keil and as far as I can tell the vectors/labels are correct.
And yes I know. Why bother doing it all in assembly.
Any ideas would be greatly appreciated. First time posting :)
I can see that the Systick interrupt is pending int NVIC
Systick has neither Enable nor Pending register bits in the NVIC. It is special that way, being tightly coupled to the MCU core itself.
Using 0x20 for the reload value is also dangerously low. You may get "stuck" in the Systick Handler, unable to leave it because the next interrupt triggers too early. Remember that Cortex M4 requires at least 12 clocks to enter and exit an interrupt handler - that consumes 24 out of your 32 cycles.
Additional hint: You last instruction changes the register used for the SP from MSP to PSP, but I don't see your code setting up the PSP first.
Be sure to implement the Hardfault_Handler - your code most likely triggers it.

Display call stack below IRQ handler on MSP430 with IAR

I'm trying to find a stack overflow in a project on MSP430, and found that it occurs mainly when an IRQ occurs after the stack is pretty full.
I've set a breakpoint on a stack pointer write with a value that is smaller than the start address of the stack, and the CPU halts in the IRQ handler.
The call stack display in IAR C-SPY then terminates at the handler function, however I'd be interested in what is below this, as this is what filled the stack.
Is there a way to display the call stack below the current interrupt handler?
If the interrupt handler is written in C, this should work correctly, as the generated CFI (call frame information) should be correct even for interrupt functions.
However, if this (for some reason) should not work, or if the interrupt routine is written in assembler (without proper CFI directives), you can use a little trick. You can manually modify the PC and SP registers in the register window by retrieving the PC from the stack and by "backing up" the SP the amount that it was adjusted inside the function. After this, the debugger will display the function that was executing when the interrupt occurred.
Note, in the traditional MSP430 core, the PC is stored as a plain 16 bit value. However, in the MSP430X core the 20 bits are a bit intertwined with the status register, see the architecture manual for details.

Resources