For a project with specific hardware integration, we need to modify the linux kernel's page_fault handler and I wondered if the following is possible:
1) during the do_page_fault, can we know which thread generated that fault (thread and process). the platform is ARM so arm-specific interrupt registers can be used if helpful
2) can we access the user-space memory of that process and read some information that our user-mode library left us prior to that? (assuming it's already probed and locked in memory)
Further explanation in the comment, if one desires.
# 1): If the page fault happened while accessing memory from the userspace application (which is likely), then the page fault handler runs in the context of that process. From CPU point of view it enters kernel mode because of exception from MMU. So yes, you can get pid/tid of the userspace process that was interrupted..
# 2): Yes. Kernel can access all the memory. If it's a 32bit system you need Highmem support, if it's a 64bit then you get it out of the box.
Related
I'm trying to understand context switching in OSs in general. I have a couple of questions that I could not find the answers to.
I would really appreciate any insight on these.
Do context switches happen mid instructions? If not, is it true for multi-step instructions (x86) like INC, XADD?
On which processor is the code responsible for context switching is run? If it is run on an arbitrary processor, that could modify the registers on that processor, right? So how does the OS manage to save that particular processor's state?
First of all, please do not restrict OS to Windows :D
Do context switches happen mid instructions? If not, is it true for multi-step instructions (x86) like INC, XADD?
In the software context switching, context switch will happen on a specific interrupt (A hardware timer, or an internal CPU tick counter timer). All of the CPU's architectures (AFAIK) have a register or flag to notify the "Fetch Unit" that there is a pending interrupt. Then, the CPU starts executing ISR by setting the PC register. Pay attention context switch will be done on an ISR. So, According to the interrupt mechanism, occurring an interrupt during executing an instruction, does not have any conflict. This way the current instruction will execute completely, But the "Fetch Unit" will load the first ISR instruction (After the hardware stack frame operation, in most of the architectures).
Although, some of the recent CPUs architecture have a Hardware Context Switching mechanism. In this way, All of the context switching processes will be done and handled by the CPU's hardware. To trigger a context switch and tell the CPU where to load its new state from, the far version of CALL and JMP instructions are used in the Intel CPUs architecture.
On which processor is the code responsible for context switching is run? If it is run on an arbitrary processor, that could modify the registers on that processor, right? So how does the OS manage to save that particular processor's state?
Each processor has its own context switch. In this way, Each processor has a particular scheduler in the kernel and OS (by observing the load balance on processors) will assign each task to one of the processors (at least in Linux).
I'm working on a custom Linux kernel for a RISC-V architecture. I am debugging using GDB/QEMU now that those tools are available. As I am debugging I notice that I am not able to access memory at addresses that are virtualized. That is once memory gets transitioned from physical to virtual addressing in the kernel, I can't access those memory locations any longer in gdb. For example, the kernel shows up like this in QEMU's info mem command.
paddr: 0x80200000 --> vaddr: 0xffffffff80000000
I think this question/problem is more an issue with QEMU or maybe my understanding of how to access it in QEMU correctly. As it stands, single stepping to this point in my kernel where virtual memory starts being used is fine but single stepping beyond this causes QEMU to effectively stop--it gives the same instruction each step. However, if I continue it boots in QEMU. How can I debug this via single stepping? Is there something I need to switch in GDB/QEMU?
I did try to access an address 0xffffffff8000007c for example and I could get that successfully, QEMU just doesn't transition to virtual memory when I single step past that point.
I'm experiencing a similar problem and have formed the following hypothesis:
I think the kernel is switching to a reduced page table when in idle,
one that does not map loaded module memory. An asynchronous break by GDB has a high likelihood of interrupting the CPU while in idle, of course.
Single stepping out idle (e.g., after hitting a key in the Linux console) and re-attempting setting a breakpoint on the loaded module succeeds at some point.
A viable strategy is probably to break on the conclusion of the module loading code and to set relevant breakpoints at that point.
I know about the system calls that OS provides to protect programs from accessing other programs memory. But that can only help if I have used the system call library provided by OS. What if I write a assembly code myself that sets CPU bit for kernel mode and executes a privileged instruction ( let's say modify OS' program segment in memory ). Can OS protect against that ?
P.S. Out of curiosity question. If any good blog or book reference can be provided, that would be helpful as I want to study OS in as much detail as possible.
The processor protects again such malicious mischief by (1) requiring you to be in an elevated mode (for our example here, KERNEL); and (2) limiting access to kernel mode.
In order to enter kernel mode from user mode there either has to be an interrupt (not applicable here) or an exception. Usually both are handled the same way but there are some bizarre processors (Did anyone say Intel?) that do things a bit differently
The operating system exception and interrupt handlers must limits what the user mode program can do.
What if I write a assembly code myself that sets CPU bit for kernel mode and executes a privileged instruction
You cant just set the kernel mode bit in the processor status register to enter kernel mode.
Can OS protect against that ?
The CPU protects against that.
If any good blog or book reference can be provided, that would be helpful as I want to study OS in as much detail as possible.
The VAX/VMS Systems Internals book is old but it is cheap and shows how a real OS has been implemented.
This blog clearly explains what my confusion was.
http://minnie.tuhs.org/CompArch/Lectures/week05.html
Even though user programs can switch to kernel mode, but they have to do it through a interrupt instruction ( int in case x86) and for this interrupt, the interrupt handler is written by the OS. ( probably when it was in kernel mode at bootup time). So this way all priviliged instructions can only be executed by the OS code only.
From what I understand, a thread that executes in user mode can eventually enter code that switches to kernel mode (using sysenter). But how can a thread that's been emanating from user code execute kernel code?
Eg: I'm calling CreateFile(), it then delegates to NtCreateFile(), which in turn calls ZwCreateFile(), than ZiFastSystemCall()... than sysenter... profit, kernel access?
Edit
This question:
How does Windows protect transition into kernel mode has an answer that helped me understand, see quote:
"The user mode thread is causing an exception that's caught by the Ring 0 code. The user mode thread is halted and the CPU switches to a kernel/ring 0 thread, which can then inspect the context (e.g., call stack & registers) of the user mode thread to figure out what to do." Also see this blog post, very informative: http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection
The short answer is that it can't.
What happens is that when you create a user-mode thread, the kernel creates a matching kernel mode thread. When "your" thread needs to execute some code in kernel mode, it's actually executed in the matching kernel mode thread.
Disclaimer: The last time I really looked closely at this was probably with Win2K or maybe even NT4 -- but I doubt much has changed in this respect.
I need a trace of memory accesses of a process for simulation.
For this purpose i need a tool that captures and logs all memory accesses of a process.
is there any program to do so? if there is not, can I do that with modifications in linux kernel?