Is it possible to allow a particular user-level application to run in kernel-mode? - linux-kernel

This is a hypothetical question. Suppose there is an application (which typically executes in user mode) that wants to access kernel data structures, read register values, and perform some kernel-level functions.
Is there a way for kernel and/or CPU to allow this application to perform its functions while maintaining the normal user-level/kernel-level isolation for other applications except this one?

In order to either put your app in kernel space (kernel memory) or to run it in ring 0 CPU mode, you will need to do that from kernel code. In normal state of operation you can't run app from the kernel with mentioned privileges (at least there is no existing API to do that). It's probably possible to implement some kernel code which is able of this. But it will be tricky and will mess up the whole concept of kernel-space/user-space separation, and if any advanced user-space API was used -- it won't work anyway.
If you are thinking about just giving your app ring 0 privileges -- it won't work either, because kernel has its own stack and because of kernel-space/user-space memory separation, so you won't be able to run internal kernel API.
Basically, you can achieve the same thing by writing kernel module instead. And for running some kernel code on behalf of user-space app -- you can use system calls interface.
So, answering your question: no, it's not possible to run user-space app in kernel mode so it can use internal kernel API.

Related

is there a workqueue feature in xnu kernel?

I need to use workqueue-like feature on Mac OSX (kernel mode driver) and am looking for a way to add work into a queue to be processed by a kernel thread later. Conceptually this is the same thing as workqueue feature available in Linux kernel. Is there something similar on XNU kernel as well?
I don't think there's a direct equivalent as such, although I admit I'm not intimately familiar with the Linux side, so I'll avoid comparing and just tell you about what's available on macOS/xnu.
I/O Kit IOWorkLoops
If you're building an I/O Kit driver, and especially if you're writing a secondary interrupt handler, you'll be using IOWorkLoops. Interrupts are abstracted by IOEventSource objects, which schedule secondary interrupt handlers to run on the driver's IOWorkLoop.
Each IOWorkLoop wraps one kernel thread and also provides a serialisation/locking mechanism for resources shared with that thread. All jobs submitted to a workloop either explicitly through an IOCommandGate or the workloop object directly, or as a result of an IOEventSource event will be serialised. Note that IOCommandGate jobs will run synchronously on the calling thread, not the workloop thread.
As always with macOS/OSX internals, you will want to look at the header file comments and possibly the implementation in the xnu source for details. I personally find IOWorkLoops a bit clumsy for some tasks, but if you're dealing with PCI devices, etc. you don't really have a choice.
thread_call
A more lightweight background work mechanism is the thread_call API. It's defined in <kern/thread_call.h> and supports running functions on an OS-managed background thread, optionally after a delay or with a specific priority. This is probably closer to what you know from Linux, has a fairly straightforward API, but is not suitable for secondary interrupt handlers.

What are methods used to identify the system call and pass function tasks to the operating system?

Is this based on context switching that schedules processes on the cpu? Im a bit lost with understand how this works
system call is not context switch based. context switch is exchange of process running on cpu. which call is going to be used is decided on the system call number which is used as index in system call table. only process context is changed from user to kernel. I always suggest for reading understanding linux kernel an excellent book

Accessing Platform Device from Userpace

From a general standpoint, I am trying to figure out how to access a platform device from userspace. To be more specific, I have a EMIF controller on and SoC of which I have added to my device tree and I believe it is correctly bound to a pre-written EMIF platform device driver. Now I am trying to figure out how I can access this EMIF device from a userspace application. I have come accross a couple different topics that seem to have some connection to this issue but I cannot quite find out how they relate.
1) As I read it seems like most I/O is done through the use of device nodes which are created by mknod(), do I need to create a device node in order to access this device?
2) I have read a couple threads that talk about writting a Kernel module (Character?, Block?) that can interface with both userspace and the platform device driver, and use it as an intermediary.
3) I have read about the possibility of using mmap() to map the memory of my platform device into my virtual memory space. Is this possible?
4) It seems that when the EMIF driver is instantiated, it calls the probe() fucntion. What functions would a userpace application call in the driver?
It's not completely clear what you're needing to do (and I should caveat that I have no experience with EMIF or with "platform devices" specifically), but here's some overview to help you get started:
Yes, the usual way of providing access to a device is via a device node. Usually this access is provided by a character device driver unless there's some more specific way of providing it. Most of the time if an application is talking "directly" to your driver, it's a character device. Most other types of devices are used in interfacing with other kernel subsystems: for example, a block device is typically used to provide access from a file system driver (say) to an underlying disk drive; a network driver provides access to the network from the in-kernel TCP/IP stack, etc.
There are several char device methods or entry points that can be supported by your driver, but the most common are "read" (i.e. if a user-space program opens your device and does a read(2) from it), "write" (analogous for write(2)) and "ioctl" (often used for configuration/administrative tasks that don't fall naturally into either a read or write). Note that mknod(2) only creates the user-space side of the device. There needs to be a corresponding device driver in the kernel (the "major device number" given in the mknod call links the user-space node with the driver).
For actually creating the device node in the file system, this can be automated (i.e. the node will automatically show up in /dev) if you call the right kernel functions while setting up your device. There's a special daemon that gets notifications from the kernel and responds by executing the mknod(2) system call.
A kernel module is merely a dynamically loadable way of creating a driver or other kernel extension. It can create a character, block or network device (et al.), but then so can a statically linked module. There are some differences in capability mostly because not all kernel functions you might want to use are "exported" to (i.e. visible to) dynamically loaded modules.
It's possible to support mapping of the device memory into user virtual memory space. This would be implemented by yet another driver entry point (mmap). See struct file_operations for all the entry points a char driver can support.
This is pretty much up to you: it depends on what the application needs to be able to do. There are many drivers in the kernel that provide no direct function to user-space, only to other kernel code. As to "probe", there are many probe functions defined in various interfaces. In most cases, these are called by the kernel (or perhaps by a 'higher level "class" driver') to allow the specific driver to discover, identify and "claim" individual devices. They (probe functions) don't usually have anything directly to do with providing access from user-space but I might well be missing something in a particular interface.
You need to create a device node in order to access the device.
The probe function is called when the driver finds a matching device.
For information on platform device API, the following articles could be useful.
The platform device API
Platform devices and device trees

Windows kernel ReadProcessMemory() / WriteProcessMemory()?

It's simple and straightforward in user mode because of those APIs.
How do you read/write specified process's userspace memory from a windows kernel module?
driver target platform is windows xp/2003
Use NtWriteVirtualMemory / NtReadVirtualMemory to write to other processes - you will need to open a handle to the process first.
Note that if you're already in the process, you can just write directly - for example if you're responding to a DeviceIoControl request from a process you can write directly to user-mode addresses and they will be in the address space of the process that called you.
I'm also starting in the world of windows drivers and from what I've read XxxProcessMemory calls NtXxxVirtualMemory in ntdll (R3-UserMode).
The NtXxxVirtualMemory calls the ZwXxxVirtualMemory (R0-KernelMode) also in the ntdll.
I believe you should use the ZwXxxVirtualMemory.
In the krnel, ZwXxx routines are just wrappers around NtXxx ones, telling the kernel that the caller is a kernel mode component, rather than an user application. When a call comes from usermode, the kernel performs additional security checks.
So, use ZwXxx when in the kernel.
An alternative approach for reading/writing memory from/to another process is:
obtain address of its process object (PsLookupProcessByProcessId),
switch the current thread to its address space (KeStackAttachProcess),
perform the operation (read/write...),
switch the address space back (KeUnstackDetachProcess),
decrement reference count incremented by (1) (ObDereferenceObject).

Windows processes in kernel vs system

I have a few questions related to Windows processes in kernel and usermode.
If I have a hello world application, and a hello world driver that exposes a new system call, foo(), I am curious about what I can and can't do once I am in kernel mode.
For starters, when I write my new hello world app, I am given a new process, which means I have my own user mode VM space (lets keep it simple, 32 bit windows). So I have 2GB of space that I "own", I can poke and peek until my hearts content. However, I am bound by my process. I can't (lets not bring shared memory into this yet) touch anyone elses memory.
If, I write this hello world driver, and call it from my user app, I (the driver code) is now in kernel mode.
First clarification/questions:
I am STILL in the same process as the user mode app, correct? Still have the same PID?
Memory Questions:
Memory is presented to my process as VM, that is even if I have 1GB of RAM, I can still access 4GB of memory (2GB user / 2GB of kernel - not minding details of switches on servers, or specifics, just a general assumption here).
As a user process, I cannot peek at any kernel mode memory address, but I can do whatever I want to the user space, correct?
If I call into my hello world driver, from the driver code, do I still have the same view of the usermode memory? But now I also have access to any memory in kernel mode?
Is this kernel mode memory SHARED (unlike User mode, which is my own processes copy)? That is, writing a driver is more like writing a threaded application for a single process that is the OS (scheduling aside?)
Next question. As a driver, could I change the process that I am running. Say, I knew another app (say, a usermode webserver), and load the VM for that process, change it's instruction pointer, stack, or even load different code into the process, and then switch back to my own app? (I am not trying to do anything nefarious here, I am just curious what it really means to be in kernel mode)?
Also, once in kernel mode, can I prevent the OS from preempting me? I think (in Windows) you can set your IRQL level to do this, but I don't fully understand this, even after reading Solomons book (Inside Windows...). I will ask another question, directly related to IRQL/DPCs but, for now, I would love to know if a kernel driver has the power to set an IRQL to High and take over the system.
More to come, but answers to these questions would help.
Each process has a "context" that, among other things, contains the VM mappings specific to that process (<2 GB normally in 32bit mode). When thread executing in user mode enteres kernel mode (e.g. from a system call or IO request), the same thread is still executing, in the process, with the same context. PsGetCurrentProcessId will return the same thing at this point as GetCurrentProcessID would have just before in user mode (same with thread IDs).
The user memory mappings that came with the context are still in place upon entering kernel mode: you can access user memory from kernel mode directly. There are special things that need to be done for this to be safe though: Using Neither Buffered Nor Direct I/O. In particular, an invalid address access attempt in the user space range will raise a SEH exception that needs to be caught, and the contents of user memory can change at any time due to the action of another thread in that process. Accessing an invalid address in the kernel address range causes a bugcheck. A thread executing in user mode cannot access any kernel memory.
Kernel address space is not part of a process's context, so is mapped the same between all of them. However, any number of threads may be active in kernel mode at any one time, so it is not like a single threaded application. In general, threads service their own system calls upon entering kernel mode (as opposed to having dedicated kernel worker threads to handle all requests).
The underlying structures that save thread and process state is all available in kernel mode. Mapping the VM of another process is best done ahead of time from the other process by creating an MDL from that process and mapping it into system address space. If you just want to alter the context of another thread, this can be done entirely from user mode. Note that a thread must be suspended to change its context without having a race condition. Loading a module into a process from kernel mode is ill advised; all of the loader APIs are designed for use from user mode only.
Each CPU has a current IRQL that it is running at. It determines what things can interrupt what the CPU is currently doing. Only an event from a higher IRQL can preempt the CPU's current activity.
PASSIVE_LEVEL is where all user code and most kernel code executes. Many kernel APIs require the IRQL to be PASSIVE_LEVEL
APC_LEVEL is used for kernel APCs
DISPATCH_LEVEL is for scheduler events (known as the dispatcher in NT terminology). Running at this level will prevent you from being preempted by the scheduler. Note that it is not safe to have any kind of page fault at this level; there would be a deadlock possibility with the memory manager trying to retrieve pages. The kernel will bugcheck immediately if it has a page fault at DISPATCH_LEVEL or higher. This means that you can't safely access paged pool, paged code segments or any user memory that hasn't been locked (i.e. by an MDL).
Above this are levels connected to hardware device interrupt levels, known as DIRQL.
The highest level is HIGH_LEVEL. Nothing can preempt this level. It's used by the kernel during a bugcheck to halt the system.
I recommend reading Scheduling, Thread Context, and IRQL
A good primer for this topic would be found at: http://www.codinghorror.com/blog/archives/001029.html
As Jeff points out for the user mode memory space:
"In User mode, the executing code has no ability to directly access hardware or reference memory. Code running in user mode must delegate to system APIs to access hardware or memory. Due to the protection afforded by this sort of isolation, crashes in user mode are always recoverable. Most of the code running on your computer will execute in user mode."
So your app will have no access to the Kernel Mode memory, infact your communication with the driver is probably through IOCTLs (i.e. IRPs).
The kernel however has access to everything, including to mappings for your user mode processes. This is a one way street, user mode cannot map into kernel mode for security and stability reasons. Even through kernel mode drivers can map into user mode memory I would advise against it.
At least that's the way it was back before WDF. I am not sure of the capabilities of memory mapping with user mode drivers.
See also: http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fdownload.microsoft.com%2Fdownload%2Fe%2Fb%2Fa%2Feba1050f-a31d-436b-9281-92cdfeae4b45%2FKM-UMGuide.doc&ei=eAygSvfuAt7gnQe01P3gDQ&rct=j&q=user+mode+mapping+into+kernel+mode&usg=AFQjCNG1QYQMcIpcokMoQSWJlGSEodaBHQ

Resources