Does SMAP/SMEP block allocation in userland (kmalloc)? - linux-kernel

I am trying to exploit a heap overflow in Linux kernel with all the protections enable i.e. SMAP and SMEP. What I understand is that SMEP doesn't allow me to execute userland code and SMAP doesn't allow me to read or write to userland. But I am wondering: if I replace the free list pointer of some free object with some userland pointer, will I be able to allocate the object in userland?
Also I tried replacing the free list pointer but I am getting weird crashes which I am not able to debug properly. I am not sure are they because of me trying to allocate something in userland.

SMAP / SMEP are mitigations that are implemented at the hardware level. If enabled, the CPU generates an exception on invalid memory accesses that is then caught by the registered kernel exception handler which will either panic, kill the current process or do whatever else is needed. It doesn't matter if the invalid access comes from a free list pointer or from something else. Kernel heap pointers are not some "special" kind of pointers that would allow you to bypass SMAP.
The only way in which something like what you describe can work is if for whatever reason, at the point of read/write to an user space address, SMAP is disabled. For example, if you manage to corrupt a pointer that is then used in an execution path that does: SMAP OFF → R/W → SMAP ON. Thing is, you don't usually find such a usage, because something like this is only done when the kernel knows it needs to handle an user pointer (e.g. copy_{to,from}_user()). Any other instance of temporarily disabling SMAP and then using a pointer should essentially be considered a bug.
So no, you will not be able to allocate a kernel object in user space, unless it's some really buggy kernel module that you're trying to exploit which wrongly disables SMAP when it's not needed.

Related

How to access physical address during interrupt handler linux

I wrote an interrupt handler in linux.
Part of the handler logic I need to access physical address.
I used iormap function but then I fell into KDB during handler time.
I started to debug it and i saw the below code which finally called by ioremap
What should I do? Is there any other way instead of map the region before?
If i will need to map it before it means that i will probably need to map and cache a lot of unused area.
BTW what are the limits for ioremap?
Setting up a new memory mapping is an expensive operation, which typically requires calls to potentially blocking functions (e.g. grabbing locks). So your strategy has two problems:
Calling a blocking function is not possible in your context (there is no kernel thread associated with your interrupt handler, so there is no way for the kernel to resume it if it had to be put to sleep).
Setting up/tearing down a mapping per IRQ would be a bad idea performance-wise (even if we ignore the fact that it can't be done).
Typically, you would setup any mappings you need in a driver's probe() function (or in the module's init() if it's more of a singleton thing). This mapping is then kept in some private device data structure, which is passed as the last argument to some variant of request_irq(), so that the kernel then passes it back as the second argument to the IRQ handler.
Not sure what you mean by "need to map and cache a lot of unused area".
Depending on your particular system, you may end up consuming an entry in your CPU's MMU, or you may just re-use a broader mapping that was setup by whoever wrote the BSP. That's just the cost of doing business on a virtual memory system.
Caching is typically not enabled on I/O memory because of the many side-effects of both reads and writes. For the odd cases when you need it, you have to use ioremap_cached().

Can I call dma_map_single() on DeviceB using an addresses returned from dma_alloc_coherent on DeviceA?

I am writing custom linux driver that needs to DMA memory around between multiple PCIE devices. I have the following situation:
I'm using dma_alloc_coherent to allocate memory for DeviceA
I then use DeviceA to fill the memory buffer.
Everything is fine so far but at this point I would like to DMA the
memory to DeviceB and I'm not sure the proper way of doing it.
For now I am calling dma_map_single for DeviceB using the
address returned from dma_alloc_coherent called on DeviceA. This
seems to work fine in x86_64 but it feels like I'm breaking the rules
because:
dma_map_single is supposed to be called with memory allocated from kmalloc ("and friends"). Is it problem being called with an address returned from another device's dma_alloc_coherent call?
If #1 is "ok", then I'm still not sure if it is necessary to call the dma_sync_* functions which are needed for dma_map_single memory. Since the memory was originally allocated from dma_alloc_coherent, it should be uncached memory so I believe the answer is "dma_sync_* calls are not necessary", but I am not sure.
I'm worried that I'm just getting lucky having this work and a future
kernel update will break me since it is unclear if I'm following the API rules correctly.
My code eventually will have to run on ARM and PPC too, so I need to make sure I'm doing things in a platform independent manner instead of getting by with some x86_64 architecture hack.
I'm using this as a reference:
https://www.kernel.org/doc/html/latest/core-api/dma-api.html
dma_alloc_coherent() acts similarly to __get_free_pages() but as size granularity rather page, so no issue I would guess here.
First call dma_mapping_error() after dma_map_single() for any platform specific issue. dma_sync_*() helpers are used by streaming DMA operation to keep device and CPU in sync. At minimum dma_sync_single_for_cpu() is required as device modified buffers access state need to be sync before CPU use it.

Making a virtual IOPCIDevice with IOKit

I have managed to create a virtual IOPCIDevice which attaches to IOResources and basically does nothing. I'm able to get existing drivers to register and match to it.
However when it comes to IO handling, I have some trouble. IO access by functions (e.g. configRead, ioRead, configWrite, ioWrite) that are described in IOPCIDevice class can be handled by my own code. But drivers that use memory mapping and IODMACommand are the problem.
There seems to be two things that I need to manage: IODeviceMemory(described in the IOPCIDevice) and DMA transfer.
How could I create a IODeviceMemory that ultimately points to memory/RAM, so that when driver tries to communicate to PCI device, it ultimately does nothing or just moves the data to RAM, so my userspace client can handle this data and act as an emulated PCI device?
And then could DMA commands be directed also to my userspace client without interfering to existing drivers' source code that use IODMACommand.
Thanks!
Trapping memory accesses
So in theory, to achieve what you want, you would need to allocate a memory region, set its protection bits to read-only (or possibly neither read nor write if a read in the device you're simulating has side effects), and then trap any writes into your own handler function where you'd then simulate device register writes.
As far as I'm aware, you can do this sort of thing in macOS userspace, using Mach exception handling. You'd need to set things up that page protection fault exceptions from the process you're controlling get sent to a Mach port you control. In that port's message handler, you'd:
check where the access was going to
if it's the device memory, you'd suspend all the threads of the process
switch the thread where the write is coming from to single-step, temporarily allow writes to the memory region
resume the writer thread
trap the single-step message. Your "device memory" now contains the written value.
Perform your "device's" side effects.
Turn off single-step in the writer thread.
Resume all threads.
As I said, I believe this can be done in user space processes. It's not easy, and you can cobble together the Mach calls you need to use from various obscure examples across the web. I got something similar working once, but can't seem to find that code anymore, sorry.
… in the kernel
Now, the other problem is you're trying to do this in the kernel. I'm not aware of any public KPIs that let you do anything like what I've described above. You could start looking for hacks in the following places:
You can quite easily make IOMemoryDescriptors backed by system memory. Don't worry about the IODeviceMemory terminology: these are just IOMemoryDescriptor objects; the IODeviceMemory class is a lie. Trapping accesses is another matter entirely. In principle, you can find out what virtual memory mappings of a particular MD exist using the "reference" flag to the createMappingInTask() function, and then call the redirect() method on the returned IOMemoryMap with a NULL backing memory argument. Unfortunately, this will merely suspend any thread attempting to access the mapping. You don't get a callback when this happens.
You could dig into the guts of the Mach VM memory subsystem, which mostly lives in the osfmk/vm/ directory of the xnu source. Perhaps there's a way to set custom fault handlers for a VM region there. You're probably going to have to get dirty with private kernel APIs though.
Why?
Finally, why are you trying to do this? Take a step back: What is it you're ultimately trying to do with this? It doesn't seem like simulating a PCI device in this way is an end to itself, so is this really the only way to do what greater goal you're ultimately trying to achieve? See: XY problem

What happens when I printk a char * that was initialized in userspace?

I implemented a new system call as an intro exercise. All it does is take in a buffer and printk that buffer. I later learned that the correct practice would be to use copy_from_user.
Is this just a precautionary measure to validate the address, or is my system call causing some error (page fault?) that I cannot see?
If it is just a precautionary measure, what is it protecting against?
Thanks!
There are several reasons.
Some architectures employ segmented memory, where there is a separate segment for the user memory. In that case, copy_from_user is essential to actually get the right memory address.
The kernel has access to everything, including (almost by definition) a lot of privileged information. Not using copy_from_user could allow information disclosure if a user passes in a kernel address. Worse, if you are writing to a user-supplied buffer without copy_to_user, the user could overwrite kernel memory.
You'd like to prevent the user from crashing the kernel module just by passing in a bad pointer; using copy_from_user protects against faults so e.g. a system call handler can return EFAULT in response to a bad user pointer.

Change user space memory protection flags from kernel module

I am writing a kernel module that has access to a particular process's memory. I have done an anonymous mapping on some of the user space memory with do_mmap():
#define MAP_FLAGS (MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS)
prot = PROT_WRITE;
retval = do_mmap(NULL, vaddr, vsize, prot, MAP_FLAGS, 0);
vaddr and vsize are set earlier, and the call succeeds. After I write to that memory block from the kernel module (via copy_to_user), I want to remove the PROT_WRITE permission on it (like I would with mprotect in normal user space). I can't seem to find a function that will allow this.
I attempted unmapping the region and remapping it with the correct protections, but that zeroes out the memory block, erasing all the data I just wrote; setting MAP_UNINITIALIZED might fix that, but, from the man pages:
MAP_UNINITIALIZED (since Linux 2.6.33)
Don't clear anonymous pages. This flag is intended to improve performance on embedded
devices. This flag is only honored if the kernel was configured with the
CONFIG_MMAP_ALLOW_UNINITIALIZED option. Because of the security implications, that option
is normally enabled only on embedded devices (i.e., devices where one has complete
control of the contents of user memory).
so, while that might do what I want, it wouldn't be very portable. Is there a standard way to accomplish what I've suggested?
After some more research, I found a function called get_user_pages() (best documentation I've found is here) that returns a list of pages from userspace at a given address that can be mapped to kernel space with kmap() and written to that way (in my case, using kernel_read()). This can be used as a replacement for copy_to_user() because it allows forcing write permissions on the pages retrieved. The only drawback is that you have to write page by page, instead of all in one go, but it does solve the problem I described in my question.
In userspace there is a system call mprotect that can modify the protection flags on existing mapping. You probably need to follow from the implementation of that system call, or maybe simply call it directly from your code. See mm/protect.c.

Resources