Why does Linux kaslr also randomize physical addresses? - linux-kernel

This is looks strange because after mmu enabled we operate with virtual addresses and don't use physical addresses.

I suppose that it is a hardening of the kernel.
Suppose that an attacker is able to corrupt a PTE.
If the physical location of the kernel is always known, then the attacker can immediately remap the page onto a suitable physical location and get code execution as a privileged user.
I think 'protection from DMA-capable devices' is not a valid answer.
If a malicious DMA-capable device has access to all of the physical memory, e.g. no protection through IOTLB, then the device can scrape memory and immediately find where the kernel is located in physical memory.

Related

Access to physical ram addresses

while studying the principle of the work of cheats and anti-cheats, I became interested in how access to physical (not virtual) RAM addresses, for example, in windows, MmAllocateContiguousMemory is used to read from a physical address (in some examples)
But how does access to physical memory addresses works (I have not found any asm/C examples that does not use nativeApi or WinApi). I suppose that windows takes all control over memory and provides only shells for working with it - but in different OSs and in Windows, access to memory is also somehow implemented, how it works and is it possible to read physical memory without winapi / nativeApi [memtest works somehow))]
When an OS boot up it enables virtual memory. From that point on, every access to memory (address space) goes through the MMU of the CPU.
Controlling the MMU is a privileged operation, only privileged code (read: the OS) can change the MMU configuration.
The MMU configuration controls what physical address a virtual address is mapped to.
When you ask Windows to "read" a range of physical memory, it actually sets up the MMU so that the returned virtual range maps to the given physical memory.
You then read the virtual memory as usual.
Virtual memory is not something along the side of physical memory, it is on top of physical memory.
It's always there.
So a user space program cannot access physical memory without the help of the OS, the CPU will always use virtual memory once enabled.
In fact, this would pose a security risk as it bypasses the security mechanism virtual memory provides on top of physical memory (memory isolation).

Writing to a disk without syscall

I am trying to understand how ring 3 to ring 0 transfer works in operating systems.
I think I understand how a syscall works.
My understanding is that when user mode program wants to make a syscall it will setup the call arguments and send an INT that will transfer over control to the OS which will then read the args, do that work and then return control back to user program. There Are more optimized sys enter arms sys exit variants as well.
All this makes sense to me If the user voluntarily calls the syscall.
However, to gurarantee safety OS cannot assume that callers will use syscall to access resources.
My question is — what happens if user program directly tries to access resource (disk) directly without involving OS.
How does the OS intercept it?
Any piece of I/O hardware, such as the disk controller, will (designer's choice) either respond to an I/O port address or a memory-space address, or possibly both. There is no other way to talk to the hardware. The hardware is sitting out on some bus. Program code must read/write some I/O port or must read/write some "memory" address which is really the device rather than actual RAM.
On x86, since the kernel controls access to both:
I/O ports, by setting or not setting the I/O port permissions, preventing ring 3 access
physical memory-space addresses (by controlling the virtual-to-physical address mapping)
then it can absolutely remove access from user mode.
So there is no instruction that user mode can execute that addresses the device. This is the fundamental aspect of the kernel/user split on any hardware: the kernel can control what user mode can do.
To pick up on a comment by #sawdust - once the kernel has set up the above restrictions, then:
an attempt to issue an I/O port instruction will trap to the kernel because access has not been granted.
access to memory-space device addresses is simply inexpressible; there is no user-space virtual address that equates to the particular physical address required.

Is LXC can be secure enough for IaaS?

I found on Debian Handbook some isolations limits about LXC.
Those limits are about :
Memory isolation
Shared filesystems
Kernel messages
Kernel compromission possibilities
For Memory isolation and filesystems, it does not seem to be a problem because it's possible to configure containers to isolate them. But there is a way to secure the Kernel enough to ensure an untrusted user can't compromise the kernel and can't read message kernel ?
If it's possible, is this restrained user access constraining for an IaaS ? Or is not it better to use real virtualization or para-virtualization to offer IaaS solutions ?
All the Linux containers still run under one kernel. If said kernel is compromised and since that kernel is running in the most privileged hardware mode (ring 0 for x86) it can affect every container running. With traditional hardware virtualization even if one guest kernel is compromised the hypervisor basically exists in another ring of protection (again x86 terminology) to isolate virtual guests. Of course it is possible to compromise the hypervisor assuming there is an error in its implementation, but compromising a virtual machine will not directly affect the other guests.
Also a compromised guest could indirectly affect the other guests via the (virtualized) network, i.e. sending malicious messages, but that is analogous to one machine in a network being compromised and doing the same to another machine, without virtualization. Furthermore, a compromised guest could start to affect the performance of the other machines via micro-architectural elements, e.g. thrashing the cache, or use said micro-architectural elements as a side channel attack to gleam some information about the other virtual machine.

how is page fault triggered in linux kernel

I understand linux kernel implements demand paging - the page is not allocated until it is first access. This is all handled in the page fault handler. But what I don't understand is how is the page fault triggered? More precisely, what triggers the call of the page fault handler? Is it from hardware?
The page fault is caused by the CPU (more specifically, the MMU) whenever the application tries to access a virtual memory address which isn't mapped to a physical address. The page fault handler (in the kernel) then checks whether the page is currently swapped to disk (swaps it back in) or has been reserved but not committed (commits it), then sends control back to the application to retry the memory access instruction. If the application doesn't have that virtual address allocated, on the other hand, then it sends the segfault instruction back to the kernel.
So it's most accurate to say that the hardware triggers the call.
when mapping onwards into memory that does not exist at all.( virtual to physical memory ).In this case, the MMU will say there is no corresponding physical memory and inform operating system which is known as a "page fault". The operating system tells it is a less used virtual memory and pls check it in disc .Then the page that the MMU was trying to find will be reloaded in place table. The memory map will be updated accordingly, then control will be given back user application at the exact point the page fault occured and perform that instruction again, only this time the MMU will output the
correct address to the memory system, and all will continue.
Since page fault triggered by MMU which is part of hardware is responsible for it.

Can I allocate memory pages at a specified physical address in a kernel module?

I am writing a kernel module in a guest operating system that will be run on a virtual machine using KVM. Here I want to allcoate a memory page at a particular physical address. kmalloc() gives me memory but at a physical address chosen by the OS.
Background : I am writing a device emulation technique in qemu that wouldn't exit when the guest communicates with the device (It exits, for example, in I/O mapped as well as port mapped devices). The basic idea is as follows : The guest device driver will write to a specific (guest) physical memory address. A thread in the qemu process will be polling it continuously to check for new data (through some status bits etc.). And will take action accordingly without causing an exit. Since there is no (existing) way by which guest can tell the host what address is being used by the device driver, I want a pre-specified memory page to be allocated for it.
You cannot allocate memory at a specific address, however, you can reserve certain physical addresses on boot time using reserve_bootmem(). Calling reserve_bootmem() early on boot (of course, it requires a modified kernel) will ensure that the reserved memory will not be passed on to the buddy system (i.e. alloc_pages() and higher level friends - kmalloc()), and you will be able to use that memory for any purpose.
It sounds like you should be attacking this from the other side, by having a physical memory range reserved in the memory map that the QEMU BIOS passes to the guest kernel at boot.

Resources