x86 - kernel - programs, cleaning and memory overwrite - memory-management

I am not sure about something. Take linux for example; when a program exits, the kernel is responsible for cleaning after the process.
How can one be sure that physical memory is never overwritten from process A to process B (different virtual memories (page entries) leading to the same physical allocation)?
How is it prevented?

Linux assigns pages to and frees pages from processes using the facilities described here.(Search the kernel sources for more detailed information.)
That means, the kernel saves information about the used pages in some data structure (could be a bitmap, for example) and only the unused ones are exposed as usable to new processes.
That prevents mistakenly assigning pages in use to new process. Any behavior beyond that would be a bug and a magnificent security hole.

Related

AArch64 memory synchronization operations on multiply-mapped addresses

Suppose I have two pages that map to the same physical memory. Would an acquire operation (or fence) on a virtual address in one page properly synchronize with a release operation (or fence) on a virtual address in the other? Secondly, would cache maintenance operations (dc, ic), too, work with such multiply-mapped memory?
In other words...
...would a stlr (or dmb ishst if fence) on one core to one page properly synchronize with ldar (or dmb ishld if fence) on another core to the other page?
...would a dc whatever on one virtual address have the same effect as a dc whatever on the other?
As to memory ordering, yes, this is fine. The ARMv8 memory model is defined in terms of reads and writes of a Location, which is defined as "a byte that is associated with an address in the physical address space". See B2.3.1 in the Architecture Reference Manual, version H.a. (Older versions left out the "physical" part so it seems someone noticed that this was ambiguous.)
Likewise, an exclusive load ldxr says in the manual that it marks the physical address as an exclusive access.
Note that if this weren't the case, then on typical OSes, shared memory between processes (e.g. shmget, mmap(MAP_SHARED), etc) would be unusable, as the shared mappings are normally at different virtual addresses in the different processes.
I can't answer the part about cache right now.

What happens when I change data at shared library?

I have some shared libraries mapped into virtual address space of my task. What happens when I change some data for example in .bss section? I do it using kmap with physical page address as argument. I can suggest 2 ways. Data is changed and it influences at all tasks which use the library or the certain page is copied due to COW.
I think it's neither. The .bss area is set up when the executable is loaded. Virtual memory space is allocated for it at that time, and that space won't be shared with any other task. Pages won't be allocated initially (by default, mlock* can change that); they will be faulted in (i.e. demand-zeroed) as referenced.
I think that even if the process forks before touching the memory, the new process would then just get the equivalent (same virtual memory space marked as demand-zero).
So if you already have a physical address for it, I would think that's already happened and you won't be changing anything except the one page belonging to the current process.

How different in management page table entries (PTE) in kernel space and user space?

In Linux OS, after enable the page table, kernel will only map PTEs belong to kernel space once and never remap them again ? This action is opposite with PTEs in the user space which needs to remap every time process switching happening ?
So, I want know the difference in management of PTEs in kernel and user space.
This question is a extended part from the question at:
Page table in Linux kernel space during boot
Each process has its own page tables (although the parts that describe the kernel's address space are the same and are shared.)
On a process switch, the CPU is told the address of the new table (this is a single pointer which is written to the CR3 register on x86 CPUs).
So, I want know the difference in management of PTEs in kernel and user space.
See these related questions,
Does Linux use self map for page tables?
Linux Virtual memory
Kernel developer on memory management
Position independent code and shared libraries
There are many optimizations to this,
Each task has a different PGD, but PTE values maybe shared between processes, so large chunks of memory can be mapped the same for each process; only the top-level directory (CR3 on x86, TTB on ARM) is updated.
Also, many CPUs have a TLB and cache. These need to be maintained with the memory mapping. Some caches are VIVT, VIPT and PIPT. The first two have to have some cache flushing iff the PGD and/or PTE change. Often a CPU will support a process, thread or domain id. The OS only needs to switch this register during a context switch. The hardware cache and TLB entries must contains tags with the process, thread, or domain id. This is an implementation detail for each architecture.
So it is possible that TLB flushes could be needed when a top level page registers changes. The CPU could flush the entire TLB when this happens. However, this would be a disadvantage to pages that remain mapped.
Also, sub-sections of memory can be the same. A loader or other library can use mmap to create code that is similar between processes. This common code may not need to be swapped at the page table level, depending on architecture, loader and Linux version. It could of course have a virtual alias and then it needs to be swapped.
And the final point to the answer; kernel pages are always mapped. Only a non-preemptive OS could not map the kernel, but that would make little sense as every process wants to call the kernel. I guess the micro-kernel paradigm allows for device drivers to unload when they are not in use. Linux uses module loading to handle this.

Mapping of Page allocated to user process in Kernel virtual address space

When a page is created for a process (which will be mapped into process address space), will that page be mapped into kernel address space ?
If not, then it won't have kernel virtual address. Then how the swapper will find the page and swap that out, if a need arises ?
If we're talking about the x86 or similar (in terms of page translation) architectures, at any given time there's one virtual address space and normally one part of it is reserved for the kernel and the other for user-mode processes.
On a context switch between two processes only the user-mode part of the virtual address space changes.
With such an organization, the kernel always has full access to the current user-mode process, because, again, there's only one current virtual address space at any moment for both the kernel and a user-mode process, it's not two, it's one. So, the kernel doesn't really have to have another, extra mapping for user-mode pages. But that's not the main point.
The main point is that the kernel keeps some sort of statistics for every page that if needed can be saved to the disk and reused elsewhere. The CPU marks each page's page table entry (PTE) as accessed when the page is first read from or written to and as dirty when it's first written to.
The kernel scans the PTEs periodically, reads the accessed and dirty markers to update said statistics and clears accessed and dirty so it can detect a change in them later (of course, if any). Based on this statistics it determines which pages are rarely used or long unused and can be repurposed.
If the "swapper" runs in the context of the current process and if it runs in the kernel, then in theory it has enough information from the kernel (the list of rarely used or long unused pages to save and unmap if dirty or just unmap if not dirty) and sufficient access to the pages of interest.
If the "swapper" itself runs as a user-mode process, things become more complicated because it doesn't have access to another process' pages by default and has to either create a mapping or ask the kernel do some extra work for it in the context of the process of interest.
So, finding rarely used and long unused pages and their addresses occurs in the kernel. The CPU helps by automatically marking PTEs as accessed and dirty. There may need to be an extra mapping to dirty pages if they get saved to the disk not in the context of the process that owns them.

Kernel mode transition

If I understand correctly, a memory adderss in system space is accesible only from kernel mode. Does it mean when components mapped in system space are executed the processor must be swicthed to kernel mode?
For ex: the virtual memory manager is a frequently used component and is mapped in system space. Whenever the VMM runs in the context of user process (lets say it translated an address), does the processor must be swicthed to kernel mode?
Thanks,
Suresh.
Typically, there's 2 parts involved.The MMU(Memory manage unit) which is a hardware component that does the translation from virtual addresses to physical addresses. And the operating system VM subsystem.
The operating system part needs to run in privileged mode (a.k.a. kernel mode) and will set up/change the mapping in the MMU based on the the user space needs.
E.g. to request more (virtual) memory, or map a file into memory, a transition to kernel mode is needed and the VM subsystem can change the mapping of the process.
Around this there's often a ton of tricks to be made - e-g. map the whole address space of the kernel into the user process virtual space, but change its access so the process can't use that memory - this means whenever you transit to kernel mode you don't need to reload the mapping for the kernel.
Taking your example of the virtual memory manager, it never actually runs in user space. To allocate memory, user mode applications make calls to the Win32 API (NTDLL.DLL as one example) to routines such as VirtualAlloc.
With regards to address translation, here's a summary of how it works (based on the content from Windows Internals 5th Edition).
The VMM uses page tables which the CPU uses to translate virtual addresses to physical addresses. The page tables live in the system space. Each table contains many PTEs (page table entries) which stores the physical address to which a virtual address is mapped. I won't go into too much detail here, but the point is that all of the VMM's work is performed in system space and not in user space.
As for context switching - when a thread running in user space needs to run in the system space, then a context switch will occur. Since the memory manager lives in system space, it's threads never need to make a context switch, since it already lives in the system space.
Apologies for the simplistic explanation, this is quite a complicated topic of discussion in depth. I would highly recommend that you pick up a copy of Windows Internals as this sounds like it would come in handy for you.

Resources