I'm new to the concept of paging and virtual memory.
I think the answer is NO because if a page can be simultaneously in two working sets, it would lead to a process interference with another process...
You are incorrect.
In every logical memory system I am aware of there is a range of shared addresses for the system. They are protected from "interference" by limiting access by processor privilege levels.
In addition, processes can create shared pages in any mode (called global sections, shared memory). These can be mapped to multiple processes. If applications use shared pages, they have to synchronize their access to them or suffer the consequences for not doing so.
Related
I'm looking into the MPI one way communication, specifically shared memory. Before you allocate a section of memory to be shared between processors, you need to split them into groups which are able to share memory. This is done using the function MPI_Comm_split_type.
This is supposed to do everything for you; split your communicator into groups of processors which can share memory, and return the new communicator to you.
My question is how does it know if two processors can share memory? Do I need to have something set up properly on my end in order for it to be able to accurately determine the memory layout of my system?
So far when I've used it it seems to create shared memory the way I would expect. I'm just worried about porting the code to different systems if it will continue to correctly identify the processors
I have studied the cache and how to utilise it effectively for a few years now. I know all about the hierarchy of caches, how a block of cache is fetched according to cache line, how the prefetcher detects memory access patterns and fetches memory in advance according to it and even how caching works on threads and the pitfalls of caching in multi-threaded programs.
What I have never been able to find out after all this time is how caching works on a computer with multiple concurrently running processes. Over the years, I've realised that my programs are just another process being run alongside other processes in the computer. Even if my program is the only program being run, there will still be the OS running in the background.
With that being said, how do the caches work with multiple processes running concurrently? Are they shared between each of the processes or is the cached memory of one process evicted upon a context switch? Perhaps the answer is a bit of a hybrid of both?
There are few scenarios, lets pick one. In this scenario, caches are accessed with physical addresses.
All the multiple processes (P1, P2...Pn) that are executing in parallel, operate on virtual addresses. We can have the TLB (which holds virtual to physical translation) to flush its entries on a context switch. All the processes can have the same number of virtual pages. But at a given time only few of them are refereed by a process. Therefore you can keep these most used pages in physical memory and the rest in the hard disk. This applies for all the processes currently active.
When process P1 is currently running, when data is required to be fetched from memory, the process is similar to how it is done as if there is only one process. One thing to be note here that when a page fault happens for process P1, if the page that is to be replaced in the physical memory belongs to another process, then that process's page table need to be updated to reflect this.
If you examine the context of physical pages, it can have pages from multiple processes. This fine as page tables of each process will know what virtual page is in which physical location.
Most CPUs are designed with caches that cache based on physical address, so they can still be hot after a context switch, even if a TLB invalidation requires a a page walk to find the right physical page for a virtual address.
If a process migrates to another CPU core, private L1 and L2 will be cold, but shared L3 will still be hot.
I am looking for some scheduling options based on data accessed by threads. Is there any way to find the pages of cache accessed by a specific thread.
If i have two threads from two different processes, is it possible to find the common data/pages accessed by both the threads
Two threads from the same process are potentially sharing the whole process memory space. If the programme does not restrict access to certain regions of memory to the threads, it might be difficult to know exactly which thread should be assigned to which cpu.
For more threads, the problem becomes more difficult, as a thread may share different data with several different threads, and create a network of relationship. If a relation between two threads mandates their affinity to a given cpu core, by transitivity all the threads of their relationship network should be bound to that very same core as well.
Perhaps the number of relations, or some form of clustering analysis (biconnectivity) would help.
Regarding your specific question, if two threads are sharing data, but are from different processes, then these processes are necessarily sharing these pages voluntarily by using shm_open (to create a shared memory segment) and mmap (to map that segment in the process memory). It is not possible to share data pages between processes otherwise, excepted implicitely (again) with the copy on write mechanism used by the OS for forked processes, in which case each page remains shared until one process makes a write to it.
The explicit sharing of pages (by shm_open) may be used to programmatically define the same CPU affinity for both threads - perhaps by convention in both programs to associate the relevant threads with the first core, or through a small handshaking protocol established at some point through the shared memory object (for instance the first byte of the memory segment could be set to the chosen cpu number + 1 by the first thread to access it, 0 meaning no affinity yet).
Unfortunately, the posix thread API doesn't provide a way to set cpu affinity for threads. You may use the non portable extension provided on the linux platform pthread_attr_setaffinity_np, with the cpuset family of functions to configure a thread affinity.
references:
cpuset
pthread_attr_setaffinity_np
I do not quite understand the benefit of "multiple independent virtual address, which point to the same physical address", even though I read many books and posts,
E.g.,in a similar question Difference between physical addressing and virtual addressing concept,
The post claims that program will not crash each other, and
"in general, a particular physical page only maps to one application's
virtual space"
Well, in http://tldp.org/LDP/tlk/mm/memory.html, in section "shared virtual memory", it says
"For example there could be several processes in the system running
the bash command shell. Rather than have several copies of bash, one
in each processes virtual address space, it is better to have only one
copy in physical memory and all of the processes running bash share
it."
If one physical address (e.g., shell program) mapped to two independent virtual addresses, how can this not crash? Wouldn't it be the same as using the physical addressing?
what does virtual addressing provide, which is not possible or convenient from physical addressing? If no virtual memory exists, i.e., two directly point to the same physical memory? i think, by using some coordinating mechanism, it can still work. So why bother "virtual addressing, MMU, virtual memory" these stuff?
There are two main uses of this feature.
First, you can share memory between processes, that can communicate via the shared pages. In facts, shared memory is one of the simplest forms of IPC.
But shared readonly pages can also be used to avoid useless duplication: most of times, the code of a program does not change after it has been loaded in memory, so its memory pages can be shared among all the processes that are running that program. Obviously only the code is shared, the memory pages containing the stack, the heap and in general the data (or, if you prefer, the state) of the program are not shared.
This trick is improved with "copy on write". The code of executables usually doesn't change when running, but there are programs that are actually self-modifying (they were quite common in the past, when most of the development was still done in assembly); to support this stuff, the operating system does read-only sharing as explained before, but, if it detects a write on one of the shared pages, it disables the sharing for such page, creating an independent copy of it and letting the program write there.
This trick is particularly useful in situations in which there's a good chance that the data won't change, but it may happen.
Another case in which this technique can be used is when a process forks: instead of copying every memory page (which is completely useless if the child process does immediately an exec) , the new process shares with the parent all its memory pages in copy on write mode, allowing quick process creation, still "faking" the "classic" fork behavior.
If one physical address (e.g., shell program) mapped to two independent virtual addresses
Multiple processes can be built to share a piece of memory; e.g. with one acting as a server that writes to the memory, the other as a client reading from it, or with both reading and writing. This is a very fast way of doing inter-process communication (IPC). (Other solutions, such as pipes and sockets, require copying data to the kernel and then to the other process, which shared memory skips.) But, as with any IPC solution, the programs must coordinate their reads and writes to the shared memory region by some messaging protocol.
Also, the "several processes in the system running the bash command shell" from the example will be sharing the read-only part of their address spaces, which includes the code. They can execute the same in-memory code concurrently, and won't kill each other since they can't modify it.
In the quote
in general, a particular physical page only maps to one application's virtual space
the "in general" part should really be "typically": memory pages are not shared unless you set them up to be, or unless they are read-only.
The function CreateFileMapping can be used to allocate space in the pagefile (if the first argument is INVALID_HANDLE_VALUE). The allocated space can later be memory mapped into the process virtual address space.
Why would I want to do this instead of using just VirtualAlloc?
It seems that both functions do almost the same thing. Memory allocated by VirtualAlloc may at some point be pushed out to the pagefile. Why should I need an API that specifically requests that my pages be allocated there in the first instance? Why should I care where my private pages live?
Is it just a hint to the OS about my expected memory usage patterns? (Ie, the former is a hint to swap out those pages more aggressively.)
Or is it simply a convenience method when working with very large datasets on 32-bit processes? (Ie, I can use CreateFileMapping to make >4Gb allocations, then memory map smaller chunks of the space as needed. Using the pagefile saves me the work of manually managing my own set of files to "swap" to.)
PS. This question is sparked by an article I read recently: http://blogs.technet.com/markrussinovich/archive/2008/11/17/3155406.aspx
From the CreateFileMappingFunction:
A single file mapping object can be shared by multiple processes.
Can the Virtual memory be shared across multiple processes?
One reason is to share memory among different processes. Different processes by only knowing the name of the mapping object can communicate over page file. This is preferable over creating a real file and doing the communications. Of course there may be other use cases. You can refer to Using a File Mapping for IPC at MSDN for more information.