Trying to share memory in windows across multiple processes, multple languages

Trying to share memory in windows across multiple processes, multple languages - windows

I have an application where I have 3 different processes that need to run concurrently, in 3 different languages, on Windows:
A "data gathering" process, which interfaces with a sensor array. The developers of the sensor array have been kind enough to provide us with their C# source code, which I can modify. This should be generating raw data and shoving it into shared memory
A "post-processing" process. This is C++ code that uses CUDA to get the processing done as fast as possible. This should be taking raw data, moving it to the GPU, then taking the results from the GPU and communicating it to--
A feedback controller written in Matlab, which takes the results of the post-processing and uses it to make decisions on how to control a mechanical system.
I've done coursework on parallel programming, but that coursework all worked in Linux, where I used the mmap.h for coordination between multiple processes. This makes sense to me--you ask the OS for a page in virtual memory to be mapped to shared physical memory addresses, and the OS gives you some shared memory.
Googling around, it seems like the preferred way to set up shared memory between processes in Windows (in fact, the only "easy" way to do it in Matlab) is to use memory-mapped files, but this seems completely bonkers to me. If I'm understanding correctly, a memory-mapped file grabs some disk space and maps it to the physical address space, which is then mapped into the virtual address space for any process that accesses the same memory-mapped file.
This seems about three times more complex than it needs to be just to get multiple processes to map pages in their virtual address space to the same physical memory. I don't feel like I should be doing anything remotely related to disk I/O for what I'm trying to accomplish, especially since performance is a big issue for me (ideally I should be able to process 1000 sets of data per second, though that's not a hard limit). Is this really the right way to coordinate my processes?

Related

How does MPI_Comm_split_type know what processes on your computer can create shared memory?

I'm looking into the MPI one way communication, specifically shared memory. Before you allocate a section of memory to be shared between processors, you need to split them into groups which are able to share memory. This is done using the function MPI_Comm_split_type.
This is supposed to do everything for you; split your communicator into groups of processors which can share memory, and return the new communicator to you.
My question is how does it know if two processors can share memory? Do I need to have something set up properly on my end in order for it to be able to accurately determine the memory layout of my system?
So far when I've used it it seems to create shared memory the way I would expect. I'm just worried about porting the code to different systems if it will continue to correctly identify the processors

Is it possible to "gracefully" use virtual memory in a program whose regular use would consume all physical RAM?

I am intending to write a program to create huge relational networks out of unstructured data - the exact implementation is irrelevant but imagine a GPT-3-style large language model. Training such a model would require potentially 100+ gigabytes of available random access memory as links get reinforced between new and existing nodes in the graph. Only a small portion of the entire model would likely be loaded at any given time, but potentially any region of memory may be accessed randomly.
I do not have a machine with 512 Gb of physical RAM. However, I do have one with a 512 Gb NVMe SSD that I can dedicate for the purpose. I see two potential options for making this program work without specialized hardware:
I can write my own memory manager that would swap pages between "hot" resident memory and "cold" on the hard disk, probably using memory-mapped files or some similar construct. This would require me coding all memory accesses in the modeling program to use this custom memory manager, and coding the page cache and concurrent access handlers and all of the other low-level stuff that comes along with it, which would take days and very likely introduce bugs. Also performance would likely be poor. Or,
I can configure the operating system to use the entire SSD as a page file / SWAP filesystem, and then just have the program reserve as much virtual memory as it needs - the same as any other normal program, relying on the kernel's memory manager which is already doing the page mapping + swapping + caching for me.
The problem I foresee with #2 is making the operating system understand what I am trying to do in a "cooperative" way. Ideally I would like to hint to the OS that I would only like a specific fraction of resident memory and swap the rest, to keep overall system RAM usage below 90% or so. Otherwise the OS will allocate 99% of physical RAM and then start aggressively compacting and cutting down memory from other background programs, which ends up making the whole system unresponsive. Linux apparently just starts sacrificing entire processes if it gets too bad.
Does there exist a kernel command in any language or operating system that would let me tell the OS to chill out and proactively swap user memory to disk? I have looked through VMM functions in kernel32.dll and the Linux paging and swap daemon (kswapd) documentation, but nothing looks like what I need. Perhaps some way to reserve, say, 1Gb of pages and then "donate" them back to the kernel to make sure they get used for processes that aren't my own? Some way to configure memory pressure or limits or make kswapd work more aggressively for just my process?

Do low-end embedded system have process isolation?

I am studying memory management. In particular, I am studying MMU and the mapping between the process logical space pages and the RAM frames.
My question is: what about low-end embedded systems? If I'm correct, MMU can't be used in this systems due to their smaller memory. So how computers with less memory available can avoid the problem of shared memory between processes?

For embedded systems, the kind of MMU you speak of is only present in high-end microcontrollers like PowerPC or Cortex A.
Low-end to mid-range microcontrollers do often have some simpler form of MMU though. Not as advanced as used to create virtual memory sections, but a simpler kind which allows remapping of RAM, flash, registers and so on. Similarly, they often have various mechanisms for protecting certain parts of the memory from accidental writes. They may or may not be smart enough to do a "MMU-like" realization that code is executing from data memory or when data access happens in code memory. Harvard vs von Neumann architecture also matters here.
As for multiple processes in a RTOS, it can't be compared with multiple processes in a desktop computer. Each process in a RTOS typically got its own stack but that's about it - the MMU isn't involved in that but it's handled by the RTOS. Code in embedded systems is typically executed directly from flash, so it doesn't make sense to assign chunks of RAM memory for executable code like in a PC. Several processes will simply execute code from flash and it might be the same code or different code between processes simply depending on whether they share common code or not.
Similarly, it is senseless to use heap allocation in embedded systems (see Why should I not use dynamic memory allocation in embedded systems?) so we don't need to create a RAM image for that purpose either. The only thing left as unique per process is the stack, as well as separate parts of .data/.bss.

Virtual memory- don't fully-understand why we need it beyond security reasons?

In several books and on websites a reason given for virtual memory management is that it allows only part of a program to be loaded in to RAM and therefore more efficient use of RAM is made.
1) Why do we need virtual memory management to only load part of a program? Why could we not load part of a program using physical addresses?
2) Beyond the security reasons for separating the different parts (stack, heap etc) of a process' memory in to various physical locations, I really don't see what other benefits there are to virtual memory?
3) Why is it important the process thinks the addresses are continuous (courtesy of virtual addresses) when in reality they are discontinuous?
EDIT: I know the obvious reason that virtual memory allows more memory to be treated as if it were RAM.

There are a number of advantages to using virtual memory over strictly physical memory, some of which you've already listed. Basically it allows your programs to just use memory without having to worry about where it comes from or what else might be competing for it. It makes memory appear to be flat and contiguous, even if it's spread out across various sections of physical memory and to disk.
1) Why do we need virtual memory management to only load part of a
program? Why could we not load part of a program using physical
addresses?
You can try that with purely physical addresses, but what if there's not a large enough single block available? With virtual addresses you can bridge sections of physical RAM and make them appear as one large block. You can also move things around in memory without interrupting processes that would be surprised to have that happen.
2) Beyond the security reasons for separating the different parts
(stack, heap etc) of a process' memory in to various physical
locations, I really don't see what other benefits there are to virtual
memory?
It also helps keep memory from getting overly fragmented. Makes it easier to segregate memory in use by one process from memory in use by another.
3) Why is it important the process thinks the addresses are continuous
(courtesy of virtual addresses) when in reality they are
discontinuous?
Try iterating over an array that's split between two discontinuous sections of memory and then come ask that again. Or allocating a buffer for some serial communications, or any number of times that your software expects a single chunk of memory.

1) Why do we need virtual memory management to only load part of a program? Why could we not load part of a program using physical addresses?
Some of us are old enough to remember 32-bit systems with 8MB of memory. Even compressing a small image file would exceed the physical memory of the system.
It's likely that the paging aspect of virtual memory will vanish in the future as system memory and storage merge.
2) Beyond the security reasons for separating the different parts (stack, heap etc) of a process' memory in to various physical locations, I really don't see what other benefits there are to virtual memory?
See #1. The amount of memory required by a program may exceed the physical memory available.
That said, the main security reasons are to separate the various processes and the system address space. Any separation of stack, heap, code, are usually for convenience and error detection.
Advantages then include:
Process memory in excess of physical memory
Separation of processes
Manages access to the kernel (and in some systems other modes as well)
Prevents executing non-executable pages.
Prevents writing to read only pages (code, data).
Easy of programming
3) Why is it important the process thinks the addresses are continuous (courtesy of virtual addresses) when in reality they are discontinuous?
I presume you are referring to virtual addresses. This is simply a matter of convenience. It would make no sense to make them non-contiguous.

1) Why do we need virtual memory management to only load part of a
program? Why could we not load part of a program using physical
addresses?
Well, you obviously are aware that programs' size can range from some KB's to several GBs or even more than that. But, as we have kind of a limitation on our main memory aka RAM (because of cost issues), so the whole program bigger than the size of RAM can't be loaded as whole at once. So, to achieve the desired result scientists (computer scientists) developed a method virtual memory. It'd help to achieve
a) first space equal to size of some portion of hard-disk(not total),but the major part that would easily accomodate parts of running program. Say, if the running program's size exceeds the size of RAM, then the program is kinda cut into segments (not really), only the relevant part which could easily fit into memory is called, and the subsequent codes are called as per address by address in sequence or as per instruction call!
b) less burden on physical memory and thereby enabling other programs in the main memory to keep running. Well there are several more reasons!
2) Beyond the security reasons for separating the different parts
(stack, heap etc) of a process' memory in to various physical
locations, I really don't see what other benefits there are to virtual
memory?
Separation of heap,stack,etc. is for storing several kinds of operations running at times. They all are different data-structures and hence they will be storing possibly different program's values or, even if similar program's values, then also distinct instruction sequence's address! Say, stack would be storing a recursive call's return address (the calling address) whereas the heap would be pointing at current code of execution of the program!
Also, it is not the virtual memory which has this storage scheme but it's actually fit in the main memory. Also,there are several portions of heap also, which performs different functions entirely! Also, I already mentioned benefits of virtual memory -- helps in running several programs simultaneously,optimises caching, addressing using paging, segmentation, etc.
3) Why is it important the process thinks the addresses are continuous
(courtesy of virtual addresses) when in reality they are
discontinuous?
Would it be better in the world if there had been counting like 1,2,3,4,5,etc. which we are familiar or had it started like 1,5,2,4,3, etc. even though knowing the true pattern rejecting the choice to render it discontinuous? Well, at least I'd have chosen the pattern option to perform any task. Similar is the case with physical (main) memory! The physical memory renders the exact address and it clearly fetches the addresses in a dis-continuous manner -- kinda mingled.
But wait, WOW, we have a mechanism like virtual memory which has led to the formation of the actual discontinuous memory locations to a fixed regular/continuous memory location! Virtual memory using paging, segmentation has made the work same, but to make us understand easier. Also,due to relative indexing in paging and due to segmentation -- the address/memory location appears continuous though the actual address is always determined the paging scheme or segment's starting address! Hence, the virtual-memory renders as if we are working with a continuous memory location. Isn't it good/better!

virtual addressing v.s. physical addressing

I do not quite understand the benefit of "multiple independent virtual address, which point to the same physical address", even though I read many books and posts,
E.g.,in a similar question Difference between physical addressing and virtual addressing concept,
The post claims that program will not crash each other, and
"in general, a particular physical page only maps to one application's
virtual space"
Well, in http://tldp.org/LDP/tlk/mm/memory.html, in section "shared virtual memory", it says
"For example there could be several processes in the system running
the bash command shell. Rather than have several copies of bash, one
in each processes virtual address space, it is better to have only one
copy in physical memory and all of the processes running bash share
it."
If one physical address (e.g., shell program) mapped to two independent virtual addresses, how can this not crash? Wouldn't it be the same as using the physical addressing?
what does virtual addressing provide, which is not possible or convenient from physical addressing? If no virtual memory exists, i.e., two directly point to the same physical memory? i think, by using some coordinating mechanism, it can still work. So why bother "virtual addressing, MMU, virtual memory" these stuff?

There are two main uses of this feature.
First, you can share memory between processes, that can communicate via the shared pages. In facts, shared memory is one of the simplest forms of IPC.
But shared readonly pages can also be used to avoid useless duplication: most of times, the code of a program does not change after it has been loaded in memory, so its memory pages can be shared among all the processes that are running that program. Obviously only the code is shared, the memory pages containing the stack, the heap and in general the data (or, if you prefer, the state) of the program are not shared.
This trick is improved with "copy on write". The code of executables usually doesn't change when running, but there are programs that are actually self-modifying (they were quite common in the past, when most of the development was still done in assembly); to support this stuff, the operating system does read-only sharing as explained before, but, if it detects a write on one of the shared pages, it disables the sharing for such page, creating an independent copy of it and letting the program write there.
This trick is particularly useful in situations in which there's a good chance that the data won't change, but it may happen.
Another case in which this technique can be used is when a process forks: instead of copying every memory page (which is completely useless if the child process does immediately an exec) , the new process shares with the parent all its memory pages in copy on write mode, allowing quick process creation, still "faking" the "classic" fork behavior.

If one physical address (e.g., shell program) mapped to two independent virtual addresses
Multiple processes can be built to share a piece of memory; e.g. with one acting as a server that writes to the memory, the other as a client reading from it, or with both reading and writing. This is a very fast way of doing inter-process communication (IPC). (Other solutions, such as pipes and sockets, require copying data to the kernel and then to the other process, which shared memory skips.) But, as with any IPC solution, the programs must coordinate their reads and writes to the shared memory region by some messaging protocol.
Also, the "several processes in the system running the bash command shell" from the example will be sharing the read-only part of their address spaces, which includes the code. They can execute the same in-memory code concurrently, and won't kill each other since they can't modify it.
In the quote
in general, a particular physical page only maps to one application's virtual space
the "in general" part should really be "typically": memory pages are not shared unless you set them up to be, or unless they are read-only.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio