I'm on a Windows platform, and I've been researching into how the OS works, what C# C++ VB C can do, but I kept running into this "insulated space" problem across multiple programs.
Is it possible, however hard or difficult, to start two programs, in their insulated space?
For example:
A.exe
Start of program, in physical RAM: 0 x 0000 0000 0000 00FF
B.exe
Start of program, in physical RAM: 0 x 0000 0000 00FF 0000
These programs have their own variable "foo" in the address "0x0F". For obvious boundary reasons, 0x0F is a local address, and it means nothing between the two programs. Each of the two programs resolves the 0x0F differently, to two different values.
Is it possible to, somehow, share this memory, physically, just by making that local variable in the two separate programs point to the same physical address?
I'm aware that local variables are handled normally as local, and are resolved, by adding the program's start offset to the variable, to get the physical location of that variable.
I've shoddily drawn an image to represent my point:
In short, how do I map local memory in two separate programs, to point to the absolute same location physically? Or at the very least, how do I get the programs to be aware of physical addresses (such as where they are, where their variables are physically, so they can see one another in physical RAM and modify one another) so that I can emulate this behavior?
Note: This has to be done by mapping, otherwise I would just be copying the memory through serialization techniques.
Yes I'm aware this would violate a lot of OS safeties and design specifications. Yes I'm aware this could pose huge security risks. Yes I'm aware this can also affect stability of the system. Yes I'm aware the OS might move memory around due to pagefile and would force the programs to re-negotiate their addressing. I still wish to achieve this effect regardless. Even if I have to write my own driver(s). I just have no idea how to start this project.
PS: Something like a raw pointer to actual physical memory in the RAM between the two programs. But it has to be allocated into a DLL or 3rd program, or into one of the two programs, so the OS doesn't just over-write it. I need to access this data directly, without any proxies i.e. the OS, because I need performance speed!
PPS: I was thinking marshalling, invoke, DLL's, and things of that nature may help, but they're all in their insulated space, due to the nature of Window's memory management. I have no idea if Linux is the same in this regard. I'd love to jump ship if Linux actually can easily support the above idea.
PPPS: Sort of like how SQL works, except I don't want the delay between writing a query, sending it to the server, waiting for the server to pass it back serially, then deserializing the data, and using it.
PPSPSP: This sort of behavior exists in embedded machines, as all the software runs in the same shared physical space, they can see one another and all the data. I wish to emulate this behavior in an operating system, without performance losses (i.e. serializing data between app domains), and without creating specialized software to run my programs inside of (an emulator).
PPPSSSPP: I want program A and program B to access the same physical memory in RAM, as a form of IPC. I need the actual physical access, not an imitation of it. Pretend it's a 1920x1080 image, that I need to iterate through and modify with two programs at the same time, where the image would be a 2D array of struct Pixel(int A,R,G,B). I have too many uses and needs for this technique to start listing all of them.
Related
On Linux x86_64, I have a simple application for which I track all memory accesses with Intel's PIN. The program uses only "a bit" of memory, most of it for dynamically allocated matrices (I've bisected the right value with ulimit). However, the memory accesses span the whole range of the VM address space, low addresses for what I presume global variables in the code, high addresses for the malloc()ed arrays.
There's a huge gap in the middle, and even in the high addresses the range is between 0x7fff4e1a37f4 and 0x7fea3af99000, which is much larger than what I would assume my application to use in total.
The post-processing that I need to do on the memory accesses deals very badly with these sparse accesses, so I'm looking for a way to restrict the virtual address range available to the process so that "it just fits", and accesses will show addresses between 0 and some more reasonable value for dynamically allocated memory (somewhere around the 40 Mb that I've discovered through ulimit).
Q: Is there an easy way to limit the available address space (and hence implicitly, available memory) to an individual process on Linux, ideally from the command line on a per-process basis?
Further notes:
I can link my application statically.
Even if I limit the memory with ulimit, the process still uses the full VM address range (not entirely unexpected).
I know about /proc/${pid}/maps, but would like to avoid creating wrappers to deal with this, and how to actually use the data in there.
I've heard about prelink (which may not apply to my static binary, but only libraries?) and can image that there are more intrusive ways to interfere with malloc(), but these solutions are too far out of my expertise to evaluate their usefulness (Limiting the heap area's Virtual address range, https://stackoverflow.com/a/29960208/60462)
If there's no simple command line-solution, instead of going for any elaborate hack, I'll probably just wing it in the post-processing and "normalize" the addresses e.g. via a few lines of perl).
ld.so(8) lists LD_PREFER_MAP_32BIT_EXEC as a way to restrict the address space to the lower 2GiB which is significantly less than the normal 64-bit address space, but possibly not small enough for your purposes.
It may also be possible to use the prctl(2) PR_SET_MM options to control the addresses of different parts of the program to solve your problem.
I have an application where I have 3 different processes that need to run concurrently, in 3 different languages, on Windows:
A "data gathering" process, which interfaces with a sensor array. The developers of the sensor array have been kind enough to provide us with their C# source code, which I can modify. This should be generating raw data and shoving it into shared memory
A "post-processing" process. This is C++ code that uses CUDA to get the processing done as fast as possible. This should be taking raw data, moving it to the GPU, then taking the results from the GPU and communicating it to--
A feedback controller written in Matlab, which takes the results of the post-processing and uses it to make decisions on how to control a mechanical system.
I've done coursework on parallel programming, but that coursework all worked in Linux, where I used the mmap.h for coordination between multiple processes. This makes sense to me--you ask the OS for a page in virtual memory to be mapped to shared physical memory addresses, and the OS gives you some shared memory.
Googling around, it seems like the preferred way to set up shared memory between processes in Windows (in fact, the only "easy" way to do it in Matlab) is to use memory-mapped files, but this seems completely bonkers to me. If I'm understanding correctly, a memory-mapped file grabs some disk space and maps it to the physical address space, which is then mapped into the virtual address space for any process that accesses the same memory-mapped file.
This seems about three times more complex than it needs to be just to get multiple processes to map pages in their virtual address space to the same physical memory. I don't feel like I should be doing anything remotely related to disk I/O for what I'm trying to accomplish, especially since performance is a big issue for me (ideally I should be able to process 1000 sets of data per second, though that's not a hard limit). Is this really the right way to coordinate my processes?
I do not quite understand the benefit of "multiple independent virtual address, which point to the same physical address", even though I read many books and posts,
E.g.,in a similar question Difference between physical addressing and virtual addressing concept,
The post claims that program will not crash each other, and
"in general, a particular physical page only maps to one application's
virtual space"
Well, in http://tldp.org/LDP/tlk/mm/memory.html, in section "shared virtual memory", it says
"For example there could be several processes in the system running
the bash command shell. Rather than have several copies of bash, one
in each processes virtual address space, it is better to have only one
copy in physical memory and all of the processes running bash share
it."
If one physical address (e.g., shell program) mapped to two independent virtual addresses, how can this not crash? Wouldn't it be the same as using the physical addressing?
what does virtual addressing provide, which is not possible or convenient from physical addressing? If no virtual memory exists, i.e., two directly point to the same physical memory? i think, by using some coordinating mechanism, it can still work. So why bother "virtual addressing, MMU, virtual memory" these stuff?
There are two main uses of this feature.
First, you can share memory between processes, that can communicate via the shared pages. In facts, shared memory is one of the simplest forms of IPC.
But shared readonly pages can also be used to avoid useless duplication: most of times, the code of a program does not change after it has been loaded in memory, so its memory pages can be shared among all the processes that are running that program. Obviously only the code is shared, the memory pages containing the stack, the heap and in general the data (or, if you prefer, the state) of the program are not shared.
This trick is improved with "copy on write". The code of executables usually doesn't change when running, but there are programs that are actually self-modifying (they were quite common in the past, when most of the development was still done in assembly); to support this stuff, the operating system does read-only sharing as explained before, but, if it detects a write on one of the shared pages, it disables the sharing for such page, creating an independent copy of it and letting the program write there.
This trick is particularly useful in situations in which there's a good chance that the data won't change, but it may happen.
Another case in which this technique can be used is when a process forks: instead of copying every memory page (which is completely useless if the child process does immediately an exec) , the new process shares with the parent all its memory pages in copy on write mode, allowing quick process creation, still "faking" the "classic" fork behavior.
If one physical address (e.g., shell program) mapped to two independent virtual addresses
Multiple processes can be built to share a piece of memory; e.g. with one acting as a server that writes to the memory, the other as a client reading from it, or with both reading and writing. This is a very fast way of doing inter-process communication (IPC). (Other solutions, such as pipes and sockets, require copying data to the kernel and then to the other process, which shared memory skips.) But, as with any IPC solution, the programs must coordinate their reads and writes to the shared memory region by some messaging protocol.
Also, the "several processes in the system running the bash command shell" from the example will be sharing the read-only part of their address spaces, which includes the code. They can execute the same in-memory code concurrently, and won't kill each other since they can't modify it.
In the quote
in general, a particular physical page only maps to one application's virtual space
the "in general" part should really be "typically": memory pages are not shared unless you set them up to be, or unless they are read-only.
I really do tried to understand the Von Neumann architecture, but there is one thing I can't understand, how can the user know the number in the computer's memory if this command or if it is a data ?
I know there is a 'stored-program concept', but I understood nothing...
Can someone explain it to me in a two sentences ?
thnx !
Put simply, the user cannot look at a memory address and determine if it is a command or data. It can be both.
Its all in the interpretation; if the program counter points to a memory address, it will be interpreted as a command. If it is referenced by a read instruction, it is data.
The point of this is flexibility. A program can write (or re-write) programs into memory, that can then be executed by setting the program counter to the start address.
Modern operating systems limit this behaviour by data execution prevention, keeping parts of the memory from being interpreted as commands.
The Basic concept of Stored program concept is the idea of storing data and instructions together in main memory.
NOTE: This is a vastly oversimplified answer. I intentionally left a lot of things out for the sake of making the point
Remember that all computer memory is, for all intents and purposes on modern machines, a long list of bytes. The numbers are meaningless unless the thing that put them there has a specific purpose for them.
I could put number 5 at address 0. It could represent the 5th instruction specified by my CPU's instruction-set manual. It could represent the number of hours of sleep I had last week. It's meaningless unless it's assigned some value.
So how do computers know what to actually "do" with the numbers?
It's a large combination of standards and specifications, which are documents or code that specify which data should go where, which each piece of data means, what acceptable values for the data are, etc. Such standards are (usually) agreed upon by the masses.
Standards exist everywhere. Your BIOS has specifications as to where to look for the main operating system entry point on the boot media (your hard disk, a live CD, a bootable USB stick, etc.).
From there, the operating system adheres to standards that dictate where in memory the VGA buffer exists (0xb8000 on x86 machines, for example) in order to output all of that boot up text you see when you start your machine.
So on and so forth.
A portable executable (windows) or an ELF image (linux) or a Mach-O image (MacOS) are just files that also follow a specification, usually mandated by the operating system manufacturer, that put pieces of code at specific positions in the file. Then that file is simply loaded into memory, given a specific virtual address in user space, and then the operating system knows exactly where the entry point for your program is.
From there, it sets up the instruction pointer (IP) to point to the current instruction byte. On most CPUs, the current byte pointed to by the IP activates specific circuits in the CPU to perform some action.
For example, on x86 CPUs, byte 0x04 is the ADD instruction that takes the next byte (so IP + 1), reads it as an unsigned 8 bit number, and adds it to the al register. This is mandated by the x86 specification, which all x86 CPUs have agreed to implement.
That means when the IP register is pointing to a byte with the value of 0x04, it will perform the add and increase the IP by 2 - the first is to skip the ADD instruction itself, and the second is to skip the "argument" (operand) to the ADD instruction.
The IP advances as fast as the CPU (and the operating system's scheduler) will allow it to - which amounts to a "running" program.
What the data mean is defined entirely by what's creating the data and what's using it. In the best of circumstances, the two parties agree, usually via a standard or specification of some sort.
I wanted to know why code segment is common for different instances of same program.
For example: consider program P1.exe running, if another copy of P1.exe is running, code segment will be common for both running instances. Why is it so?
If the code segment in question is loaded from a DLL, it might be the operating system being clever and re-using the already loaded library. This is one of the core points of using dynamically loaded library code, it allows the code to be shared across multiple processes.
Not sure if Windows is clever enough to do this with the code sections of regular EXE files, but it would make sense if possible.
It could also be virtual memory fooling you; two processes can look like they have the same thing on the same address, but that address is virtual, so they really are just showing mappings of physical memory.
Code is typically read-only, so it would be wasteful to make multiple copies of it.
Also, Windows (at least, I can't speak for other OS's at this level) uses the paging infrastructure to page code in and out direct from the executable file, as if it were a paging file. Since you are dealing with the same executable, it is paging from the same location to the same location.
Self-modifying code is effectively no longer supported by modern operating systems. Generating new code is possible (by setting the correct flags when allocating memory) but this is separate from the original code segment.
The code segment is (supposed to be) static (does not change) so there is no reason not to use it for several instances.
Just to start at a basic level, Segmentation is just a way to implement memory isolation and partitioning. Paging is another way to achieve this. For the most part, anything you can achieve via segmentation, you can be achieve via paging. As such, most modern operating systems on the x86 forego using segmentation at all, instead relying completely on paging facilities.
Because of this, all processes will usually be running under the trivial segment of (Base = 0, Limit = 4GB, Privilege level = 3), which means the code/data segment registers play no real part in determining the physical address, and are just used to set the privilege level of the process. All processes will usually be run at the same privilege, so they should all have the same value in the segment register.
Edit
Maybe I misinterpreted the question. I thought the question author was asking why both processes have the same value in the code segment register.