I have two separate PowerPoint windows open, so how many processes are going on? If more than one process is going on, what do these processes have in common in MEMORY PROCESS: Text Part, Data Part, Heap, Stack?
I'm trying to find out about the progress of the application,
Related
I have two programs, the first program (lets' call it A) creates a huge chunk of data and save them on disk, the second program (lets' call it B) reads data from disk and perform data processing. The old workflow is that, I run program A, save data on disk, then run program B, load the data from disk, then process the data. However, this is very time-consuming, since we need two disk IO for large data.
One trivial way to solve this problem is to simply merge the two programs. However, I do NOT want to do this (imagine with a single dataset, we want to have multiple data processing programs running in parallel on the same node, which makes it necessary to separate the two programs). I was told that there is a technique called memory mapping file, which allows multiple processes to communicate and share memory. I find some reference in https://man7.org/linux/man-pages/man3/shm_unlink.3.html.
However, in the example shown there, the execution of two programs (processes) is overlapped, and the two processes communicate with each other in a "bouncing" fashion. In my case, I am not allowed to have such communication pattern. For some reason I have to make sure that program B is executed only after program A is finished (serial workflow). I just wonder if mmap can still be used in my case? I know it seems weird since at some point there is some memory allocated by program A while no program is running (between A and B), which might leads to memory leak, but if this optimization is possible, it would be a huge improvement. Thanks!
Memory mapped files and shared memory are two different concepts.
The former enable you to map a file in memory so that reads to the memory location read the file and write to the memory location write into the file. This kind of operation is very useful to abstract IO accesses as basic memory read/write. It is especially useful for big-data applications (or just to reuse code so to compute files directly).
The later is typically used for multiple running programs to communicate together while being in different processes. For example, programs like the Chrome/Chromium browser use that so to communicate between tabs that are different processes (for sake of security). It is also used in HPC for fast MPI communication between processes lying on the same computing node.
Linux also enable you to use pipes so for one process to send data to another. The pipe is closed when the process emitting data ends. This is useful for dataflow-based processing (eg. text filtering using grep for example).
In your case, it seems like 1 process is run and then the other starts only when the first process is finished. This means data needs to be mandatory stored in a file. Shared memory cannot be used here. That being said, this does not mean the file has to be stored on a storage device. On Linux for example, you can store files in RAM using RAMFS for example which is a filesystem stored in RAM. Note that files stored in such filesystem are not saved anywhere when the machine is shutdown (accidentally or deliberately) so it should not be used for critical data unless you can be sure the machine will not crash / be shutdown. RAMFS have a limited space and AFAIK the configuration of such filesystem require root privileges.
An alternative solution is to create a mediator process (M) with one purpose: receiving data from a process and sending it to other processes. Shared memory can be used in this case since A and B communicate with M and pair of processes are alive simultaneously. A can directly write in the memory shared by M once shared and B can read it later. M needs to be created before A/B and finished after A/B.
Question
Are there any notable differences between context switching between processes running the same executable (for example, two separate instances of cat) vs processes running different executables?
Background
I already know that having the same executable means that it can be cached in the same place in memory and in any of the CPU caches that might be available, so I know that when you switch from one process to another, if they're both executing the same executable, your odds of having a cache miss are smaller (possibly zero, if the executable is small enough or they're executing in roughly the same "spot", and the kernel doesn't do anything in the meantime that could cause the relevant memory to be evicted from the cache). This of course applies "all the way down", to memory still being in RAM vs. having been paged out to swap/disk.
I'm curious if there are other considerations that I'm missing? Anything to do with virtual memory mappings, perhaps, or if there are any kernels out there which are able to somehow get more optimal performance out of context switches between two processes running the same executable binary?
Motivation
I've been thinking about the Unix philosophy of small programs that do one thing well, and how taken to its logical conclusion, it leads to lots of small executables being forked and executed many times. (For example, 30-something runsv processes getting started up nearly simultaneously on Void Linux boot - note that runsv is only a good example during startup, because they mostly spend their time blocked waiting for events once they start their child service, so besides early boot, there isn't much context-switching between them happening. But we could easily image numerous cat or /bin/sh instances running at once or whatever.)
The context switching overhead is the same. That is usually done with a single (time consuming) instruction.
There are some more advanced operating systems (i.e. not eunuchs) that support installed shared programs. They have reduced overhead when more than one process accesses them. E.g., only one copy of read only data loaded into physical memory.
TL;DR Does it make sense to write multiple dumps for the same crash event, and if yes, what do you need to look out for.
We're using MiniDumpWriteDump to write a crash dump when there is a unhandled-exception / abort / younameit in our application.
The code so far actually writes two dumps:
One with MiniDumpWithDataSegs to get a small one that can be sent by even crappy email w/o problem once zipped.
A full one MiniDumpWithFullMemory to have the full info available should we need it.
To make this work, we call MiniDUmpWriteDump twice:
1 Open/create file for small dump
2 Write small dump
3 Open/create file for large dump
4 Write large dump
As far as I can tell, one additinoal idea of this scheme was that writing the small dump is faster. It's always subsecond basically, while writing the large dump often can take quite a few seconds, especially when the application is fully loaded and the large dump will easily be 1.2 GB or more.
The idea behind writing the small dump first, as far as I can tell, was that because it's faster, it would take a more detailed snapshot of the crashed process at the point in time it crashed, as the process is heavily multithreaded.
Obviously, the threads of the process continue to run between the end of the first call and the start of the second call to MDWP, so we do have quite some cases where the info in the small dump is actually more accurate than the info in the large dump.
After thinking about this, I would assume however, that to write the dump, MiniDumpWriteDump has to suspend the threads of the process anyway, so if we were to write the large dump first, we would have the large dump more accurate than the small one.
Question
Should we write the large dump before the small one? Should we even be writing two dumps? Could we somehow have the system first suspend the threads of the process and then write two dumps that are completely "synchronous"?
I ever analyzed dumps from various customers for a couple years, the followings is only my personal perspective to your question, hope this helps.
Should we write the large dump before the small one?
i don't consider the order is important for crash, hang etc typical issues. the crash spot is there, the deadlock is there in dump, first captured or after.
Should we even be writing two dumps?
i would suggest write at least 1 full dump, the small dump is very convenient for you to get an initial impression of what's the problem, but it's very limited esp. when your application crash. so you may suggest customer to email you the small dump to do first round triage, if this can not help you find the root cause, then ask the full dump. technically you can strip a small dump out from a full dump, however, you may not want your customer to do this sort of work for you. so this depends on how you interact with your customer.
Could we somehow have the system first suspend the threads of the process and then write two dumps that are completely "synchronous"?
technically this is doable. e.g. it's relatively easy to do out-proc, a simple NtSuspendProcess() suspend all target threads, but it have to be called from another process. if you prefer to do in-proc, you have to enumerate all threads and call SuspendThread(), this is how MiniDumpWriteDump() works. However, i think sync/asyn does not affect the accuracy of the dump.
Writing two dumps at the same time is not advisable because MiniDumpWriteDump is not thread safe.
All DbgHelp functions, such as this one, are single threaded. Therefore, calls from more than one thread to this function will likely result in unexpected behavior or memory corruption.
Whether you should write a large dump with a small dump depends on your application and the kind of bugs you might expect. The minidump only contains stack information, it does not contain heap memory, handle information or recently unloaded modules.
Obtaining stack information will obviously give you a stack trace, but if the stack trace only tells you that your last action was to reference some memory on the heap the trace isn't much use. Your expected failure modes will dictate which makes more sense. If you have legacy code that maybe isn't using RAII to manage handles, or the handling of heap allocated memory isn't as managed as you'd like then a full dump will be useful.
You should also consider the person who will submit the memory dump. If your customers are on the Internet, they might not appreciate submitting a sizable memory dump. They might also be worried about the private data that may also be submitted along with a full memory dump. A minimdump is much smaller and easier to submit and less likely (though not impossible) to contain private data. If you're customers are running on an internal network, then a full memory dump is more acceptable.
It is better to write a minidump first and then a large dump. This way you are more likely to get some data out quickly, rather than waiting for a full dump. A full dump can take a while users are often impatient. They may decide to kill the process so they can get back to work. Also, if the disk is getting full (potentially the cause of the crash) it's slightly more likely that you have room for a minidump than a full dump.
DbgHelp.dll imports SuspendThread and ResumeThread. You can do the same thing. Call SuspendThread for all threads (minus the current of course), call MiniDumpWriteDump as many times as you need, then call ResumeThread on each thread you suspended. This should give you consistently accurate dumps.
I understand that delete returns memory to the heap that was allocated of the heap, but what is the point? Computers have plenty of memory don't they? And all of the memory is returned as soon as you "X" out of the program.
Example:
Consider a server that allocates an object Packet for each packet it receives (this is bad design for the sake of the example).
A server, by nature, is intended to never shut down. If you never delete the thousands of Packet your server handles per second, your system is going to swamp and crash in a few minutes.
Another example:
Consider a video game that allocates particles for the special effect, everytime a new explosion is created (and never deletes them). In a game like Starcraft (or other recent ones), after a few minutes of hilarity and destruction (and hundres of thousands of particles), lag will be so huge that your game will turn into a PowerPoint slideshow, effectively making your player unhappy.
Not all programs exit quickly.
Some applications may run for hours, days or longer. Daemons may be designed to run without cease. Programs can easily consume more memory over their lifetime than available on the machine.
In addition, not all programs run in isolation. Most need to share resources with other applications.
There are a lot of reasons why you should manage your memory usage, as well as any other computer resources you use:
What might start off as a lightweight program could soon become more complex, depending on your design areas of memory consumption may grow exponentially.
Remember you are sharing memory resources with other programs. Being a good neighbour allows other processes to use the memory you free up, and helps to keep the entire system stable.
You don't know how long your program might run for. Some people hibernate their session (or never shut their computer down) and might keep your program running for years.
There are many other reasons, I suggest researching on memory allocation for more details on the do's and don'ts.
I see your point, what computers have lots of memory but you are wrong. As an engineer you have to create programs, what uses computer resources properly.
Imagine, you made program which runs all the time then computer is on. It sometimes creates some objects/variables with "new". After some time you don't need them anymore and you don't delete them. Such a situation occurs time to time and you just make some RAM out of stock. After a while user have to terminate your program and launch it again. It is not so bad but it not so comfortable, what is more, your program may be loading for a while. Because of these user feels bad of your silly decision.
Another thing. Then you use "new" to create object you call constructor and "delete" calls destructor. Lets say you need to open so file and destructor closes it and makes it accessible for other processes in this case you would steel not only memory but also files from other processes.
If you don't want to use "delete" you can use shared pointers (it has garbage collector).
It can be found in STL, std::shared_ptr, it has one disatvantage, WIN XP SP 2 and older do not support this. So if you want to create something for public you should use boost it also has boost::shared_ptr. To use boost you need to download it from here and configure your development environment to use it.
I noticed that all the _EPROCESS objects are linked to each other via the ActiveProcessList link. What is the purpose of this List. For what does the OS use this list of Active Processes?
In Windows NT, the schedulable unit is the thread. Processes serve as a container of threads, and also as an abstraction that defines what virtual memory map is active (and some other things).
All operating systems need to keep this information available. At different times, different components of the operating system could need to search for a process that matches a specific characteristic, or would need to assess all active processes.
So, how do we store this information? Why not a gigantic array in memory? Well, how big is that array going to be? Are we comfortable limiting the number of active processes to the size of this array? What happens if we can't grow the array? Are we prepared to reserve all that memory up front to keep track of the processes? In the low process use case, isn't that a lot of wasted memory?
So we can keep them on a linked list.
There are some occasions in NT where we care about process context but not thread context. One of those is I/O completion. When an I/O operation is handled asynchronously by the operating system, the eventual completion of that I/O could be in a process context that is different from the requesting process context. So, we need some records and information about the originating process so that we can "attach" to this process. "Attaching" to the process swaps us into the appropriate context with the appropriate user-mode memory available. We don't care about thread context, we care about process context, so this works.