using hundreds of child processes in Windows? - windows

I have a Windows application which uses many third-party modules of questionable reliability. My app has to create many objects from those modules, and one bad object may cause a lot of problems.
I was thinking of a multi-process scheme where the objects are created in a separate process (the interfaces are basically all the same, so creating the cross-process communication shouldn't be so difficult). At the most extreme, I'm considering one object per process so I might end up with anywhere between 20 processes and a few hundred processes launch from my main app.
Would that cause Windows any problems? I'm using Windows 7, 64-bit, and memory and CPU power won't be an issue.

As long as you have enough CPU power and memory there are no problems. Having the general rules for distributed applications, multithreading (yes, multithreading), deadlocks, atomic operations and co. everything should be fine.

Related

How to randomize address space at runtime for benchmarking purposes

I'm looking for a mechanism like ASLR for Linux in order to benchmark a distributed application while accounting for incidental layout changes. For background and motivation, see the Stabilizer paper.
The goal is to recreate the behavior of Stabilizer, but in a distributed environment with complex deployment. (As far as I can tell, that project is no longer maintained and never made it past a prototype phase.) In particular, the randomization should take place (repeatedly) at runtime and without needing to invoke the program through a special binary or debugger. On the other hand, I assume full access to source code and the ability to arbitrarily change/recompile the system under test.
By "complex deployment", I mean that there may be different binaries running on different machines, possibly written in different languages. Think Java programs calling into JNI libraries (or other languages using FFI), "main" binaries communicating with sidecar services, etc. The key in all of these cases is that the native code (the target of randomization) is not manually invoked, but is somehow embedded by another program.
I'm only concerned about the randomization aspect (i.e., assume that metrics collection/reporting is handled externally). It's fine if the solution is system-specific (e.g., only runs on Linux or with C++ libraries), but ideally it would be a general pattern that can be applied "anywhere", regardless of the compiler/toolchain/OS.
Side note: layout issues are less of a concern on larger systems thanks to the extra sources of random noise (network, CPU temperatures/throttling, IPC overheads, etc). However, in many cases, distributed applications are deployed on "identical" machines with uniform environments, so there's still plenty of room for correlated performance impacts. The goal is just to debias the process for decision making.

stress testing concurrency: how to slow down thread execution oin my application?

I have hit situations where certain race conditions only occur when I run my application in the valgrind emulator. But if I run my executable as a normal program these issues did not occur.
My theory is that the emulator slows down execution so much that threads have larger common time frames in which they execute and can operate on shared data structures thereby increasing the chance of hitting race conditions on these shared resources.
I would like to have a more fine grained control on e.g. a virtual clock frequency by means of a specialized emulator.
Does somebody know about an existing tool to do this job. I have tried to search for this online. There should be some academic paper dealing with this. But so far I haven't found it yet.

Does process termination automatically free all memory used? Any reason to do it explicitly?

In Windows NT and later I assume that when a process expires, either because it terminated itself or was forcefully terminated, the OS automatically reclaims all the memory used by the process. Are there any situations in which this is not true? Is there any reason to free all the memory used by a user-mode application explicitly?
Whenever a process ends, all memory pages mapped to it are returned to the available state. This could qualify as "reclaiming the memory," as you say. However, it does not do things such as running destructors (if you are using C++).
I highly recommend freeing all memory, not from a resources perspective, but from a development perspective. Trying to free up memory encourages you to think about the lifespan of memory, and helps you make sure you actually clean up properly.
This does not matter in the short run, but I have dealt with countless software programs which assumed that they owned the process, so didn't have to clean up after themselves. However, there are a lot of reasons to want to run a program in a sandbox. Many randomized testing scenarios can run much faster if they don't have to recreate the process every time. I've also had several programs which thought they would be standalone, only to find a desire to integrate into a larger software package. At those times, we found out all the shortcuts that had been taken with memory management.

Analogs of Intel's Cluster OpenMP

Are there analogs of Intel Cluster OpenMP? This library simulates shared-memory machine (like SMP or NUMA) while running on distributed memory machine (like Ethernet-connected cluster of PC's).
This library allows to start openmp programs directly on cluster.
I search for
libraries, which allow multithreaded programms run on distributed cluster
or libraries (replacement of e.g. libgomp), which allow OpenMP programms run on distributed cluster
or compilers, capable of generating cluster code from openmp programms, besides Intel C++
The keyword you want to be searching for is "distributed shared memory"; there's a Wikipedia page on the subject. MOSIX, which became openMOSIX, which is now being developed as part of LinuxPMI, is the closest thing I'm aware of; but I don't have much experience with the current LinuxPMI project.
One thing you need to be aware of is that none of these systems work especially well, performance-wise. (Maybe a more optimistic way of saying it is that it's a tribute to the developers that these things work at all). You can't just abstract away the fact that accessing on-node memory is very very different from memory on some other node over a network. Even making local memory systems fast is difficult and requires a lot of hardware; you can't just hope that a little bit of software will hide the fact that you're now doing things over a network.
The performance ramifications are especially important when you consider that OpenMP programs you might want to run are almost always going to be written assuming that memory accesses are local and thus cheap, because, well, that's what OpenMP is for. False sharing is bad enough when you're talking about different sockets accessing a common cache line - page-based false sharing across a network is just disasterous.
Now, it could well be that you have a very simple program with very little actual shared state, and a distributed shared memory system wouldn't be so bad -- but in that case I've got to think you'd be better off in the long run just migrating the problem away from a shared-memory-based model like OpenMP towards something that'll work better in a cluster environment anyway.

How do I limit RAM to test low memory situations?

I'm trying to reproduce a bug that seems to appear when a user is using up a bunch of RAM. What's the best way to either limit the available RAM the computer can use, or fill most of it up? I'd prefer to do this without physically removing memory and without running a bunch of arbitrary, memory-intensive programs (ie, Photoshop, Quake, etc).
Use a virtual machine and set resource limits on it to emulate the conditions that you want.
VMWare is one of the leaders in this area and they have a free vmware player that lets you do this.
I'm copying my answer from a similar question:
If you are testing a native/unmanaged/C++ application you can use AppVerifier and it's Low Resource Simulation setting which will use fault injection to simulate errors in memory allocations (among many other things). It's also really useful for finding a ton of other subtle problems that often lead to application crashes.
You can also use consume.exe, which is part of the Microsoft Windows SDK for Windows 7 and .NET Framework 3.5 Service Pack 1 to easily use a lot of memory, disk space, cpu time, the page file, or kernel pool and see how your application handles the lack of available resources. (Does it crash? How is the performance affected? etc.)
Use either a job object or ulimit(1).
Create a virtual machine and set the ram to what you need.
The one I use is Virtual Box from SUN.
http://www.virtualbox.org/
It is easy to set up.
If you are developing in Java, you can set the memory limits for the JVM at startup.

Resources