How can i find an app that needs parallelized systems - parallel-processing

I have an assignment for school to make an existing app using parallel systems. Can someone give me an idea of what should I write about?

An easy option, if you are new to parallel systems, would be to choose something that is "trivially parallelisable" or "embarrasingly parallelisable" where nothing is dependent on anything else. Try Googling those terms.
Examples might be:
ping 200 IP addresses in parallel - hint use GNU Parallel
resize 100,000 images to 50% of their original width and height - hint: use ImageMagick and GNU Parallel
grab 100 webpages in parallel

Related

OpenMDAO: Use Parallel Groups in order to run same computations i parallel, to make the optimization faster

I am writing this post in the hope to understand how to set parallel computations by using OpenMDAO. I have been told that OpenMDAO has an option called Parallel Groups (https://openmdao.org/newdocs/versions/latest/features/core_features/working_with_groups/parallel_group.html) and I am wondering if this option could help me to make a gradient-free optimizer able to run in parallel the computations of the function that it has to study.
Do you know if I can create 2 or 3 instances of the function that I am trying to optimize, and in that way make OpenMDAO able to run the instances of the function with differents chosen inputs, in order to find the optimal results with less time that if it had to work with only one function instance ?
I saw that this thread was closer to what I am trying to do: parallelize openmdao optimization with different initial guesses I think it could have brought me some answers, but it appears that the link proposed as an answer is not available anymore.
Many thanks in advance for your help
to start, You'll need to install MPI, MPI4py, PETSc, and PETSc4py. These can be installed without too much effort on linux. They are a little harder on OSx, and VERY hard on windows.
You can use parallel groups to run multiple components in parallel. Whether or not you can make use of that for a gradient free method though is a trickier question. Unfortunately, as of V3.17, none of the current gradient free drivers are set up to work that way.
You could very probably make it work, but it will require some development on your part. You'll need to find a way to map the "generation data" (using that GA term as a generic reference to the set of parallel cases you can run at once for a gradient free method). That will almost certainly involve setting up a for loop outside the normal OpenMDAO run method.
You would set up the model with n instances in parallel where n is equal to the size of a generation. Then write your own code around a call to run_model that would map the gradient free data down into that model to run the cases all at once.
I am essentially proposing that you forgo the driver API and write your own execution code around OpenMDAO. This modeling approach was prototyped in the 2020 Reverse Hackathon, where we discussed how the driver API is not strictly necessary.

Most effective method to use parallel computing on different architectures

I am planning to write something to take advantages of the many devices that I have at home.
Basically my aim is to use the laptop to execute calculations, and also to use my main desktop computer to add more power (and finish the task quicker). I work with cellular simulation and chemical interactions, so to me would be great to take advantage of all that I have available at home.
I am using mainly OSX, so I need something that may work with that OS. I can code in objective-C, C and C++.
I am aware of GCD, OpenCL and MPI, but I am not sure which way to go.
I was planning to not use the full power of my desktop but only some of the available cores (in this way I can continue to work on the desktop doing other tasks that are not so resource intensive). In particular I would love to use the graphic card power (it is an ATI card, so no CUDA), since all that I do mainly is spreadsheet, word and coding with Xcode, and the graphic card resources are basically unused in that scenario.
Is there a specific set of libraries or API, among the aforementioned 3, that would allow me to selectively route tasks, and use resources on another machine without leaving the control totally to the compiler? I've heard that GCD is great but it has very limited control on where the blocks are executed, while MPI is on the other side of the spectrum....OpenCL seems to be in the middle.
Before diving in one of these technologies I would like to know which one would most likely suit my needs; I am sure that some other researcher has already used successfully parallel computing to achieve what I am trying to achieve.
Thanks in advance.
MPI is more for scientific computing large scale many processors many nodes exc not for a weekend project, for what you describe I would suggest using OpenCl or any one the more distributed framework of AMQP protocol families, such as zeromq or rabbitMQ, or a combination of OpenCl and AMQP , or even simpler consider multithreading , i would suggest OpenMP for that. I'm not sure if you are looking for direct solvers or parallel functions but there are many that exist as well for gpu's and cpu's which you can find on the web
Sorry, but this question simply cannot be meaningfully answered as posed. To be sure, I could toss out a collection of buzzwords describing various technologies to look at like GCD, OpenMPI, OpenCL, CUDA and any number of other technologies which allow one to run a single program on multiple cores, multiple programs on different cooperating computers, or a single program distributed across CPU and GPU, and it sounds like you know about a number of those already so I wouldn't even be adding much value in listing the buzzwords.
To simply toss out such terms without knowing the full specifics of the problem you're trying to solve, however, is a bit like saying that you know English, French and a little German so sure, by all means - mix them all together in a single paragraph without knowing anything about the target audience! Similarly, you can parallelize a given computation in any number of ways, across any number of different processing elements, but whether that parallelization is actually a win or not is going to be entirely dependent on the nature of the algorithm, its data dependencies, how much computation is expected for each reasonable "work chunk", and whether it can be executed on a GPU with sufficient numeric precision, among many other factors. The more complex the technology you choose, the more those factors matter and the greater the possibility that the resulting code will actually be slower than its single-threaded, single machine counterpart. IPC overhead and data copying can, and frequently do, swamp all of the gains one might realize from trying to naively parallelize something and then add additional overhead on top of that, resulting in a net loss. This is why engineers who can do this kind of work meaningfully and well are in such high demand. :)
Without knowing anything about your calculations, I would move in baby steps. First try a simple multi-processor framework like GCD (which is already built in to OS X and requires no additional dependencies to use) and figure out how to factor your code such that it can effectively use all of the available cores on a single machine. Once you've learned where the wins are (and if there even are any - if multi-threading isn't helping, multi-machine parallelization almost certainly won't either), try setting up several instances of the calculation on several machines with a simple IPC model that allows for distributing the work. Having already factored your algorithm(s) for multiple threads, it should be comparatively straight-forward to further generalize the approach across multiple machines (though it bears noting that the two are NOT the same problem and either way you still want to use all the cores available on any of the given target machines, so the two challenges are both complimentary and orthogonal).

Blackbox Performance Testing Snort

I am looking for several ways to test Snort and compare it's overall speed when changing specific variables such as how many rules are ran, etc. What I am interested in is the best practice to obtain the raw data I am looking for. Do I simply use the time function in any standard Linux distro or are there specific programs capable of performing multiple tests and correlating the data as specified by my input.
To expand on this same question are there more general steps I can take to perform black box performance testing on a variety of other programs.
How do I obtain the raw data to begin with?
Well the closest thing I could find was gprof. If anyone has any other suggestions that would be great.

How to speed up MATLAB codes?

As great as MATLAB is as a mathematical language, its speed is not as fast as one like it to be. I am wondering what are the general practices to speed up running a MATLAB code? For example I know that if instead of running for loops one can do computations in vector/matrix format s/he will see speedup in running the code.
I am wondering what are other suggestions.
Here are a few basic performance tips:
Learn to use the profiler to understand which parts of your
computation are slow
Limit the amounts of expensive function calls via vectorization
Preassign arrays instead of growing them in loops
Use multithreaded functions (such as bsxfun)
Use the latest version of Matlab - there have
been tremendous performance gains over the last 5 years
Use the parallel toolbox for multicore and/or GPU processing
Use efficient algorithms
Use Java or C/C++ code where appropriate (though the speed-up can be disappointing)
If you're doing a lot of easily-parallelizable operations, parfor will automatically parallelize your for loops: http://www.mathworks.com/help/toolbox/distcomp/parfor.html
Installing Lightspeed.
I have recently been through the frustrating process of installing Tom Minka's Lightspeed on my Mac. Along the way I learnt a few hard lessons worth sharing with other Mac users.
My system has the following specifications
OS X version 10.8.5
Xcode version 4.6.3
Matlab version 2011a
1) Make sure that Lightspeed is installed on a path with NO spaces in its name. I made the mistake of putting it inside "Library/Application Support/Matlab" which caused me endless trouble. In particular, it led to the same issue reported by Tomer Levinboim (levinboim.blogspot.co.nz) with the added problem that his fixes did not fully resolve the problem!
2) Read Michel Valstar's notes "Compiling Matlab Mex Files on a Mac" and install the recommended patch from Mathworks ( http://www.mathworks.com/matlabcentral/answers/94092). This patch applies all the needed flag/option changes that Levinboim identifies.
3) Change the line options.COMPFLAGS in the install_lightspeed.m file inside the lightspeed folder to point to:
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk
4) In Matlab check that the current path points at the Lightspeed folder. Run the command install_lightspeed. If successful run test_lightspeed. You should now have a working version of Lightspeed!
5) Path settings persist between sessions so the startup.sh approach suggested in the Read Me appears to be unnecessary on a Mac. However, if you wish to go down that track, first read:
Where is startup.m supposed to be?
http://obasic.net/set-your-customized-startup-file-for-matlab .
You might begin reviewing some ways to begin thinking about vectorization here.
After that, the PDF given here, even though incomplete, provides many Matlab idioms that give good performance.
I just found this here: Writing Fast MATLAB Code.
by Pascal Getreuer and this here: Lightspeed Toolbox. Great stuff...

Parallel STL algorithms in OS X

I working on converting an existing program to take advantage of some parallel functionality of the STL.
Specifically, I've re-written a big loop to work with std::accumulate. It runs, nicely.
Now, I want to have that accumulate operation run in parallel.
The documentation I've seen for GCC outline two specific steps.
Include the compiler flag -D_GLIBCXX_PARALLEL
Possibly add the header <parallel/algorithm>
Adding the compiler flag doesn't seem to change anything. The execution time is the same, and I don't see any indication of multiple core usage when monitoring the system.
I get an error when adding the parallel/algorithm header. I thought it would be included with the latest version of gcc (4.7).
So, a few questions:
Is there some way to definitively determine if code is actually running in parallel?
Is there a "best practices" way of doing this on OS X? (Ideal compiler flags, header, etc?)
Any and all suggestions are welcome.
Thanks!
See http://threadingbuildingblocks.org/
If you only ever parallelize STL algorithms, you are going to disappointed in the results in general. Those algorithms generally only begin to show a scalability advantage when working over very large datasets (e.g. N > 10 million).
TBB (and others like it) work at a higher level, focusing on the overall algorithm design, not just the leaf functions (like std::accumulate()).
Second alternative is to use OpenMP, which is supported by both GCC and
Clang, though is not STL by any means, but is cross-platform.
Third alternative is to use Grand Central Dispatch - the official multicore API in OSX, again hardly STL.
Forth alternative is to wait for C++17, it will have Parallelism module.

Resources