How parallelize existing executable?

How parallelize existing executable? - parallel-processing

In the lab we have a piece of software that is used for MRI analysis, which involves a lot of data crunching. Is there a way to redistribute the load generated by the program across multiple computers/GPUs without editing the program itself?

Related

Bypassing memory management system in Linux, windows and OX

I'm trying to research information about how to implement the memory management system in a DBSM. I know the theoretical part, but can't find anything about bypassing the memory management system in Linux, windows nor mac OX.
I'm trying to implement an efficient DBMS myself for fun. Which should be able to run on any common OS. In Database Systems The complete book, they discus the importance on fitting the relations/tables in adjacent sectors/blocks on hardware for faster read/write. But i can't seem to find anything when crawling the web. Is there a way to bypass the memory management system, such that I can write to specific locations on disk in such that the data saved there are placed on continuously blocks and so that the file system can recognize the data as a file?
I know that this is not a general coding question.
I'm used to write in C, C#, F#, ML, Python and a little gnu assembly.
For clarification I do not mean which DBMS should I use, I'm trying to implement the the DBSM myself.

Signal Processing/FFT On Live Input Audio

I was told this belongs in programming, not in the Signal Processing sub-exchange.
Is there a way to implement spectrum analysis (specifically FFT) on live audio in Windows?
I want to be able to read in audio from a microphone, and have a display on the screen that shows the resultant Fourier Transform.
It would also be useful if I could execute a program if a certain set of FFT characteristics occurred.
Thanks guys!

To do this you have a couple of options depending on your most preferred language/framework basically. I'm not sure how new you are to signal processing so I'll suggest a few options.
Visual Programming
These are all visual programming environments which don't actually require any writing of code, however Simulink and Pure Data both require a runtime for the user to run the program.
Simulink (Paid)
MathWorks/Matlab's visual programming tool that works really well in real-time (in my opinion). Using the Audio System Toolbox, you can easily capture microphone input from your system in realtime and carry out the FFT processing, plot the spectrum and, like you said, if certain FFT conditions are met then to carry out some further processing.
This isn't free software and requires having the Matlab/Simulink run-time installed to be used. You can also script your processing in Matlab's .m language as desired (a cross between Java, JS and C).
Max MSP (Paid)
A similar version to Simulink but developed as a standalone visual programming tool. This will allow you similar freedom to Simulink but I think it will be easier for re-distribution.
You can compile MAX MSP into executables to give to someone straight away. Here is a reference to get you started on using the FFT in MAX. Again, this isn't free but if you wanted to learn more about it then I think it's worth the money (if I recall it's not too expensive).
If you need some more custom processing than the built in modules, I believe you can design custom MAX modules using C or JavaScript. Max is designed to easily get system audio input / outputs and here's a link to get you started.
Bonus: You can design your own Ableton Live plugins with the Max4Live addon which just lets your MAX MSP projects get compiled into .VST format. So you can build custom FX if you are into music production.
Pure Data (PD) (Free)
A very bland open source version of MAX MSP but completely free. It may look dull at first but a lot of researches I know use it to build fairly complex systems that can do some serious data processing. There are also lots of community built extras for PD if you ever needed a custom module. Here is a link to get you started on the FFT in PD. You cannot compile applications with PD, but since it's completely free to install anyone can run your program after installing PD. Another link for troubleshooting audio I/O in PD (if it isn't working right out of the box).
Programming Languages
Now the visual stuff is a really good way to get started if you aren't already introduced to DSP or audio programming. Otherwise here a just a few options and links to get started and where I would recommend.
Matlab & Octave
Like before, the Audio Systems Toolbox supports realtime audio I/O within a Matlab script. This combined with Matlab's built in FFT function can have you setup programming realtime FFTs and plotting the response in no time at all (less than 10 lines of code or something).
Octave has it's own version of the FFT function and different backends for rendering plot responses, but no Audio Systems Toolbox. However, Playrec is also an opensource alternative for audio I/O in Matlab/Octave that supports realtime audio input and output.
(Octave is an open source equivalent to Matlab (Matlab needs a paid license to develop a program), but does not support all Matlab supported features).
Python
Due to the PyAudio module, realtime audio I/O and DSP is more possible using Python! I would recommend Python if you are just starting out for sure since it's a nice introduction into any programming language and can help with teaching the fundamentals of DSP before attempting lower level languages.
Here's where you can get started with real-time non-blocking audio I/O in Python with PyAudio. To plot your data you can use a library such as matplotlib (designed similar to Matlab's easy plotting functionality).
For your FFT there are multiple libraries out there but I'd start with the Scipy / Numpy one.
C
One of the classic (and sometimes) most daunting programming language. With no objects (unless you want to make them yourself) or other high level abstractions, C is one of the few languages that still feels like you're building a lot from the ground up (which personally I like).
To get started with audio, I'd look at, in my opinion, the most widely used cross-platform audio I/O library; portaudio. This will let you access the soundcard data inputs and outputs in realtime on Mac, Linux & Windows.
Once you get this up and running an FFT I would use to get started would be the KissFFT just because of it's pure simplicity to use. If you want to plot the data, I would maybe look at gnuplot, but this is'nt a very pretty route in terms of development.
If you are very new to programming I would not recommend this unless you really want to get stuck in.
C++
Both KissFFT and portaudio will also compile with C++ code, but here are a couple of higher level alternatives.
One of my favourites is the JUCE framework / development environment. It has built in cross-platform audio I/O and already has a custom FFT function as part of the framework. You can build custom VSTs for your music DAW if you want to as well. It also comes with 'easy' (if you know C++) access to graphics windows with higher level access to openGL, so you could get fancy when plotting your data in real-time. If i remember correctly, one of the Demo projects on first installation is a real-time FFT plot you can compile and see the input from your laptop mic. JUCE is free for personal use, but comes with a small license fee as an indie developer.
Otherwise another one that comes to mind is the QT C++ library/framework for UI design (mainly). This is a cross platform easy to use GUI designer, that also has high level classes for obtaining audio input from a Mac/Win/Linux mic. Here is just one example I came across using QT's multimedia classes and FFTReal to plot a realtime FFT spectrum.
Summary
I've suggested a lot of options, but also missed out a few other people may recommend like languages such as R, C#, Java, Rust etc... and there are so many suggestions it's impossible to cover them all but I think this should be enough to get started. If it were me in terms of experience:
Complete Beginner to Programming: Max MSP
Novice / Knows their way around a little bit: Python (with PyAudio)
Programmed in other languages maybe looking to gain more programming skills : C++ with JUCE
Any of these languages you pick will be good in future reference for software positions and many companies / researchers use them to prototype / develop realtime audio processing software.
This is just my opinion but hopefully this gets you well along your way!

how to run a openmp program on clusters with multiple nodes? [duplicate]

I want to know if it would be possible to run an OpenMP program on multiple hosts. So far I only heard of programs that can be executed on multiple thread but all within the same physical computer. Is it possible to execute a program on two (or more) clients? I don't want to use MPI.

Yes, it is possible to run OpenMP programs on a distributed system, but I doubt it is within the reach of every user around. ScaleMP offers vSMP - an expensive commercial hypervisor software that allows one to create a virtual NUMA machine on top of many networked hosts, then run a regular OS (Linux or Windows) inside this VM. It requires a fast network interconnect (e.g. InfiniBand) and dedicated hosts (since it runs as a hypervisor beneath the normal OS). We have an operational vSMP cluster here and it runs unmodified OpenMP applications, but performance is strongly dependent on data hierarchy and access patterns.
NICTA used to develop similar SSI hypervisor named vNUMA, but development also stopped. Besides their solution was IA64-specific (IA64 is Intel Itanium, not to be mistaken with Intel64, which is their current generation of x86 CPUs).
Intel used to develop Cluster OpenMP (ClOMP; not to be mistaken with the similarly named project to bring OpenMP support to Clang), but it was abandoned due to "general lack of interest among customers and fewer cases than expected where it showed a benefit" (from here). ClOMP was an Intel extension to OpenMP and it was built into the Intel compiler suite, e.g. you couldn't use it with GCC (this request to start ClOMP development for GCC went in the limbo). If you have access to old versions of Intel compilers (versions from 9.1 to 11.1), you would have to obtain a (trial) ClOMP license, which might be next to impossible given that the product is dead and old (trial) licenses have already expired. Then again, starting with version 12.0, Intel compilers no longer support ClOMP.
Other research projects exist (just search for "distributed shared memory"), but only vSMP (the ScaleMP solution) seems to be mature enough for production HPC environments (and it's priced accordingly). Seems like most efforts now go into development of co-array languages (Co-Array Fortran, Unified Parallel C, etc.) instead. I would suggest that you have a look at Berkeley UPC or invest some time in learning MPI as it is definitely not going away in the years to come.

Before, there was the Cluster OpenMP.
Cluster OpenMP, was an implementation of OpenMP that could make use of multiple SMP machines without resorting to MPI. This advance had the advantage of eliminating the need to write explicit messaging code, as well as not mixing programming paradigms. The shared memory in Cluster OpenMP was maintained across all machines through a distributed shared-memory subsystem. Cluster OpenMP is based on the relaxed memory consistency of OpenMP, allowing shared variables to be made consistent only when absolutely necessary. source
Performance Considerations for Cluster OpenMP
Some memory operations are much more expensive than others. To achieve good performance with Cluster OpenMP, the number of accesses to unprotected pages must be as high as possible, relative to the number of accesses to protected pages. This means that once a page is brought up-to-date on a given node, a large number of accesses should be made to it before the next synchronization. In order to accomplish this, a program should have as little synchronization as possible, and re-use the data on a given page as much as possible. This translates to avoiding fine-grained synchronization, such as atomic constructs or locks, and having high data locality source.

Another option for running OpenMP programs on multiple hosts is the remote offloading plugin in the LLVM OpenMP runtime.
https://openmp.llvm.org/design/Runtimes.html#remote-offloading-plugin
The big issue with running OpenMP programs on distributed memory is data movement. Coincidentally, that is also one of the major issues in programming GPU's. Extending OpenMP to handle GPU programming has given rise to OpenMP directives to describe data transfer. Programming GPU's has also forced programmers to think more carefully about building programs that consider data movement.

Difference between multi-process programming with fork and MPI

Is there a difference in performance or other between creating a multi-process program using the linux "fork" and the functions available in the MPI library?
Or is it just easier to do it in MPI because of the ready to use functions?

They don't solve the same problem. Note the difference between parallel programming and distributed-memory parallel programming.
Using the fork/join model you mentioned usually is for parallel programming on the same physical machine. You generally don't distribute your work to other connected machines (with the exceptions of some of the models in the comments).
MPI is for distributed-memory parallel programming. Instead of using a single processor, you use a group of machines (even hundreds of thousands of processors) to solve a problem. While these are sometimes considered one large logical machine, they are usually made up of lots of processors. The MPI functions are there to simplify communication between these processes on distributed machines to avoid having to do things like manually open TCP sockets between all of your processes.
So there's not really a way to compare their performance unless you're only running your MPI program on a single machine, which isn't really what it's designed to do. Yes, you can run MPI on a single machine and people do that all the time for small test codes or small projects, but that's not the biggest use case.

Is there a Linux equivalent of Windows' "resource files"?

I have a C library, which I build as a shared object for Linux and a DLL for Windows with MinGW32. The API depends on a couple of data files (statistical models) which I'd really like to roll in with the SO/DLL so that deployment is just one file.
It looks like I can achieve this for Windows with a "resource file" compiled with windres, but then I've got to write a bunch of resource-handling code for Windows, and I'm still stuck with the files on Linux.
Is there a way to achieve the same functionality on Linux?
Even better, is there a portable solution?

It's actually quite simple on Linux and other ELF systems: http://www.linuxjournal.com/content/embedding-file-executable-aka-hello-world-version-5967
OS X has bundles, so you just build your library as a framework and put the file in the bundle.

Two potential solutions:
Phong Vo's sfio library, which is part of the AT&T Advanced Software Technology toolset, is a wonderful replacement for C stdio.h, and it will allow you to open either files or memory blocks using a single API. So you can easily convert your existing files to C initialized data to include in your DLL or SO file.
This is a good cross-platform solution, but the penalty is that the learning curve to get started is pretty high. They don't make it easy to figure out how stuff works or to take one part of their toolset and split it out for use independent of the other parts. But the good news is that if you want to adopt their U/Win system for running Unix codes on windows (all part of the same toolset), you can create DLLs and SOs using the same system.
For this kind of problem I often fall back on Lua; I can stored Lua data either in external files or within C as initialized data. This is great for distributing everything in one .so file; I do this for my students.
Again the downside is that you have to master and incorporate a new technology.
In my own work I use Lua over the AT&T stuff for these reasons:
Lua has a much smaller footprint and is designed to play well with others; with AST you really have do adopt their way of doing things.
The learning curve with Lua is much less steep; you can be productive very quickly.
Lua is dead easy to install and it's easy to get information about it. AST has its own quirky installation process shared by nobody else in the world; it's often hard to make the installation work; and it's harder to get information about it.
Using Lua has a lot of other payoffs, so the effort spent learning Lua and learning how to incorporate Lua into C codes is easy to amortize over multiple projects.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio