what does K in KMP_AFFINITY mean? - openmp

I just learned KMP_AFFINITY from OpenMP. Could anyone tell me why its name starts with K? A lot of variables start with KMP_ in Intel OpenMP implementation. Does it stand for kernel?

Kuck, as in David Kuck's organization KAI https://en.wikipedia.org/wiki/David_Kuck (disbanded subsequent to Kuck's retirement).

Related

OpenMP #pragma num_threads

I am trying to understand OpenMP a bit, but I am confused how I am allowed to enter variables in num_threads() when it is a part of a #pragma which is a place to give information to the compiler. I was expecting it would not allow variables as a parameter to the num_threads but looks like I am allowed to use variables. How is that? How does it work?
The compiler does convert the pragma into a call to the OpenMP runtime, that is why a variable is allowed here.
Don't hard wire thread counts into your code :-), leave it to the runtime to do the right thing, or set it from the environment. (Use OMP_NUM_THREADS or KMP_HW_SUBSET with LLVM or Intel compilers).
Certainly never put in num_threads(constant).
Or, at least, consider some questions before you do...
How can you choose the right number of threads?
If you answer: "It's the number of cores in my machine", OK, next questions
Will you always use that machine, and only that machine?
Is no-one else ever going to use this code?
If you answer: "It's the number that perform best", next question:
As above: will that be true on all machines you ever use? On machines other people use to run your code?
Answering like that implies you have run scaling studies, which clearly required changing the number of threads. If you hard-wire this (with a constant), how can you ever run them again without editing and recompiling...

Scientific library in C/C with OpenMP

I have to exploit PpenMP in some algorithm and for this purpose I need some mathematical functions, like eig or svd as it is available in MATLAB and it is quite fast in MATLAB. I already tried the following libraries with OpenMP
GSL - GNU Scientific Library
Eigen C++ template library
but I don't know why my OpenMP parallelised code is much slower than the serial code, may be there is some thing wrong in the library, or that the function random, eig or svd are blocking? I have no idea how to figure it out, can some body suggest me which is most compatible math library with OpenMP.
I can recommend Intel's MKL; note that it costs money which may affect your decision. I neither know nor care what language(s) it is written in, just so long as it provides APIs callable from my chosen language. Mine is Fortran, but it has bindings for C too
If you look around SO you'll find many questions from people whose first (or second or third) OpenMP programs were actually slower than their serial versions. Look at some of the answers. Don't conclude that there is a magic bullet, in the shape of a library, to make your code faster. Instead, realise that it is most likely that you've written a poorly-parallelised program and fix that.
Finally, if you have an installation of Matlab, don't expect to be able to write your own routines to outperform Matlab's. I won't say it can't be done, but I think you'll find it very difficult.
GSL is compatible with OpenMP. You can try with Intel Math Kernel Library which comes as a trial version for free.
If the speed up is not so much, then probably the code is not much parallelizable. You may want to debug and see the details of the running threads in Intel Thread Checker, that could be helpful to see where the bottlenecks are.
I think you just want to find a fast implementation of lapack (or related routines) which is already threaded, but it's a little hard to tell from your question. High Performance Mark suggests MKL, which is an excellent example; others include ATLAS or FLAME which are open source but take some doing to build.

Fastest math programming language?

I have an application that requires millions of subtractions and remainders, i originally programmed this algorithm inside of C#.Net but it takes five minutes to process this information and i need it faster than that.
I have considered perl and that seems to be the best alternative now. Vb.net was slower in testing. C++ may be better also. Any advice would be greatly appreciated.
You need a compiled language like Fortran, C, or C++. Other languages are designed to give you flexibility, object-orientation, or other advantages, and assume absolutely fastest performance is not your highest priority.
Know how to get maximum performance out of a single thread, and after you have done so investigate sharing the work across multiple cores, for example with MPI. To get maximum performance in a single thread, one thing I do is single-step it at the machine instruction level, to make sure it's not dawdling about in stuff that could be removed.
Some calculations are regular enough to take profit of GPGPUs: recent graphic cards are essentially specialized massively parallel numerical co-processors. For instance, you could code your numerical kernels in OpenCL. Otherwise, learn C++11 (not some earlier version of the C++ standard) or C. And in many cases Ocaml could be nearly as fast as C++ but much easier to code with.
Perhaps your problem can be handled by scilab or R, I did not understand it enough to help more.
And you might take advantage of your multi-core processor by e.g. using Pthreads or MPI
At last, the Linux operating system is perhaps better to deal with massive calculations. It is significant that most super computers use it today.
If execution speed is the highest priority, that usually means Fortran.
Try Julia: its killing feature is being easy to code in a high level concise way, while keeping performances at the same order of magnitude of Fortran/C.
PARI/GP is the best I have used so far. It's written in C.
Try to look at DMelt mathematical program. The program calls Java libraries. Java virtual machine can optimize long mathematical calculations for you.
The standard tool for mathmatic numerical operations in engineering is often Matlab (or as free alternatives octave or the already mentioned scilab).

tuning performance of a program by using different combinations of gcc options

To my memory, there was a project to explore the best combination of gcc options(cflag)
for getting best performance of the program.
If I'm not mistaken, they do it in random test.
Could somebody reminder me about the name of the project.
It is difficult to dig from google since the project was halted.
thanks!
I think you mean ACOVEA (Analysis of Compiler Options via Evolutionary Algorithm).
You probably mean acovea.
http://www.coyotegulch.com/products/acovea/

Proving correctness of multithread algorithms

Multithread algorithms are notably hard to design/debug/prove. Dekker's algorithm is a prime example of how hard it can be to design a correct synchronized algorithm. Tanenbaum's Modern operating systems is filled with examples in its IPC section. Does anyone have a good reference (books, articles) for this? Thanks!
It is impossible to prove anything without building upon guarentees, so the first thing you want to do is to get familiar with the memory model of your target platform; Java and x86 both have solid and standardized memory models - I'm not so sure about CLR, but if all else fails, you'll have build upon the memory model of your target CPU architecture. The exception to this rule is if you intend to use a language that does does not allow any shared mutable state at all - I've heard Erlang is like that.
The first problem of concurrency is shared mutable state.
That can be fixed by:
Making state immutable
Not sharing state
Guarding shared mutable state by the same lock (two different locks cannot guard the same piece of state, unless you always use exactly these two locks)
The second problem of concurrency is safe publication. How do you make data available to other threads? How do you perform a hand-over? You'll the solution to this problem in the memory model, and (hopefully) in the API. Java, for instance, has many ways to publish state and the java.util.concurrent package contains tools specifically designed to handle inter-thread communication.
The third (and harder) problem of concurrency is locking. Mismanaged lock-ordering is the source of dead-locks. You can analytically prove, building upon the memory model guarentees, whether or not dead-locks are possible in your code. However, you need to design and write your code with that in mind, otherwise the complexity of the code can quickly render such an analysis impossible to perform in practice.
Then, once you have, or before you do, prove the correct use of concurrency, you will have to prove single-threaded correctness. The set of bugs that can occur in a concurrent code base is equal to the set of single-threaded program bugs, plus all the possible concurrency bugs.
The Pi-Calculus, A Theory of Mobile Processes is a good place to begin.
"Principles of Concurrent and Distributed Programming", M. Ben-Ari
ISBN-13: 978-0-321-31283-9
They have in on safari books online for reading:
http://my.safaribooksonline.com/9780321312839
Short answer: it's hard.
There was some really good work in the DEC SRC Modula-3 and larch stuff from the late 1980's.
e.g.
Thread synchronization: A formal specification (1991)
by A D Birrell, J V Guttag, J J Horning, R Levin
System Programming with Modula-3, chapter 5
Extended static checking (1998)
by David L. Detlefs, David L. Detlefs, K. Rustan, K. Rustan, M. Leino, M. Leino, Greg Nelson, Greg Nelson, James B. Saxe, James B. Saxe
Some of the good ideas from Modula-3 are making it into the Java world, e.g.
JML, though "JML is currently limited to sequential specification" says the intro.
I don't have any concrete references, but you might want to look into the Owicki-Gries theory (if you like theorem proving) or process theory/algebra (for which there are also various model-checking tools available).
#Just in case: I is. But from what i learnt, doing so for a non trivial algorithm is a major pain. I leave that sort of a thing for brainier people. I learnt what i know from Parallel Program Design: A Foundation (1988)
by K M Chandy, J Misra

Resources