sequential version of openmp - openmp

I'm trying to evaluate speedup obtained in openmp parallel programs using NAS Parallel benchmarks relative to the sequential execution. How do I run the sequential version of openmp programs ?

By not turning the compiler / linker switch on, i.e. for gcc remove -fopenmp and -lgomp.

Why don't you just set a single thread?
omp_set_num_threads(1)

Related

Advantages of using OpenMP in Z3 nix version

I've seen that Z3 can be build from the sources using --noomp so it doesn't link with OpenMP.
What is the advantage of using OpenMP. If I use the flag --noomp is the solving process gonna be slower?
Thank you
Use OpenMP if you use Z3 from multiple threads within the same process. Otherwise, Z3 is faster without OpenMP. When you enable OpenMP, it uses locks to protect potentially shared memory, but if you never use different threads (NB. all operations on the same context should take place on the same thread, or at least serialized) then these locks are pure overhead.

How can I compile a C program for multiple cores with mingw?

I have a C program that I am compiling with mingw, but it runs on only one core of my 8-core machine. How do I compile it to run on multiple cores?
(To clarify: I am not looking to use multiple cores to compile, as compilation time is low. It's runtime where I want to use my full CPU capacity.)
There is no other way but to write a multithread program. You need to first see how to split your tasks into independent parts which can be then run in threads simultaneously.
It cannot be fully automated. You may consider making use of the last additions of the C11 standard, or taking a look at pthreads or OpenMP.

Could GPU accelerate gcc/g++ compilation

When I'm building my gentoo system, my nvidia gpu is usually unused, can I make some use of it?
No, you cannot.
GPUs are typically best at accelerating massively parallel math-heavy tasks that involve little branching. Compiling software is basically the exact opposite of this - it's branch-heavy and does not parallelize well beyond the file level.

How is -march different from -mtune?

I tried to scrub the GCC man page for this, but still don't get it, really.
What's the difference between -march and -mtune?
When does one use just -march, vs. both? Is it ever possible to just -mtune?
If you use -march then GCC will be free to generate instructions that work on the specified CPU, but (typically) not on earlier CPUs in the architecture family.
If you just use -mtune, then the compiler will generate code that works on any of them, but will favour instruction sequences that run fastest on the specific CPU you indicated. e.g. setting loop-unrolling heuristics appropriately for that CPU.
-march=foo implies -mtune=foo unless you also specify a different -mtune. This is one reason why using -march is better than just enabling options like -mavx without doing anything about tuning.
Caveat: -march=native on a CPU that GCC doesn't specifically recognize will still enable new instruction sets that GCC can detect, but will leave -mtune=generic. Use a new enough GCC that knows about your CPU if you want it to make good code.
This is what i've googled up:
The -march=X option takes a CPU name X and allows GCC to generate code that uses all features of X. GCC manual explains exactly which CPU names mean which CPU families and features.
Because features are usually added, but not removed, a binary built with -march=X will run on CPU X, has a good chance to run on CPUs newer than X, but it will almost assuredly not run on anything older than X. Certain instruction sets (3DNow!, i guess?) may be specific to a particular CPU vendor, making use of these will probably get you binaries that don't run on competing CPUs, newer or otherwise.
The -mtune=Y option tunes the generated code to run faster on Y than on other CPUs it might run on. -march=X implies -mtune=X. -mtune=Y will not override -march=X, so, for example, it probably makes no sense to -march=core2 and -mtune=i686 - your code will not run on anything older than core2 anyway, because of -march=core2, so why on Earth would you want to optimize for something older (less featureful) than core2? -march=core2 -mtune=haswell makes more sense: don't use any features beyond what core2 provides (which is still a lot more than what -march=i686 gives you!), but do optimize code for much newer haswell CPUs, not for core2.
There's also -mtune=generic. generic makes GCC produce code that runs best on current CPUs (meaning of generic changes from one version of GCC to another). There are rumors on Gentoo forums that -march=X -mtune=generic produces code that runs faster on X than code produced by -march=X -mtune=X does (or just -march=X, as -mtune=X is implied). No idea if this is true or not.
Generally, unless you know exactly what you need, it seems that the best course is to specify -march=<oldest CPU you want to run on> and -mtune=generic (-mtune=generic is here to counter the implicit -mtune=<oldest CPU you want to run on>, because you probably don't want to optimize for the oldest CPU). Or just -march=native, if you ever going to run only on the same machine you build on.

strange behavior of an OpenMP program

I'm debugging an OpenMP program. Its behavior is strange.
1) If a simple program P (while(1) loop) occupies one core 100%, the OpenMP program pauses even it occupies all remained cores. Once I terminate the program P, OpenMP program continues to execute.
2) The OpenMP program can execute successfully in situation 1 if I set OMP_NUMBER_THREADS to 32/16/8.
I tested on both 8-core x64 machines and 32-core Itanium machines. The former uses GCC and libomp. The later uses privately-owned aCC compiler and libraries. So it is unlikely related to compiler/library.
Could you help point out any possible reasons which may cause the scene? Why can it be affected by another program?
Thanks.
I am afraid that you need to give more information.
What is the OS you are running on?
When you run using 16 threads are you doing this on the 8-core or the 32 core machine?
What is the simple while(p) program doing in this while loop?
What is the OpenMP program doing (in general terms - if you can't be specific)?
Have you tried using a profiling tool to see what the OpenMP program is doing?

Resources