Speed up compile time with make - makefile

Is there anyway to speed up the time it takes a run a make compile. We have a package that takes 12 minutes and looking to speed that up. Any flags to pass to make, or way to run it parallel.

Try running make -jN with N being the number of cores in your system if you haven't already.
Try using fewer compile time optimizations if it applies (avoid -O3 in particular).
You can also take a look at distcc.

Related

Computer comparison

I have access to many different computers, and I want to compare their performance.
So I ran a benchmark program, and wanted to see on which computer it runs faster.
Is it important where I compile the program? Are there compilation flags that make it matter (like -xhost or -march=native)?

Open CL speedup obtained is above 7000

I have an OpenCL sequential program and a parallel program which consists of the same algorithm. I have got the execution time results as 133000 milliseconds for sequential and 17 milliseconds as the kernel time for parallel. So when I calculate the speed up that is 133000/17 i get 7823 as the speedup. Whether this much of speed up possible?
Such a speedup might happen (but seems quite big; to me, a speedup of 7823 looks suspicious but not entirely impossible, see e.g. these slides and that. A 100x factor would seem more reasonable). Costly graphics cards are rumored to be able to run at several teraflops. A single core gives only gigaflops. Some particular programs can even run slower on GPGPU than on the CPU.
When benchmarking your CPU code, be sure to enable optimizations in your compiler (e.g. compile with gcc -O2 at least with GCC). Without any optimization (e.g. gcc -O0) the CPU performance is slow (e.g. a 3x factor between binary obtained with gcc -O0 and gcc -O2 is common).
BTW, cache considerations matter a lot for CPU performance. If you wrote your numerical CPU code without taking that into account, it may be quite slow (in the weird case when it has bad locality of reference).
If the kernel function has a problem and has not been executed, the time results will be inaccurate

Serial program runs slower with multiple instances or in parallel

I have a fortran code that I am using to calculate some quantities related to the work that I do. The code itself involves several nested loops, and requires very little disk I/O. Whenever the code is modified, I run it against a suite of several input files (just to make sure it's working properly).
To make a long story short, the most recent update has increased the run time of the program by about a factor of four, and running each input file serially with one CPU takes about 45 minutes (a long time to wait, just to see whether anything was broken). Consequently, I'd like to run each of the input files in parallel across the 4 cpus on the system. I've been attempting to implement the parallelism via a bash script.
The interesting thing I have noted is that, when only one instance of the program is running on the machine, it takes about three and a half minutes to crank through one of the input files. When four instances of the program are running, it takes more like eleven and a half minute to crank through one input file (bringing my total run time down from about 45 minutes to 36 minutes - an improvement, yes, but not quite what I had hoped for).
I've tried implementing the parallelism using gnu parallel, xargs, wait, and even just starting four instances of the program in the background from the command line. Regardless of how the instances are started, I see the same slow down. Consequently, I'm pretty sure this isn't an artifact of the shell scripting, but something going on with the program itself.
I have tried rebuilding the program with debugging symbols turned off, and also using static linking. Neither of these had any noticeable impact. I'm currently building the program with the following options:
$ gfortran -Wall -g -O3 -fbacktrace -ffpe-trap=invalid,zero,overflow,underflow,denormal -fbounds-check -finit-real=nan -finit-integer=nan -o [program name] {sources}
Any help or guidance would be much appreciated!
On modern CPUs you cannot expect a linear speedup. There are several reasons:
Hyperthreading GNU/Linux will see hyperthreading as a core eventhough it is not a real core. It is more like 30% of a core.
Shared caches If your cores share the same cache and a single instance of your program uses the full shared cache, then you will get more cache misses if you run more instances.
Memory bandwidth A similar case as the shared cache is the shared memory bandwidth. If a single thread uses the full memory bandwidth, then running more jobs in parallel may congest the bandwidth. This can partly be solved by running on a NUMA where each CPU has some RAM that is "closer" than other RAM.
Turbo mode Many CPUs can run a single thread at a higher clock rate than multiple threads. This is due to heat.
All of these will exhibit the same symptom: Running a single thread will be faster than each of the multiple threads, but the total throughput of the multiple threads will be bigger than the single thread.
Though I must admit your case sounds extreme: With 4 cores I would have expected a speedup of at least 2.
How to identify the reason
Hyperthreading Use taskset to select which cores to run on. If you use 2 of the 4 cores is there any difference if you use #1+2 or #1+3?
Turbo mode Use cpufreq-set to force a low frequency. Is the speed now the same if you run 1 or 2 jobs in parallel?
Shared cache Not sure how to do this, but if it is somehow possible to disable the cache, then comparing 1 job to 2 jobs run at the same low frequency should give an indication.

How can I compile a C program for multiple cores with mingw?

I have a C program that I am compiling with mingw, but it runs on only one core of my 8-core machine. How do I compile it to run on multiple cores?
(To clarify: I am not looking to use multiple cores to compile, as compilation time is low. It's runtime where I want to use my full CPU capacity.)
There is no other way but to write a multithread program. You need to first see how to split your tasks into independent parts which can be then run in threads simultaneously.
It cannot be fully automated. You may consider making use of the last additions of the C11 standard, or taking a look at pthreads or OpenMP.

Speedup GNU make build process - Parallelism?

I build a huge project frequently and this takes long time (more than one hour) to finish even after configuring pre-compiled headers. Are their any guidelines or tricks to allow make work in parallel (e.g. starting gcc in background, ...etc) to allow for faster builds?
Note: Sources and binaries are too large in size to be placed in a ram file system and I don't want to change the directory structure or build philosophy.
You can try
make -j<number of jobs to run in parallel>
make -jN is a must now that most machines are multi-core. If you don't want to write -jN each time, you can put
export MAKEFLAGS=-jN
in your .bashrc.
You may also want to checkout distcc.
If your project is becoming too big for one machine to handle, you can use one of the distributed make replacements, such as Electric Cloud.
If you want to run your build in parallel,
make -jN
does the job, but keep in mind:
N should be equal to the maximum number of threads your machine supports, if you enter a number greater than that, make automatically makes N=maximum number of threads your machine supports
make doesn't support parallel build using -jN in MSDOS, it just does a serial build. If you specify -jN, it will downgrade N=1.
Read more here, from the make source: http://cmdlinelinux.blogspot.com/2014/04/parallel-build-using-gnu-make-j.html

Resources