Trouble with parallel make not always starting another job when one finishes - makefile

I'm working on a system with four logical CPS (two dual-core CPUs if it matters). I'm using make to parallelize twelve trivially parallelizable tasks and doing it from cron.
The invocation looks like:
make -k -j 4 -l 3.99 -C [dir] [12 targets]
The trouble I'm running into is that sometimes one job will finish but the next one won't startup even though it shouldn't be stopped by the load average limiter. Each target takes about four hours to complete and I'm wondering if this might be part of the problem.
Edit: Sometimes a target does fail but I use the -k option to have the rest of the make still run. I haven't noticed any correlation with jobs failing and the next job not starting.

I'd drop the '-l'
If all you plan to run the the system is this build I think the -j 4 does what you want.
Based on my memory, if you have anything else running (crond?), that can push the load average over 4.
GNU make ref

Does make think one of the targets is failing? If so, it will stop the make after the running jobs finish. You can use -k to tell it to continue even if an error occurs.

#BCS
I'm 99.9% sure that the -l isn't causeing the problem because I can watch the load average on the machine and it drops down to about three and sometimes as low as one (!) without starting the next job.

Related

Gnu Make: When invoking parallel make, if pre-requisites are supplied during the build, will make try to remake those?

This is an order of operations question.
Suppose I declare a list of requirements:
required:=$(patsubst %.foo,%.bar, $(shell find * -name '.foo'))
And a rule to make those requirements:
$(required):
./foo.py $#
Finally, I invoke the work with:
make foo -j 10
Suppose further the job is taking days and days (up to a week on this slow desktop computer).
In order to speed things up, I'd like to generate a list of commands and do some of the work on the Much faster laptop. I can't do all of the work on the laptop because, for whatever reason, it can't stay up for hours and hours without discharging and suspending (if I had to guess, probably due to thermal throttling):
make -n foo > outstanding_jobs
cat outstanding_jobs | sort -r | sponge outstanding_jobs
scp slow_box:outstanding_jobs fast_laptop:outstanding_jobs
ssh fast_laptop
head -n 200 outstanding_jobs | parallel -j 12
scp *.bar slow_box:.
The question is:
If I put *.bar in the directory where the original make job was run, will make still try to do that job on the slow box?
OR do I have to halt the job on the slow box and re-invoke make to "get credit" in the make recipe for the new work that I've synced over onto the slow box?
NOTE: substantially revised.
Before it starts building anything, make constructs a dependency graph to guide it, based on an analysis of the requested goal(s), the applicable build rules, and, to some extent, the files already present. It then walks the graph, starting from the goal nodes, to determine which are out of date with respect to their prerequisites and update them.
Although it does not necessarily evaluate the whole graph before running any recipes, once it decides that a given target needs to be updated, make is committed to updating it. In particular, once make decides that some direct or indirect prerequisite of T is out of date, it is committed to (re)building T, too, regardless of any subsequent action on T by another process.
So, ...
If I put *.bar in the directory where the original make job was run,
will make still try to do that job on the slow box?
Adding files to the build directory after make starts building things will not necessarily affect which targets the ongoing make run will attempt to build, nor which recipes it uses to build them. The nearer a target is to a root of the dependency graph, the less likely that the approach described will affect whether make performs a rebuild, especially if you're running a parallel make.
It's possible that you would see some time savings, but you must also consider the possibility that you end up with an inconsistent build.
OR do I have to halt the job on the slow box and re-invoke make to "get credit" in the make recipe for the new work that I've synced over onto the slow box?
If the possibility of an inconsistent build can be discounted, then that is probably a viable option. A new make run will take the then-existing files into account. Depending on the defined rules and the applicable timestamps, it is still possible that some targets would be rebuilt that did not really need to be, but unless the makefile engages in unusual shennanigans, chances are good that at least most of the built files imported from the helper machine will be accepted and used without rebuilding.

Make uses multiple cores even without -j argument?

I've noticed on my MacBook Pro (Quad-core) that when I run make, it takes the same amount of time as make -j, and sure enough, Activity Monitor shows all four cores getting high usage. Why is this? Is there some default setting that Apple has? I mean, it would make sense for -j to be the default, but from what I've seen on the web make with no arguments should only be using one thread.
This isn't necessarily a problem, but I'd like to understand the cause nonetheless.
The -j|--jobs flag specifies/limits the number of commands that can be run simultaneously, not the number of threads to allocate to a single command. Think of this option as concurrency instead of parallelism.
For example, I can specify --jobs=2 and have both an ES6 transpiler and a SASS preprocessor running in the background, in the same terminal window, watching for any file changes I may make.

Get current job number in makefile

Is there a way to get current job number to use in makefile rule?
Let me give you a little context.
I am using a tool which runs on multiple files. Naturally I use parallel jobs to speed things up. The real issue here is that this tool spawns multiple threads and I would like them to run in single core - since that way it is faster. I did some tests and it runs faster that way.
I need job numer to set process affinity to cpu core.
There is no way to get a "job number" because make doesn't track a job number. Basically all the instances of make in a single build share a list of identical tokens. When make wants to start a job it obtains a token. When make is finished with a job it adds the token back to the list. If it tries to get a token and one is not available it will sleep until one becomes available. There's no distinguishing characteristic to the tokens so there's no way to have a "job number".
To learn more about how GNU make handles parallel builds, you can read http://make.mad-scientist.net/jobserver.html
I'm not quite sure how this helps you anyway. Make doesn't know anything about threads, it only starts processes. If a single process consists of multiple threads, make will still think of it as a single job.
EDIT:
Assuming that you are in a single, non-recursive invocation of make you can do it like this:
COUNT :=
%.foo :
$(eval COUNT += x)
#echo "$#: Number of rules run is $(words $(COUNT))"
all: a.foo b.foo c.foo d.foo

makefile with multiple jobs -j and subdirectories

Imagine you have a make file with subdirectories to be built using "make -c". Image you have 4 target directories. 3 out of 4 are done, 1 needs to run. You run the overall makefile with:
make -j 4
Is there a way to tell the makefile system to run the remaining target with make -c -j 4 instead of just 1 ? If two targets would be missing I would like make -c -j 2 for each one.
I'll expand on Beta's (correct) answer. All the individual make processes communicate with each other and guarantee that there will never be more than N jobs running across all the different make invocations, when you use -jN. At the same time, they always guarantee that (assuming there are at least N jobs that can possibly be run across all the make invocations), N jobs will always be running.
Suppose instead that you had 4 directories with "something to do", which somehow you could know a priori, and so instead of invoking one instance of make with -j4 and letting that make invoke the 4 submakes normally, you force each of the submakes to be invoked with -j1. Now suppose that the first directory had 10 targets out of date, the second had 5, the third had 20, and the fourth had 100 out of date targets. At first you have 4 jobs running in parallel. Then once the second directory's 5 targets are built, you only have 3 jobs running in parallel, then 2, then for the rest of the build of the fourth directory you'll have only one target being built at a time and no parallelism. That's much Less Good.
The way GNU make works, instead, all four instances of make are communicating. When the second directory is done, the jobs it was running are available to the other directories, etc. By the end of the build the fourth directory is building four jobs at a time in parallel. That's much More Good.
Maybe if you explained why you want to do this, it would be more helpful to us in constructing an answer.
Make handles that automatically. From the manual:
If you set [-j] to some numeric value ā€˜Nā€™ and your operating system
supports it... the parent make and all the sub-makes will communicate
to ensure that there are only ā€˜Nā€™ jobs running at the same time
between them all.

Does multiple runs make it parallel?

I have written a short python script to process my big fastq files in size from 5Gb to 35Gb. I am running the script in a Linux server that has many cores. The script is not written in parallel at all and taking about 10 minutes to finish for a single file in average.
If I run the same script on several files like
$ python my_script.py file1 &
$ python my_script.py file2 &
$ python my_script.py file3 &
using the & sign to push back the process.
do those scripts run in parallel and will I save some time?
It seems not to me, since I am using top command to check the processor usage and each ones usage drops as I added new runs or shouldn't it use somewhere close 100% ?
So if they are not running in parallel, is there a way to make the os run them in parallel ?
Thanks for answers
Commands executed this way do indeed run in parallel. The reason why they're not using up 100% of your CPU time might be because they're I/O bound, rather than CPU bound. The description of what the the script does ("big fastq files in size from 5Gb to 35Gb") suggests that this might just be the case.
If you look at the process list given by ps, though, you should see three python processes on there - unless one or more of them will have terminated by the time you run ps.
Time spent in waiting on I/O operations is accounted as a different kind of CPU usage, usually %wa. You are probably just looking at the %us (user CPU time).

Resources