Get current job number in makefile - parallel-processing

Is there a way to get current job number to use in makefile rule?
Let me give you a little context.
I am using a tool which runs on multiple files. Naturally I use parallel jobs to speed things up. The real issue here is that this tool spawns multiple threads and I would like them to run in single core - since that way it is faster. I did some tests and it runs faster that way.
I need job numer to set process affinity to cpu core.

There is no way to get a "job number" because make doesn't track a job number. Basically all the instances of make in a single build share a list of identical tokens. When make wants to start a job it obtains a token. When make is finished with a job it adds the token back to the list. If it tries to get a token and one is not available it will sleep until one becomes available. There's no distinguishing characteristic to the tokens so there's no way to have a "job number".
To learn more about how GNU make handles parallel builds, you can read http://make.mad-scientist.net/jobserver.html
I'm not quite sure how this helps you anyway. Make doesn't know anything about threads, it only starts processes. If a single process consists of multiple threads, make will still think of it as a single job.
EDIT:
Assuming that you are in a single, non-recursive invocation of make you can do it like this:
COUNT :=
%.foo :
$(eval COUNT += x)
#echo "$#: Number of rules run is $(words $(COUNT))"
all: a.foo b.foo c.foo d.foo

Related

GNU Make - how to add timestamp output (with minimal makefile modification)

I want to get a better idea of my build job metrics but unfortunately, make doesn't output timestamps per se.
If I run make --print-data-base, for a given target it outputs a line
# Last modified 2016-08-15 13:53:16
but that doesn't give me the duration.
QUESTION
Is there a way to get duration of building a target without modifying each target? Some targets are inside makefiles which are generated DURING the build so not feasible to modify their recipes.
POSSIBLE SOLUTION
I could implement a pre- and post-recipe for every target and output a timestamp that way.
Is that a good idea given this is parallel make? Obviously there would be increased build time from calling a pre- and post-recipe for every target but I'd be fine with that.
If this is a parallel make, then the "preactions", "actions" and "postactions" may be interleaved. That is, you might get output like:
Pre-action 12:03:05
Pre-action 12:03:06
building foo...
building bar...
Post-action 12:04:17
Post-action 12:04:51
So it would behoove you to pass a TARGETNAME variable to the pre-action and post-action scripts.
Also, start and end times are not all there is to know about how long an action takes, when you are running things in parallel; rule A might take longer that rule B, simply because rule B is running alone while rule A is sharing the processor with rules C through J.
Other than that, I see no problem with this approach.

Can GNU makefiles rules have processes as requirements, if so how?

At some step of my software building automatization, which I attempt to implement using GNU make Makefiles, I run into the case of not only having targets a requirement being source files, but as a sort of different type of requirement I would like the target to depend on another software is started and hence exist as an operation system process.
Such a program could be background process but also a foreground process such as a Webbrowser which running a HTML5 application, which might play a role in a building process by for instance interacting with files it is fed through the building process.
I would hence like to write a rule somewhat like this:
.PHONY: firefoxprocess
Html5DataResultFile: HTML5DataSourceFile firefoxprocess
cp HTML5DataSourceFile folder/checked/by/html5app/
waitforHtml5DataResultFile
firefoxprocess:
/usr/bin/firefox file://url/to/html5app &
As seen I have taken the idea that .PHONY targets are somewhat non-file targets and hence would allow for requirering a process to be started?
Yet I a unsure if that is right. The documentation of GNU make is excellent and quite large and I am unsure understood it completely. To the best of my knowledge the documentation did not really report on the use of processes being used in rules, which motivates the question here.
My feeling has been that pidfiles are somewhat a link between processes and files, but they come with several problems (i.e. race conditions, uniqueness etc)
Sometimes a Makefile dependency tree includes elements that aren't naturally or necessarily time-dependent files. There are two answers:
create a file to represent the step, or
just do the work "in line" as part of the step.
The second option is usually easiest. For instance, if a target file is to be created in a directory that might not exist yet, you don't want to make the directory name itself a dependency, because that would cause the file to be out of date whenever the directory changed. Instead, I do:
d/foo:
#test -d d || mkdir -p d
...
In your case, you could something similar; you just need a way to test for a running instance of firefox, and to be able to start it. Something like this might do:
Html5DataResultFile: HTML5DataSourceFile
pgrep firefox || { /usr/bin/firefox && sleep 5; }
cp HTML5DataSourceFile folder/checked/by/html5app/
waitforHtml5DataResultFile
The sleep call just lets FF initialize, because it might not be ready to do anything the instant it returns.
The problem with option #1 in your case is that it's undependable and a little circular. Firefox won't reliably remove the pidfile if the process dies. If it does successfully remove the file when it exits, and re-creates it when it restarts, you have a new problem: the timestamp on the file spuriously defines any dependencies as out of date, when in fact the restarted process hasn't invalidated them.

makefile with multiple jobs -j and subdirectories

Imagine you have a make file with subdirectories to be built using "make -c". Image you have 4 target directories. 3 out of 4 are done, 1 needs to run. You run the overall makefile with:
make -j 4
Is there a way to tell the makefile system to run the remaining target with make -c -j 4 instead of just 1 ? If two targets would be missing I would like make -c -j 2 for each one.
I'll expand on Beta's (correct) answer. All the individual make processes communicate with each other and guarantee that there will never be more than N jobs running across all the different make invocations, when you use -jN. At the same time, they always guarantee that (assuming there are at least N jobs that can possibly be run across all the make invocations), N jobs will always be running.
Suppose instead that you had 4 directories with "something to do", which somehow you could know a priori, and so instead of invoking one instance of make with -j4 and letting that make invoke the 4 submakes normally, you force each of the submakes to be invoked with -j1. Now suppose that the first directory had 10 targets out of date, the second had 5, the third had 20, and the fourth had 100 out of date targets. At first you have 4 jobs running in parallel. Then once the second directory's 5 targets are built, you only have 3 jobs running in parallel, then 2, then for the rest of the build of the fourth directory you'll have only one target being built at a time and no parallelism. That's much Less Good.
The way GNU make works, instead, all four instances of make are communicating. When the second directory is done, the jobs it was running are available to the other directories, etc. By the end of the build the fourth directory is building four jobs at a time in parallel. That's much More Good.
Maybe if you explained why you want to do this, it would be more helpful to us in constructing an answer.
Make handles that automatically. From the manual:
If you set [-j] to some numeric value ā€˜Nā€™ and your operating system
supports it... the parent make and all the sub-makes will communicate
to ensure that there are only ā€˜Nā€™ jobs running at the same time
between them all.

How can I tell what -j option was provided to make

In Racket's build system, we have a build step that invokes a program that can run several parallel tasks at once. Since this is invoked from make, it would be nice to respect the -j option that make was originally invoked with.
However, as far as I can tell, there's no way to get the value of the -j option from inside the Makefile, or even as an environment variable in the programs that make invokes.
Is there a way to get this value, or the command line that make was invoked with, or something similar that would have the relevant information? It would be ok to have this only work in GNU make.
In make 4.2.1 finally they got MAKEFLAGS right. That is, you can have in your Makefile a target
opts:
#echo $(MAKEFLAGS)
and making it will tell you the value of -j parameter right.
$ make -j10 opts
-j10 --jobserver-auth=3,4
(In make 4.1 it is still broken). Needless to say, instead of echo you can invoke a script doing proper parsing of MAKEFLAGS
Note: this answer concerns make version 3.82 and earlier. For a better answer as of version 4.2, see the answer by Dima Pasechnik.
You can not tell what -j option was provided to make. Information about the number of jobs is not accessible in the regular way from make or its sub-processes, according to the following quote:
The top make and all its sub-make processes use a pipe to communicate with
each other to ensure that no more than N jobs are started across all makes.
(taken from the file called NEWS in the make 3.82 source code tree)
The top make process acts as a job server, handing out tokens to the sub-make processes via the pipe. It seems to be your goal to do your own parallel processing and still honor the indicated maximum number of simultaneous jobs as provided to make. In order to achieve that, you would somehow have to insert yourself into the communication via that pipe. However, this is an unnamed pipe and as far as I can see, there is no way for your own process to join the job-server mechanism.
By the way, the "preprocessed version of the flags" that you mention contain the expression --jobserver-fds=3,4 which is used to communicate information about the endpoints of the pipe between the make processes. This exposes a little bit of what is going on under the hood...

Trouble with parallel make not always starting another job when one finishes

I'm working on a system with four logical CPS (two dual-core CPUs if it matters). I'm using make to parallelize twelve trivially parallelizable tasks and doing it from cron.
The invocation looks like:
make -k -j 4 -l 3.99 -C [dir] [12 targets]
The trouble I'm running into is that sometimes one job will finish but the next one won't startup even though it shouldn't be stopped by the load average limiter. Each target takes about four hours to complete and I'm wondering if this might be part of the problem.
Edit: Sometimes a target does fail but I use the -k option to have the rest of the make still run. I haven't noticed any correlation with jobs failing and the next job not starting.
I'd drop the '-l'
If all you plan to run the the system is this build I think the -j 4 does what you want.
Based on my memory, if you have anything else running (crond?), that can push the load average over 4.
GNU make ref
Does make think one of the targets is failing? If so, it will stop the make after the running jobs finish. You can use -k to tell it to continue even if an error occurs.
#BCS
I'm 99.9% sure that the -l isn't causeing the problem because I can watch the load average on the machine and it drops down to about three and sometimes as low as one (!) without starting the next job.

Resources