multiple targets from one recipe and parallel execution - parallel-processing

I have a project which includes a code generator which generates several .c and .h files from one input file with just one invocation of the code generator. I have a rule which has the .c and .h files as multiple targets, the input file as the prerequisite, and the recipe is the invocation of the code generator. I then have further rules to compile and link the generated .c files.
This works fine with a -j factor of 1, but if I increase the j factor, I find I get multiple invocations of the code generator, up to the -j factor or the number of expected target files, whichever is smallest. This is bad because multiple invocations of the code generator can cause failures due to the generated code being written multiple times.
I'm not going to post my actual (large) code here, but I have been able to construct a small example which appears to demonstrate the same behavior.
The Makefile looks like this:
output.concat: output5 output4 output3 output2 output1
cat $^ > $#
output1 output2 output3 output4 output5: input
./frob input
clean:
rm -rf output*
Instead of a code generator, for this example I have written a simple shell script, frob which generates multiple output files from one input file:
#!/bin/bash
for i in {1..5}; do
{
echo "This is output${i}, generated from ${1}. input was:"
cat ${1}
} > output${i}
done
When I run this Makefile with non-unity -j factors, I get the following output:
$ make -j2
./frob input
./frob input
cat output5 output4 output3 output2 output1 > output.concat
$
We see ./frob here gets invoked twice, which is bad. Is there some way I can construct this rule such that the recipe only gets invoked once, even with a non-unity -j factor?
I have considered changing the rule so that just one of the expected output files is the target, then adding another rule with no recipe such that its targets are the remaining expected output files, and the prerequisite is the first expected output file. But I'm not sure this would work, because I don't know if I can guarantee the order in which the files are generated, and thus may end up with circular dependencies.

This is how make is defined to work. A rule like this:
foo bar baz : boz ; $(BUILDIT)
is exactly equivalent, to make, to writing these three rules:
foo : boz ; $(BUILDIT)
bar : boz ; $(BUILDIT)
baz : boz ; $(BUILDIT)
There is no way (in GNU make) to define an explicit rule with the characteristics you want; that is that one invocation of the recipe will build all three targets.
However, if your output files and your input file share a common base, you CAN write a pattern rule like this:
%.foo %.bar %.baz : %.boz ; $(BUILDIT)
Strangely, for implicit rules with multiple targets GNU make assumes that a single invocation of the recipe WILL build all the targets, and it will behave exactly as you want.

Correctly generate and update multiple targets a b с in parallel make -j from input files i1 i2:
all: a b c
.INTERMEDIATE: d
a: d
b: d
c: d
d: i1 i2
cat i1 i2 > a
cat i1 i2 > b
cat i1 i2 > c
If any of a,b,c are missing, the pseudo-target d is remade. The file d is never created; the single rule for d avoids several parallel invocations of the recipe.
.INTERMEDIATE ensures that missing file d doesn't trigger the d recipe.
Some other ways for multiple targets in the book "John
Graham-Cumming - GNU Make Book" p.92-96.

#MadScientist's answer is promising - I think I could possibly use that. In the meantime, I have been playing with this some more and come up with a different possible solution, as hinted at in the question. I can split the rule in two as follows:
INPUT_FILE = input
OUTPUT_FILES = output5 output4 output3 output2 output1
OUTPUT_FILE1 = $(firstword $(OUTPUT_FILES))
OUTPUT_FILES_REST = $(wordlist 2,$(words $(OUTPUT_FILES)),$(OUTPUT_FILES))
$(OUTPUT_FILE1): $(INPUT_FILE)
./frob $<
touch $(OUTPUT_FILES_REST)
$(OUTPUT_FILES_REST): $(OUTPUT_FILE1)
Giving only one output file as a target fixes the possible parallelism problem. Then we make this one output file as a prerequisite to the rest of the output files. Importantly in the frob recipe, we touch all the output files with the exception of the first so we are guaranteed that the first will have an older timestamp than all the rest.

As of make 4.3 (Jan 2020) make allows grouped targets. As per docs the following will update all targets only once if any of the targets is missing or outdated:
foo bar biz &: baz boz
echo $^ > foo
echo $^ > bar
echo $^ > biz

Answer by Ivan Zaentsev almost worked for me, with exception of the following issue. Only when running parallel make (-j2 or above), when a prerequisite of the generated file was changed, the generated file was regenerated successfully, however, the subsequent targets that depend on the generated file were not rebuilt.
The workaround I found was to provide a recipe for the generated files (the trivial copy command), besides the dependency on the intermediate target (d):
d: i1 i2
cat i1 i2 > a.gen
cat i1 i2 > b.gen
cat i1 i2 > c.gen
.INTERMEDIATE: d
a.gen : d
b.gen : d
c.gen : d
a: a.gen d
cp $< $#
b: b.gen d
cp $< $#
c: c.gen d
cp $< $#
e: a b c
some_command $# $^
The clue was this debug output from make when running without the workaround (where 'e' was not rebuilt with make -j2, despite a,b,c being rebuilt):
Finished prerequisites of target file `a'.
Prerequisite `d' of target `a' does not exist.
No recipe for `a' and no prerequisites actually changed.
No need to remake target `a'.

Here is the solution that seemed to work for me (credit to #Ivan Zaentsev for the main solution and to #alexei for pointing out the problem with it). It is similar to the original approach with one major change. Instead of generating temporary files (*.gen as suggested), it just touches the files that depend on the INTERMEDIATE file. :
default: f
.INTERMEDIATE: temp
a b c: temp
touch $#
temp: i1 i2
echo "BUILD: a b c"
cat i1 i2 > a
cat i1 i2 > b
cat i1 i2 > c
e: a b c
echo "BUILD: e"
touch $#
f: e
echo "BUILD: f"
touch $#

Related

How to write makefile so that it ignores "irrelevant" changes?

Say I have a Makefile like this
B: A
quick-custom-script < $< > $#
C: B
slow-custom-script < $< > $#
Also assume that it may well happen that changes in A would produce the same B. I would like to achive that in such a case the complex making of C is left out because that is certainly unnecessary work when it has unchanged input.
My idea was to put the output of quick-custom-script to a temporary file, diff that against the current B and overwrite B only if differences are found.
In this case, the C rule would still see the old B and do nothing. Unfortunately, this produces another problem that I see (and perhaps more?): On any subsequent run, even without any changes made, A will be newer than the non-overwritten B and hence (even if it is quick) the first script will run - unnecesarily.
I think this can somewhat be minimized as follows
Btemp: A
quick-custom-script < $< > $#
B: Btemp
diff -q $< $# || cp $< $#
C: B
slow-custom-script < $< > $#
Nevertheless I wonder if there is any smarter way to achieve my goal?
This is pretty close to the usual convention. It puts the comparison into the same recipe, like this:
B: A
quick-custom-script < $< > $#T
$(move-if-change)
With this definition:
move-if-change = #if cmp -s $# $#T ; then rm $#T ; else mv $#T $#; fi
This combination into one rule has the advantage that if quick-custom-script terminates abnormally and the makefile is run again, the recipe will start from scratch, and the partially written output file is discarded.
Using cmp is usually quicker than diff, and mv is atomic, so it avoids reintroducing the same potential corruption.

code management: generate source files with slight variations of various rules

I have a source file in a declarative language (twolc, actually) that I need to write many variations on: a normative version and many non-normative versions, each with one or more variations from the norm. For example, say the normative file has three rules:
Rule A:
Do something A-ish
Rule B:
Do something B-ish
Rule C:
Do something C-ish
Then one variation might have the exact same rules as the norm for A and C, but a different rule for B, which I will call B-1:
Rule A:
Do something A-ish
Rule B-1:
Do something B-ish, but with a flourish
Rule C:
Do something C-ish
Imagine that you have many different subtle variations on many different rules, and you have my situation. The problem I am worried about is code maintainability. If, later on, I decide that Rule A needs to be refactored somehow, then I will have 50+ files that need to have the exact same rule edited by hand.
My idea is to have separate files for each rule and concatenate them into variations using cat: cat A.twolc B.twolc C.twolc > norm.twolc, cat A.twolc B-1.twolc C.twolc > not-norm.twolc, etc.
Are there any tools designed to manage this kind of problem? Is there a better approach than the one I have in mind? Does my proposed solution have weaknesses I should watch out for?
As you added the makefile tag, here is a GNU-make-based (and Gnu make only) solution:
# Edit this
RULES := A B B-1 C
VARIATIONS := norm not-norm
norm-rules := A B C
not-norm-rules := A B-1 C
# Do not edit below this line
VARIATIONSTWOLC := $(patsubst %,%.twolc,$(VARIATIONS))
all: $(VARIATIONSTWOLC)
define GEN_rules
$(1).twolc: $$(patsubst %,%.twolc,$$($(1)-rules))
cat $$^ > $$#
endef
$(foreach v,$(VARIATIONS),$(eval $(call GEN_rules,$(v))))
clean:
rm -f $(VARIATIONSTWOLC)
patsubst is straightforward. The foreach-eval-call is a bit more tricky. Long story short: it loops over all variations (foreach). For each variation v, it expands (call) GEN_rules by replacing $(1) by $(v) (the current variation) and $$ by $. Each expansion result is then instantiated (eval) as a normal make rule. Example: for v=norm, the GEN_rules expansion produces:
norm.twolc: $(patsubst %,%.twolc,$(norm-rules))
cat $^ > $#
which is in turn expanded as (step-by-step):
step1:
norm.twolc: $(patsubst %,%.twolc,A B C)
cat $^ > $#
step2:
norm.twolc: A.twolc B.twolc C.twolc
cat $^ > $#
step3:
norm.twolc: A.twolc B.twolc C.twolc
cat A.twolc B.twolc C.twolc > norm.twolc
which does what you want: if norm.twolc does not exist or if any of A.twolc, B.twolc, C.twolc is more recent than norm.twolc, the recipe is executed.

Switching directories in makefile

I have written my project in some directory d . I have a huge set of testcases . So I would like to put them in an internal directory din of d . Also I would like to store the output files in dout . How to achieve this ?
For now a part of my makefile looks like :
test:
./a.out < input1.c > output1.txt
./a.out < input2.c > output2.txt
.
.
.
.
./a.out < inputn.c > outputn.txt
(n is known)
make test now executes the program .
edit:
< > are corrected in the question
Create a subdirectory din. Put your Makefile to run the tests in there.
In the main directory's Makefile, enter the rule
test:
$(MAKE) -C din test
I would also rewrite your existing Makefile.
First of all, you don't want to repeat the same thing for every file. You can use a pattern rule here. So the rule becomes
output%.txt: input%.c
a.out < $< > $#
Now how to make the test target depend on every output file?
It would be great if there was a construct to list every file that can be made from an existing file using the above rule, but there isn't. However, we can take the list of existing files ending in .c and convert it to the corresponding output files with GNU Make's wildcard and patsubst functions, although it is ugly:
test: $(patsubst output%,input%,$(patsubst %.c,%.txt,$(wildcard *.c)))
Now, you have only two rules and one command.

Makefile, how to have prerequisites given by function with target as argument

I have a list of files contained in another file. I want to use this list as prerequisite for some target and for doing so I use a function that reads the list from file.
The problem is that I have different lists for different targets so I need to pass the target as argument to the function that reads the list. Something like that (that does not work):
getlist = $(shell cat $1)
tmp%: $(call getlist, %)
#cat file1 > $#
#cat file2 >> $#
file%:
#touch $#
#echo "$#" > $#
clean:
#rm file1 file2 tmp
where the list for building the tmp1 file is in the 1 file, the one for building the tmp2 file is in the 2 file and so on and so forth.
If I have instead tmp1: $(call figlist, 1) all works, but I need something capable of treating different file names.
If needed for the solution I can also change the way I named the files.

Makefile with variable number of targets

I am attempting to do a data pipeline with a Makefile. I have a big file that I want to split in smaller pieces to process in parallel. The number of subsets and the size of each subset is not known beforehand. For example, this is my file
$ for i in {1..100}; do echo $i >> a.txt; done
The first step in Makefile should compute the ranges,... lets make them fixed for now
ranges.txt: a.txt
or i in 0 25 50 75; do echo $$(($$i+1))'\t'$$(($$i+25)) >> $#; done
Next step should read from ranges.txt, and create a target file for each range in ranges.txt, a_1.txt, a_2.txt, a_3.txt, a_4.txt. Where a_1.txt contains lines 1 through 25, a_2.txt lines 26-50, and so on... Can this be done?
You don't say what version of make you're using, but I'll assume GNU make. There are a few ways of doing things like this; I wrote a set of blog posts about metaprogramming in GNU make (by which I mean having make generate its own rules automatically).
If it were me I'd probably use the constructed include files method for this. So, I would have your rule above for ranges.txt instead create a makefile, perhaps ranges.mk. The makefile would contain a set of targets such as a_1.txt, a_2.txt, etc. and would define target-specific variables defining the start and stop values. Then you can -include the generated ranges.mk and make will rebuild it. One thing you haven't described is when you want to recompute the ranges: does this really depend on the contents of a.txt?
Anyway, something like:
.PHONY: all
all:
ranges.mk: a.txt # really? why?
for i in 0 25 50 75; do \
echo 'a_$$i.txt : RANGE_START := $$(($$i+1))'; \
echo 'a_$$i.txt : RANGE_END := $$(($$i+25))'; \
echo 'TARGETS += a_$$i.txt'; \
done > $#
-include ranges.mk
all: $(TARGETS)
$(TARGETS) : a.txt # seems more likely
process --out $# --in $< --start $(RANGE_START) --end $(RANGE_END)
(or whatever command; you don't give any example).

Resources