I am attempting to do a data pipeline with a Makefile. I have a big file that I want to split in smaller pieces to process in parallel. The number of subsets and the size of each subset is not known beforehand. For example, this is my file
$ for i in {1..100}; do echo $i >> a.txt; done
The first step in Makefile should compute the ranges,... lets make them fixed for now
ranges.txt: a.txt
or i in 0 25 50 75; do echo $$(($$i+1))'\t'$$(($$i+25)) >> $#; done
Next step should read from ranges.txt, and create a target file for each range in ranges.txt, a_1.txt, a_2.txt, a_3.txt, a_4.txt. Where a_1.txt contains lines 1 through 25, a_2.txt lines 26-50, and so on... Can this be done?
You don't say what version of make you're using, but I'll assume GNU make. There are a few ways of doing things like this; I wrote a set of blog posts about metaprogramming in GNU make (by which I mean having make generate its own rules automatically).
If it were me I'd probably use the constructed include files method for this. So, I would have your rule above for ranges.txt instead create a makefile, perhaps ranges.mk. The makefile would contain a set of targets such as a_1.txt, a_2.txt, etc. and would define target-specific variables defining the start and stop values. Then you can -include the generated ranges.mk and make will rebuild it. One thing you haven't described is when you want to recompute the ranges: does this really depend on the contents of a.txt?
Anyway, something like:
.PHONY: all
all:
ranges.mk: a.txt # really? why?
for i in 0 25 50 75; do \
echo 'a_$$i.txt : RANGE_START := $$(($$i+1))'; \
echo 'a_$$i.txt : RANGE_END := $$(($$i+25))'; \
echo 'TARGETS += a_$$i.txt'; \
done > $#
-include ranges.mk
all: $(TARGETS)
$(TARGETS) : a.txt # seems more likely
process --out $# --in $< --start $(RANGE_START) --end $(RANGE_END)
(or whatever command; you don't give any example).
Related
I have two lists of files as prerequisites
input_i.xx
config_j.yy
and I need to run all of their combinations. A single one looks like this:
input1_config3.output: input1.xx config3.yy
run_script $^
Also in reality, their names are not numbered, but I already have their stems defined in INPUTS and CONFIGS. With that, I can generate all the targets together
TARGETS:=$(foreach input,$(INPUTS),$(foreach config,$(CONFIGS),$(input)_$(config).output))
But I have difficulty with the prerequisites. It seems I need to
get basename
split on _
add the extensions .xx and .yy
.SECONDEXPANSION
$(TARGETS): $(basename $#)
run_script $^
Can someone show me how to do that? Not sure if this the proper way, maybe a bottom-up way is easier?
make is not really suitable for keeping track of an M x N matrix of results. The fundamental problem is that you can't have two stems in a rule, so you can't say something like
# BROKEN
input%{X}_config%{Y}.output: input%{X}.xx config%{Y}.yy
As a rough approximation, you could use a recursive make rule to set a couple of parameters, and take it from there, but this is rather clumsy.
.PHONY: all
all:
$(MAKE) -$(MAKEFLAGS) X=1 Y=6 input1_config6.output
$(MAKE) -$(MAKEFLAGS) X=1 Y=7 input1_config7.output
$(MAKE) -$(MAKEFLAGS) X=2 Y=6 input2_config6.output
:
input$X_config$Y.output: input$X.xx config$Y.yy
run_script $^
It would be a lot easier if you provided a complete sample example with a complete set of targets and prerequisites and exactly what you wanted to happen.
Using .SECONDEXPANSION might work, but you're not using it correctly; please re-read the documentation. The critical aspect of .SECONDEXPANSION is that you have to escape the variables that you want to avoid expanding until the second pass. In your example you've not escaped anything, so .SECONDEXPANSION isn't actually doing anything at all here. However, as #tripleee points out it's not easy to use multiple variable values in a single target.
To do this more easily you'll probably want to use eval. Something like this:
define DECLARE
$1_$2.output: $1.xx $2.yy
TARGETS += $1_$2.output
endef
TARGETS :=
$(foreach input,$(INPUTS),$(foreach config,$(CONFIGS),$(eval $(call DECLARE,$(input),$(config)))))
$(TARGETS):
run_script $^
I have another solution using include and bash for loop.
include trees.mk
trees.mk:
#for input in $(INPUTS); do \
for config in $(CONFIGS); do \
echo $${input}_$$config.output : $${input}.xx $$config.yy; \
echo -e '\t run_scipt $$^ ';\
done \
done > $#
At the beginning, trees.mk doesn't exist. The double for loops write out the rule to the target using file redirection >$#.
I got this idea from Managing Projects with GNU Make, Third Edition By Robert Mecklenburg, on
page 56
This is on GNU Make 3.82, RHEL 7. Make appears to be running sequentially even though I passed in --jobs.
I'm doing about 700K trivial jobs - concatenating large gzip files onto other gzip files. If there is only one file to concatenate, then I create a symbolic link instead. Here is the command:
# Pattern to rebuild gzip file - concatenate if needed, otherwise just link
$(THISDIR)/%.tgz:
mkdir -p $$(dirname $#) && \
if [ $$(echo '$^' | wc -w) -gt 1 ]; then cat $^ > $#; else ln -s $^ $#; fi
I already separated by && to avoid another shell invocation, made no difference.
About 600K of the 700K jobs are just creating symbolic links. For the remainder, the average number of files to concatenate is four.
Why is this so slow? I'm getting 5-8 TPS. More importantly, even though I specified (on a machine with 64 CPUS):
make --jobs --max-load=48
I see very few processes on top. So it appears that Make is not running parallel jobs at all. Is there a minimal job length for parallelism to work efficiently on GNU Make?
The load average from top right now is
top - 22:50:32 up 3 days, 13:13, 32 users, load average: 7.96, 7.44, 5.73
A few further details that might be helpful:
Make itself is running at close to 100% CPU.
There is no dependency between any of the files other than, of course, target and dependencies on the same rule. In other words, there are no files that appear both in $# and $^.
Files are being created and read from NFS mounts
I've generated the 700K dependencies as rules that get read into the Makefile with an include. That process itself takes 25 minutes or so.
Possible to improve performance, especially when large number of files are being rebuilt by using (gnu) make functions to replace shell commands. This will reduce the umber of 'fork' and 'exec' required to complete the tasks:
%.tgz:
mkdir -p $(<D) && \
$(if $(findstring $(words $^),1),ln -s $^ $#, cat $^ > $#)
For the mkdir command, using the $(<D) will eliminate the call to dirname
For the cat/ln command, using $(findstring ...) and words will replace the echo ... | wc pipe, and the $if(...) will replace the shell if statement.
Overall, only 2 commands (mkdir, cat/ln), instead of 5 commands (mkdir, dirname, echo, wc, cat/ln) per target. Performance is about 2X
Make was spending a large portion of the prep time trying to match each of the targets to all the built-in rules for things like C files. Adding
.SUFFIXES:
MAKEFLAGS += --no-builtin-rules
made a huge difference. It still spends a few minutes after reading all the patterns in, but the benefits now outweigh that cost.
I am checking for existence of flag that is passed by user to GNUmakefile.
Basically, i am checking whether user has passed -j in my makefile. I have added below if condition. But before that i am trying to display MAKEFLAGS where i can see output is empty for that variable.
ifneq (,$(findstring j,$(MAKEFLAGS)))
....
Am i missing anything here?
Sometimes users may also pass --jobs instead of -j , And also i need to check whether the value passed to -j/--jobs is greater than 2
Is there any easy way in GNUmake for doing so in single if condition ?
The answer to your question depends on what version of GNU make you're using.
If you're using GNU make 4.1 or below, then the answer is "no, it's not possible" from within a makefile (of course you can always write a shell script wrapper around make and check the arguments before invoking make).
If you're using GNU make 4.2 or above, then the answer is "yes, it's quite possible". See this entry from the GNU make NEWS file:
Version 4.2 (22 May 2016)
The amount of parallelism can be determined by querying MAKEFLAGS, even when
the job server is enabled (previously MAKEFLAGS would always contain only
"-j", with no number, when job server was enabled).
This is a tricky question because MAKEFLAGS is a very strange make variable. First of all, with GNU make 4.3, -jN, -j N, --jobs N and --jobs=N are all converted to -jN in MAKEFLAGS, which looks interesting. You could thus try something like:
J := $(patsubst -j%,%,$(filter -j%,$(MAKEFLAGS)))
to get the N value passed on the command line or the empty string if -j and --jobs have not been used. But then, if you try the following you will see that it is not the whole story:
$ cat Makefile
.PHONY: all
J := $(patsubst -j%,%,$(filter -j%,$(MAKEFLAGS)))
ifneq ($(J),4)
all:
#echo MAKEFLAGS=$(MAKEFLAGS)
#echo patsubst...=$(patsubst -j%,%,$(filter -j%,$(MAKEFLAGS)))
#echo J=$(J)
else
all:
#echo J=4
endif
$ make -j4
MAKEFLAGS= -j4 -l8 --jobserver-auth=3,4
patsubst...=4
J=
Apparently MAKEFLAGS is not set when the Makefile is parsed (and the J make variable is assigned the empty string) but it is when the recipes are executed. So, using MAKEFLAGS with conditionals does not work. But if you can move your tests in a recipe, something like the following could work:
.PHONY: all
all:
j=$(patsubst -j%,%,$(filter -j%,$(MAKEFLAGS))); \
if [ -n "$$j" ] && [ $$j -gt 2 ]; then \
<do something>; \
else \
<do something else>; \
fi
Or:
.PHONY: all
J = $(patsubst -j%,%,$(filter -j%,$(MAKEFLAGS)))
all:
#if [ -n "$(J)" ] && [ $(J) -gt 2 ]; then \
<do something>; \
else \
<do something else>; \
fi
Note the use of the recursively expanded variable assignment (J = ...) instead of simple assignment (J := ...).
I am processing some files and want to at one point create two categories depending on the filename so I can compare the two. Is this possible in a makefile?
%.output1: %.input
ifneq (,$(findstring filename1,$(echo $<)))
mv $<.output1 $#
endif
%.output2: %.input
ifneq (,$(findstring filename2,$(echo $<)))
mv $<.output2 $#
endif
%.output_final: %.output1 %.output2
do_something
I think there is two things wrong with this code:
There is a mistake in the ifneq line.
%.output1 %.output2 will always use the same filename - it may not be possible to do this in 'make' and this may require ruffus.
You have tab-indented the ifneq line so make doesn't consider it a make directive and is considering it a shell command and attempting to pass it to the shell to execute (hence the shell error you removed in your recent edit).
Use spaces (or no indentation) on that line to have make process it correctly. That being said having done that you cannot use $< in the comparison as it will not be set at that point.
$(echo) is also not a make function. You have mixed/confused processing times. You cannot combine make and shell operations that way. (Not that you need echo there to begin with.)
If you want the comparison to happen at shell time then do not use make constructs and instead use shell constructs:
%.output1: %.input
if [ filename1 = '$<' ]; then
mv $<.output1 $#
fi
Though that is also incorrect as $< is %.input and $# is %.output1 for whatever stem matched the %. That rule should probably look more like this (though I'm having trouble understanding what you are even trying to have this rule do so I may have gotten this wrong).
%.output1: %.input
# If the stem that matched the '%' is equal to 'filename1'
if [ filename1 = '$*' ]; then
# Then copy the prerequisite/input file to the output file name.
cp $< $#
fi
I'm not sure I understand your second question point. The % in a single rule will always match the same thing but between rules it can differ.
This %.output_final: %.output1 %.output2 target will map the target file foo.output_final to the prerequisite files foo.output1 and foo.output2. But will also map any other *.output_final file to appropriately matching prerequisite files.
I am trying to write a makefile that does something like the following:
%-foo-(k).out : %-foo-(k-1).out
# do something, e.g.
cat $< $#
i.e. there are files with arbitrary stems, then -foo-, then an integer, followed by .out. Each file depends on the one with the same name, with integer one smaller.
For instance, if the file blah/bleh-foo-1.out exists, then
make blah/bleh-foo-2.out
would work.
I could do this with multiple stems if there were such a thing... what's another way to do this sort of thing in (gnu) make?
There is no easy way to do something like this. You basically have two options: you can use auto-generated makefiles, or you can use $(eval ...). To me auto-generated makefiles are easier, so here's a solution:
SOURCELIST = blah/bleh-foo-1.out
all:
-include generated.mk
generated.mk: Makefile
for f in $(SOURCELIST); do \
n=`echo "$$f" | sed -n 's/.*-\([0-9]*\)\.out$/\1/p'`; \
echo "$${f%-foo-[0-9]*.out}-foo-`expr $$n + 1`.out: $$f ; cat $$< > $$#"; \
done > $#