How to provide parameters to a submake when running in parallel? - parallel-processing

I am trying to use make to handle some data processing.
Consider the following simple rule in a makefile makefile-month
output_$(YEAR)_$(MONTH): input_$(YEAR)_$(MONTH)
foo input_$(YEAR)_$(MONTH) output_$(YEAR)_$(MONTH)
This rule can be used to process any required month using, e.g.
make -f makefile-month YEAR=2006 MONTH=2
And this works fine.
What I am really interested now is to use make to process several months in parallel.
However, I cannot find a simple way of achieving this.
Notice that using a shell for loop does not work with parallel make.
Defining a global makefile,
all:
for year in 2006; do \
for month in 1 2 3 4 5 6 7 8 9 10 11 12; do \
$(MAKE) -f makefile-month YEAR=$$year MONTH=$$month; \
done; \
done
and running,
make -j 12
does not execute each month in parallel.
Each call to the sub-make is executed in serial.
Any ideas?

There are lots of different ways to handle the details, but the overall solution is to move away from for loops in a single recipe and switch to individual targets. So for example:
YEARS := 2006 2007
MONTHS := 1 2 3 4 5 6 7 8 9 10 11 12
TARGETS := $(foreach Y,$(YEARS),$(foreach M,$(MONTHS),month.$Y.$M))
.PHONY: all $(TARGETS)
all: $(TARGETS)
$(TARGETS):
$(MAKE) -f makefile-month YEAR=$(word 2,$(subst ., ,$#)) MONTH=$(word 3,$(subst ., ,$#))
(note I didn't test this but hopefully you get the idea).

Related

How do I start an unknown (but limited) number of lambda functions, each with an incremental port number, in a Makefile?

In effect, I want the opposite of this Makefile recipe:
.PHONY: stop-lambdas # Stop the lambdas.
stop-lambdas:
#$(MAKE) --no-print-directory --makefile LastBuildMakefile report-build
#echo ''
#$(MAKE) --no-print-directory lambda-stopper
#echo ''
lambda-stopper: $(patsubst %,-stop-%,$(LAMBDAS))
$(patsubst %,-stop-%,$(LAMBDAS)):
#$(MAKE) --no-print-directory validate-lambda lambda="$(patsubst -stop-%,%,$#)"
#echo "Start the lambda function in lambdas/$(patsubst -stop-%,%,$#)."
#docker stop "lambda_$(patsubst -stop-%,%,$#)">/dev/null
#echo ''
So far I've got:
.PHONY: start-lambdas # Get the lambdas up and running to allow you to make calls to them.
start-lambdas:
#$(MAKE) --no-print-directory --makefile LastBuildMakefile report-build
#echo ''
#$(MAKE) --no-print-directory lambda-starter
#echo ''
lambda-starter: $(patsubst %,-start-%,$(LAMBDAS))
$(patsubst %,-start-%,$(LAMBDAS)):
#$(MAKE) --no-print-directory validate-lambda lambda="$(patsubst -start-%,%,$#)"
#echo "Start the lambda function in lambdas/$(patsubst -start-%,%,$#)."
#docker run --detach --name "lambda_$(patsubst -start-%,%,$#)" --publish XXXXXXX:8080 --rm "${CONTAINER_GROUP}/lambda_$(patsubst -start-%,%,$#):${LAST_CONTAINER_TAG_BRANCH}_${LAST_CONTAINER_TAG_VERSION}" > /dev/null
#echo ''
The one bit I can't work out is how to get XXXXXXX defined.
I have 2 ideas for this.
Simple (I hope) increment a var starting at 49900 for each lambda function (there's about 10 at the moment, and more could be added to this project, but not enough to exhaust the port numbers).
Define the list of port numbers (manually defined - 1 for each lambda function) and then programmatically identify which var to use for each lambda_$(patsubst -start-%,%,$#)
I can hard-coded setup in the Makefile of:
QUEUE_PORT_NUMBER=49901
QUEUE_ADMIN_PORT_NUMBER=49902
QUEUE_API_PORT_NUMBER=49903
...
In playing around:
LAMBDA_PORT_NUMBER_VAR_NAME=$(patsubst -start-%,%,$#)_PORT_NUMBER; echo $${LAMBDA_PORT_NUMBER_VAR_NAME^^};
outputs:
QUEUE_PORT_NUMBER
QUEUE_ADMIN_PORT_NUMBER
QUEUE_API_PORT_NUMBER
...
So I can programmatically define the variable that has the port number.
What I can't work out is how to take a variable with the name of a variable and then get the value from that into my recipe.
Some additional thoughts:
The port number is irrelevant from run to run. I'll be displaying the assigned port. I would expect them to be consistent between multiple make start-lambdas.
This is only for local devs to get the lambdas execution locally and to allow them to run tests ... all local.
The docker attach and detach already is used when we're running unit tests on each lambda in turn. This currently uses port 49990 every time. We fire a load of curl requests to the running (but detached) lambda, validate the results, and then kill the lambda, and move on to the next one. So that bit all works.
We just want to have all the lambdas up and running simultaneously.
Not sure I understood all details. Let's suppose you want to define a number, starting at 49901, for each of your -start-lambda, and that you want this number to be the same each time you invoke make. We can use the join GNU make function to assemble a list of -start-lambda-number tokens from your list of -start-lambda and a computed list of numbers. We store the list of tokens in make variable TOKEN.
In the following we also factorize several statements of your current Makefile with make variables (STARTER) and shell variables in the recipe (lambda). For the latter there are 2 important aspects to remember:
make expands the recipe before passing it to the shell. So, when using the value of a shell variable we must write $$lambda (or $$number) instead of $lambda. After the make expansion it will become $lambda, what we want to pass to the shell, and no just ambda.
Each line of a recipe is executed by a different shell. In order to use a shell variable in several recipe lines we must join them together with ; (or &&, as you wish) such that they become one single line, executed by one single shell. But for better readability we can use the line continuation (with a trailing \).
The number for a given -start-lambda target is extracted inside the recipe and stored in a second shell variable (number) with $(patsubst $#-%,%,$(filter $#-%,$(TOKEN))), that is, find the corresponding token with function filter and extract the number with patsubst.
LAMBDAS := a b c d
STARTER := $(patsubst %,-start-%,$(LAMBDAS))
NUMBER := $(shell seq 49901 `expr 49900 + $(words $(STARTER))`)
TOKEN := $(join $(addsuffix -,$(STARTER)),$(NUMBER))
.PHONY: start-lambdas # Get the lambdas up and running to allow you to make calls to them.
start-lambdas:
#$(MAKE) --no-print-directory --makefile LastBuildMakefile report-build
#echo ''
#$(MAKE) --no-print-directory lambda-starter
#echo ''
lambda-starter: $(STARTER)
$(STARTER):
#lambda="$(patsubst -start-%,%,$#)"; \
number="$(patsubst $#-%,%,$(filter $#-%,$(TOKEN)))"; \
$(MAKE) --no-print-directory validate-lambda lambda="$$lambda"; \
echo "Start the lambda function in lambdas/$$lambda."; \
docker run --detach --name "lambda_$$lambda" --publish "$$number:8080" --rm "${CONTAINER_GROUP}/lambda_$$lambda:${LAST_CONTAINER_TAG_BRANCH}_${LAST_CONTAINER_TAG_VERSION}" > /dev/null
#echo ''
Remember that the $$ and the line continuations (the trailing ; \) are essential to guarantee the proper expansion of the shell variables lambda and number, and their availability in all lines of the recipe.
Note: this works only if $(filter $#-%,$(TOKEN))) returns only one token. If you have two lambdas named foo and foo-bar the corresponding tokens would be, for instance, -start-foo-49901 and -start-foo-bar-49907 and for target -start-foo both would match $#-%. So, if you have such lambda names, change the separator between the -start-lambda part and the number. Use, e.g., | instead of - if you do not have | characters in your lambda names.
Demo with a simplified dummy example:
$ cat Makefile
LAMBDA := a b c d
STARTER := $(patsubst %,-start-%,$(LAMBDA))
NUMBER := $(shell seq 49901 `expr 49900 + $(words $(STARTER))`)
TOKEN := $(join $(addsuffix -,$(STARTER)),$(NUMBER))
.PHONY: lambdas-starter $(STARTER)
lambda-starter: $(STARTER)
$(STARTER):
#lambda="$(patsubst -start-%,%,$#)"; \
number="$(patsubst $#-%,%,$(filter $#-%,$(TOKEN)))"; \
echo "Start the lambda function in lambdas/$$lambda with $$number."
$ make
Start the lambda function in lambdas/a with 49901.
Start the lambda function in lambdas/b with 49902.
Start the lambda function in lambdas/c with 49903.
Start the lambda function in lambdas/d with 49904.
Answered by Renaud Pacalet - Thank you very much.
I've taken his answer and adapted it further to my needs.
Below is one adaptation to show the list of documented Makefile targets, as well as reporting the list of lambdas and the port numbers that will be used for testing.
LAMBDAS := $(notdir $(wildcard ./src/lambdas/*))
PORT_NUMBERS := $(shell seq 49901 `expr 49900 + $(words $(LAMBDAS))`)
LAMBDA_TOKENS := $(join $(addsuffix -,$(LAMBDAS)),$(PORT_NUMBERS))
START_LAMBDAS := $(patsubst %,-start-%,$(LAMBDA_TOKENS))
LIST_LAMBDAS := $(patsubst %,-list-%,$(LAMBDA_TOKENS))
.PHONY: list # Generate list of targets
.PHONY: $(LIST_LAMBDAS)
list:
#grep '^.PHONY: .* #' $(MAKEFILE_LIST) | sort | sed 's/^.*\.PHONY: \(.*\) # \(.*\)/\1 \2/' | expand -t16
#echo ''
#echo 'The list of lambda functions known to this Makefile are:'
#echo ''
#$(MAKE) --no-print-directory list-lambdas
#echo ''
list-lambdas:$(LIST_LAMBDAS)
$(LIST_LAMBDAS):
#token=$(filter $(patsubst -list-%,%,$#),$(LAMBDA_TOKENS)); \
parts=($${token//-/ }); \
echo " src/lambdas/$${parts[0]} Will use port number $${parts[1]} for testing."| expand -t26
This outputs:
$ make list
build Build the containers
list Generate list of targets
test Test the containers
check-container Check the version of Node in the AWS NodeJS container
clean Remove images, logs, NodeJS cache, node_modules, and built distributions
npm-ci Run `npm ci` for the lambda functions using the AWS NodeJS container
npm-update Run `npm update` for the lambda functions using the AWS NodeJS container
shell Get a bash shell prompt using the AWS NodeJS container, with the current directory mounted in /srv/queue
The list of lambda functions known to this Makefile are:
src/lambdas/api Will use port number 49901 for testing.
src/lambdas/queue Will use port number 49902 for testing.
src/lambdas/queue_admin Will use port number 49903 for testing.
src/lambdas/rendering Will use port number 49904 for testing.
src/lambdas/reporting Will use port number 49905 for testing.
src/lambdas/schedules Will use port number 49906 for testing.
So I now have 1 pair of lambda and port numbers and then use that to build the targets for Makefile. Certainly understandable for those that can read Makefiles!

Why would GNU Make run sequential even though I've added --jobs and --max-load isn't even close?

This is on GNU Make 3.82, RHEL 7. Make appears to be running sequentially even though I passed in --jobs.
I'm doing about 700K trivial jobs - concatenating large gzip files onto other gzip files. If there is only one file to concatenate, then I create a symbolic link instead. Here is the command:
# Pattern to rebuild gzip file - concatenate if needed, otherwise just link
$(THISDIR)/%.tgz:
mkdir -p $$(dirname $#) && \
if [ $$(echo '$^' | wc -w) -gt 1 ]; then cat $^ > $#; else ln -s $^ $#; fi
I already separated by && to avoid another shell invocation, made no difference.
About 600K of the 700K jobs are just creating symbolic links. For the remainder, the average number of files to concatenate is four.
Why is this so slow? I'm getting 5-8 TPS. More importantly, even though I specified (on a machine with 64 CPUS):
make --jobs --max-load=48
I see very few processes on top. So it appears that Make is not running parallel jobs at all. Is there a minimal job length for parallelism to work efficiently on GNU Make?
The load average from top right now is
top - 22:50:32 up 3 days, 13:13, 32 users, load average: 7.96, 7.44, 5.73
A few further details that might be helpful:
Make itself is running at close to 100% CPU.
There is no dependency between any of the files other than, of course, target and dependencies on the same rule. In other words, there are no files that appear both in $# and $^.
Files are being created and read from NFS mounts
I've generated the 700K dependencies as rules that get read into the Makefile with an include. That process itself takes 25 minutes or so.
Possible to improve performance, especially when large number of files are being rebuilt by using (gnu) make functions to replace shell commands. This will reduce the umber of 'fork' and 'exec' required to complete the tasks:
%.tgz:
mkdir -p $(<D) && \
$(if $(findstring $(words $^),1),ln -s $^ $#, cat $^ > $#)
For the mkdir command, using the $(<D) will eliminate the call to dirname
For the cat/ln command, using $(findstring ...) and words will replace the echo ... | wc pipe, and the $if(...) will replace the shell if statement.
Overall, only 2 commands (mkdir, cat/ln), instead of 5 commands (mkdir, dirname, echo, wc, cat/ln) per target. Performance is about 2X
Make was spending a large portion of the prep time trying to match each of the targets to all the built-in rules for things like C files. Adding
.SUFFIXES:
MAKEFLAGS += --no-builtin-rules
made a huge difference. It still spends a few minutes after reading all the patterns in, but the benefits now outweigh that cost.

Need to check existence of flag in GNUmakefile

I am checking for existence of flag that is passed by user to GNUmakefile.
Basically, i am checking whether user has passed -j in my makefile. I have added below if condition. But before that i am trying to display MAKEFLAGS where i can see output is empty for that variable.
ifneq (,$(findstring j,$(MAKEFLAGS)))
....
Am i missing anything here?
Sometimes users may also pass --jobs instead of -j , And also i need to check whether the value passed to -j/--jobs is greater than 2
Is there any easy way in GNUmake for doing so in single if condition ?
The answer to your question depends on what version of GNU make you're using.
If you're using GNU make 4.1 or below, then the answer is "no, it's not possible" from within a makefile (of course you can always write a shell script wrapper around make and check the arguments before invoking make).
If you're using GNU make 4.2 or above, then the answer is "yes, it's quite possible". See this entry from the GNU make NEWS file:
Version 4.2 (22 May 2016)
The amount of parallelism can be determined by querying MAKEFLAGS, even when
the job server is enabled (previously MAKEFLAGS would always contain only
"-j", with no number, when job server was enabled).
This is a tricky question because MAKEFLAGS is a very strange make variable. First of all, with GNU make 4.3, -jN, -j N, --jobs N and --jobs=N are all converted to -jN in MAKEFLAGS, which looks interesting. You could thus try something like:
J := $(patsubst -j%,%,$(filter -j%,$(MAKEFLAGS)))
to get the N value passed on the command line or the empty string if -j and --jobs have not been used. But then, if you try the following you will see that it is not the whole story:
$ cat Makefile
.PHONY: all
J := $(patsubst -j%,%,$(filter -j%,$(MAKEFLAGS)))
ifneq ($(J),4)
all:
#echo MAKEFLAGS=$(MAKEFLAGS)
#echo patsubst...=$(patsubst -j%,%,$(filter -j%,$(MAKEFLAGS)))
#echo J=$(J)
else
all:
#echo J=4
endif
$ make -j4
MAKEFLAGS= -j4 -l8 --jobserver-auth=3,4
patsubst...=4
J=
Apparently MAKEFLAGS is not set when the Makefile is parsed (and the J make variable is assigned the empty string) but it is when the recipes are executed. So, using MAKEFLAGS with conditionals does not work. But if you can move your tests in a recipe, something like the following could work:
.PHONY: all
all:
j=$(patsubst -j%,%,$(filter -j%,$(MAKEFLAGS))); \
if [ -n "$$j" ] && [ $$j -gt 2 ]; then \
<do something>; \
else \
<do something else>; \
fi
Or:
.PHONY: all
J = $(patsubst -j%,%,$(filter -j%,$(MAKEFLAGS)))
all:
#if [ -n "$(J)" ] && [ $(J) -gt 2 ]; then \
<do something>; \
else \
<do something else>; \
fi
Note the use of the recursively expanded variable assignment (J = ...) instead of simple assignment (J := ...).

Makefile with variable number of targets

I am attempting to do a data pipeline with a Makefile. I have a big file that I want to split in smaller pieces to process in parallel. The number of subsets and the size of each subset is not known beforehand. For example, this is my file
$ for i in {1..100}; do echo $i >> a.txt; done
The first step in Makefile should compute the ranges,... lets make them fixed for now
ranges.txt: a.txt
or i in 0 25 50 75; do echo $$(($$i+1))'\t'$$(($$i+25)) >> $#; done
Next step should read from ranges.txt, and create a target file for each range in ranges.txt, a_1.txt, a_2.txt, a_3.txt, a_4.txt. Where a_1.txt contains lines 1 through 25, a_2.txt lines 26-50, and so on... Can this be done?
You don't say what version of make you're using, but I'll assume GNU make. There are a few ways of doing things like this; I wrote a set of blog posts about metaprogramming in GNU make (by which I mean having make generate its own rules automatically).
If it were me I'd probably use the constructed include files method for this. So, I would have your rule above for ranges.txt instead create a makefile, perhaps ranges.mk. The makefile would contain a set of targets such as a_1.txt, a_2.txt, etc. and would define target-specific variables defining the start and stop values. Then you can -include the generated ranges.mk and make will rebuild it. One thing you haven't described is when you want to recompute the ranges: does this really depend on the contents of a.txt?
Anyway, something like:
.PHONY: all
all:
ranges.mk: a.txt # really? why?
for i in 0 25 50 75; do \
echo 'a_$$i.txt : RANGE_START := $$(($$i+1))'; \
echo 'a_$$i.txt : RANGE_END := $$(($$i+25))'; \
echo 'TARGETS += a_$$i.txt'; \
done > $#
-include ranges.mk
all: $(TARGETS)
$(TARGETS) : a.txt # seems more likely
process --out $# --in $< --start $(RANGE_START) --end $(RANGE_END)
(or whatever command; you don't give any example).

Restructure makefile to avoid redundant implementation

A part of my makefile is as follow:
list1: all
for f in \
`less fetch/list1.txt`; \
do \
...
./$(BIN) $$f & \
...
done
list2: all
for f in \
`less fetch/list2.txt`; \
do \
...
./$(BIN) $$f & \
...
done
fetch/list1.txt and fetch/list2.txt contains two lists of files (path+filename), and make list1 and make list2 will respectively go through the 2 lists and run $(BIN) again the files. This works fine.
The problem is that, I have a couple of file lists as list1 and list2, and the process of the make is always the same. Does anyone know how to simplify makefile such that make listA, make list4, etc. does what they are supposed to do?
You can use a Pattern Rule:
all:
#echo "The all recipe"
list%: all
#echo "This recipe does something with $#.txt"
Output is:
$ make list1
The all recipe
This recipe does something with list1.txt
$ make list256
The all recipe
This recipe does something with list256.txt
$
I do not recommend performing scripting within makefiles. It will very often lead to arbitrary, inconsistent bugs, and other forms of frustration.
Use Make for execution control with dependencies, (as in, determining what gets executed when) but write your actual primitives (scripts or other programs) seperately, and call them from within Make.
bar: foo1 foo2 foo3
# bar is the make target. Foo1, 2, and 3 are sub-targets needed to make bar.
foo1:
fooscript1 # Written as an individual script, outside Make.
fooscript2
How about:
all:
#echo "All"
action_%:
#./$(BIN) $*
ACTION=$(patsubst %,action_%,$(shell cat $(ACT_FILE)))
actionList:
#make $(ACTION)
list%: all
#make ACT_FILE=fetch/list$*.txt actionList
Supports all list :-)
Rather than allow infinite parallelism (you were using ./$(BIN) fileName &). You can control actual parallelism using Make's built in features.
make -j8 list1
# ^ Parallelism set to 8

Resources