I would like to process files generated from a list into a summary, using parallel processing of each intermediary file. make might be well suited for this.
Let's take an example : given a list of urls, download files, process them in parallel, and generate a report from the processed files.
example (won't work) :
all : report_file
report_file : $(wildcard data/*.processed)
...
data/%.processed : data/%.input
... # this should be processed in parallel
data/%.input : filelist
download all lines of filelist to N files.
filelist :
generate_list url_file > $#
I'd like the processing (and maybe downloading) of each file to be done in parallel and I don't know how many lines will be generated in filelist.
this won't work because the processed files do not exist when run, so rule to build report will be given an empty input.
also, it can be useful to avoid downloading files newer than 1 day and thus not process them, so makefile like dependency has a use here.
I could generate a special makefile from the list, but is there a way to do it with a single makefile ?
Since report_file's dependencies can't be evaluated until after all the target-dependency information is parsed, you'll need to refresh this information again. The only way to do this that I know of is to invoke a submake.
all: filelist
$(MAKE) $(shell cat $<) #make data/a.processed data/b.processed etc...
$(MAKE) report_file
report_file : $(wildcard data/*.processed)
...
data/%.processed : data/%.input
... # this should be processed in parallel
data/%.input :
download all lines of filelist to N files.
filelist :
generate_list url_file > $# #url list
sed -i 's;\(.*\);\1.processed;g' $# #append .processed to all urls
Related
I am trying to use makefile to manage my building process in a small project, where the target number and target names are not known beforehand but depends on the input. Specifically, I want to generate a bunch of data files (say .csv files) according to a cities_list.txt file with a list of city names inside. For example, if the contents of the txt file are:
newyork
washington
toronto
then a script called write_data.py would generate three files called newyork.csv, washington.csv and toronto.csv. When the content of the cities_list.txt file changes, I want make to deal with this change cleverly, i.e. only update the new-added cities files.
I was trying to define variable names in target names to make this happen but didn't succeed. I'm now trying to create a bunch of intermediate .name files as below:
all: *.csv
%.name: cities_list.txt
/bin/bash gen_city_files.sh $<
%.csv: %.name write_data.py
python3 write_data.py $<
clean:
rm *.name *.csv
This seems to be very close to success, but it only gives me one .csv file. The reason is obvious, because make can't determine what files should be generated for the all target. How can I let make know that this *.csv should contain all the files where there exists a corresponding *.name file? Or is there any better way to achieve what I wanted to do here?
All right, this should do it. We'd like a variable assignment at the head of the file:
CITY_FILES := newyork.csv washington.csv toronto.csv
There are two ways to do this. This way:
-include cities.mak
# this rule can come later in the makefile, near the bottom
cities.mak: cities_list.txt
#sed 's/^/CITIES := /' $< > $#
and this way:
CITIES := $(shell cat cities_list.txt)
After we've done one of those two, we can construct the list of needed files:
CITY_FILES := $(addsuffix .csv, $(CITIES))
and build them:
# It is convenient to have this be the first rule in the makefile.
all: $(CITY_FILES)
%.csv: write_data.py
python3 $< $*.name
I tried looking for answers to this question, so I apologize in advance if this is a duplicate of a question I didn't find. Also sorry that I cannot directly provide the code that I am working with (it would require a lot of environmental dependencies, anyway).
I have a sequence of actions, which all depend on the success of the previous actions, and also don't need to be repeated unless they are out of date. A make solution seemed like the proper one. I've come up with a solution that does almost all of it. Here is the sequence of steps I am trying to replicate, with the output of each step listed below its input:
ZIP file
extract to package/
package/directory/*.comp
execute uncomp.py to create a .uncomp file from a .comp file
Everything works fine up to this point
package/directory/*.uncomp
For *.uncomp files, execute script1 to produce a .html file
For *_ext.uncomp files, execute script2 to produce numbered *_ext.##.png file(s)
Multiple numbered files (_ext.0.png, _ext.1.png, _ext.2.png) are possible, and may not be present at the time make is run. However, make should know that they are the output of the previous step, and only run this recipe if these files (a) don't exist or (b) any are older than the *_ext.uncomp file.
I have put together a Makefile which does almost what I'm looking for, except that it delegates all of the last portion (numbered files) to a shell script which I could program to look at file times, but that defeats the purpose of using make in the first place, in my opinion.
Environment
Debian 8.8 (x86)
GNU Make 4.0
Built for x86_64-pc-linux-gnu
My Question
What rules and recipes can I use to inform GNU make of the relationship between the *_ext.uncomp files and the _ext.##.png files so that those recipes only get executed as necessary (and say 'Target is up-to-date' if all .png files are at least as new as the _ext.uncomp file), that won't also apply to the *.uncomp files, and that will still work of there are no .png files in the output?
I will also need to indicate the relationship between non-_ext files and their corresponding HTML counterparts. So that script1 only gets executed when the HTML file is out of date or doesn't exist. This recipe/rule should not pay attention to _ext.uncomp files.
Any other advice on my Makefile would also be appreciated, because I am not overly familiar with it.
Generalized contents of my current Makefile
.PHONY : all
all : package package/directory/*.uncomp
./process $^
%.comp.uncomp : %.comp package
python uncomp.py $<
package : *.zip
rm -rf package/
unzip *.zip -d package/
Contents of the process script
This script should no longer exist if all the goals of the question are met (make will handle everything). It works great, but it always processes .uncomp files no matter what, even if the output from them already exists and is newer than the source.
#!/bin/bash
if [ $# -lt 2 ]; then
echo "$0 expects at least 2 arguments"
exit 1
fi
# Discard the first agrument, it's always 'package'
shift
# Iterate over each of the remaining arguments
while [ $# -gt 0 ]; do
if [[ $1 == *_ext.uncomp ]] ; then
python script2 $1
elif [[ $1 == *.uncomp ]] ; then
python script1 $1
else
echo "Warning: Unknown file type: $1"
fi
shift
done
I learned a lot about GNU make trying to get this to work. I discovered that the solution to my problem was in not overthinking it.
The most important realization was that I didn't need make to track all of the numbered output files, but just the first one (if the first one is out of date or missing, they all will be, and they all get re-extracted by the script, so a 1:1 relationship was all I needed to indicate there).
I found out that GNU make 3.82 and later uses "shortest stem first" order instead of definition order when matching pattern rules. To make my file compatible with both versions, I made sure to define the most specific stems first.
After that it was a matter of setting up some implicit rules, and just telling make what to expect to be able to find—the concept is a little backwards to my way of thinking which is why I had some trouble at first (look for this file that doesn't exist yet; now, here's a way to make it from a file that does exist). The end result, fully functional:
PACKAGE := package
COMP := .comp
UNCOMP := .comp.uncomp
PNG0 := .comp.0.png
TXT := .comp.txt
SUFFIX := _ext
COMPFILES = $(wildcard $(PACKAGE)/subdir/*$(COMP))
UNCOMPFILES = $(COMPFILES:$(COMP)=$(UNCOMP))
SUFFIXFILES = $(filter %$(SUFFIX)$(UNCOMP),$(UNCOMPFILES))
PNGFILES = $(SUFFIXFILES:$(UNCOMP)=$(PNG0))
NOSUFFIXFILES = $(filter-out %$(SUFFIX)$(UNCOMP),$(UNCOMPFILES))
TXTFILES = $(NOSUFFIXFILES:$(UNCOMP)=$(TXT))
.PHONY : all
all : pngs txts htaccess
.PHONY : txts
txts : $(TXTFILES)
.PHONY : pngs
pngs : $(PNGFILES)
.PHONY : uncomp
uncomp : $(UNCOMPFILES)
make pngs
make txts
.PHONY : htaccess
htaccess : $(PACKAGE)/.htaccess
%$(SUFFIX)$(PNG0) : %$(SUFFIX)$(UNCOMP)
## Ignore failures when extracting PNG files
-python script1.py $<
%$(TXT) : %$(UNCOMP)
## Ignore failures when dumping TXT files
-python script2.py $< > $#
%$(UNCOMP) : %$(COMP)
## Ignore decompression failure
-python uncomp.py $<
$(PACKAGE)/.htaccess : .htaccess | $(PACKAGE)
cp .htaccess $(PACKAGE)/
$(PACKAGE) : *.zip
rm -rf $(PACKAGE)/
unzip *.zip -d $(PACKAGE)/
make uncomp
.PHONY : clean
clean :
rm -rf $(PACKAGE)/
Pretty new to GNU Make. This is a less complex example of something more general I have been trying to get to work.
I have many input files that have similar name format .txt, and I have a shell script that will take the input file and generate an output of the same name but with a different extension .wc. I have written the following Make file.
# name of dependencies
SRC = $(wildcard *.txt)
# get name of targets (substitute .wc for .txt)
TAR = $(SRC:.txt=.wc)
all: $(TAR)
%.wc: %.txt
sh word_count.sh $<
This runs fine, and will generate all the .wc output files. However, if I modify one of the input(dependency) files, they are all rebuilt. So the question is; what is the best way to get GNU Make to only process the modified .txt files in the directory?
Before I start, I'll mention that I'm not using GNU Make in this case for building a C/C++ project.
Makefile:
DEST_DIR = build/
SRC_DIR = src/
$(SRC_DIR)a/ : $(SOMETHING_ELSE)
$(DO_SOMETHING_TO_GENERATE_A_DIR)
$(DEST_DIR)% : $(SRC_DIR)%
cp -r $^ $#
ALL_DEPS += <SOMETHING>
... more code which appends to ALL_DEPS ...
.PHONY: all
all : $(ALL_DEPS)
I've got some files not generated via Make rules in $(SRC_DIR). (For the sake of this example, let's say there's a directory $(SRC_DIR)b/ and a file $(SRC_DIR)c .)
I want to append to ALL_DEPS all targets which represent files or directories in $(DEST_DIR) so that "make all" will run all of the available $(DEST_DIR)% rules.
I thought to do something like this:
ALL_DEPS += $(addprefix $(DEST_DIR),$(notdir $(wildcard $(SRC_DIR)*)))
But of course, that doesn't catch anything that hasn't yet been made. (i.e. it doesn't append $(DEST_DIR)a/ to the list because $(SRC_DIR)a/ doesn't yet exist when the $(wildcard ...) invocation is evaluated and the shell doesn't include it in the results returned by the $(wildcard ...) invocation.)
So, rather than a function which finds all (currently-existing) files matching a pattern, I need one which finds all targets matching a pattern. Then, I could do something like this:
ALL_DEPS += $(addprefix $(DEST_DIR),$(notdir $(targetwildcard $(SRC_DIR)*)))
If it matters any, I've got much of the GNU Make code split across multiple files and included by a "master" Makefile. The ALL_DEPS variable is appended to in any of these files which has something to add to it. This is in an attempt to keep the build process modular as opposed to dropping it all in one monster Makefile.
I'm definitely still learning GNU Make, so it's not unlikely that I'm missing something fairly obvious. If I'm just going about this all wrong, please let me know.
Thanks!
It is simply not possible to do what you're trying to do; you're trying to get make to recognise something that doesn't exist.
This is part of the reason why, in general, wildcards are bad (the other being that you can end up including stuff you didn't mean to). The right thing to do here is to explicitly create a list of source files (ls -1 | sed -e 's/\(.*\)/sources+=\1/' > dir.mk) and perform the patsubst transformation on that list.
If you have additional files that are generate as part of the build, then you can append them to that list and their rules will be found as you'd expect.
I have some XML source files which need to be processed by a Ruby script to create generated c# files before my main target can be built. The start-up cost of script is much greater than the time to process each file so it's quite inefficient to process them one by one as is usually done in make files. What I want to do is collect them all together and pass them as a list to script which execute just before updating the main target.
What I have now is something like:
_generated_/%.xml.cs : %.cs
#execute ruby script to generate .cs file
out.exe : a.cs b.cs _generated_/e.xml.cs ....
#compile .cs files
I came across the idea of using eval for this so if the files which are processed have a suffix of .s and yield a file with a suffix of .t when processed by the script my idea was to do this:
%.xml : _generated_/%.xml.cs
$(eval SOURCE_FILES += $<)
However this rule won't trigger unless there is shell command after the eval (echo will do) - I guess it's because make knows that simply calling a function can't possibly produce a file. Another idea I had was to collect the list of files into a temporary file instead.
.INTERMEDIATE source_list.txt
%.xml : _generated_/%.xml.cs
echo $< >> source_list.txt
While these will probably both work, I am wondering if there is a better way to do this.
Update:
What I ended up doing is was something like the following - the # prefix on eval function fools make into believing that a shell command is being executed.
_generated_/%.xml.cs : %.cs
# $(eval DIRTY_XML += $(<))
out.exe : a.cs b.cs _generated_/e.xml.cs ....
# Create generated cs files
# by running ruby script with DIRTY_XML as input
# Compile all .cs files
Use an empty file called, say, ruby-marker, to indicate that all of the xml files have been processed. Its modification time can be compared to those of the "x.s" files. Then use $? to select only the prerequisite "x.s" files that have changed since the last run of the ruby script.
main-target: ruby-marker
whatever...
ruby-marker: foo.s bar.s baz.s
ruby-script $?
#touch $#
You could use a $(filter) on $? - $? is the list of prerequisites that are newer than target.