Processing multiple files generated from single input - makefile

I have a data file that is processed by a script to produce multiple output files. Each of these output files is then processed further. Which files are created depends on the contents of the input file, so I can't list them explicitly. I can't quite figure out how to refer to the various files that are generated in a makefile.
Currently, I have something like this:
final.out: *.out2
merge_files final.out $(sort $^)
%.out2: %.out1
convert_files $?
%.out1: data.in
extract_data data.in
This fails with No rule to make target '*.out2', needed by 'final.out'. I assume this is because the .out2 files don't exist yet and therefore the wildcard expression isn't replaced the way I would like it to. I have tried to use the wildcard function but that fails because the list of prerequisites ends up being empty.
Any pointers would be much appreciated.

EDIT: fixed the list of prerequisites in second pass.
You apparently cannot compute the list of intermediate files before running the extract_data command. In this case a solution consists in running make twice. One first time to generate the *.out1 files and a second time to finish the job. You can use an empty dummy file to mark whether the
extract_data command shall be run again or not:
ifeq ($(FIRST_PASS_DONE),)
final.out: .dummy
$(MAKE) FIRST_PASS_DONE=yes
.dummy: data.in
extract_data $<
else
OUT1 := $(wildcard *.out1)
OUT2 := $(patsubst %.out1,%.out2,$(OUT1))
final.out: $(OUT2)
merge_files $# $(sort $^)
%.out2: %.out1
convert_files $?
endif

Unfortunately your question is missing some details I would ask immediately if some SW developer would present this makefile for review:
does extract_files provide the list of files?
does convert_files convert one file or multiple? The example seems to imply that it converts multiple.
then I have to question the decision to break up extract, convert and merge into separate rules as you will not benefit from parallel build anyway
The following is the approach I would choose. I'm going to use a tar file as an example for an input file that results in multiple output files
generate a makefile fragment for the sorted list of files
use the tar option v to print files while they are extracted
convert each line into a makefile variable assignment
include the fragment to define $(DATA_FILES)
if the fragment needs to be regenerated, make will restart after it has generated it
use static pattern rule for the conversion
use the converted file list as dependency for the final target
.PHONY: all
all: final.out
# extract files and created sorted list of files in $(DATA_FILES)
Makefile.data_files: data.tar
set -o pipefail; tar xvf $< | sort | sed 's/^/DATA_FILES += /' >$#
DATA_FILES :=
include Makefile.data_files
CONVERTED_FILES := $(DATA_FILES:%.out1=%.out2)
$(CONVERTED_FILES): %.out2: %.out1
convert_files $< >$#
final.out: $(CONVERTED_FILES)
merge_files final.out $^
UPDATE if extract_data doesn't provide the list of files, you could modify my example like this. But of course that depends on that there are no other files that match *.out1 in your directory.
# extract files and created sorted list of files in $(DATA_FILES)
Makefile.data_files: data.in
set -o pipefail; \
extract_data $< && \
(ls *.out1 | sort | sed 's/^/DATA_FILES += /') >$#

Related

Always process outermost file extension (and strip extensions along the way)

I have a bunch of different source files in my static HTML blog. The outermost extensions explain the format to be processed next.
Example: Source file article.html.md.gz (with target article.html) should be processed by gunzip, then by my markdown processor.
Further details:
The order of the extensions may vary
Sometimes an extension is not used (article.html.gz)
I know how to process all different extensions
I know that the final form is always article.html
Ideally I would have liked to just write rules as follows:
...
all-articles: $(ALL_HTML_FILES)
%: %.gz
gunzip ...
%: %.md
markdown ...
%: %.zip
unzip ...
And let make figure out the path to take based on the sequence of extensions.
From the documentation however, I understand that there are constraints on match-all rules, and the above is not possible.
What's the best way forward? Can make handle this situation at all?
Extensions are made up examples. My actual source files make more sense :-)
I'm on holiday so I'll bite.
I'm not a fan of pattern rules, they are too restricted and yet too arbitrary at the same time for my tastes. You can achieve what you want quite nicely in pure make:
.DELETE_ON_ERROR:
all: # Default target
files := a.html.md.gz b.html.gz
cmds<.gz> = gzip -d <$< >$#
cmds<.md> = mdtool $< -o $#
define rule-text # 1:suffix 2:basename
$(if $(filter undefined,$(flavor cmds<$1>)),$(error Cannot handle $1 files: [$2$1]))
$2: $2$1 ; $(value cmds<$1>)
all: $2
endef
emit-rule = $(eval $(call rule-text,$1,$2))# 1:suffix 2:basename
emit-hierachy = $(if $(suffix $2),$(call emit-rule,$1,$2)$(call emit-hierachy,$(suffix $2),$(basename $2)))# 1:suffix 2:basename
emit-rules = $(foreach _,$1,$(call emit-hierachy,$(suffix $_),$(basename $_)))# 1:list of source files
$(call emit-rules,${files})
.PHONY: all
all: ; : $# Success
The key here is to set $files to your list of files.
This list is then passed to emit-rules.
emit-rules passes each file one-at-a-time to emit-hierachy.
emit-hierachy strips off each extension in turn,
generates the appropriate make syntax, which it passes to $(eval …).
emit-hierachy carries on until the file has only one extension left.
Thus a.html.md.gz becomes this make syntax:
a.html.md: a.html.md.gz ; gunzip <$< >$#
a.html: a.html.md ; mdtool $< -o $#
all: a.html
Similarly, b.html.gz becomes:
b.html: b.html.gz ; gunzip <$< >$#
all: b.html
Neato, or what?
If you give emit-rules a file with an unrecognised extension (c.html.pp say),
you get a compile-time error:
1:20: *** Cannot handle .pp files: [c.html.pp]. Stop.
Compile-time? Yeah, before any shell commands are run.
You can tell make how to handle .pp files by defining cmds<.pp> :-)
For extra points it's also parallel safe. So you can use -j9 on your 8 CPU laptop, and -j33 on your 32 CPU workstation. Modern life eh?

one target for one dependency in makefile

I am trying to use make to generate thumbnails of photos by typing "make all". If the thumbnails are not yet generated make all generates them, else make all just generate the thumbnails of modified photos. For this I need one target (thumbnail) for each dependency (photo) . My code is like this :
input = pictures/*.jpg
output = $(subst pictures,thumbs,$(wildcard $(input)))
all : $(output)
echo "Thumbnails generated !"
$(output) : $(input)
echo "Converting ..."
convert -thumbnail 100 $(subst thumbs,pictures,$#) $#
How can I modify it to get the desired result ?
Your problem is this line
$(output) : $(input)
The output variable is the list of every output file.
The input variable is the wildcard pattern.
This sets the prerequisites of every output target as the wildcard pattern which means if any file changes every output file will be seen as needing to be rebuilt.
The fix for this is to either use a static pattern rule like this
$(output) : thumbs/% : pictures/%
which says to build all the files in $(output) by matching them against the pattern thumbs/% and using the part that matches % (called the stem) in the prerequisite pattern (pictures/%).
Alternatively, you could construct a set of specific input/output matches for each file with something like
infiles = $(wildcard pictures/*.jpg)
$(foreach file,$(infiles),$(eval $(subst pictures/,thumbs/,$(file)): $(file)))
$(output):
echo "Converting ..."
convert -thumbnail 100 $(subst thumbs,pictures,$#) $#
Which uses the eval function to create explicit thumbs/file.jpg: pictures/file.jpg target/prerequisite pairs for each input file.

How to write a (GNU)makefile with output different than the target?

I have script that takes in a filename and generates multiple files with same name but different extension. I want to write a makefile that depends on files generated with different extension but only specify the filename. I have a dummy example to explain it below.
test_output:test_input
genereate.py -i $^ -o $#
The above makefile dependency generates multiple files with same filename but different extension, but won't generate the actual target. For example, it generates
test_output.a test_output.b test_output.c
The way its written above is not the efficient way as there no actual target, so it runs this even though the output is already there.
How would i specify the makefile so that it reads in the target(test_output) but actually depends on the output file it generates like test_output.a or any of the other files.
If you use GNU make (you didn't say) you can use pattern rules to tell make about a rule that generates multiple outputs based on a single stem. So for example you can write:
%.a %.b %.c : test_input
genereate.py -i $^ -o $*
(it would work a lot better if the input filename was related to the output filenames with the same stem, but the above will work although you'll have to write a different one for each input file).
Typically that's what you want, so that other targets that need these outputs can depend on them.
If you really want to have a target without any extension as well, just create it:
test_output : test_output.a test_output.b test_output.c
%.a %.b %.c : test_input
genereate.py -i $^ -o $*

Compress Makefile Intermediates (Two ways to create same target)

I have a Makefile I'm currently using for purposes other than compiling. Instead of deleting intermediate files, I'd like to keep them, but gzip them, and then later have Makefile detect that an intermediate file exists and instead of recomputing it, simply unzip it.
Let's suppose I have target target.txt that depends on an intermediate file called intermediate.txt, which itself depends on prereq.txt. So something like:
target.txt: intermediate.txt
intermediate.txt: prereq.txt
Now by default Make deletes the intermediate file, but we can disable that. But let's say that both computing intermediate.txt takes a long time, so I'll disable automatic deletion of it. But what if file intermediate.txt is also very large, so I'd like to compress it (gzip) to intermediate.txt.gz. Instead of recomputing the file, I'd like Make to unzip the existing zipped file, so gunzip intermediate.txt.gz.
The larger question I suppose I'm asking is I have two ways of making a target, based on two different dependencies. I'd like Make to execute the rule that has the prerequisite that exists, and ignore the other rule, but perhaps delete the zipped version and recompute it only if the prerequisite to the intermediate has a newer timestamp. Does anyone have any suggestions?
If you are using GNU Make, you can do this with pattern rules (tac is used to represent whatever processing you're doing):
%.txt: %.i.txt
tac $^ > $# #make .txt file the normal way
gzip $^ #gzip the intermediate file
%.txt: %.i.txt.gz
gunzip < $^ | tac > $# #make .txt by streaming in the gzipped intermediate
%.i.txt: %.p.txt
tac $^ > $# #make the intermediate file from the prereq
This works for pattern rules because if the .i.txt file is not found, Make falls through to the next pattern and looks for the .i.txt.gz version. This does not work for explicit rules, because later rules simply replace earlier rules.
I would guess that you do NOT want to just uncompress the .gz version if prereq.txt is newer than the gzipped file. In that case, I would tend to just use shell tests to store off and restore the gzipped file and not get make directly involved:
target.txt: intermediate.txt
...same as it ever was ....
intermediate.txt: prereq.txt
if [ $#.gz -nt $< ]; then \
gunzip <$#.gz >$#; \
else \
whatever >$# <$<; \
gzip <$# >$#.gz; \
fi
where 'whatever' is the command that creates intermediate.txt from prereq.txt

echo for one target

I have a makefile which looks for .txt files in a directory and for each file makes echo of it name.
pchs := $(wildcard $(OUTPUT:%=%/*.txt))
txt: $(pchs)
%.txt:
echo $#
But when I start it the make utility returns me that nothing to be done for txt. Why?
EDIT1:
After some answers I understand what I should make with my makefile. Now it looks like this:
pchs := $(wildcard $(OUTPUT:%=%/*.txt))
.PHONY : $(pchs)
txt: $(pchs)
%.txt:
#echo pch is '$<'
But .PHONY does not help me the result of making is the same.
Why does make says, that there ist nothing to do? Because make calculates dependencies of targets, usually file targets. And the "txt" target produces no file.
.PHONY is for targets, that produce no file, like the clean target.
This here should work:
pchs := $(wildcard $(OUTPUT:%=%/*.txt))
.PHONY: txt
txt: $(pchs)
echo $#
But, since you only echo the filename, I guess that you are post processing this output. Maybe you could formulate this post processing as a rule in the makefile?
Because makefiles define what you want to have built. And the .txt files already exist, so there is nothing to do.
To solve this there are a number of possibilities, but you should look into the .PHONY record if using gnu-make at least.
You can build fake-things out of the txt records and mark them as phony. But... it might just be easier to do this:
pchs := $(wildcard $(OUTPUT:%=%/*.txt))
txt:
for i in $(pchs) ; do echo $$i ; done
That's because every .txt file you've listed in $(pchs) is up-to-date and Make decides to take no action.
Add them to .PHONY target to force rebuilding them every time you run Make:
.PHONY : $(pchs)
UPD.
Also check that $(pchs) list is not empty, it could be done i.e. as follows:
txt : $(pchs)
#echo pchs is '$^'
I would use Bash to determine the *.txt files, instead of Make:
txt:
ls | grep -F '.txt'
You could also use this as a template to make a more general target, that echos any files that exist in the directory with a particular extension.
You may want the target to be PHONY, since it's not making a file.

Resources