Make for identical workflow in separate directories - makefile

I'm using make to automate some of my data analysis. I have several directories, each containing a different realization of the data, which consists of several files representing the state of the data at a given time, like so:
├── a
│ ├── time_01.dat
│ ├── time_02.dat
│ ├── time_03.dat
│ └── ...
├── b
│ ├── time_01.dat
│ └── ...
├── c
│ ├── time_01.dat
│ └── ...
├── ...
The number of datafiles in each directory is unknown, and more can be added at any time. The files all have the same naming convention in each directory.
I want to use make to run the exact same set of recipes in each directory (to analyze each dataset separately and uniformly). In particular, there is one script that should run any time a new datafile is added, and creates an output file (analysis_time_XX.txt) for each datafile in the directory. This script does not update any files that have been previously created, but does create all the missing files. Refactoring this script is not a possibility, unfortunately.
So I have one recipe creating many targets, yet it must run separately for each directory. The solutions I've found to create multiple targets with one recipe (e.g. here) do not work in my case, as I need one rule to do this separately for multiple sets of files in separate directories.
These intermediate files are needed for their own sake (as they help validate the data collected), but are also used to create a final comparison plot between the datasets.
My current setup is an ugly combination of functions and .SECONDEXPANSION
dirs = a b c
datafiles = $(foreach dir,$(dirs),$(wildcard $(dir)/*.dat))
df_to_analysis = $(subst .dat,.txt,$(subst time_,analysis_time_,$(1)))
analysis_to_df = $(subst .txt,.dat,$(subst analysis_time_,time_,$(1)))
analysis_files = $(foreach df,$(datafiles),$(call df_to_analysis,$(df)))
all: final_analysis_plot.png
.SECONDEXPANSION:
$(analysis_files): %: $$(call analysis_to_df,%)
python script.py $(dir $#)
final_analysis_plot.png: $(analysis_files)
python make_plot.py $(analysis_files)
Note that script.py creates all of the analysis_time_XX.txt files in the given directory. The flaw with this setup is that make does not know that the first script generates all the targets, and so runs unnecessarily when parallel make is used. For my application parallel make is a necessity, as these scripts have a long runtime, and parallelization saves a lot of time as the setup is "embarrassingly parallel."
Is there an elegant way to fix this issue? Or even an elegant way to clean up the code I have now? I've shown a simple example here, which already requires a good bit of setup, and doing this for several different scripts gets unwieldy quickly.

I think, in your case there's no need to bother with .txt files. If script.py was nicer and could work per-file, there would be a value in writing individual file rules. In this case, we need to introduce an intermediate per-directory .done files.
DATA_DIRS := a b c
# A directory/.done.analysis file means that `script.py` was run here.
DONE_FILES := $(DATA_DIRS:%=%/*.done.analysis)
# .done.analysis depends on all the source data files.
# When a .dat file is added or changes, it will be newer than
# a .done.analysis file; and the analysis would be re-run.
$(DONE_FILES): %/.done.analysis: $(wildcard %/*.dat)
python script.py $(#D)
final_analysis_plot.png: $(DONE_FILES)
python make_plot.py $(wildcard $(DATA_DIRS)/analysis_time_*.txt)

Related

Is there a command to generate all .mo files from multiple .po files?

I have multiple .po files in a standard directory structure of GNU gettext:
locales/
├── en_US
│   └── LC_MESSAGES
│   └── myapp.po
└── zh_TW
└── LC_MESSAGES
└── myapp.po
I knew that I could write a script that uses msgfmt to generate .mo files from these .po files. Something like:
# generate-mo-files.sh
for PO_FILE in locales/*/LC_MESSAGES/*.po
do
MO_FILE="${PO_FILE/.po/.mo}"
msgfmt -o "$MO_FILE" "$PO_FILE"
done
However, writing this script for each project I work on is a bit tedious. Is there a ready-made script for such use case?
If you are using GNU autotools, you can use the script gettextize to prepare your project for gettext. Other build systems have similar special tools.
Otherwise, your little script is exactly the right solution.

Custom targets using patterns in Makefile

I have a tree of files that looks like this:
.
├── Makefile
├── README.md
├── exercises
│   ├── 100-exercises.ipynb
│   ├── 200-exercises.ipynb
│   ├── 300-exercises.ipynb
│   └── 400-exercises.ipynb
├── notes
│   ├── 101-notes-pandas.ipynb
│   ├── 102-notes-matplotlib-1.ipynb
│   ├── 103-notes-numpy-scipy.ipynb
│   └── 104-notes-matplotlib-seaborn.ipynb
└── tasks
├── 101-tasks-pandas.ipynb
├── 102-tasks-matplotlib-1.ipynb
├── 103-tasks-numpy-scipy.ipynb
└── 104-tasks-matplotlib-seaborn.ipynb
I would like to add some targets that only operates on files according to patterns in their filename. For example:
make lecture-1
make lecture-1-notes
make lecture-1-exercises
make lecture-2
make notes
make exercises
...
etc.
where lecture-1 refers to the set of targets that have a filename beginning with a 1 e.g. tasks/101-tasks-pandas.ipynb - to be clear the patterns are:
notes -> ./notes/*
exercises -> ./exercises/*
tasks -> ./tasks/*
lecture-1 -> ./*/1[0-9][0-9]*.ipynb
lecture-2 -> ./*/2[0-9][0-9]*.ipynb
lecture-1-notes -> ./notes/1[0-9][0-9]*.ipynb
The long way would be to have a separate target for each but I feel like there must be some kind of pattern/regex matching that can be done to avoid this.
EDIT:
For more information on the operations done on each target, I have an executable command which basically converts the IPython notebook to HTML. This is stored as a variable in make called RENDER_HTML
e.g. At the moment to render everything in the notes folder I have the following sections in my Makefile:
RENDER_HTML=jupyter nbconvert --execute --to html
NOTES_TARGETS=$(wildcard ./notes/*.ipynb)
...
.PHONY: notes
notes: ${NOTES_TARGETS}
#mkdir -p $#/html/
${RENDER_HTML} $^
#mv $#/*.html $#/html/
It would be a lot easier for us to help you if you provided example of what kinds of rules you want and what they would do: maybe implementing rules for one of these by hand that can serve as an example.
Without having any idea what the targets and prerequisites are, what I suggest is that you use recursive make to compute a list of targets to build; something like this:
lecture-%:
$(MAKE) $(patsubst ???,???,$(wildcard */$(*)*))
I just used ??? here since you don't provide any information on how the source files are translated to targets: you'll have to do that part yourself :).
If you can define a rule that builds a single output file, like this:
notes/%:
#mkdir -p $(#D)/html/
${RENDER_HTML} $#.ipynb
#mv $(#D)/*.html $(#D)/html/
then you can do this:
lecture-%:
$(MAKE) $(patsubst %.ipynb,%,$(wildcard */$(*)*))
The mv command there confuses me somewhat (seems like there should be a better way to do that for sure) but it's what you have in your question so I guess it's right.
I'm suspicious that this won't work for you, depending on the answer to my question above. If the render command needs to see ALL the files (for example to build an index.html or something) then I don't quite understand how you want this to work, when it builds only some of the files. Basically, the problem is still under-specified to allow us to give a working solution. But maybe there's enough info here to get you started.

How do you make a makefile target depend on a file with the same name as the target file's directory?

Suppose you have the following project structure:
.
├── Makefile
└── src
└── 1.py
The program 1.py creates multiple (0, 1, or more) files in the directory build/1. This generalizes to arbitrary numbers, i.e. a program x.py where x is some natural number would create multiple files in the directory build/x. The project can consist of many python(3) files.
A makefile for the specific scenario above could look like this:
PYTHON_FILES := $(shell find src -name '*.py')
TXT_FILES := build/1/test.txt
.PHONY: clean all
all: $(TXT_FILES)
build/1/test.txt: src/1.py
mkdir -p build/1
touch build/1/test.txt # emulates: python3 src/1.py
echo "success!"
clean:
rm -rf build
Running make with the above project structure and makefile results in the following project structure:
.
├── Makefile
├── build
│   └── 1
│   └── test.txt
└── src
└── 1.py
How do I generalize the rule head build/1/test.txt: src/1.py to handle projects with any number of python programs (or, equivalently, build subdirectories) and any number of output files per python program?
You can generalized the existing rule to work on ANY python code in src. Use '%' in the pattern rule, use '$* to refer to the number in the action list.
The rule will re-run the test, whenever the python test is modified. It will record "success" only if the python test indicate completion without an error.
Update 2019-11-24: Generalized the test to handle N tests, each generating multiple files. With rebuild.
Note 1: Make need a way to know if the python test passed without ANY failure. The solution assume non-zero exit code from the python code, or that there is another way to tell if all tests have passed.
Note 2: The done file capture the list of test files generated in the folder (excluding the test.done itself). This can be used to verify that NO output file was removed, if needed, in a separate target to compensate the the lack of explicit files generated by the process
TASK_LIST=1 2 3 4
all: ${TASK_LIST:%=build/%/task.done}
build/%/task.done: src/%.py
mkdir -p build/$*
touch build/$*/test.txt # emulates: python3 src/1.py
# Run script src/%.py should return non-zero exit on failure.
ls build/$* | grep -xv "$(#F)" > $#
touch $# # Mark "success!"
GNU Make documentation: https://www.gnu.org/software/make/manual/html_node/Automatic-Variables.html

Is there a way for having multiple wildcards?

I try to use GNU make to organize my research data, processing and visualization as recommended by the Data Science CookieCutter project. My raw data is structured like this:
.
├── data
│   ├── interim
│   │   └── cleaned
│   └── raw
│   ├── ex01
│   └── ex02
Where I keep the data of experiment 1 and 2 seperated but combine them after cleaning them. Eg data/raw/ex01/p0-c0.csv becomes data/interim/cleaned/ex01-p0-c0.hdf.
In make I use two rules like this:
data/interim/cleaned/ex01-%.hdf: data/raw/ex01/source0/%.csv
data/raw/ex01/source1/%.csv
$(PYTHON) src/data/make_dataset.py $^ $#
data_interim_cleaned_ex01: $(addprefix $(CLEANED_DIR)/ex01-, $(addsuffix .hdf, $(basename $(basename $(notdir $(wildcard data/raw/ex01/source0/*.csv))))))
This strikes me as oddly verbose (especially because I copied the block for experiment 2) and I my intuition tells me that it would be easier if there were multiple (named) wildcards. I guess regexps would help, but are not (easily) available in make.
Is there a canonical way to solve this?
The following solution isn't really a canonical make file but IMHO much of the canonical functionality of make is too hard to grasp and remember anyway. Questions like "how can I transform my set of filenames from shape X to Y" come up all the time because users do employ directory and filename structure as means to organize their projects (a very natural and logical way) and make is really badly equipped to handle such tasks programmatically.
One way is to use the usual range of command line tools like sed, the other are helper libraries like gmtt to take apart strings:
include gmtt-master/gmtt.mk
COMMON_ROOT = data/raw
COMMON_DEST = data/interim/cleaned
SOURCE = data/raw/ex01/p0-c0.csv data/raw/ex01/p1-c1.csv data/raw/ex02/p0-c0.csv data/raw/ex02/p1-c1.csv
# a pattern which separates a string into 5 parts (see below)
SEP_PATTERN = $(COMMON_ROOT)/ex*/*.csv
# use the elements (quoted variable-references '$$'!) in the new filename
OUTPUT_PATTERN = $(COMMON_DEST)/ex$$2-$$4.hdf
# glob-match tests a glob pattern on a string and returns the string cut up at the border of
# the glob elements (*,?,[] and verbatim strings). We immediately turn this into a gmtt table
# by prepending the number of columns (5) to it:
SEPARATED = 5 $(foreach fname,$(SOURCE),$(call glob-match,$(fname),$(SEP_PATTERN)))
$(info $(SEPARATED))
$(info -----------------)
$(info $(call map-tbl,$(SEPARATED),$(OUTPUT_PATTERN)$$(newline)))
Output:
$ make
5 data/raw/ex 01 / p0-c0 .csv data/raw/ex 01 / p1-c1 .csv data/raw/ex 02 / p0-c0 .csv data/raw/ex 02 / p1-c1 .csv
-----------------
data/interim/cleaned/ex01-p0-c0.hdf
data/interim/cleaned/ex01-p1-c1.hdf
data/interim/cleaned/ex02-p0-c0.hdf
data/interim/cleaned/ex02-p1-c1.hdf
make: *** Keine Ziele. Schluss.
I fear that turning the makefile into one which generates rules dynamically is inevitable, tho.
The answer is perhaps not one you are going to like, but it is to not introduce variability or repetition in you file names. There are easy or at least reasonable ways to articulate relationships in a Makefile between stem names where you add or remove a prefix (such as a directory name) or a suffix. Anything else creates complications where you end up with tortured and complex transformation rules or external helper scripts to manage the mappings, or, in the worst care, a situation where you simply have to abandon make for dependency management.
One workaround which sort of allows you to keep your cake and eat it too is to set up symlinks between your preferred, human-friendly naming conventions and the structures managed by make; but this is a crutch at best.
Another technique which may be useful to you is to touch a simple flag file to mark a complex set of dependencies as handled. Especially if there are dependencies which do not map directly to a set of input file names for another target, putting all of those behind a simple
.input-files-done: some complex depencies
touch $#
and then just depending on .input-files-done for the targets which share these dependencies can simplify your Makefile and your workflow.
But in summary, my main recommendation would be to keep file names uniform, so that you can always declare an explicit dependency from one file name to another with a simple rule.

Make starts in wrong directory under FreeBSD

I have a very simple Makefile that just shells out to another Makefile:
all:
cd src && make all
My directory structure (the Makefile is in the top-level directory):
[I] mqudsi#php ~/bbcp> tree -d
.
├── bin
│   └── FreeBSD
├── obj
│   └── FreeBSD
├── src
└── utils
This works just fine under Linux, but under FreeBSD, it gives an error about src not being found.
To debug, I updated the Makefile command to pwd; cd src && make all and I discovered that somehow when I run make in the top-level directory, it is being executed under ./obj instead, meaning it's looking for ./obj/src/ to cd into.
Aside from the fact that I have no clue why it's doing that, I presumed for sure that calling gmake instead of make under FreeBSD would take care of it, but that wasn't the case (and I'm relieved, because I can't believe there is that huge of a difference between BSD make and GNU make in terms of core operation).
The odd thing is, deleting obj makes everything work perfectly. So in the presence of an obj directory, make cds into ./obj first; otherwise it executes as you'd expect it to.
Answering my own question here.
From the FreeBSD make man page:
.OBJDIR A path to the directory where the targets are built. Its
value is determined by trying to chdir(2) to the follow-
ing directories in order and using the first match:
1. ${MAKEOBJDIRPREFIX}${.CURDIR}
(Only if `MAKEOBJDIRPREFIX' is set in the environ-
ment or on the command line.)
2. ${MAKEOBJDIR}
(Only if `MAKEOBJDIR' is set in the environment or
on the command line.)
3. ${.CURDIR}/obj.${MACHINE}
4. ${.CURDIR}/obj
5. /usr/obj/${.CURDIR}
6. ${.CURDIR}
Variable expansion is performed on the value before it's
used, so expressions such as
${.CURDIR:S,^/usr/src,/var/obj,}
may be used. This is especially useful with
`MAKEOBJDIR'.
`.OBJDIR' may be modified in the makefile via the special
target `.OBJDIR'. In all cases, make will chdir(2) to
the specified directory if it exists, and set `.OBJDIR'
and `PWD' to that directory before executing any targets.
The key part being
In all cases, make will chdir(2) to specified directory if it exists, and set .OBJDIR'PWD' to that directory before executing any targets.
By contrast, the GNU make manual page makes no such reference to any sort of automatic determination of OBJDIR, only that it will be used if it is set.
The solution was to override the OBJDIR variable via the pseudotarget .OBJDIR:
.OBJDIR: ./
all:
cd src && make
clean:
cd src && make clean
An alternative solution is to prefix the cd targets with ${CURDIR}, which isn't modified after the chdir into OBJDIR.
I don't get why gmake behaved the same way, however. That feels almost like a bug to me.

Resources