I try to use GNU make to organize my research data, processing and visualization as recommended by the Data Science CookieCutter project. My raw data is structured like this:
.
├── data
│ ├── interim
│ │ └── cleaned
│ └── raw
│ ├── ex01
│ └── ex02
Where I keep the data of experiment 1 and 2 seperated but combine them after cleaning them. Eg data/raw/ex01/p0-c0.csv becomes data/interim/cleaned/ex01-p0-c0.hdf.
In make I use two rules like this:
data/interim/cleaned/ex01-%.hdf: data/raw/ex01/source0/%.csv
data/raw/ex01/source1/%.csv
$(PYTHON) src/data/make_dataset.py $^ $#
data_interim_cleaned_ex01: $(addprefix $(CLEANED_DIR)/ex01-, $(addsuffix .hdf, $(basename $(basename $(notdir $(wildcard data/raw/ex01/source0/*.csv))))))
This strikes me as oddly verbose (especially because I copied the block for experiment 2) and I my intuition tells me that it would be easier if there were multiple (named) wildcards. I guess regexps would help, but are not (easily) available in make.
Is there a canonical way to solve this?
The following solution isn't really a canonical make file but IMHO much of the canonical functionality of make is too hard to grasp and remember anyway. Questions like "how can I transform my set of filenames from shape X to Y" come up all the time because users do employ directory and filename structure as means to organize their projects (a very natural and logical way) and make is really badly equipped to handle such tasks programmatically.
One way is to use the usual range of command line tools like sed, the other are helper libraries like gmtt to take apart strings:
include gmtt-master/gmtt.mk
COMMON_ROOT = data/raw
COMMON_DEST = data/interim/cleaned
SOURCE = data/raw/ex01/p0-c0.csv data/raw/ex01/p1-c1.csv data/raw/ex02/p0-c0.csv data/raw/ex02/p1-c1.csv
# a pattern which separates a string into 5 parts (see below)
SEP_PATTERN = $(COMMON_ROOT)/ex*/*.csv
# use the elements (quoted variable-references '$$'!) in the new filename
OUTPUT_PATTERN = $(COMMON_DEST)/ex$$2-$$4.hdf
# glob-match tests a glob pattern on a string and returns the string cut up at the border of
# the glob elements (*,?,[] and verbatim strings). We immediately turn this into a gmtt table
# by prepending the number of columns (5) to it:
SEPARATED = 5 $(foreach fname,$(SOURCE),$(call glob-match,$(fname),$(SEP_PATTERN)))
$(info $(SEPARATED))
$(info -----------------)
$(info $(call map-tbl,$(SEPARATED),$(OUTPUT_PATTERN)$$(newline)))
Output:
$ make
5 data/raw/ex 01 / p0-c0 .csv data/raw/ex 01 / p1-c1 .csv data/raw/ex 02 / p0-c0 .csv data/raw/ex 02 / p1-c1 .csv
-----------------
data/interim/cleaned/ex01-p0-c0.hdf
data/interim/cleaned/ex01-p1-c1.hdf
data/interim/cleaned/ex02-p0-c0.hdf
data/interim/cleaned/ex02-p1-c1.hdf
make: *** Keine Ziele. Schluss.
I fear that turning the makefile into one which generates rules dynamically is inevitable, tho.
The answer is perhaps not one you are going to like, but it is to not introduce variability or repetition in you file names. There are easy or at least reasonable ways to articulate relationships in a Makefile between stem names where you add or remove a prefix (such as a directory name) or a suffix. Anything else creates complications where you end up with tortured and complex transformation rules or external helper scripts to manage the mappings, or, in the worst care, a situation where you simply have to abandon make for dependency management.
One workaround which sort of allows you to keep your cake and eat it too is to set up symlinks between your preferred, human-friendly naming conventions and the structures managed by make; but this is a crutch at best.
Another technique which may be useful to you is to touch a simple flag file to mark a complex set of dependencies as handled. Especially if there are dependencies which do not map directly to a set of input file names for another target, putting all of those behind a simple
.input-files-done: some complex depencies
touch $#
and then just depending on .input-files-done for the targets which share these dependencies can simplify your Makefile and your workflow.
But in summary, my main recommendation would be to keep file names uniform, so that you can always declare an explicit dependency from one file name to another with a simple rule.
Related
I am trying to build a framework which is supposed to apply similar operations to different designs/projects. Therefore, I have a general Makefile which defines general targets used for most of the operations. The idea is then that each design has its own main Makefile. This main Makefile includes the general Makefile for the general functionality, defines some variables for some basic configuration of the general Makefile, but can also extend or override variables from the general Makefile or define new targets or override targets when they are not applicable.
So the simplified directory structure looks something like this:
<Root Dir>
| -- targets.mk
| -- design1
| -- Makefile
| -- design2
| -- Makefile
The simplified general Makefile targets.mk looks something like this
${FF_LIST}: ${SRC_FILES}
#echo "Extract FF List for ${DESIGN_NAME}"
.PHONY: get_ff_list
get_ff_list: ${FF_LIST}
#echo "Get FF list for ${DESIGN_NAME} from ${FF_LIST}"
And the simplified design specific Makefile looks something like this:
include ../targets.mk
DESIGN_NAME = design1
FF_LIST = ./misc/ff_list.csv
With this implementation, I have the problem now, when calling the target get_ff_list within the design1 directory, that the recipe for the get_ff_list target is executed but the prerequisites are not, although the echo prints the right file.
user:/tmp/make_test/design1$ make get_ff_list
Get FF list for design1 from ./misc/ff_list.csv
It seems like that the target ${FF_LIST} is not expanded correctly. I can understand that during the time I am including the targets.mk Makefile this variable does not exist. However, my understanding of Makefile's recursive variable declaration with = should expand the variable every time the variable is used (as it is done and seems to work within the recipe itself).
I could include the targets.mk Makefile at the end after the configuration/setting the variables, like:
DESIGN_NAME = design1
FF_LIST = ./misc/ff_list.csv
include ../targets.mk
This seems to work and solve this particular issue. However, when I also want to extend or override variables/targets from the general Makefile, then it becomes a bit less obvious where to include it. Especially, if I am not the only one using the framework and other users create there own new designs.
Maybe this is even not a good way to use Makefiles to begin with. I would also be happy to get suggestions of better ways to implement this.
However, my understanding of Makefile's recursive variable declaration with = should expand the variable every time the variable is used (as it is done and seems to work within the recipe itself).
No. Read the section of the manual on How make Reads a Makefile to understand when variables are expanded immediately, and when the expansion is deferred.
The simplest way to do what you want is for the include targets.mk to come at the end of the Makefile, not at the beginning. If that's not feasible then you'll have to split the main makefile into two parts, one that sets variables and is included first, and the other that defines rules and is included last.
I'm using make to automate some of my data analysis. I have several directories, each containing a different realization of the data, which consists of several files representing the state of the data at a given time, like so:
├── a
│ ├── time_01.dat
│ ├── time_02.dat
│ ├── time_03.dat
│ └── ...
├── b
│ ├── time_01.dat
│ └── ...
├── c
│ ├── time_01.dat
│ └── ...
├── ...
The number of datafiles in each directory is unknown, and more can be added at any time. The files all have the same naming convention in each directory.
I want to use make to run the exact same set of recipes in each directory (to analyze each dataset separately and uniformly). In particular, there is one script that should run any time a new datafile is added, and creates an output file (analysis_time_XX.txt) for each datafile in the directory. This script does not update any files that have been previously created, but does create all the missing files. Refactoring this script is not a possibility, unfortunately.
So I have one recipe creating many targets, yet it must run separately for each directory. The solutions I've found to create multiple targets with one recipe (e.g. here) do not work in my case, as I need one rule to do this separately for multiple sets of files in separate directories.
These intermediate files are needed for their own sake (as they help validate the data collected), but are also used to create a final comparison plot between the datasets.
My current setup is an ugly combination of functions and .SECONDEXPANSION
dirs = a b c
datafiles = $(foreach dir,$(dirs),$(wildcard $(dir)/*.dat))
df_to_analysis = $(subst .dat,.txt,$(subst time_,analysis_time_,$(1)))
analysis_to_df = $(subst .txt,.dat,$(subst analysis_time_,time_,$(1)))
analysis_files = $(foreach df,$(datafiles),$(call df_to_analysis,$(df)))
all: final_analysis_plot.png
.SECONDEXPANSION:
$(analysis_files): %: $$(call analysis_to_df,%)
python script.py $(dir $#)
final_analysis_plot.png: $(analysis_files)
python make_plot.py $(analysis_files)
Note that script.py creates all of the analysis_time_XX.txt files in the given directory. The flaw with this setup is that make does not know that the first script generates all the targets, and so runs unnecessarily when parallel make is used. For my application parallel make is a necessity, as these scripts have a long runtime, and parallelization saves a lot of time as the setup is "embarrassingly parallel."
Is there an elegant way to fix this issue? Or even an elegant way to clean up the code I have now? I've shown a simple example here, which already requires a good bit of setup, and doing this for several different scripts gets unwieldy quickly.
I think, in your case there's no need to bother with .txt files. If script.py was nicer and could work per-file, there would be a value in writing individual file rules. In this case, we need to introduce an intermediate per-directory .done files.
DATA_DIRS := a b c
# A directory/.done.analysis file means that `script.py` was run here.
DONE_FILES := $(DATA_DIRS:%=%/*.done.analysis)
# .done.analysis depends on all the source data files.
# When a .dat file is added or changes, it will be newer than
# a .done.analysis file; and the analysis would be re-run.
$(DONE_FILES): %/.done.analysis: $(wildcard %/*.dat)
python script.py $(#D)
final_analysis_plot.png: $(DONE_FILES)
python make_plot.py $(wildcard $(DATA_DIRS)/analysis_time_*.txt)
I have a project that involves sub-directories with sub-makefiles. I'm aware that I can pass variables from a parent makefile to a sub-makefile through the environment using the export command. Is there a way to pass variables from a sub-makefile to its calling makefile? I.e. can export work in the reverse? I've attempted this with no success. I'm guessing once the sub-make finishes its shell is destroyed along with its environment variables. Is there another standard way of passing variables upward?
The short answer to your question is: no, you can't [directly] do what you want for a recursive build (see below for a non-recursive build).
Make executes a sub-make process as a recipe line like any other command. Its stdout/stderr get printed to the terminal like any other process. In general, a sub-process cannot affect the parent's environment (obviously we're not talking about environment here, but the same principle applies) -- unless you intentionally build something like that into the parent process, but then you'd be using IPC mechanisms to pull it off.
There are a number of ways I could imagine for pulling this off, all of which sound like an awful thing to do. For example you could write to a file and source it with an include directive (note: untested) inside an eval:
some_target:
${MAKE} ${MFLAGS} -f /path/to/makefile
some_other_target : some_target
$(eval include /path/to/new/file)
... though it has to be in a separate target as written above because all $(macro statements) are evaluated before the recipe begins execution, even if the macro is on a later line of the recipe.
gmake v4.x has a new feature that allows you to write out to a file directly from a makefile directive. An example from the documentation:
If the command required each argument to be on a separate line of the
input file, you might write your recipe like this:
program: $(OBJECTS)
$(file >$#.in) $(foreach O,$^,$(file >>$#.in,$O))
$(CMD) $(CMDFLAGS) #$#.in
#rm $#.in
(gnu.org)
... but you'd still need an $(eval include ...) macro in a separate recipe to consume the file contents.
I'm very leery of using $(eval include ...) in a recipe; in a parallel build, the included file can affect make variables and the timing for when the inclusion occurs could be non-deterministic w/respect to other targets being built in parallel.
You'd be much better off finding a more natural solution to your problem. I would start by taking a step back and asking yourself "what problem am I trying to solve, and how have other people solved that problem?" If you aren't finding people trying to solve that problem, there's a good chance it's because they didn't start down a path you're on.
edit You can do what you want for a non-recursive build. For example:
# makefile1
include makefile2
my_tool: ${OBJS}
# makefile2
OBJS := some.o list.o of.o objects.o
... though I caution you to be very careful with this. The build I maintain is extremely large (around 250 makefiles). Each level includes with a statement like the following:
include ${SOME_DIRECTORY}/*/makefile
The danger here is you don't want people in one tree depending on variables from another tree. There are a few spots where for the short term I've had to do something like what you want: sub-makefiles append to a variable, then that variable gets used in the parent makefile. In the long term that's going away because it's brittle/unsafe, but for the time being I've had to use it.
I suggest you read the paper Recursive Make Considered Harmful (if that link doesn't work, just google the name of the paper).
Your directory structure probably looks like this:
my_proj
|-- Makefile
|-- dir1
| `-- Makefile
`-- dir2
`-- Makefile
And what you are doing in your parent Makefile is probably this:
make -C ./dir1
make -C ./dir2
This actually spawns/forks a new child process for every make call.
You are asking for updating the environment of the parent process from its children, but that's not possible by design (1, 2).
You still could work around this by:
using a file as shared memory between two processes (see Brian's answer)
using the child's exit error code as a trigger for different actions [ugly trick]
I think the simplest solution is using standard out from a sub Makefile.
Parent Makefile
VAR := $(shell $(MAKE) -s -C child-directory)
all:
echo $(VAR)
Child Makefile
all:
#echo "MessageToTheParent"
TL;DR: How can I use find in a Makefile in order to identify the relevant source files (e.g., all .c files)? I know how to use a wildcard but I'm not able to get find to work.
Longer version:
I'm putting together a Makefile as part of an exercise on shared libraries; I noticed that when I use the following lines to specify the source and object files (i.e., .c files) for my shared library, I get an error after running make (gcc fatal error: no input files):
SRC=$(find src/ -maxdepth 1 -type f -regex ".*\.c")
OBJ=$(patsubst %.c,%.o,$(SRC))
*rest-of-makefile*
However, it compiles correctly when I use wildcard instead of find:
SRC=$(wildcard src/*.c)
OBJ=$(patsubst %.c,%.o,$(SRC))
*rest-of-makefile*
(As reference, included below is confirmation that the find command does indeed return the intended file when run from the shell.)
What is the correct syntax for using the find command (in my Makefile) to search for my source files (if it's at all possible)?
(Why would I prefer to use find?: I like the fact that I can quickly double-check the results of a find statement by running the command from the shell; I can't do that with wildcard. Also, I'd like to rely on regexes if possible. )
As reference, below is the relevant tree structure. As you can see (from the second code-block below), running the find command as specified in the Makefile (i.e., from above) does indeed return the intended file (src/libex29.c). In other words, the issue described above isn't because of a syntax problem in the find options or the regex.
.
├── build
├── Makefile
├── src
│ ├── dbg.h
│ ├── libex29.c
│ └── minunit.h
└── tests
├── libex29_tests.c
└── runtests.sh
Results of running find from the . folder above:
~/lchw30$ find src/ -maxdepth 1 -type f -regex ".*\.c"
src/libex29.c
P.S. I know this post technically violates the rule that all posted code must compile - I just thought that including the entire code for the both the Makefile as well as the libex29.c source file would be overkill. Let me know if that's not the case - happy to post the files in their entirety, if folks prefer.
Make doesn't have a find function. You have to use the shell function to run find. Also you should always use := not = for shell (and wildcard, for that matter) for performance reasons. And you should put spaces around assignments in make, just for clarity:
SRC := $(shell find src/ -maxdepth 1 -type f -regex ".*\.c")
Also I don't see why you want to use find here. find is good if you want to search and entire subdirectory structure which contains more than one level, but wildcard is far more efficient for simple directory lookups.
Before I start, I'll mention that I'm not using GNU Make in this case for building a C/C++ project.
Makefile:
DEST_DIR = build/
SRC_DIR = src/
$(SRC_DIR)a/ : $(SOMETHING_ELSE)
$(DO_SOMETHING_TO_GENERATE_A_DIR)
$(DEST_DIR)% : $(SRC_DIR)%
cp -r $^ $#
ALL_DEPS += <SOMETHING>
... more code which appends to ALL_DEPS ...
.PHONY: all
all : $(ALL_DEPS)
I've got some files not generated via Make rules in $(SRC_DIR). (For the sake of this example, let's say there's a directory $(SRC_DIR)b/ and a file $(SRC_DIR)c .)
I want to append to ALL_DEPS all targets which represent files or directories in $(DEST_DIR) so that "make all" will run all of the available $(DEST_DIR)% rules.
I thought to do something like this:
ALL_DEPS += $(addprefix $(DEST_DIR),$(notdir $(wildcard $(SRC_DIR)*)))
But of course, that doesn't catch anything that hasn't yet been made. (i.e. it doesn't append $(DEST_DIR)a/ to the list because $(SRC_DIR)a/ doesn't yet exist when the $(wildcard ...) invocation is evaluated and the shell doesn't include it in the results returned by the $(wildcard ...) invocation.)
So, rather than a function which finds all (currently-existing) files matching a pattern, I need one which finds all targets matching a pattern. Then, I could do something like this:
ALL_DEPS += $(addprefix $(DEST_DIR),$(notdir $(targetwildcard $(SRC_DIR)*)))
If it matters any, I've got much of the GNU Make code split across multiple files and included by a "master" Makefile. The ALL_DEPS variable is appended to in any of these files which has something to add to it. This is in an attempt to keep the build process modular as opposed to dropping it all in one monster Makefile.
I'm definitely still learning GNU Make, so it's not unlikely that I'm missing something fairly obvious. If I'm just going about this all wrong, please let me know.
Thanks!
It is simply not possible to do what you're trying to do; you're trying to get make to recognise something that doesn't exist.
This is part of the reason why, in general, wildcards are bad (the other being that you can end up including stuff you didn't mean to). The right thing to do here is to explicitly create a list of source files (ls -1 | sed -e 's/\(.*\)/sources+=\1/' > dir.mk) and perform the patsubst transformation on that list.
If you have additional files that are generate as part of the build, then you can append them to that list and their rules will be found as you'd expect.