I'm using make to control the data flow in a statistical analysis. If have my raw data in a directory ./data/raw_data_files, and I've got a data manipulation script that creates cleaned data cache at ./cache/clean_data. The make rule is something like:
cache/clean_data:
scripts/clean_data
I do not want to touch the data in ./data/, either with make, or any of my data munging scripts. Is there any way in make to create a dependency for the cache/clean_data that just checks whether specific files in ./data/ are newer than last time make ran?
If clean_data is a single file, just let it depend on all data files:
cache/clean_data: data/*
scripts/clean_data
If it is a directory containing multiple cleaned files, the easiest way is to write a stamp file and have that depend on your data files:
cache/clean_data-stamp: data/*
scripts/clean_data
touch cache/clean_data-stamp
Note that this regenerates all clean_data files if one data file changes. A more elaborate approach is possible if you have a 1-to-1 mapping between data and cleaned files. The GNU Make Manual has a decent example of this. Here is an adaptation:
DATAFILES:= $(wildcard data/*)
CACHEFILES:= $(patsubst data/%,cache/clean_data/%,$(DATAFILES))
cache/clean_data/% : data/%
scripts/clean_data --input $< --output $#
all: $(CACHEFILES)
Here, we use wildcard to get a list of all files under data. Then we replace the data path with the cache path using patsubst. We tell make how to generate cache files via a static pattern rule, and finally, we define a target all which generates all the required cache files.
Of course you can also list your CACHEFILES explicitly in the Makefile (CACHEFILES:= cache/clean_data/a cache/clean_data/b), but it is typically more convenient to let make handle that automatically, if possible.
Notice that this complex example probably only works with GNU Make, not in Windows' nmake. For further info, consult the GNU Make Manual, it is a great resource for all your Makefile needs.
Related
I'm using GNU Make to build graphs for a paper. I have two targets:
data which rebuilds the data/*.csv folder. This is very computationally expensive. (Also in terms of money.)
plot which rebuilds the plots from the data/ folder
Now, because of how expensive data is to compute, I committed the resulting files in git. I'd like to avoid changing them whenever possible. But when someone clones the git repository, it messes the mtime of the files, so make plot wants to rebuild data, even though they're already there.
That said, I don't want to remove the target dependency! If, for some reason, I recompute something in data, I want the plots to see that and to be able to rebuild themselves. Also, if one csv is missing, I want it to be computed.
I think ideally, what I want is to have a way to say "if these files are present, assume that they are up to date". Is there a way to do that in GNU Make?
Thanks to the comment of Renaud Pacalet, I used order-only dependencies to rewrite my rule like this:
data/%.csv: | source/%.py
...
Using this | allows make to never rebuild a CSV file already present.
I am new to the Go programming language. I am hoping to integrate Go code, if possible, into existing code that contains heterogeneous code. My present organization of code is:
<reverse-TLD>/<component-path>/<code><extension>
where:
<reverse-TLD> is the domain with parts reversed. For example, com.mydomain.mysubdomain.
<component-path> is 1 or more subdirectories under which code lives. For example, image/jpeg.
<code> is the part of a code filename before the extension. For example, jpeg2000.
<extension> is the extension. For example, .sh, .py, etc. For example, this taken with the other elements above would have a path: com.mydomain.mysubdomain/image/jpeg/jpeg2000.go.
Note that code files other than Go files are in the same directory as Go files.
My issues are:
My existing structure above doesn't include src, pkg, or bin directories. Are there environment or Go env variables that allow me to specify these directories?
The directory <reverse-TLD> and all files under it is read-only. I need the output of the compilation to be based under another directory, given as $BUILD_DIR. That directory can have whatever directories are needed under it.
I am thinking that as a convention, I could use lowercase filenames for Go code that will become an executable command and leading-uppercase filenames for Go code that will become package objects. Is there a best practice naming convention for making this distinction in the Go community?
Is there any problem with my using reverse TLDs? For example, com.mydomain.mysubdomain vs. mysubdomain.mydomain.com.
If the src, pkg, and bin directories are hard requirements, then I think I'll have to write a script that finds Go files and copies them to a temporary directory that meets the requirements, compile them, and then move the built artifacts to the $BUILD_DIR. But, I'm hoping that Go is flexible enough to allow me to do this.
If it is possible, could you show me the commands or environment variables that are needed to compile given the constraints above? And, comments on items 1-4 above are appreciated. Thank you!
That against Go's conventions and is not a recommended practice
Make can tell if a file has been modified since the last make invocation. I guess it compares the files' modification times with the time they were last built. To do this it would have to store the latest times on disk, right?
Anyone know if and where or how it does that?
Thanks.
I guess you didn't look too hard to find an answer:
http://www.gnu.org/software/make/ If a target file is newer than all of its dependencies, then it is already up to date, and it does not need to be regenerated.
http://www.gnu.org/software/make/manual/html_node/Rule-Syntax.html The criterion for being out of date is specified in terms of the prerequisites, which consist of file names separated by spaces. [...] A target is out of date if it does not exist or if it is older than any of the prerequisites (by comparison of last-modification times). The idea is that the contents of the target file are computed based on information in the prerequisites, so if any of the prerequisites changes, the contents of the existing target file are no longer necessarily valid.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html The make utility examines time relationships and shall update those derived files (called targets) that have modified times earlier than the modified times of the files (called prerequisites) from which they are derived.
It doesn't do that.
Instead, it compares the modification time of the target with the modification times of its dependencies. So when you have a rule
foo-sorted: foo; sort $< > $#
the modification times of foo-sorted and foo are compared.
I am having a huge .tgz file which is further structured inside like this:
./RandomFoldername1/file1
./RandomFoldername1/file2
./RandomFoldername2/file1
./RandomFoldername2/file2
etc
What I want to do is having each individual file extracted to standard output so that I can pipe it afterwards to another command. While doing this, I also need to get the RandomFoldername name and file name so that I can deal with them properly from within the second command.
Till now the options I have are
to either extract all of the tarball and deal with the structured files that I will be having, which is not an option since the extracted tar doesn't fit into the hard drive
Make a loop that pattern match each file and extract one file at time. This option although that solves the problem, is too slow because the tarball is sweeped each time for only one file.
While searching on how to solve this, I've started to fear that there is no better alternative to this.
Using tar the tool I don't believe you have any other options.
Using a tar library for some language of your choice should allow you to do what you want though as it should let you iterate over the entries in the tarball one-by-one and allow you to extract/pipe/etc. each file one-by-one as necessary.
Can you include expressions in the "Output Files" section of a build rule in Xcode? Eg:
$(DERIVED_FILE_DIR)$(echo "/dynamic/dir")/$(INPUT_FILE_BASE).m
Specifically, when translating Java files with j2objc, the resulting files are saved in subfolders, based on the java packages (eg. $(DERIVED_FILE_DIR)/com/google/Class.[hm]). This is without using --no-package-directories, which I can't use because of duplicate file names in different packages.
The issue is in Output Files, because Xcode doesn't know how to search for the output file at the correct location. The default location is $(DERIVED_FILE_DIR)/$(INPUT_FILE_BASE).m, but I need to perform a string substitution to insert the correct path. However any expression added as $(expression) gets ignored, as it was never there.
I also tried to export a variable from the custom script and use it in Output Files, but that doesn't work either because the Output Files are transformed into SCRIPT_OUTPUT_FILE_X before the custom script is ran.
Unfortunately, Xcode's build support is pretty primitive (compared to say, make, which is third-odd years older :-). One option to try is splitting the Java source, so that the two classes with the same names are in different sub-projects. If you then use different prefixes for each sub-project, the names will be disambiguated.
A more fragile, but maybe simpler approach is to define a separate rule for the one of the two classes, so that it can have a unique prefix assigned. Then add an early build phase to translate it before any other Java classes, so the rules don't overlap.
For me, the second alternative does work (Xcode 7.3.x) - to a point.
My rule is not for Java, but rather for Google Protobuf, and I tried to maintain the same hierarchy (like your Java package hierarchy) in the generated code as in the source .proto files. Indeed files (.pb.cc and .pb.h) were created as expected, with their hierarchies, inside the Build/Intermediates/myProject.build/Debug/DerivedSources directory.
However, Xcode usually knows to continue and compile the generated output into the current target - but that breaks as it only looks for files in the actual ${DERIVED_FILE} - not within sub-directories underneath.
Could you please explain better "Output Files are transformed into SCRIPT_OUTPUT_FILE_X" ? I do not understand.