Makefile: assume file is up to date for a specific target? - makefile

I'm using GNU Make to build graphs for a paper. I have two targets:
data which rebuilds the data/*.csv folder. This is very computationally expensive. (Also in terms of money.)
plot which rebuilds the plots from the data/ folder
Now, because of how expensive data is to compute, I committed the resulting files in git. I'd like to avoid changing them whenever possible. But when someone clones the git repository, it messes the mtime of the files, so make plot wants to rebuild data, even though they're already there.
That said, I don't want to remove the target dependency! If, for some reason, I recompute something in data, I want the plots to see that and to be able to rebuild themselves. Also, if one csv is missing, I want it to be computed.
I think ideally, what I want is to have a way to say "if these files are present, assume that they are up to date". Is there a way to do that in GNU Make?

Thanks to the comment of Renaud Pacalet, I used order-only dependencies to rewrite my rule like this:
data/%.csv: | source/%.py
...
Using this | allows make to never rebuild a CSV file already present.

Related

How do I not commit the development team lines in project.pbxproj without deselecting those lines manually?

I am collaborating with my friend on an iOS app. We use different Apple IDs in our Xcodes, so in "Signing and Capabilities" tab of project settings, we select different teams in the "Team" field:
From my observation, changing this affects the MyProject.xcodeproj/project.pbxproj file, which stores the file references that the Xcode project has, in addition to the "Team". Here's a snippet of what is changed:
buildSettings = {
ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon;
ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor;
CODE_SIGN_STYLE = Automatic;
DEVELOPMENT_TEAM = <my team ID>; /* this is changed */
INFOPLIST_FILE = MyProject/Info.plist;
LD_RUNPATH_SEARCH_PATHS = (
"$(inherited)",
"#executable_path/Frameworks",
);
PRODUCT_BUNDLE_IDENTIFIER = io.github.sweeper777.MyApp;
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_VERSION = 5.0;
TARGETED_DEVICE_FAMILY = 1;
};
The problem arises, when one of us commits this file and the other person pulls. The "puller" will now have the "Team" set to something invalid. When this person then tries to run the app on a real device, there will be code signing errors for obvious reasons. To solve this, this person must tediously go through all the targets that we have, and set each "Team" to their own team.
How can we make it so that on each person's computer, the "Team" stays the same after pulling, but any other changes to MyProject.xcodeproj/project.pbxproj is applied?
Remarks:
Putting the entire MyProject.xcodeproj/project.pbxproj in .gitignore doesn't work, because that would ignore every other change to it. Adding a new file to the project, for example, also changes MyProject.xcodeproj/project.pbxproj, and we want to be able to pull that change.
Manually deselecting the lines that say "DEVELOPMENT_TEAM = ..." when committing is as tedious as reselecting the correct team every time, so that's not a solution.
I found this. Apparently, I can configure git to run sed before git checkout and git add. However, that answer seems ignore the line by deleting it completely. This means that my friend, when he pulls, would still have to reselect the correct team. What I want is the kind of "ignore" that simply stops tracking that line. That is, if there is a local version of that line, use that.
I am also aware that this all wouldn't be a problem if we are on the same team. But if I understand this correctly, I can't have multiple people on my team unless I have a Company account, and not only can I not afford that, I don't own a Company.
I don't use Xcode itself and do not know how to smuggle Git hooks and scripts past the Xcode interface, so you'll need more than just this answer. But you mention sed in comments, and given your proposed file format, that may well be the way to go:
buildSettings = {
ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon;
ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor;
CODE_SIGN_STYLE = Automatic;
DEVELOPMENT_TEAM = <my team ID>; /* this is changed */
INFOPLIST_FILE = MyProject/Info.plist;
LD_RUNPATH_SEARCH_PATHS = (
"$(inherited)",
"#executable_path/Frameworks",
);
PRODUCT_BUNDLE_IDENTIFIER = io.github.sweeper777.MyApp;
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_VERSION = 5.0;
TARGETED_DEVICE_FAMILY = 1;
};
Git has the ability to run what it calls clean and smudge filters. These can be used to run any arbitrary program you like, including sed, the "stream editor", which is particularly good at making single-line changes based on regular expression matches.
There is another method that may also work, and may "play better" with Xcode, or may play worse. I'll go over that too, after covering clean and smudge filters.
Before we dive into writing clean and smudge filters, and using them from Git—you'll need to know all of these details as you will have to write your own custom filters—we should start with a simple fact about Git commits: No part of any commit can ever be changed. Once you make a commit, the stuff that's inside the commit—the stored data in all of its files—is the way it is, forever. So these filters have to work within that system. Remember that, as it will help with understanding what we're doing.
How Git makes and stores objects
The files inside a commit are not files, exactly: they're not the same thing as files in your file system, at least. Instead, they are what Git calls objects, specifically blob objects. A blob object holds the file's data; other objects hold the file's name; and commit objects collect everything together to be used all at once. There's one more internal object type for annotated tags but we'll stop here as we're really only interested in the blob-object part.
When Git extracts a commit, it reads the internal blob objects and runs them through internal code to decompress and format them into regular files. This can include doing end-of-line hacking (turning LFs into CRLFs) if desired. Normally, all this happens entirely inside Git, and the end result is that Git writes out an ordinary everyday file for you to use. This ordinary file is what you will work on / with, in Xcode or any other editor and compiler system and so on. These ordinary files are in your working tree.
After you've extracted some commit, you'll do some work on it, by changing some or all of the files in your working tree, to achieve whatever result you wanted. This can include changing the buildSettings, editing Swift code, editing Objectionable-C Objective-C code, and so on. You might add all-new files to the working tree, some of which you never commit at all (you can help make sure this never happens by listing such files in .gitignore).
Eventually, though, you'd like to commit the updated code. To do so, you must run git add, or maybe have your IDE run git add for you (perhaps Xcode has clicky buttons to do this). This invokes code in Git that converts the working tree file(s) back to internal blob objects if and as needed.
Again, normally this is all handled entirely inside of Git. Git will read the working tree file, maybe do CRLF-to-LF-only changes, compress the text, search for duplicate objects, and do all the other complicated things necessary to prepare the file, so that it is ready to be committed. The resulting data need not match what's in your working tree at all: it just has to be something that, when Git later goes to extract the file, produces what you will need in your working tree.
Clean and smudge filters
This is where these clean and smudge filters come in. I said, above, that normally, Git does the extraction and insertion all on its own. For binary files, the only thing Git does here is apply lossless compression.1 For text files, Git can do CRLF/LF substitutions as well. But what if you'd like to do your own operations?
You can: Git will let you do whatever you want during the extract process with a smudge filter, and will let you do whatever you want during the compress process with a clean filter. The clean filter replaces the in-file data, using a stream-edit type process,2 and then Git does its CRLF hacking if any and compressing on the "cleaned" data. The smudge filter replaces the decompressed, post-CRLF-hacking data coming out of Git with the data that should go into the working tree.
Hence you can write, as your clean filter, a sed script of the form:
s/DEVELOPMENT_TEAM = .*;/DEVELOPMENT_TEAM = DEVTEAMTEMPLATE;/
With that as the entire sed script, what sed will do is edit the incoming data stream and replace any actual development team text with the word DEVTEAMTEMPLATE.
Your smudge filter has to work slightly harder: it must find the template line and adjust it so that it contains the correct team ID. Where will you get the correct team ID? That's up to you: perhaps you can store it in a file in your home directory, or in a file that you create in the working tree but never commit in Git. You'll have to write this one or two or however-many-liner sed and/or shell script yourself.
1There are multiple phases of compression; git add does just one, and git checkout undoes all—including reading from "pack" files—as needed. The deeper level of compression, using delta encoding techniques, is entirely invisible at the "object" level, so nobody ever really has to think about it.
2With the advent of Git-LFS, Git gained the ability to run long-lived filters. Before that, Git always used simple stream filtering. The stream filtering is easier to understand, but is less efficient for doing en-masse operations on many files. Here, we're only interested in one file per repository anyway, so there's no need to go into the fancier long-lived filter details.
Defining clean and smudge filters
The tricky part here, with Git, is that you must define the filters in one place—in $HOME/.gitconfig or .git/config, for instance—and then tell Git to invoke them from another place, using the .gitattributes file. This is described in the gitattributes documentation. This documentation is pretty thorough, so read it. You can ignore all the long-running filter discussion, as noted above. I will quote one bit from the documentation here for emphasis, though, and expound on it:
Note that "%f" is the name of the path that is being worked on. Depending on the version that is being filtered, the corresponding file on disk may not exist, or may have different contents. So, smudge and clean commands should not try to access the file on disk, but only act as filters on the content provided to them on standard input.
When Git is running the smudge filter, it:
has opened some internal object (which may or may not be packed);
has decompressed it, or is in the process of decompressing it, and pumped / is-pumping out the data; and
this data is being fed to your filter, but is not written out to any file anywhere.
Your filter can use %f to know the name of the target output file, but the data are not in that file yet. The data bytes are only in some OS-level pipes or sockets or whatever your OS uses for connecting the output of one program (Git's internal decompressors) to another (your filter). Your smudge filter must read its standard input to get the data, and write the smudged data to standard output so that Git can read it (if necessary) and/or redirect that output to the correct file. Do not attempt to open the file by name!
(The same holds for the clean filter, except that in many cases, the input to your filter is just the raw data already in the file, so that opening the file and reading it mostly works. So this can mislead you, if you do your tests using a clean filter.)
Note that you can implement this scheme without a clean filter at all: your smudge filter can replace whatever is in the committed file even if it's a real team ID, rather than just a template. If you choose to do this, however, you'll "see" the team ID changing every time a different team-ID commits the file. The nice thing about using the clean filter is that once the committed copies of the file use the template line, every future cleaned file also uses the template line, so that it never changes.
Alternative: a template file
In general, it's unwise to commit actual configurations. Clean and smudge tricks can work, but they can only go so far: this particular file format works well because the change you want made is on a single line, and Git itself shows you file changes on a line-by-line basis, and sed works well with line-oriented input, and so on.
A lot of configuration files, though, wind up storing at least slightly-sensitive data, or perhaps very-sensitive data such as cleartext passwords. Such files should not be stored in Git at all if at all possible. Instead, you would store a template file in Git.
In this case, for instance, instead of storing MyProject.xcodeproj/project.pbxproj, you might have Git store MyProject.xcodeproj/project.pbxproj.template. This file would have template-ized contents. When you clone and check out the repository, you'd subsequently copy the template file into place and do any required adjustments.
Should the MyProject.xcodeproj/project.pbxproj file itself need to change, e.g., to acquire a new SWIFT_VERSION setting, you'd instead edit the template file, add that to Git, and commit. You would then use the usual "convert template to mine" process, or manually update the MyProject.xcodeproj/project.pbxproj file. Since this file is never committed—and is listed in .gitignore—it never goes into any commit and you never have to worry about collisions within it. Only the template file goes into Git.

How does make tell when a file hasn't been modified?

Make can tell if a file has been modified since the last make invocation. I guess it compares the files' modification times with the time they were last built. To do this it would have to store the latest times on disk, right?
Anyone know if and where or how it does that?
Thanks.
I guess you didn't look too hard to find an answer:
http://www.gnu.org/software/make/ If a target file is newer than all of its dependencies, then it is already up to date, and it does not need to be regenerated.
http://www.gnu.org/software/make/manual/html_node/Rule-Syntax.html The criterion for being out of date is specified in terms of the prerequisites, which consist of file names separated by spaces. [...] A target is out of date if it does not exist or if it is older than any of the prerequisites (by comparison of last-modification times). The idea is that the contents of the target file are computed based on information in the prerequisites, so if any of the prerequisites changes, the contents of the existing target file are no longer necessarily valid.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html The make utility examines time relationships and shall update those derived files (called targets) that have modified times earlier than the modified times of the files (called prerequisites) from which they are derived.
It doesn't do that.
Instead, it compares the modification time of the target with the modification times of its dependencies. So when you have a rule
foo-sorted: foo; sort $< > $#
the modification times of foo-sorted and foo are compared.

Expressions in a build rule "Output Files"?

Can you include expressions in the "Output Files" section of a build rule in Xcode? Eg:
$(DERIVED_FILE_DIR)$(echo "/dynamic/dir")/$(INPUT_FILE_BASE).m
Specifically, when translating Java files with j2objc, the resulting files are saved in subfolders, based on the java packages (eg. $(DERIVED_FILE_DIR)/com/google/Class.[hm]). This is without using --no-package-directories, which I can't use because of duplicate file names in different packages.
The issue is in Output Files, because Xcode doesn't know how to search for the output file at the correct location. The default location is $(DERIVED_FILE_DIR)/$(INPUT_FILE_BASE).m, but I need to perform a string substitution to insert the correct path. However any expression added as $(expression) gets ignored, as it was never there.
I also tried to export a variable from the custom script and use it in Output Files, but that doesn't work either because the Output Files are transformed into SCRIPT_OUTPUT_FILE_X before the custom script is ran.
Unfortunately, Xcode's build support is pretty primitive (compared to say, make, which is third-odd years older :-). One option to try is splitting the Java source, so that the two classes with the same names are in different sub-projects. If you then use different prefixes for each sub-project, the names will be disambiguated.
A more fragile, but maybe simpler approach is to define a separate rule for the one of the two classes, so that it can have a unique prefix assigned. Then add an early build phase to translate it before any other Java classes, so the rules don't overlap.
For me, the second alternative does work (Xcode 7.3.x) - to a point.
My rule is not for Java, but rather for Google Protobuf, and I tried to maintain the same hierarchy (like your Java package hierarchy) in the generated code as in the source .proto files. Indeed files (.pb.cc and .pb.h) were created as expected, with their hierarchies, inside the Build/Intermediates/myProject.build/Debug/DerivedSources directory.
However, Xcode usually knows to continue and compile the generated output into the current target - but that breaks as it only looks for files in the actual ${DERIVED_FILE} - not within sub-directories underneath.
Could you please explain better "Output Files are transformed into SCRIPT_OUTPUT_FILE_X" ? I do not understand.

Make dummy target that checks the age of an existing file?

I'm using make to control the data flow in a statistical analysis. If have my raw data in a directory ./data/raw_data_files, and I've got a data manipulation script that creates cleaned data cache at ./cache/clean_data. The make rule is something like:
cache/clean_data:
scripts/clean_data
I do not want to touch the data in ./data/, either with make, or any of my data munging scripts. Is there any way in make to create a dependency for the cache/clean_data that just checks whether specific files in ./data/ are newer than last time make ran?
If clean_data is a single file, just let it depend on all data files:
cache/clean_data: data/*
scripts/clean_data
If it is a directory containing multiple cleaned files, the easiest way is to write a stamp file and have that depend on your data files:
cache/clean_data-stamp: data/*
scripts/clean_data
touch cache/clean_data-stamp
Note that this regenerates all clean_data files if one data file changes. A more elaborate approach is possible if you have a 1-to-1 mapping between data and cleaned files. The GNU Make Manual has a decent example of this. Here is an adaptation:
DATAFILES:= $(wildcard data/*)
CACHEFILES:= $(patsubst data/%,cache/clean_data/%,$(DATAFILES))
cache/clean_data/% : data/%
scripts/clean_data --input $< --output $#
all: $(CACHEFILES)
Here, we use wildcard to get a list of all files under data. Then we replace the data path with the cache path using patsubst. We tell make how to generate cache files via a static pattern rule, and finally, we define a target all which generates all the required cache files.
Of course you can also list your CACHEFILES explicitly in the Makefile (CACHEFILES:= cache/clean_data/a cache/clean_data/b), but it is typically more convenient to let make handle that automatically, if possible.
Notice that this complex example probably only works with GNU Make, not in Windows' nmake. For further info, consult the GNU Make Manual, it is a great resource for all your Makefile needs.

How do I get a file manifest for each revision in a git repository?

I have a git repository that was created on Microsoft Windows. Microsoft Windows has a case insensitive file system. The people checking into this repository have not been careful about the case of their filenames. This means that the same directory or file sometimes shows up under two different names.
I mean to fix this problem. But in order to really fix it, I have to get a handle on it.
Is there a quick and simple way to get a list of the files at each revision?
I need this in order to figure out which revisions (if any) have the same file under two different names so I can decide on a strategy for fixing such cases. This means I need to get this information en-masse as quickly as possible so the analysis consumes a resonable amount of time.
One way to get this is with ls-tree:
git ls-tree -r --name-only <commit>
(Note that this looks at the portion of the tree corresponding to your current directory, so you should either run it from the top level of your repo, or give the --full-tree option.)
This is essentially instantaneous, since all Git has to do is recursively examine the tree; it doesn't even have to look at the contents of files.
I'm not sure how you're going to use a list of filenames to detect the same file under two different names. If you just mean that you want to look for filenames that would be the same on a case-insensitive filesystem, then the list of filenames is all you needed.
However, if you think the files might actually have the same content, you could drop the --name-only, so that you'll also see the SHA1s of all the file, and can find identical files by looking for duplicate hashes.
You could run something like this:
git log --name-only --pretty="format:%H"
This command will show the the sha1 and the list of changed files for every revision.

Resources