distcc - are there cases it requires a synchronized network filesystem - makefile

Two simplified makefiles
makefile1
a.txt:
echo “123144234” > a.txt
t2: a.txt
cat a.txt > b.txt
makefile2
t1:
echo “123144234” > a.txt
t2: t1
cat a.txt > b.txt
Both makefiles have the same functionality.
Both makefiles can be run in parallel because the dependency of t2 on t1.
However, there is a critical difference which might?/does? make a difference when it comes to distributed builds.
In makefile1, t2 depends directly on the artifact a.txt which is also the same as the name of the target itself a.txt. However, in makefile2, while the recipe and artifact of t1 is same as for a.txt, the name of the target is not a.txt.
This difference is key because gnu make (and I assume distcc) does NOT parse the recipe - nor analyze the filesystem at runtime - to determine all the artifacts for a given target. In makefile2, gnu make does NOT create ANY relationship between a.txt and t1.
When the build is done as make -j i.e. parallel but not distributed, this difference is irrelevant because all make targets are run on the same machine i.e. all the make instances access the same filesystem.
But let's consider what could?/does? happen during a distributed build if the two targets are built on two separate machines
In both makefiles, the recipe for t2 would be run AFTER the recipe for a.txt/t1.
However, in makefile1 the dependency of t2 on a.txt is explicit i.e. distcc knows that to make t2 on a separate machine, it must send the file a.txt to that separate machine.
QUESTION
If makefile2 is run using distcc, without a synchronized distributed filesystem, and t2 is maked on another machine, will there be a build error because a.txt is not present on the other machine?
What are the options for a distributed Linux filesystem?

distcc is merely a replacement for gcc. It uses local gcc to preprocess the source file, then sends it for compilation to another machine, receives back the object file and saves it into the local filesystem. distcc doesn't require a shared network filesystem or clock synchronization between the participating hosts.
There is alse new "pump" functionality that preprocesses on remote servers, but it doesn't require a shared network filesystem or clock synchronization either.
Your make always runs locally.
Answering your questions:
distcc doesn't run make. make runs distc instead of gcc. make examines dependencies and their timestamps locally.
make runs locally and it doesn't care whether the filesystem it uses is local or networked.

Related

Run each Command individually but distribute them in Multiple Make-files

I wanted to have every Env have its own Makefile (local, dev, production).
So I created 3 directories and a Makefile for every Directory.
Then creates a common MakeFile which includes all other child Makefiles as :
I was able to include my child commands in Parent file but the issue is
If I ran make local , it executes all commands inside Makefile.local
But instead I want each command must be ran individual
When mentioned like make local local_command or even make local_command , local_command must be executed only.
You likely want something like:
TOP_LEVEL_TARGS := dev local prod
$(TOP_LEVEL_TARGS):
make -f config/local/Makefile.$# $(filter-out $(TOP_LEVEL_TARGS), $(MAKECMDGOALS))
This will invoke a sub-make with all the command goals of the original make commands (minus the top level targets).

How to write a Makefile to copy scripts to server

After I finish writing scripts on my local machine, I need to copy them in the cluster to execute the codes. For example, I want to copy all the matlab files in my current directory in a directory at the server id#server.
Can anyone help to write a very basic Makefile to fulfill this purpose?
Thanks a lot!
John
Here is an adaptation of Jens's answer, together with my answer here, that takes advantage of the capabilities of Make to only copy across those files that have been modified since the last time you copied the files to the server. That way, if you have hundreds of .m files and you modify one of them, you won't copy all of them across to the server.
It makes use of an empty hidden file, .last_push, that serves only to record (through its own timestamp) the time at which we last copied files to the server.
FILES = $(shell find . -name *.m)
SCP = scp id#server:path/relative/to/your/serverhomedir
LAST_PUSH = .last_push
.PHONY : push
push : $(LAST_PUSH)
$(LAST_PUSH) : $(FILES)
$(SCP) $?
touch $(LAST_PUSH)
Run this with make or make push. The key is the variable $?, which is populated with the list of all prerequisites that are newer than the target - in this case, the list of .m files that have been modified more recently than the last push.
How do you copy files to the server? Assuming you have ssh/scp available:
FILES = file1 file2 *.matlab
copy:
scp $(FILES) id#server:path/relative/to/your/serverhomedir
Run with
$ make copy
As a shell script, it could look like this:
#!/bin/sh
set -- file1 file2 *.matlab
scp "$#" id#server:path/relative/to/your/serverhomedir
Don't forget to chmod u+x yourscript.

gprof output not being generated when executed in bash script

I've compiled and linked my application with -pg, and when running the application I get the correct gmon.out file and can analysis it with gprof. I am running a number of scripts under different situations to discover a speed issue between two versions of our software.
When I run the application I do produce the gmon.out output.
Since I have to do this for a number of different scripts, I piled them into a scrip to run so I can take a nap. It's not complicated. I'm also running this script at the same time in another directory with the other version of the application.
./test test1.script
gprof test > test1.ver1.stats
rm -f gmon.out
./test test2.script
gprof test > test2.ver1.stats
rm -f gmon.out
These do not produce the gmon.out file. Is there any explanation to this behavior? Also, running the script without the analysis of the other version running (eg, concatenate the scripts instead of running them in parallel), also produces the same behavior.
The scripts I am using to test the application change directories to get a large data-set. This affected the gmon.out location so when the application exited it was written to that far off directory. As the GNU GProf manual says,
The gmon.out file is written in the program's current working
directory at the time it exits. This means that if your program calls
chdir, the gmon.out file will be left in the last directory your
program chdir'd to. If you don't have permission to write in this
directory, the file is not written, and you will get an error message.
Last night I had the scripts running like,
GMON_OUT_PREFIX=test1.ver1.out ./test test1.script
GMON_OUT_PREFIX=test2.ver1.out ./test test2.script
GMON_OUT_PREFIX=test3.ver1.out ./test test3.script
And although, I couldn't find the files in my working directory, I did eventually find the files in the data folder. In the above, it is not important to specify different names for the output since they are proceeded by their process id (like, GMON_OUT_PREFIX.PID), but was necessary in my case to distinguish the tests.

How to have make build from one directory if the source file exists, otherwise build from another?

I'm working on modifying a huge recursive makefile project that has 6000+ source files. All of which are clearcase controlled. Since I don't want to copy the whole source tree, I'm trying to create a new project only containing the modified source files and thus pull the source from the original tree if they don't exist in my modified tree.
I have already modified the makefile in ModDir to check if each folder exists locally and execute make in that folder if it does. Otherwise it executes make in the sourceDir. My issue lies in the subdir makefiles.
Each subdir makefile contains a list of all of the source files needed for that module. I need to find a way to build the file locally if it exists, else build the file from SourceDir/subdir.
I.e. in my image, the Dir1 makefile needs to build F1 from ModDir/Dir1/F1, and build the other files from SourceDir/Dir1/F2-F3.
I tried to use VPATH to tell make to locate the source files in both locations (ModDir first of course) which works beautifully. However, since make assumes the object files are in the ModDir, it can't find any of the object files built in SourceDir.
I also tried making a pre-build rule to modify the make file list with bash, but I don't know if that's even possible.
How do I use make to build from one directory if the source file exists (ModDir), otherwise build from another (SourceDir)?
The easiest way will be to put your "if ... then ... else" logic in an external bash or batch (whichever OS you use) script and swap makefiles before calling make

How do I get GNU make to remove intermediate directories created by implicit rules?

GNU make automatically removes intermediate files created by implicit rules, by calling rm filename at the end. This obviously doesn't work if one of the targets was actually a directory. Take the following example:
.PHONY: all
all: test.target
%.target: tempdir.%
touch $#
tempdir.%:
mkdir -p $#
make -n reveals the action plan:
mkdir -p tempdir.test
touch test.target
rm tempdir.test
Is it possible to get GNU make to correctly dispose of intermediate directories? Perhaps by changing rm to rm -rf?
There is no way to make this happen. Although GNU make prints the command "rm", really internally it's running the unlink(2) system call directly and not invoking a shell command. There is no way to configure or modify the command that GNU make runs (except by changing the source code of course).
However, I feel I should point out that it's just not going to work to use a directory as a normal prerequisite of a target. GNU make uses time-last-modified comparison to tell when targets are up to date or not, and the time-last-modified of a directory does not follow the standard rules. The TLM of a directory is updated every time a file (or subdirectory) in that directory is created, deleted, or renamed. This means you will created the directory, then have a bunch of files that depend on it: the first one is built and has timestamp N. The last one is built and has timestamp N+x. That also sets the directory's timestamp to N+x. Then the next time you run make, it will notice that the first one has an older timestamp (N) than one of its prerequisites (the directory, at N+x), and rebuild.
And this will happen forever, until it can build the remaining "out of date" prerequisites fast enough that their timestamp is not newer than the directory.
And, if you were to drop a temporary file or editor backup file or something in that directory, it would start all over again.
Just don't do it.
Some people use an explicit shell command to create directories. Some people create them as a side-effect of the target creation. Some people use order-only prerequisites to ensure they're created on time.

Resources