I'm using makefiles to structure a data analysis pipeline consisting of various steps such as extracting data, sampling and modelling. However, when I create an additional "upper-level" makefile (such as modelling) that depends on others, and I try to run make again (either on the new makefile or any other upper-level makefiles), it tries to re-build everything again from the lowest-level makefile targets.
In the code below I show the "lowest level" makefile and another one (makefile.sample that includes the first one):
Makefile
all:
make $(DATA_TRN)
make $(Y_TRN)
# dates
SNAP_TRN := 2019-06-18
SNAP_TST := 2019-07-24
# Targets
TARGET_CLASS = m1_vol
TARGET_SURV = m1_qty
TARGET_INIT = act_qty
END_DUR = 1
# dirs
DIR_DATA := data
DIR_CONFIG := configs
# data files for training and predict
DATA_TRN := $(DIR_DATA)/processed/churnvol_train_$(SNAP_TRN).csv
DATA_TST := $(DIR_DATA)/processed/churnvol_test_$(SNAP_TST).csv
# labels
Y_TRN := $(DIR_DATA)/processed/label_train_$(SNAP_TRN).csv
Y_TST := $(DIR_DATA)/processed/label_test_$(SNAP_TST).csv
# Config files
CONFIG_PANEL := $(DIR_CONFIG)/config_panel.yaml
CONFIG_INPUT := $(DIR_CONFIG)/config_inpute.yaml
FEATS := $(DIR_CONFIG)/featimp_churnvol.csv
# Generates a clean dataset (inputed and one hot encoded)
$(DATA_TRN): $(CONFIG_INPUT) $(FEATS) | $(DATA_DIR)
python src/_buildDataset.py --train-file $(DATA_TRN) \
--test-file $(DATA_TST) \
--train-date $(SNAP_TRN) \
--test-date $(SNAP_TST) \
--config-panel $(CONFIG_PANEL) \
--config-input $< \
--feats $(lastword $^)
$(Y_TRN): $(DATA_TRN) | $(DATA_DIR)
python src/_extractColumns.py --data $< \
--columns $(TARGET_INIT),$(TARGET_CLASS),$(TARGET_SURV) \
--file $#
clean:
-rm -rf $(DATA_TST) $(DATA_TRN) $(Y_TRN) $(Y_TST)
.PHONY: all clean
Makefile.sample.u1
include Makefile
sample:
make -f Makefile.sample.$(SAMPLE_NAME) $(DATA_TRN_SAMPLE)
SAMPLE_NAME = u1
MIN_RATIO = 0.333333 # ratio of minority class for undersampling
DATA_TRN_SAMPLE := $(DIR_DATA)/processed/churnvol_train_$(SAMPLE_NAME)_$(SNAP_TRN).csv
$(DATA_TRN_SAMPLE): $(DATA_TRN) | $(DATA_DIR)
python ./src/_generate_sample_u1.py --data-train $< \
--data-file $# \
--min-ratio $(MIN_RATIO) \
--end-feat $(TARGET_SURV) \
--end-dur $(END_DUR)
.PHONY: sample
From these 2 makefiles, I'm assuming that make only needs to rebuild $(DATA_TRN) if either $(CONFIG_INPUT) or $(FEATS) changed after $(DATA_TRN) right? The problem is that both timestamps show a previous date from the target $(DATA_TRN) and when I run make all it still rebuilds the target. src/_buildDataset.py also shows a previous date. I've tried to debug and running make $(DATA_TRN) returns a $(DATA_TRN) is up to date. message. However, when I run make $(Y_TRN) it rebuild $(DATA_TRN), which I find odd. This also means that when I run make -f Makefile.sample.u1 sample, it also goes on to rebuild $(DATA_TRN). Can someone help me figure out if I did something wrong? Or other ways to debug, such as knowing exactly what file make marked as "changed". Thank you
Related
In linux kernel Makefile.build:
`
# To build objects in subdirs, we need to descend into the directories
$(subdir-builtin): $(obj)/%/built-in.a: $(obj)/% ;
$(subdir-modorder): $(obj)/%/modules.order: $(obj)/% ;`
the $(obj)/subdir/built-in.a depends on $(obj)/subdir prereq, but where is the rule to build $(obj)/subdir?
I assume the following rule is only for $(obj)/ directory, and can't apply for the above subdir.
`
# Build
# ---------------------------------------------------------------------------
$(obj)/: $(if $(KBUILD_BUILTIN), $(targets-for-builtin)) \
$(if $(KBUILD_MODULES), $(targets-for-modules)) \
$(subdir-ym) $(always-y)
#:`
Thanks!
I have looked up the makefile, but have not found any clue.
I already got the answer from a Linux kernel maintainer, as below:
See around line 500.
$(subdir-ym):
$(Q)$(MAKE) $(build)=$# \
need-builtin=$(if $(filter $#/built-in.a, $(subdir-builtin)),1) \
need-modorder=$(if $(filter $#/modules.order, $(subdir-modorder)),1) \
$(filter $#/%, $(single-subdir-goals))
I have the following rule in a Makefile
%/collapsed_flow_profile.png: \
code/piv/plot_collapsed_flow_profiles.R \
%/experiment2/averaged_flow_profile.csv \
%/experiment3/averaged_flow_profile.csv \
%/experiment4/averaged_flow_profile.csv \
%/experiment6/averaged_flow_profile.csv \
%/experiment7/averaged_flow_profile.csv
code/piv/plot_collapsed_flow_profiles.R \
$*/collapsed_flow_profile.png \
$*/experiment2/averaged_flow_profile.csv \
$*/experiment3/averaged_flow_profile.csv \
$*/experiment4/averaged_flow_profile.csv \
$*/experiment6/averaged_flow_profile.csv \
$*/experiment7/averaged_flow_profile.csv
As you can see, I'm passing all the dependencies except the first one as an argument to the code used in the recipe. Since $^ automatic variable contains all the dependencies, it is possible to remove the first dependency from $^ and pass it to the code as an argument?
There are $(firstword a b c) -> a and $(filter-out ...) and $(wordlist s,e,text) functions for transforming text. Combining them might be able to do what you want. The excellent documentation has the details.
PS: What happened to experiment 5? Outlier? Does not fit expectations? :-)
I've to add a file and a folder to the zip that is created when I run make dist command. This is an open-source project.
After research, I understood I've to modify the Makefile.am but examples online don't work or match with my current Makefile.am
Makefile.am
SUBDIRS = bin data po src extensions docs
DISTCLEANFILES = \
intltool-extract \
intltool-merge \
intltool-update
EXTRA_DIST = \
$(bin_SCRIPTS) \
intltool-merge.in \
intltool-update.in \
intltool-extract.in
DISTCHECK_CONFIGURE_FLAGS = --disable-update-mimedb
check-po:
#for i in $(top_srcdir)/po/*.po ; do \
if ! grep -q ^`basename $$i | \
sed 's,.po,,'`$$ $(top_srcdir)/po/LINGUAS ; then \
echo '***' `basename $$i | \
sed 's,.po,,'` missing from po/LINGUAS '***' ; \
exit 1; \
fi; \
done;
lint:
flake8 --ignore E402 $(top_srcdir)/src $(top_srcdir)/extensions
test: lint check-po
PYTHONPATH=$(pkgdatadir)/extensions:$(PYTHONPATH) \
python -m sugar3.test.discover $(top_srcdir)/tests
configure.ac
AC_INIT([Sugar],[0.114],[],[sugar])
AC_PREREQ([2.59])
AC_CONFIG_MACRO_DIR([m4])
AC_CONFIG_SRCDIR([configure.ac])
SUCROSE_VERSION="0.114"
AC_SUBST(SUCROSE_VERSION)
AM_INIT_AUTOMAKE([1.9 foreign dist-xz no-dist-gzip])
AM_MAINTAINER_MODE
PYTHON=python2
AM_PATH_PYTHON
AC_PATH_PROG([EMPY], [empy])
if test -z "$EMPY"; then
AC_MSG_ERROR([python-empy is required])
fi
PKG_CHECK_MODULES(SHELL, gtk+-3.0)
IT_PROG_INTLTOOL([0.35.0])
GETTEXT_PACKAGE=sugar
AC_SUBST([GETTEXT_PACKAGE])
AM_GLIB_GNU_GETTEXT
AC_ARG_ENABLE(update-mimedb,
AC_HELP_STRING([--disable-update-mimedb],
[disable the update-mime-database after install [default=no]]),,
enable_update_mimedb=yes)
AM_CONDITIONAL(ENABLE_UPDATE_MIMEDB, test x$enable_update_mimedb = xyes)
GLIB_GSETTINGS
AC_CONFIG_FILES([
bin/Makefile
bin/sugar
data/icons/Makefile
data/Makefile
extensions/cpsection/aboutcomputer/Makefile
extensions/cpsection/aboutme/Makefile
extensions/cpsection/background/Makefile
extensions/cpsection/backup/Makefile
extensions/cpsection/backup/backends/Makefile
extensions/cpsection/datetime/Makefile
extensions/cpsection/frame/Makefile
extensions/cpsection/keyboard/Makefile
extensions/cpsection/language/Makefile
extensions/cpsection/modemconfiguration/Makefile
extensions/cpsection/Makefile
extensions/cpsection/network/Makefile
extensions/cpsection/power/Makefile
extensions/cpsection/updater/Makefile
extensions/cpsection/webaccount/services/Makefile
extensions/cpsection/webaccount/Makefile
extensions/deviceicon/Makefile
extensions/globalkey/Makefile
extensions/webservice/Makefile
extensions/Makefile
Makefile
po/Makefile.in
src/jarabe/config.py
src/jarabe/controlpanel/Makefile
src/jarabe/desktop/Makefile
src/jarabe/frame/Makefile
src/jarabe/intro/Makefile
src/jarabe/journal/Makefile
src/jarabe/Makefile
src/jarabe/model/Makefile
src/jarabe/model/update/Makefile
src/jarabe/util/Makefile
src/jarabe/util/telepathy/Makefile
src/jarabe/view/Makefile
src/jarabe/webservice/Makefile
src/Makefile
])
AC_OUTPUT
When I run the command make dist, the output zip doesn't include a file and folder which I now need to add. I'm not able to understand where in the code(Makefile.am or configure.ac) should I make changes.
I've to add a file and a folder to the zip that is created when I run make dist command.
I take it that these are not already included in the distribution. If you're not sure, then check -- Automake-based build systems such as yours identify a lot of files automatically for inclusion in distribution packages.
Supposing that these files are not already included, there are several ways to cause them to be. Easiest would be to add them to the EXTRA_DIST variable, yielding
EXTRA_DIST = \
$(bin_SCRIPTS) \
intltool-merge.in \
intltool-update.in \
intltool-extract.in \
a_directory \
some_file.ext
Don't forget the trailing backslashes if you continue with the multiline form (which I like, as I find it much easier to read). You can specify a path to the file, the directory, or both. Do note that in the case of the directory, it will be not just the directory itself but all its contents, recursively, that are included in the distribution. This is all documented in the manual.
If you need finer control, then there is also an extension point for managing the contents of the distribution in the form of the "dist hook". This comprises a make target named dist-hook. Like any other literal make rule in your Makefile.am, any rule you provide for building that target is copied to the final generated Makefile, and if such a rule is present then its recipe is run as part of building the distribution, after the distribution directory is otherwise populated but before the archive file is built from that. You can write more or less arbitrary shell code in that target's recipe to tweak the distribution. Follow the above link to the documentation for the gory details.
EXTRA_DIST variable sounds like the thing.
I am writing a master makefile to compile and install multiple autoconf based libraries, which depend on each other. All works well for the first go. The issue is: if I am working on one of these libraries individually and do "make && make install" header files in the prefix folder are overwritten (even if they are untouched). This causes all dependent libraries to compile from scratch.
Is there a way to avoid the unnecessary recompiles without hacking into the makefiles?
Maybe the solution is a little late, but
./configure INSTALL="install -p"
fixes the recompilation problem. This flag makes GNU install set the timestamps of the installed files to the timestamps of the built files.
You could use sentinel files that exist only to establish your dependency graph. For eg.
prefix := /usr/local
.PHONY: all
all: libx-built
libx-built \
: libx.tar.gz \
; tar xzvf $# \
&& cd libx \
&& ./configure --prefix=$(prefix) \
&& make && make install \
&& touch $#
Then, you'd make a dependent liby build only when libx-built is new.
liby-built \
: liby.tar.gz libx-built \
; ...
I would like a rule something like:
build/%.ext: src/%.ext
action
I have one directory of files in a folder that I want to optimize and then output to a different folder. However, the files have the same name in the input and output folders. I have tried various iterations of the rule above, but make will either always or never rebuild depending how I tweak the above. Suggestions?
EDIT:
I ended up with the following solution, which works great!
JS = \
src/js/script2.js \
src/js/script1.js
JS_OPT = $(patsubst src/js/%.js,web/js/%.js, $(JS))
all: $(JS_OPT)
$(JS_OPT): web/js/%.js: src/js/%.js
cat $# | ./bin/jsmin > $<
Try somethink like this:
INPUT_FILES = \
src/a.txt \
src/b.txt \
OPTIMIZED_FILES=$(patsubst src/%.ext,build/%.ext,$(INPUT_FILES))
$(OPTIMIZED_FILES): build/%.ext: src/%.txt
optimize_command $# $<