Writing custom build tasks to generate build dependencies automatically on each build - gradle

I'm trying to figure out how to use Gradle. I'm having this problem, which is trivial to solve in Makefiles, CMake, etc, but I have not idea how to do it in Gradle.
Let's say I have an application that contains some resource files inside. Those resource files are generated from a set of input files, and are put inside the application in compiled form.
I'd like to instruct Gradle to automatically generate those resources when it's needed by pointing it to the resource input files. So, effectively I'd like to do something like this:
all: out1 out2
#echo done
out1: in1
cat in1 | xxd > out1
out2: temp1
cat temp1 | xxd > out2
temp1: in1
cat in2 | tr '[:lower:]' '[:upper:]' > temp1
(let's skip this unfortunate effect when if xxd will fail then an invalid out1 will be generated)
Let's say out1 and out2 are my resource files that I'd like to embed inside the application. These files are generated by different rules:
out1 is generated from in1 by processing it with xxd (it's just an example),
out2 is generated from in2 by first uppercasing it, and then piping it through xxd. A temporary file, temp1 is being created in the meantime, but that's just a temporary file and it's not important at all.
So what I'd like to achieve in Gradle is basically an equivalent of the Makefile script pasted above with all its features; I mostly mean that out1 and out2 shouldn't be generated if in1 and in2 didn't change (because the resource generation phase can be expensive and time consuming, so I'd like to avoid having to run it on each build), and the possibility of the build system to automatically figure out how to run the build in parallel, so that out1 and out2 is being generated at the same time.
I'm trying to dig up the docs for Gradle and some examples, but 95% of what I find are some opaque scripts that use some particular plugin, which nobody explains how it works inside. The docs say that Gradle is an "automation tool", so it should be perfectly doable what I'm trying to achieve, but is it really the case? Is there any sense in trying to use Gradle as a tool for the use-case described in this post?

Doable. Just need a bit of imagination. Gradle tasks can be created at runtime and to my knowledge it needs the inputs you need to generate tasks be ready when you invoke gradle (not sure if tasks can still be created once one task had already started, I haven't personally tried).
For your use case, you may just need to execute a separate gradle command (separate build file) once the input is ready. Gradle allows you to invoke any shell command, so just spawn off another gradle call.
Here's a sample on how you may create new tasks dynamically (multiple tasks are spawned off from an array, you may just read yours from a file): https://github.com/HCL-TECH-SOFTWARE/DX-Modules-and-ScriptApps/blob/main/06ThemeComponentInApp/DxModule/build.gradle

Related

Temp file not being deleted

I'm trying to create a temporary file in my pipeline, then use that file in another rule.
For example, I have two rules in a .smk file:
#Unzip adapter trimmed fastq file
rule unzip_fastq:
input:
'{sample}.adapterTrim.round2.fastq.gz',
output:
temp('{sample}.adapterTrim.round2.fastq')
conda:
'../envs/rep_element.yaml'
shell:
'gunzip -c {input[0]} > {output[0]}'
#Run bowtie2 to align to rep elements and parse output
rule parse_bowtie2_output_realtime:
input:
'{sample}.adapterTrim.round2.fastq'
output:
'rep_element_pipeline/{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam'
params:
bt2=config["ref"]["bt2_index_path"], eid=config["ref"]["enst2id"]
conda:
'../envs/rep_element.yaml'
shell:
'perl ../scripts/parse_bowtie2_output_realtime_includemultifamily.pl '
'{input[0]} {params.bt2} {output[0]} {params.eid}'
{sample}.adapterTrim.round2.fastq is used once and should ultimately be deleted upon completion. However, I'm finding that this file is uploaded to Amazon S3, even with the addition of temp(). I'm also finding that this file is removed locally, but still persists on S3.
Am I doing this correctly? '{sample}.adapterTrim.round2.fastq' is not currently written in the rule-all of the Snakefile.
We ultimately need to prevent this file from being uploaded to S3, so if there is a way to specify not to upload this file in the rule, that would be useful.
It seems that the snippet in the question is not consistent with actual use, since for S3 files one would need to wrap file names in remote.
However, as a general solution, documentation contains the following:
The remote() wrapper is mutually-exclusive with the temp() and protected() wrappers.
Hence, if you intend to use a temp file, make sure it's not wrapped in remote, or explicitly wrap the file in local.

Aggregating `bazel test` reports when testing many targets

I am trying to aggregate all the test.xml reports generated after a bazel test run. The idea is to then upload this full report to a CI platform with a nicer interface.
Consider the following example
$ find .
foo/BUILD
bar/BUILD
$ bazel test //...
This might generate
./bazel-testlogs/foo/tests/test.xml
./bazel-testlogs/foo/tests/... # more
./bazel-testlogs/bar/tests/test.xml
./bazel-testlogs/bar/tests/... # more
I would love to know if there is a better way to aggregate these test.xml files into a single report.xml file (or the equivalent). This way I only need to publish 1 report file.
Current solution
The following is totally viable, I just want to make sure I am not missing some obvious built in feature.
find ./bazel-testlogs | grep 'test.xml' | xargs [publish command]
In addition, I will check out the JUnit output format, and see if just concatenating the reports is sufficient. This might work much better.

How does a shell script read the data in a batch test folder

I recently replicated a SEGAN experiment based on TensorFlow0.12.1.The author provides a shell script for testing (clean_wav.sh), as shown in the figure below:
This is the original version provided by the author. According to the path of my test data, the modified version is as follows:
Noisy_testset_wav_16k is my test data folder, but running the script system will report an error:
This folder is a directory, but when I change the path to:
NOISY_WAVNAME='/home/zyf/SEGAN/ SEGAN/segan-master1/noisy_testset_wav_16k/p232_023.wav'
the script runs normally and the program function can also be achieved.
However, only one audio file can be processed at a time and cannot be processed in batches.Hope everybody knows reason or have opinion, can give me give advice or two, thank very much.
The code is written in the way it only processes file, you can add a loop in shell script to process all files in the folder:
for f in $NOISY_WAVDIR/*.wav; do
python main.py --init_noise_std 0. --save_path segan_v1.1 \
--batch_size 100 --g_nl prelu --weights SEGAN-41700 \
--preemph 0.95 --bias_deconv True \
--bias_downconv True --bias_D_conv True \
--test_wav $f --save_clean_path $SAVE_PATH
done
but that would not be optimal use of the GPU since you do not process audio in batch. Ideally you'd want to modify python code to process audio in batches but that would not be a trivial task.

Shell script to verify data packages

I need to make shell script to check my algorithms with loads of data(tests packages saved in .in files, every package contains folder with .in file and the other one with .out file where supposed to be correct result)
Sometimes It's about 1000 files in one packages so there's no point of doing it manually. I need some kind of loop which opens this .in file then redirect input of my c++ program and also redirect output of this program(save result to .out files) But the point is I can't get this language as quick as I need.
And I would like this script to compare results of my algorithm to .out files from packages
for f in ExternalIn/*.in; do//part of code which opens process with my algorithm and compare its .out file to .out file from package
Skipping checks for missing files, whitespace-safety, etc., you probably need something like:
for f in ExternalIn/*.in; do
# diff the result of my_cpp_app eating file.in with file.out
# and store the comparison result in file.diff
diff ${f/.in/.out} <(my_cpp_app <$f 2>/dev/null) > ${f/.in/.diff}
done
Although I would probably do it with find / xargs pipeline which is not only safer but also allows parallel execution.
Or even write a Makefile for this and use make, which after all is a tool for exactly this kind of work.

Join multiple Coffeescript files into one file? (Multiple subdirectories)

I've got a bunch of .coffee files that I need to join into one file.
I have folders set up like a rails app:
/src/controller/log_controller.coffee
/src/model/log.coffee
/src/views/logs/new.coffee
Coffeescript has a command that lets you join multiple coffeescripts into one file, but it only seems to work with one directory. For example this works fine:
coffee --output app/controllers.js --join --compile src/controllers/*.coffee
But I need to be able to include a bunch of subdirectories kind of like this non-working command:
coffee --output app/all.js --join --compile src/*/*.coffee
Is there a way to do this? Is there a UNIXy way to pass in a list of all the files in the subdirectories?
I'm using terminal in OSX.
They all have to be joined in one file because otherwise each separate file gets compiled & wrapped with this:
(function() { }).call(this);
Which breaks the scope of some function calls.
From the CoffeeScript documentation:
-j, --join [FILE] : Before compiling, concatenate all scripts together in the order they were passed, and write them into the specified file. Useful for building large projects.
So, you can achieve your goal at the command line (I use bash) like this:
coffee -cj path/to/compiled/file.js file1 file2 file3 file4
where file1 - fileN are the paths to the coffeescript files you want to compile.
You could write a shell script or Rake task to combine them together first, then compile. Something like:
find . -type f -name '*.coffee' -print0 | xargs -0 cat > output.coffee
Then compile output.coffee
Adjust the paths to your needs. Also make sure that the output.coffee file is not in the same path you're searching with find or you will get into an infinite loop.
http://man.cx/find |
http://www.rubyrake.org/tutorial/index.html
Additionally you may be interested in these other posts on Stackoverflow concerning searching across directories:
How to count lines of code including sub-directories
Bash script to find a file in directory tree and append it to another file
Unix script to find all folders in the directory
I've just release an alpha release of CoffeeToaster, I think it may help you.
http://github.com/serpentem/coffee-toaster
The most easy way to use coffee command line tool.
coffee --output public --join --compile app
app is my working directory holding multiple subdirectories and public is where ~output.js file will be placed. Easy to automate this process if writing app in nodejs
This helped me (-o output directory, -j join to project.js, -cw compile and watch coffeescript directory in full depth):
coffee -o web/js -j project.js -cw coffeescript
Use cake to compile them all in one (or more) resulting .js file(s). Cakefile is used as configuration which controls in which order your coffee scripts are compiled - quite handy with bigger projects.
Cake is quite easy to install and setup, invoking cake from vim while you are editing your project is then simply
:!cake build
and you can refresh your browser and see results.
As I'm also busy to learn the best way of structuring the files and use coffeescript in combination with backbone and cake, I have created a small project on github to keep it as a reference for myself, maybe it will help you too around cake and some basic things. All compiled files are in www folder so that you can open them in your browser and all source files (except for cake configuration) are in src folder. In this example, all .coffee files are compiled and combined in one output .js file which is then included in html.
Alternatively, you could use the --bare flag, compile to JavaScript, and then perhaps wrap the JS if necessary. But this would likely create problems; for instance, if you have one file with the code
i = 0
foo = -> i++
...
foo()
then there's only one var i declaration in the resulting JavaScript, and i will be incremented. But if you moved the foo function declaration to another CoffeeScript file, then its i would live in the foo scope, and the outer i would be unaffected.
So concatenating the CoffeeScript is a wiser solution, but there's still potential for confusion there; the order in which you concatenate your code is almost certainly going to matter. I strongly recommend modularizing your code instead.

Resources