Why does my large multi-project gradle build start up slowly? - gradle

I am on a team that has a large multi-project build (~350 modules) and we have noticed that doing an individual build (-a) still takes a large amount of time. When building the entire set of things the overhead isn't so bad, but when we do something like:
cd my/individual/project
gradle -a --configure-on-demand --daemon build
that it still takes 30-40 seconds just in the configuration phase (before it starts building anything, which we measure using the --dry-run option)
We have approximately 10 custom tasks and we are setting the inputs and outputs for those tasks but we still see this inordinately long configuration time. We are using gradle 1.10

It's hard to say from a distance. A common mistake is to do work in the configuration phase that should be done in the execution phase. Here is an example:
task myTask {
copy {
from ...
to ...
}
}
Here, the project.copy method will be called (and therefore copying will take place) every single time the task is configured, which is for every single build invocation - no matter which tasks are run! To fix this, you'd move the work into a task action:
task myTask {
doLast {
copy {
from ...
into ...
}
}
}
Or even better, use a predefined task type:
task myTask(type: Copy) {
from ...
into ...
}
Some steps you can take to find the cause(s) for performance problems:
Review the build
Run with --info, --debug, or --profile
Profile the build with a Java profiler
PS: Specifying inputs and outputs doesn't shorten configuration time (but likely execution time).

Related

Execute more than one command in a task without breaking incremental build

We use gradle incremental builds and want to run two commands (ideally from one task). One of the solutions here worked getting the two commands running... however it breaks incremental builds.. It looks something like:
task myTask() {
inputs.files(inputFiles)
project.exec {
workingDir web
commandLine('mycmd')
}
project.exec {
workingDir web
commandLine('mysecond-cmd')
}
}
if running a single command and incremental builds is working, the task looked similar to this, the thing that seems to make the difference is the workingDir:
task myTask(type: Exec) {
workingDir myDir // this seems to trigger/enable continuos compilation
commandLine ('myCmd')
}
the best alternative so far is create 3 tasks, one for each of the cmdline tasks I want to run and a third one to group them, which seems dirty.
The question is: Is there a way to run two or more commands in one task with incremental builds still working?
I'll try to answer the question from the comments: how can I signal from a task that has no output files that the build should watch certain files. Unfortunately, this is hard to answer without knowing the exact use case.
To start with, Gradle requires some form of declared output that it can check to see whether a task has run or whether it needs to run. Consider that the task may have failed during the previous run, but the input files haven't changed since then. Should Gradle run the task?
If you have a task that doesn't have any outputs, that means you need to think about why that task should run or not in any given build execution. What's it doing if it's not creating or modifying files or interacting with another part of the system?
Assuming that incremental build is the right solution — it may not be — Gradle does allow you to handle unusual scenarios via TaskOutputs.upToDateWhen(Action). Here's an example that has a straightforward task (gen) that generates a file that acts as an input for a task (runIt) with no outputs:
task gen {
ext.outputDir = "$buildDir/stuff"
outputs.dir outputDir
doLast {
new File(outputDir, "test.txt").text = "Hurray!\n"
}
}
task runIt {
inputs.dir gen
outputs.upToDateWhen { !gen.didWork }
doLast {
println "Running it"
}
}
This specifies that runIt will only run if the input files have changed or the task gen actually did something. In this example, those two scenarios equate to the same thing. If gen runs, then the input files for runIt have changed.
The key thing to understand is that the condition you provide to upToDateWhen() is evaluated during the execution phase of the build, not the configuration phase.
Hope that helps, and if you need any clarification, let me know.

Execute gradle tasks on sub projects in order

we have a project structure such that
ParentProject
projA
projB
projC
projD
...
We have a bunch of tasks that we run on these projects from the parent project, calling gradle clean build make tag push remove
clean: removes the build directory for each project
build: compiles the source code
make: uses the compiled code to create a docker image (custom task)
tag: tags the docker image (custom task)
push: pushes the tagged docker image to our nexus repo (custom task)
remove: removes the local docker image (custom task)
all the tasks work as intended, but not exactly in the order we want.
our issue is disk space. creating all the docker images (we have many to make) takes several gigabytes of space. Gradle fails because it runs out of disk space before reaching the end. We thought we could solve this issue by simply removing the image after it is pushed and free the disk space, but the way gradle runs, that does not work.
Current, gradle executes in the following manner:
projA:clean
projB:clean
projC:clean
...
projA:build
projB:build
projC:build
...
projA:make
projB:make
projC:make
...
going through like this, everything tries to build before anything gets removed. The way we would like to run gradle is as follows:
projA:clean
projA:build
projA:make
projA:tag
projA:push
projA:remove
projB:clean
projB:build
projB:make
projB:tag
projB:push
projB:remove
...
this way the project cleans itself up and frees disk space before the next project starts building.
is there any way to do this?
edit: There may be times where not every task needs to be run. The image may already built and simply needs to be retagged. Or may need to be built for local testing, but not tagged or pushed.
Looks like tasks make, tag, push, remove are custom tasks.
If you want dependency among tasks, for example taskX should be executed whenever taskY is started, then you should use dependsOn property of task.It is also called as task dependencies.
below shows the build.gradle on parent project that make task of projA dependsOn clean task of projA. you should do this for remaining tasks of projA.
project('projA') {
task make(dependsOn: ':projA:build') {
doLast {
println 'make'
}
}
}
project('projA') {
task build {
doLast {
println 'build'
}
}
}
...
...
Then you should continue for remaining projects and remaining tasks. For example, ProjB:clean should dependOn projA:remove
project('projB') {
task clean(dependsOn: ':projA:remove') {
doLast {
println 'clean'
}
}
}
continue for other tasks in project B and remaining projects.
For more details please follow this link.
We found we can create a task of type 'GradleBuild' where we can specify what tasks we want to run within a single task.
task deploy(type: GradleBuild){
tasks = ['clean','build','make','tag','push','remove']
}
this solution prevents linking of tasks with dependencies, so we can still build sub projects individually without worrying about other tasks being run unnecessarily.

parallelize code in a gradle task

I have a task which essentially performs the following:
['subproj1', 'subproj2'].each { proj ->
GradleRunner.create()
.withProjectDir(file("./examples/${proj}/"))
.withArguments('check')
.build()
}
The check is a system test and requires connecting to 3rd party services, so I would like to parallelize this.
Can this be done in gradle? If so, how?
I tried using java threading but the builds failed with errors which I can't remember what they were exactly, but they suggested that the gradle internal state had gotten corrupted.
Did you tried to use the experimental parallel task execution? On the first glance it's quite simple. You just call ./gradlew --parallel check (or if it turns out to work fine for you can also define this in your gradle.properties). This will start n threads (where n is the number cpu cores) which will execute your tasks. Each thread owns a certain project so the tasks of one project will never be executed in parallel.
You can override the number of tasks (or worker) by setting the property --max-workers at the command line or by setting org.gradle.workers.max=n at your gradle.properies.
If you are just interested in executing tests in parallel than you might try to set Test.setMaxParallelForks(int). That will cause the to execute the tests of one project (if I understood this right) in parallel (with the number of tasks you defined).
Hope that helps. Maybe the gradle documentation gives you some more insights: https://docs.gradle.org/current/userguide/multi_project_builds.html#sec:parallel_execution
While multiproject build is the right way to go in the long term, my short term approach was to use GPars:
import groovyx.gpars.GParsPool
buildscript {
repositories {
mavenCentral()
}
dependencies {
classpath "org.codehaus.gpars:gpars:1.1.0"
}
}
task XXXX << {
GParsPool.withPool(10) {
['subproj1', 'subproj2'].eachParallel { proj ->
GradleRunner.create()
.withProjectDir(file("./examples/${proj}/"))
.withArguments('check')
.build()
}
}
}

How do I make Gradle rerun a task when its dependencies are run?

Let's say that I have a task "main" that depends on another task "dependency." I would like "main" to be rerun any time its dependency (or its dependency's dependencies) is rebuilt because "main" relies on the artifacts produced by "dependency" (or the dependencies of "dependency").
A build.gradle file containing an example of what I'm dealing with is the following:
defaultTasks 'main'
task baseDependency {
outputs.file 'deps.out'
outputs.upToDateWhen { false }
doLast {
exec {
commandLine 'bash', '-c', 'echo hello world > deps.out'
}
}
}
task dependency(dependsOn: baseDependency)
task main(dependsOn: dependency) {
outputs.file 'main.out'
doLast {
exec {
commandLine 'bash', '-c', 'echo hello world > main.out'
}
}
}
Executing gradle the first time:
:baseDependency
:dependency
:main
BUILD SUCCESSFUL
Total time: 0.623 secs
Executing it a second time:
:baseDependency
:dependency
:main UP-TO-DATE
BUILD SUCCESSFUL
Total time: 0.709 secs
I would really love if "main" were not marked "UP-TO-DATE" if its dependencies had to be rebuilt. That seems essential. How do you make sure that's the case?
The standard way to specify dependency between tasks is via task's inputs and outputs. This can be any file, fileset or directory.
In your case you should modify main task and add inputs.file 'deps.out' to its definition.
Note that gradle has an optimization that may lead to unexpected behavior in a simplistic example that you provided.
Before a task is executed for the first time, Gradle takes a snapshot
of the inputs. This snapshot contains the set of input files and a
hash of the contents of each file. Gradle then executes the task. If
the task completes successfully, Gradle takes a snapshot of the
outputs. This snapshot contains the set of output files and a hash of
the contents of each file. Gradle persists both snapshots for the next
time the task is executed.
Each time after that, before the task is executed, Gradle takes a new
snapshot of the inputs and outputs. If the new snapshots are the same
as the previous snapshots, Gradle assumes that the outputs are up to
date and skips the task. If they are not the same, Gradle executes the
task. Gradle persists both snapshots for the next time the task is
executed.
So even if you specify correct inputs in a simple example where the same file is generated the dependent task will be marked as up to date on the second and subsequent runs.
If you do not want or cannot hardcode dependency on the file you can override upToDateWhen for dependent task and calculate the condition if the task is up to date based on dependencies of this task and their state like this:
outputs.upToDateWhen { task ->
task.taskDependencies.inject(true) { r, dep ->
r && dep.values.inject(true) { res, v ->
res && (!(v instanceof Task) || v?.state.getSkipped())
}
}
}
upToDateWhen should return true if this task should not be run at all (because its output are already up-to-date). And this is the case when no dependent task was run (the gradle documentation is a bit vague about this I must admit but getSkipped seems work as expected).

Why do my Gradle tests run repeatedly?

I have a pretty standard Gradle build that's building a Java project.
When I run it for the first time, it compiles everything and runs the tests. When I run it a second time without changing any files, it runs the tests again.
According to this thread, Gradle is supposed to be lazy by defaut and not bother running tests if nothing has changed. Has the default behaviour here been changed?
EDIT:
If I run gradle test repeatedly, the tests only run the first time and are subsequently skipped. However, if I run gradle build repeatedly, the tests get re-run every time, even though all other tasks are marked as up-to-date.
the gradle uptodate check logs on info level why a task is not considered to be up-to-date. please rerun the "gradle build -i" to run with info logging at check the logging output.
cheers,
René
OK, so I got the answer thanks to Rene prompting me to look at the '-i' output...
I actually have 2 test tasks: the 'test' one from the Java plugin, and my own 'integrationTest' one. I didn't mention this in the question because I didn't think it was relevant.
It turns out that these tasks are writing their output (reports, etc.) to the same directory, so Gradle's task-based input and output tracking was thinking that something had changed, and re-running the tests.
So the next question (which I will ask separately) becomes: how do I cleanly (and with minimal Groovy/Gradle code) completely separate two instances of the test task.
You need to create test tasks in your build.gradle and then call those specific tasks to run a specific set of tests. Here is an example that will filter out classes so that they don't get run twice (such as when running a suite and then re-running its child classes independently):
tasks.withType(Test) {
jvmArgs '-Xms128m', '-Xmx1024m', '-XX:MaxPermSize=128m'
maxParallelForks = 4 // this runs tests parallel if more than one class
testLogging {
exceptionFormat "full"
events "started", "passed", "skipped", "failed", "standardOut", "standardError"
displayGranularity = 0
}
}
task runAllTests(type: Test) {
include '**/AllTests.class'
testReportDir = file("${reporting.baseDir}/AllTests")
testResultsDir = file("${buildDir}/test-results/AllTests")
}
task runSkipSuite(type: Test) {
include '**/Test*.class'
testReportDir = file("${reporting.baseDir}/Tests")
testResultsDir = file("${buildDir}/test-results/Tests")
}
Also, concerning your build question. The "build" task includes a clean step which is cleaning tests from your build directory. Otherwise the execution thinks the tests have already been ran.

Resources