We are trying Gradle for our very large and complex enterprise app. We are using multi project build structure and are very excited with Gradle's parallel execution feature.
Our codebase is structured in domain layers like this:
UI modules (~20) -> shared ui -> domain -> dao -> framework
Dependencies are uni directional and build happens bottom up.
Unfortunately we are not seeing a big boost in our build times. Its pretty much same as what we were getting with ant before.
Looking at the execution sequence of tasks in parallel mode few things doesn't look right.
Our expectation is Gradle will run tasks in sequence initially when it is building core layers. So after it assembles framework, dao, domain and shared ui, it should kick everything else in parallel.
But execution sequence we are seeing is somewhat like this:
framework.assemble -> dao.assemble -> domain.assemble -> shared.ui.assemble -> Other UI modules.assmble (in parallel) -> war -> Other UI.check + shared.ui.check + dao.check (in parallel) -> domain.check -> framework.check
Bottleneck is at the end when it is running checks for domain and framework in sequence and not in parallel. These 2 modules are the biggest modules for us with around 12k unit tests and they take around 4 mins to run.
We spent lot of time looking at the dependencies using gradle tasks --all and test task for these modules are completely independent and there is nothing that should hold off their execution.
We are wondering if this is a known issue or is there a way to enable some extra debugging in Gradle to get more insight as how Gradle determines execution order with parallel mode. Any help is appreciated.
As of Gradle 1.4, parallel task execution is (intentionally) constrained in a few ways. In particular, the set of tasks executing at any time won't contain two tasks belonging to the same project. This will be improved over time. I'm not aware of any debugging aids other than what you get from the logs (e.g. with --debug).
Note that parallel test execution is a separate feature. If you have a lot of tests in the same project, test.maxParallelForks = x with x > 1 should show a noticeable speedup. The value for x is best determined experimentally. A good starting point is the number of physical cores on the machine (e.g. Runtime.getRuntime().availableProcessors() / 2).
Related
I'm currently working on performance & memory tuning for a spark process. As part of this I'm performing multiple runs of different versions of the code and trying to compare their results side by side.
I've got a few questions to ask, so I'll post each separately so they can be addressed separately.
Currently, it looks like getOrCreate() is re-using the Spark Context each run. This is causing me two problems:
Caching from one run may be affecting the results of future runs.
All of the tasks are bundled into a single 'job', and I have to guess at which tasks correspond to which test run.
I'd like to ensure that I'm properly resetting all caches in each run to ensure that my results are comparable. I'd also ideally like some way of having each run show up as a separate job in the local job history server so that it's easier for me to compare.
I'm currently relying on spark.catalog.clearCache() but not sure if this is covering all of what I need. I'd also like a way to ensure that the tasks for each job run are clearly grouped in some fashion for comparison so I can see where I'm losing time and hopefully see total memory used by each run as well (this is one thing I'm currently trying to improve on).
Is there any way to include the test coverage of Cucumber features and other useful statistics in the SonarQube analysis? I have done a bit of researching, but couldn't find a proper plugin.
From this thread (written after the OP's question), David Racadon added:
As far as I understand:
It is not possible to run an analysis on a project containing only test code because the 'sonar.sources' property is mandatory.
Measures on test code are not aggregated at project level.
As far as I am concerned, I consider test files part of the project the same way source files are. Thus, measures of test files should be aggregated on top of source files.
For now, SonarQube shows that your project is 1,000 lines even if you have 0 or 10,000 lines of test code on top of those 1,000 lines of source code. For me, SonarQube gives you a biased estimate of the size of your project and the effort of maintenance.
The closest would then be his plugin racodond/sonar-gherkin-plugin which:
analyzes Cucumber Gherkin feature files and:
Computes metrics: lines of code, number of scenarios, etc.
Checks various guidelines to find out potential bugs and code smells through more than 40 checks
Provides the ability to write your own checks
Parallel builds in Maven 3 are a good thing.
The process uses dependency graph to evaluate the order of builds, and the doc states the following:
... This goes by declared dependencies in the pom, and there is no good log of how this graph is actually evaluated. (I was hoping to render the actual execution graph, but never got around to finding a cool tool/way to do it - plaintext ascii in the -X log would be one option).
I am wondering if such a rendition exists already and if so, how can it be triggered?
So if we have a drop db task and a create db task and a start server task and a runqatest task and we want to
have independent tasks so I can just call gradle dropdb by itself(or the others as well)
have the runqatest depend on dropdb, createdb, populatedb, startserver
Number 2 above obviously needs to be ordered or will break and gradle does not abide by any order like ant does. How to achieve this? (I have read plenty about this on this post
http://markmail.org/thread/wn6ifkng6k7os4qn#query:+page:1+mid:hxibzgim5yjdxl7q+state:results
though the one user is wrong on it not being deterministic when you have
1. e depend on c and d
2. c depend on b,a
3. d depend on a,b
since e decides c will be first, the build would run b,a,c,d so it is completely deterministic. I do agree that parallelizing a build is much harder if you have order though like ant does as you can't just run c and d in parallel as order matters(and it's worse as from a user perspective, it does not matter most of the time).
If only they would add a dependsOnOrdered so we can do order when absolutely necessary.
OR does anyone know what is the way we are supposed to do this? The issue was filed against gradle in 2009!!!! I still see no documentation in gradle on how to do ordered stuff when needed.
Dean
Here is one solution:
if (gradle.startParameter.taskNames.contains("qatest") {
qatest.dependsOn startServer
startServer.dependsOn populatedb
populatedb.dependsOn createdb
createdb.dependson dropdb
}
The limitation of this approach is that it only works if qatest is part of the initial tasks provided on the command line. Sometimes this is good enough, and you can add a check to make sure that users don't go wrong.
If you need this more often, you can add a little helper method that makes it easier to declare such a workflow. Something like workflow(qatest, dropdb, createdb, populatedb, startserver).
Another approach is to create "clones" of the tasks, and add task dependencies (only) between the clones. Again, you could hide this behind a little abstraction. For example, createWorkflowTask("startServer") { ... } could create and configure both a startServer and a startServerWorkflow task.
In summary, the programmability of Gradle makes it possible to overcome the problem that "workflow" isn't yet a first-class concept in Gradle.
Gradle 1.6 added an alternative solution, but it's still incubating: mustRunAfter. See the release notes.
I am working on a project where we have only 13% of code coverage with our unit tests. I would like to come up with a plan to improve that but by focusing first on the areas where increasing coverage would bring the greatest value.
This project is in C#, we're using VS 2008 and TFS 2008 and out unit tests are written using MSTest.
What methodology should I use to determine which classes we should tackle first?
Which metrics (code or usage) should I be looking at (and how can I get those metrics if this is not obvious)?
I would recommend adding unit tests to all the classes you touch, not retrofitting existing classes.
Most of the advantages of unit testing is in helping programmers code and ensuring that "Fixes" don't actually break anything, if you are not adding code to a new section of code that isn't every modified, the benefits of unit tests start to drop off.
You might also want to add unit tests to classes that you rely on if you have nothing better to do.
You absolutely should add tests to new functionality you add, but you should probably also add tests to existing functionality you may break.
If you are doing a big refactor, consider getting 80-100% coverage on that section first.
For some good statistics and deterministic querying of certain methods you could definitely look at NDepend: http://www.ndepend.com/
NDepend exposes a query language called CQL (Code Query Language) that allows you to write queries against your code relating to certain statistics and static analysis.
There is no true way to determine which classes might benefit the most, however by setting your own thresholds in CQL you could establish some rules and conventions.
The biggest value of a unit test is for maintenance, to ensure that the code still works after changes.
So, concentrate on methods/classes that are most likely / most frequently changed.
Next in importance are classes/methods with less-than-obvious logic. The unit tests will make them less fragile while serving as extra "documentation" for their contracted API
In general unit tests are a tool to protect against regression and regression is most likely to occur in classes with the most dependencies. You should not have to choose, you should test all classes, but if you have to, test the classes that have the most dependencies in general.
Arrange all your components into levels. Every class in a given level should only depend on components at a lower level. (A "component" is any logical group of one or more classes.)
Write unit tests for all your level 1 components first. You usually don't need mocking frameworks or another other such nonsense because these components only rely on the .NET Framework.
Once level 1 is done, start on level 2. If you level 1 tests are good, you won't need to mock those classes when you write your level 2 tests.
Continue in like fashion, working your way up the application stack.
Tip: Break all your components into level specific DLLs. That way you can ensure that low level components don't accidentally take a dependency on a higher level component.