How can I see maven's parallel build graph - maven

Parallel builds in Maven 3 are a good thing.
The process uses dependency graph to evaluate the order of builds, and the doc states the following:
... This goes by declared dependencies in the pom, and there is no good log of how this graph is actually evaluated. (I was hoping to render the actual execution graph, but never got around to finding a cool tool/way to do it - plaintext ascii in the -X log would be one option).
I am wondering if such a rendition exists already and if so, how can it be triggered?

Related

How to find out what implicit(s) are used in my scala code

Problem statement:
I read in multiple sources/articles that implicits drive up scala compilation time
I want to remove/reduce them to minimum possible to see what compilation time without them will look like (codebase is around 1000 files of various complexity based on scalaz & akka & slick)
I don't really know what kind of static analysis I can perform. Any liks/references to already existing tooling highly appreciated.
It is true that implicits can degrade compilation speed, especially for code that uses them for type-level computations. It is definitely worth it to measure their impact. Unfortunately it can be difficult to track down the culprits. There are tools that can help though:
Run scalac with -Ystatistics:typer to see how many tree nodes are processed during type-checking. E.g. you can check the number of ApplyToImplicitArgs and ApplyImplicitView relative to the total (and maybe compare this to another code base).
There is currently an effort by the Scala Center to improve the status quo hosted at scalacenter/scalac-profiling. It includes an sbt plugin that should be able to give you an idea about implicit search times, but it's still in its infancy (not published yet at the time of writing). I haven't tested it myself but you can still give it a try.
You can also compile with -Xlog-implicits, pipe the output to a file and analyse the logs. It will show a message for each implicit candidate that was considered but failed, complete with source position, search type and reason for the failure. Such failed searches are expensive. You can write a simple script with your favourite scripting language (why not Scala?) to summarize the data and even plot it with some nice graphics.
Aside: How to resolve a specific implicit instance?
Just use reify and good ol' println debugging:
import scala.collection.SortedSet
import scala.reflect.runtime.universe._
println(showCode(reify { SortedSet(1,2,3) }.tree))
// => SortedSet.apply(1, 2, 3)(Ordering.Int)
There is scalac profiling being done by scala center now: https://scalacenter.github.io/scalac-profiling/

Team city breaks build based on code coverage

That's basically the idea. I own a project and I want to break any new build on TeamCity based on a code coverage percentage. As simply as: this percentage can never go down. This way I ensure that new commits are covered.
TeamCity provides this out of the box. Simply go to the configuration for the project, and click 'Failure Conditions'. This gives you a place whwre you can add a failure condition on a metric change. One of the available metric changes is 'Percentage of line coverage'. You can set it so that the build fails if this is less the 0 difference from the last build.
Beware adding this though, especially if you have projects where the code coverage is not 100% already, as a refactoring which reduces the number of lines in the project and all of those lines happen to be covered by tests will result on the overall coverage going down, and a failing build despite not adding any new functionality.

Processing gcov data files for tracing purposes

I'm trying to create a tool similar to TraceGL, but for C-type languages:
As you can see, the tool above highlights code flows that were not executed in red.
In terms of building this tool for Objective-C, for example, I know that gcov (and libprofile_rt in clang) output data files that can help determine how many times a given line of code has been executed. However, would the gcov data files be able to tell me when a given line of code occurred during a program's execution?
For example, if line X is called during code paths A and B, would I be able to ascertain from the gcov that code paths A and B called line X given line X alone?
As far as I know, GCOV instrumentation data only tells that some point in the code was executed (and maybe how many times). But there is no relationship between the code points that are instrumented.
It sounds like what you want is to determine paths through the code. To do that, you either need to do static analysis of the code (requiring a full up C parser, name resolver, flow analyzer), or you need to couple the dynamic instrumentation points together in execution order.
The first requires you find machinery capable of processing C in all of its glory; you don't want to repeat that yourself. GCC, Clang, our DMS Toolkit are choices. I know the GCC and Clang do pretty serious analysis; I'm pretty sure you could find at least intraprocedural control flow analysis; I know that DMS can do this. You'd have to customize GCC and Clang to extract this data. You'd have to configure DMS to extract this data; configuration is easier than customization because it is a design property rather than a "custom" action. YMMV.
Then, using the GCOV data, you could determine the flows between the GCOV data points. It isn't clear to me that this buys you anything beyond what you already get with just the static control flow analysis, unless your goal is to exhibit execution traces.
To do this dynamically, what you could do is force each data collection point in the instrumented code to note that it is the most recent point encountered; before doing that, it would record the most recent point encountered before it was. This would produce in effect a chain of references between points which would match the control flow. This has two problems from your point of view, I think: a) you'd have to modify GCOV or some other tool to insert this different kind of instrumentation, b) you have to worry about what and how you record "predecessors" when a data collection point gets hit more than once.
gcov (or lcov) is one option. It does produce most of the information you are looking for, though how often those files are updated depends on how often __gcov_flush() is called. It's not really intended to be real time, and does not include all of the information you are looking for (notably, the 'when'). There is a short summary of the gcov data format here and in the header file here. lcov data is described here.
For what you are looking for DTrace should be able to provide all of the information you need, and in real time. For Objective-C on Apple platforms there are dtrace probes for the runtime which allow you to trace pretty much anything. There are a number of useful guides and examples out there for learning about dtrace and how to write scripts. Brendan Gregg provides some really great examples. Big Nerd Ranch has done a series of articles on it.

gradle has no task ordering so how to achieve this

So if we have a drop db task and a create db task and a start server task and a runqatest task and we want to
have independent tasks so I can just call gradle dropdb by itself(or the others as well)
have the runqatest depend on dropdb, createdb, populatedb, startserver
Number 2 above obviously needs to be ordered or will break and gradle does not abide by any order like ant does. How to achieve this? (I have read plenty about this on this post
http://markmail.org/thread/wn6ifkng6k7os4qn#query:+page:1+mid:hxibzgim5yjdxl7q+state:results
though the one user is wrong on it not being deterministic when you have
1. e depend on c and d
2. c depend on b,a
3. d depend on a,b
since e decides c will be first, the build would run b,a,c,d so it is completely deterministic. I do agree that parallelizing a build is much harder if you have order though like ant does as you can't just run c and d in parallel as order matters(and it's worse as from a user perspective, it does not matter most of the time).
If only they would add a dependsOnOrdered so we can do order when absolutely necessary.
OR does anyone know what is the way we are supposed to do this? The issue was filed against gradle in 2009!!!! I still see no documentation in gradle on how to do ordered stuff when needed.
Dean
Here is one solution:
if (gradle.startParameter.taskNames.contains("qatest") {
qatest.dependsOn startServer
startServer.dependsOn populatedb
populatedb.dependsOn createdb
createdb.dependson dropdb
}
The limitation of this approach is that it only works if qatest is part of the initial tasks provided on the command line. Sometimes this is good enough, and you can add a check to make sure that users don't go wrong.
If you need this more often, you can add a little helper method that makes it easier to declare such a workflow. Something like workflow(qatest, dropdb, createdb, populatedb, startserver).
Another approach is to create "clones" of the tasks, and add task dependencies (only) between the clones. Again, you could hide this behind a little abstraction. For example, createWorkflowTask("startServer") { ... } could create and configure both a startServer and a startServerWorkflow task.
In summary, the programmability of Gradle makes it possible to overcome the problem that "workflow" isn't yet a first-class concept in Gradle.
Gradle 1.6 added an alternative solution, but it's still incubating: mustRunAfter. See the release notes.

How to loop a Maven execution?

On a maven project, on process-test-resources phase I set up the database schemas with sql-maven-plugin. On this project that are N database shards which I set up with N repeated with exactly the same content bar the database name. Everything works as expected.
Problem here is that with a growing number of shards the number of similar blocks grows, which is cumbersome and makes maintenance annoying (since, per definition, all of those databases are literally the same). I would like to be able to define a "list" of database names and let sql-maven-plugin run once for each, without having to define the whole block many times.
I'm not looking for changes in the test setup as I positively want to setup as many shards as needed on the test environment. I need solely some "maven sugar" for conveniently define the over which values the executions should "loop".
I understand that maven itself does not support iteration by itself and am looking for alternatives or ideas of how to better achieve this. Things that come to my mind are:
Using/writing a "loop" plugin that manages the multiple parameterized executions
Extending sql-maven-plugin to support my use case
???
Does anyone has a better/cleaner solution?
Thanks in advance.
In this case i would recommend to use the maven-antrun-plugin to handle this situation, but of course it also possible to implement a particular maven plugin for this kind of purpose.

Resources