Overwrite Databricks Dependency - maven

In our project we're using com.typesafe:config in version 1.3.4. According to the latest release notes, this dependency is already provided by Databricks on the cluster, but in a very old version (1.2.1).
How can I overwrite the provided dependency with our own version?
We use maven, in our dependencies I have
<dependency>
<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.3.4</version>
</dependency>
Our created jar file should therefore contain the newer version.
I created a Job by uploading the jar file. The Job fails because it can't find a method that was added after version 1.2.1, so it looks like the library we provided gets overwritten by the older version on the cluster.

In the end we have fixed this by shading the relevant classes, by adding the following to our build.sbt
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.typesafe.config.**" -> "shadedSparkConfigForSpark.#1").inAll
)

We solved it in the end by utilizing Sparks ChildFirstURLClassLoader. The project is open source so you can check it out yourself here and the usage of the method here.
But for reference, here is the method in its entirety. You need to provide a Seq of jars that you want to override with your own, in our case it's the typesafe config.
def getChildFirstClassLoader(jars: Seq[String]): ChildFirstURLClassLoader = {
val initialLoader = getClass.getClassLoader.asInstanceOf[URLClassLoader]
#tailrec
def collectUrls(clazz: ClassLoader, acc: Map[String, URL]): Map[String, URL] = {
val urlsAcc: Map[String, URL] = acc++
// add urls on this level to accumulator
clazz.asInstanceOf[URLClassLoader].getURLs
.map( url => (url.getFile.split(Environment.defaultPathSeparator).last, url))
.filter{ case (name, url) => jars.contains(name)}
.toMap
// check if any jars without URL are left
val jarMissing = jars.exists(jar => urlsAcc.get(jar).isEmpty)
// return accumulated if there is no parent left or no jars are missing anymore
if (clazz.getParent == null || !jarMissing) urlsAcc else collectUrls(clazz.getParent, urlsAcc)
}
// search classpath hierarchy until all jars are found or we have reached the top
val urlsMap = collectUrls(initialLoader, Map())
// check if everything found
val jarsNotFound = jars.filter( jar => urlsMap.get(jar).isEmpty)
if (jarsNotFound.nonEmpty) {
logger.info(s"""available jars are ${initialLoader.getURLs.mkString(", ")} (not including parent classpaths)""")
throw ConfigurationException(s"""jars ${jarsNotFound.mkString(", ")} not found in parent class loaders classpath. Cannot initialize ChildFirstURLClassLoader.""")
}
// create child-first classloader
new ChildFirstURLClassLoader(urlsMap.values.toArray, initialLoader)
}
As you can see, it also has some logic to abort if the jar files you specified do not exist in the classpath.

Databricks supports the initialization script (cluster scope or global scope) so that you can install/remove any dependency. The details are at https://docs.databricks.com/clusters/init-scripts.html.
In your initialization script, you can remove the default jar file locates at databricks driver/executor classpath /databricks/jars/ and add the expected one there.

Related

Use one version of java library or another => make code compatible for both

I'm working on a Spring Boot project built with Gradle and the main language is Kotlin.
In this project, there is one imported library (developed in Java) which, depending on the version I use, has 5 or 6 parameters in the constructor of a specific class I use.
For now, I switch between the versions manually by changing the version number in the build.gradle.kts file so my question would be : regardless of the version I use, how could my code work for all the versions ?
So, basically,
library-version1.jar => Class(6 parameters)
library-version2.jar => Class(5 parameters)
project with library-version1.jar or library-version2.jar imported => universal code to create instance of Class
P.S : may I add that I have to use those 2 versions of the library.
I found the solution for my initial question :
val yourVariable = YourClass::class.java
val constructor = yourVariable.constructors[0] //The class I use has only one constructor
val implementationVersion = yourVariable.`package`.implementationVersion
if (implementationVersion < "a specific version number") {
constructor.newInstance(6 parameters) as YourClass
} else {
constructor.newInstance(5 parameters) as YourClass
}
But now I have a follow-up question that you can find in the comments (I can still post it here though) :
In my build.gradle.kts file, I have this line in my dependencies :
dependencies {
implementation("myLibrary:1.0")
...
}
Obviously, I don't want to switch between myLibrary v1 and v2 manually. Just by changing the build.gradle.kts file, would it be possible to have :
build.gradle.kts -> appV1 (if myLibrary v1 used)
-> appV2 (if myLibrary v2 used)
?

Gradle: Set context root of web application

I have 2 Gradle projects, A is the root, and B is a subproject of A.
Project B generates a .war file, which is included in the .ear file, that project A generates.
I'd like to implement a general solution, where I can change the context root of project B.
Based on my research, I should call the ear.deploymentDescriptors.webModule(path, contextRoot) method, where path is the path of the artifact B in the ear.
How can I get the name of the artifact of B from project A, so that I have something to call the above mentioned method?
Is there a better way to set the context root?
Assume project A has a build.gradle and within ear, the below code can be solve this -
plugins {
id 'ear'
}
dependencies {
deploy project(path:':b', configuration: 'archives')
}
ear{
/* some basis configuration */
libDirName 'APP-INF/lib'
deploymentDescriptor {
/* Some basic attributes */
fileName = "application.xml"
version = "8"
def Set<Project> subProj = project.getSubprojects();
subProj.each{proj ->
if(proj.name.contains("B")){
webModule(proj.name + "-" + proj.version + ".war", "/"+ proj.name)
} //if close
}//each close
}//deploymentDescriptor close
}//ear close

Reading pom version in a pipeline job if pom is not in project root

I have tried using readMavenPom like the following to get the pom version and so for this has been working very well as the pom.xml file has been present in the root of the project directory.
pom = readMavenPom file: 'pom.xml'
For some of our projects, this pom.xml won't be available in the root of the project repository instead it will be available inside the parent module so in that case, we modify the groovy script like the following.
pom = readMavenPom file: 'mtree-master/pom.xml'
There are only two possibilities, either the pom.xml file will present in the root or it will be inside the parent module. So is there a way to rule out this customization that we make every time?
I'd check if the file exists in a specific location with fileExisits:
def pomPath = 'pom.xml'
if (!(fileExists pomPath)) {
pomPath = 'mtree-master/pom.xml'
}
pom = readMavenPom file: pomPath
Bonus: Check multiple paths
def pomPaths = ['pom.xml', 'mtree-master/pom.xml']
def pomPath = ''
for (def path : pomPaths) {
if (fileExists path) {
pomPath = path
break
}
}
// check that pomPath is not empty and carry on

Reading includes from idl file in custom task

I want to make my gradle build inteligent when building my model.
To acquire this I was planning to read schema files, acquire what is included and then build firstly included models (if they are not present).
I'm pretty new to Groovy and Gradle, so please that into account.
What I have:
build.gradle file on root directory, including n subdirectories (subprojects added to settings.gradle). I have only one gradle build file, because I defined tasks like:
subprojects {
task init
task includeDependencies(type: checkDependencies)
task build
task dist
(...)
}
I will return to checkDependencies shortly.
Schema files located externally, which I can see.
Each of them have from 0 to 3 lines of code, that say about dependencies and looks like that:
#include "ModelDir/ModelName.idl"
In my build.gradle I created task that should open, and read those dependencies, preferably return them:
class parsingIDL extends DefaultTask{
String idlFileName="*def file name*"
def regex = ~/#include .*\/(\w*).idl/
#Task Action
def checkDependencies(){
File idlFile= new File(idlFileName)
if(!idlFile.exists()){
logger.error("File not found)
} else {
idlFile.eachLine{ line ->
def dep = []
def matcher = regex.matcher(line)
(...)*
}
}
}
}
What should I have in (...)* to find all dependencies and how should I define, that for example
subprojectA::build.dependsOn([subprojectB::dist, subprojectC::dist])?
All I could find on internet created dep, that outputted given:
[]
[]
[modelName]
[]
[]
(...)

Gradle with Eclipse - incomplete .classpath when multiple sourcesets

I have a gradle build script with a handful of source sets that all have various dependencies defined (some common, some not), and I'm trying to use the Eclipse plugin to let Gradle generate .project and .classpath files for Eclipse, but I can't figure out how to get all the dependency entries into .classpath; for some reason, quite few of the external dependencies are actually added to .classpath, and as a result the Eclipse build fails with 1400 errors (building with gradle works fine).
I've defined my source sets like so:
sourceSets {
setOne
setTwo {
compileClasspath += setOne.runtimeClasspath
}
test {
compileClasspath += setOne.runtimeClasspath
compileClasspath += setTwo.runtimeClasspath
}
}
dependencies {
setOne 'external:dependency:1.0'
setTwo 'other:dependency:2.0'
}
Since I'm not using the main source-set, I thought this might have something to do with it, so I added
sourceSets.each { ss ->
sourceSets.main {
compileClasspath += ss.runtimeClasspath
}
}
but that didn't help.
I haven't been able to figure out any common properties of the libraries that are included, or of those that are not, but I can't find anything that I'm sure of (although of course there has to be something). I have a feeling that all included libraries are dependencies of the test source-set, either directly or indirectly, but I haven't been able to verify that more than noting that all of test's dependencies are there.
How do I ensure that the dependencies of all source-sets are put in .classpath?
This was solved in a way that was closely related to a similar question I asked yesterday:
// Create a list of all the configuration names for my source sets
def ssConfigNames = sourceSets.findAll { ss -> ss.name != "main" }.collect { ss -> "${ss.name}Compile".toString() }
// Find configurations matching those of my source sets
configurations.findAll { conf -> "${conf.name}".toString() in ssConfigNames }.each { conf ->
// Add matching configurations to Eclipse classpath
eclipse.classpath {
plusConfigurations += conf
}
}
Update:
I also asked the same question in the Gradle forums, and got an even better solution:
eclipseClasspath.plusConfigurations = configurations.findAll { it.name.endsWith("Runtime") }
It is not as precise, in that it adds other stuff than just the things from my source sets, but it guarantees that it will work. And it's much easier on the eyes =)
I agree with Tomas Lycken, it is better to use second option, but might need small correction:
eclipse.classpath.plusConfigurations = configurations.findAll { it.name.endsWith("Runtime") }
This is what worked for me with Gradle 2.2.1:
eclipse.classpath.plusConfigurations = [configurations.compile]

Resources