Where should I get the maven dependencies when migrating a mapreduce project from hdp to bigtop? - maven

I am migrating a map-reduce java-project (built using maven) from Horton Works to Big Top.
I am trying to figure out what is the best way to ensure that the depedency versions in my java-project match the jar files deployed on the cluster by Big Top.
We are currently targetting Big Top 3.2.0.
I am inspecting their BOM file and using those versions in my pom file.
For example, when we were using hdp I had something like
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.2.3.1.4.0-315</version>
</dependency>
According to the Big Top BOM file the Spark Version is 3.2.3 & Scala Library Version is 2.12.13. Does that mean that the new maven depdency in our project pom file should be
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.3</version>
</dependency>
Is there a place where the exact maven dependencies are listed? Is this the correct way to migrate our project's POM file?

Related

How to incorporate BIRT 4.9.0 into POM?

Scenario:
I am refactoring my application to work under java 17. Birt runtime 3.7.x is embedded in my application.
In updating to point to birt 4.9.0, I have updated my pom as follows:
<dependency>
<groupId>org.eclipse.birt</groupId>
<artifactId>birt-runtime</artifactId>
<version>4.9.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.eclipse.birt/birt-runtime-osgi -->
<dependency>
<groupId>org.eclipse.birt</groupId>
<artifactId>birt-runtime-osgi</artifactId>
<version>4.9.0</version>
</dependency>
When I build, I get the exception
org.eclipse.birt:birt-runtime:jar:4.9.0 was not found in https://repo1.maven.org/maven2 during a previous attempt
I have deleted and rebuilt my local .m2 directory.
When I dig around the maven repository I find the file at https://repo1.maven.org/maven2/org/eclipse/birt/birt-runtime/4.9.0/
This link (BiRT latest Runtime as one Maven Dependency for Eclipse) was resolved by manually downloading the file and pointing to a local copy. I'd prefer to avoid that, since Maven is all about avoiding that kind of scenario.
I suspect there's something in the maven path I am missing.
Thank you in advance.

Not found :org.apache.hadoop.security.authentication.util.KerberosUtil

I am running storm jar in a cluster ,where I configured hadoop,kafka,storm cluster
when I run the jar in local mode it works fine ,when I run it on storm cluster, I am finding respective error in Storm UI:
java.lang.NoSuchMethodError: org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosTicket(Ljavax/security/auth/Subject;)Z at
org.apache.hadoop.security.UserGroupInformation.<init>(UserGroupInformation.java:666) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:861) at
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
pom.xml
Click here to view POM file
After some google I found I found we have add hadoop auth jar.even after i finding same error
I think you're packaging an old Hadoop jar.
Take a look at the storm-hdfs POM https://github.com/apache/storm/blob/v1.0.6/external/storm-hdfs/pom.xml. When you use the Shade plugin, the jar you end up with will contain all your dependencies, including transitive ones brought in through direct dependencies. Storm-hdfs declares a dependency on a list of Hadoop jars. You need to make sure that you're declaring the same list of Hadoop jars in your POM if you want to use a different version of Hadoop from the default.
Specifically what's happening is that you haven't declared hadoop-auth in your POM, so your POM gets packaged with the default version of that jar (2.6.1). Since that version of hadoop-auth is incompatible with the other Hadoop jars (which are 2.9.1), you get an exception at runtime.
You should either exclude all Hadoop jars from your import of storm-hdfs and then put the jars you want to use in Storm's lib directory, or add the right versions of the Hadoop jars to your dependency list in your POM.
Edit:
I think I found your issue. You haven't set the scope of storm-core to provided. Since storm-core depends on hadoop-auth, and you haven't declared it explicitly, Maven will try to guess which version of hadoop-auth you need based on where the dependency appears in the tree. Since hadoop-auth appears as 2.9.1 through some of your Hadoop dependencies, but 2.6.1 through storm-core, you happen to get 2.6.1 put in your jar.
If you want to avoid this kind of thing in the future, you should use Maven's dependencyManagement block https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Management.
i.e. you should add something like the following to your pom, and then remove the exclusions of hadoop jars.
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
</dependencies>
</dependencyManagement>

Maven dependency management in IntelliJ

I'm currently building apps for Apache Spark. Spark provides during runtime a lot of dependencies, which I normally need if I test/run the apps locally in the IDE (IntelliJ).
Is there any possibility to have different set of dependendencies related if I use the 'package' or the usual compile/run target in IntelliJ ?
For instance, this is a needed dependency to Hadoop
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
<scope>provided</scope>
</dependency>
But the scope 'provided' does not work when I run it locally in the IDE.
If you want IntelliJ to use its own build process rather than Maven's, it's probably better to tell add a (global) library to your project dependencies in the IDE.
It definitely won't be providing these Spark JARs by default, which is what you're telling Maven here.

Which pom dependency should I use for jar commons-lang.jar

How do I know which version of a pom dependency I should use if its version is not in the jar name. For example the jar commons-lang.jar, what version of the pom dependency should I use ?
Here are its search results on maven central repo - http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22net.sf.staccatocommons%22%20AND%20a%3A%22commons-lang%22
First, use the one from Apache.
Second, you have two options, the 2.x or 3.x branches; from searching mvnrepository.com:
2.6
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
3.1
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.1</version>
</dependency>
If you're using Maven, you shouldn't have "just a jar", you should only know about POM dependencies.
(As of Feb 2014 it's up to 3.3.2, the 2.x series is still at 2.6. Note that you may use both in the same application because of their different packages.)
While the other answers are correct a very handy way to find out exact match for an unknown jar where all you have is the jar itself and it does not contain a useful manifest is to create a sha1 checksum of the jar and then do a checksum search on http://search.maven.org in the Advanced Search at the bottom or on your own instance of a Nexus repository server that downloaded the index of the Central Repository.
And btw your search on central was incorrect since it had the wrong groupId as part of it. Here is a corrected link:
http://search.maven.org/#search%7Cga%7C1%7C%22commons-lang%22
If you are migrating to Maven and just have a bunch of jars then you can try examining their META-INF/MANIFEST.MF files inside of those jars.
I've just opened commons-lang.jar and saw the following in its META-INF/MANIFEST.MF:
...
Implementation-Title: Commons Lang
Implementation-Vendor: The Apache Software Foundation
Implementation-Vendor-Id: org.apache
Implementation-Version: 2.4
...
So you can use Implementation-Version as your version in pom.xml:
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.4</version>
</dependency>

Choosing dependency version in maven and maven plugin

I have a maven plugin which is using hsqldb 1.8.0.10. In my pom.xml from the plugin, it is declared like this:
<dependency>
<groupId>hsqldb</groupId>
<artifactId>hsqldb</artifactId>
<version>1.8.0.10</version>
</dependency>
But if I run that plugin from another maven project, and that project has a newer version of hsqldb (for instance 1.9.0), how can I configure my plugin that he will use the newest version of hsqldb, without changing it's pom.xml?
And is it possible to do this the other way around as well? If my other maven project uses hsqldb 1.7.0 (for instance), that he will use the 1.8.0.10 version which is specified in the maven plugin itself?
I hope someone can answer my question.
Kind regards,
Walle
Your main question is possible, but it might not work properly if the plugin doesn't work with the newer code for any reason.
A plugin can have it's own personal dependencies section, and will use standard Maven dependency resolution, choosing the highest version requested. So, you can do
<plugin>
<groupId>some.group.id</groupId>
<artifactId>some.artifact.id</artifactId>
<version>someversion</version>
<dependencies>
<dependency>
<groupId>hsqldb</groupId>
<artifactId>hsqldb</artifactId>
<version>1.9.0</version>
</dependency>
</dependencies>
</plugin>
I don't think going the other way around is possible, though.
use properties place holder for the version, say ${hsqldb.version} then declare in different project pom the version you want to put in it

Resources