Spark/Spring validation-api dependencies conflict - spring

I run Spring/Spark app and face this problem
The following method did not exist:
javax.validation.BootstrapConfiguration.getClockProviderClassName()Ljava/lang/String;
The method's class, javax.validation.BootstrapConfiguration, is available from the following locations:
***validation-api-1.1.0.Final.jar!/javax/validation/BootstrapConfiguration.class
***/BOOT-INF/lib/validation-api-2.0.1.Final.jar!/javax/validation/BootstrapConfiguration.class
It was loaded from the following location:
file:/usr/hdp/2.6.3.0-235/spark2/jars/validation-api-1.1.0.Final.jar
How do I make spark read my dependency first and then look at the system lib?
I tried to specify in Oozie
I tried to specify in spark-submit
Nothing worked so far.

Had encountered a similar situation. I finally ended up doing as below. ie i copied the required jars to a directory, and used the extraClasspath option
spark-submit --conf spark.driver.extraClassPath="C:\sparkjars\validation-api-2.0.1.Final.jar;C:\sparkjars\gson-2.8.6.jar" myspringbootapp.jar
From the documentaion, spark.driver.extraClassPath Extra classpath entries to prepend to the classpath of the driver.

Related

Shouldn't Oozie/Sqoop jar location be configured during package installation?

I'm using HDP 2.4 in CentOS 6.7.
I have created the cluster with Ambari, so Oozie was installed and configured by Ambari.
I got two errors while running Oozie/Sqoop related to jar file location. The first concerned postgresql-jdbc.jar, since the Sqoop job is incrementally importing from Postgres. I added the postgresql-jdbc.jar file to HDFS and pointed to it in workflow.xml:
<file>/user/hdfs/sqoop/postgresql-jdbc.jar</file>
It solved the problem. But the second error seems to concern kite-data-mapreduce.jar. However, doing the same for this file:
<file>/user/hdfs/sqoop/kite-data-mapreduce.jar</file>
does not seem to solve the problem:
Failing Oozie Launcher, Main class
[org.apache.oozie.action.hadoop.SqoopMain], main() threw exception,
org/kitesdk/data/DatasetNotFoundException
java.lang.NoClassDefFoundError:
org/kitesdk/data/DatasetNotFoundException
It seems strange that this is not automatically configured by Ambari and that we have to copy jar files into HDFS as we start getting errors.
Is this the correct methodology or did I miss some configuration step?
This is happening due to the missing jars in the classpath. I would suggest you to use the property oozie.use.system.libpath=true in the job.properties file. All the sqoop related jars will be added automatically in the classpath. Then add only custom jar you need to the lib directory of the workflow application path., all the sqoop related jars will be added from the /user/oozie/share/lib/lib_<timestamp>/sqoop/*.jar.

How to find jar dependencies when running Apache Pig script?

I am having some difficulties running a simple pig script to import data into HBase using HBaseStorage
The error I have encountered is given by:
Caused by: <file demo.pig, line 14, column 0> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.backend.hadoop.hbase.HBaseStorage' with arguments '[rdf:predicate rdf:object]'
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Scan.setCacheBlocks(Z)V
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.initScan(HBaseStorage.java:427)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.<init>(HBaseStorage.java:368)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.<init>(HBaseStorage.java:239) 13_21.51.28.tar.gz
... 29 more
According to other questions and threads, the main response/answer to this issue would be to register the appropriate jars required for the HBaseStorage references. What I am stumped by is how am I supposed to identify the required JAR given the appropriate Pig function.
I even tried to open the various jar files under the hbase and pig folders to ensure the appropriate classes are registered in the pig script.
For example, since java.lang.NoSuchMethodError was caused by org.apache.hadoop.hbase.client.Scan.setCacheBlocks(Z)V
I imported specifically the jar that contains org.apache.hadoop.hbase.client.Scan, to no avail.
Pig's documentation does not provide any obvious links and help that I can refer to.
I am using Hadoop 2.7.0, HBase 1.0.1.1., Pig 0.15.0.
If you need any other clarification, feel free to ask me again. Would really appreciate it if someone could help me out with this issue.
Also, is it better to install Hadoop and the relevant softwares from scratch, or is it better to directly get one of the Hadoop bundles available?
There is something wrong with the released jar: hbase-client-1.0.1.1.jar
you can test it with this code, the error will show up:
Scan scan = new Scan();
scan.setCacheBlocks(true);
I've tried other set functions, like setCaching, it throws the same error. While I checked the source code, those functions exist. Maybe just compile hbase-client-1.0.1.1.jar manually, I'm still looking for better solution...
============
Update for above, found the root cause is hbase-client-1.0.1.1.jar incompatibility with older versions.
https://issues.apache.org/jira/browse/HBASE-10841
https://issues.apache.org/jira/browse/HBASE-10460
There is a change of return value for set functions, jars compiled with old version won't work with current.
For your question, you can modify the pig script $PIG_HOME/bin/pig, set debug=true, then it will just print running info.
Did you register required jars.
Most important jars habse,zookeeper and guava
I solved the similar kind of issue by registering zookeeper jar in my pigscript

Conflicting jars while using Unirest on CDH

I'm trying to use Unirest to send a POST request from a MapReduce job on a Cloudera Hadoop 5.2.1 cluster.
One of Unirest's dependencies is httpcore-4.3.3.jar. The CDH package includes httpcore-4.2.5.jar in the classpath. While trying to run my code, I got a "ClassNotFound" exception.
I added a line in my code to check where it's getting a different class from and the answer was troubling: /opt/cloudera/parcels/CDH/jars/httpcore-4.2.5.jar.
I've looked everywhere online and tried everything I found. Needless to say, nothing seems to work.
I tried setting HADOOP_CLASSPATH environment variable, I tried setting HADOOP_USER_CLASSPATH_FIRST, and I tried using the -libjars parameter in the hadoop jar command.
Anyone have any idea how to solve this?

How to use an external Jar file in the Hadoop program

I have a Hadoop program in which I use a couple of external jar files. When I submit the jar file of my program to the Hadoop cluster it gives me the following error.
Exception in thread "main" java.lang.NoClassDefFoundError: edu/uci/ics/jung/graph/Graph
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:201)
I understand what the problem is but don't know how to solve it. How can I add the jar files to my program?
I think, you can also modify the environment of the job’s running task attempts explicitly by specifying JAVA_LIBRARY_PATH or LD_LIBRARY_PATH variables:
hadoop jar [main class]
-D mapred.child.env="LD_LIBRARY_PATH=/path/to/your/libs" ...
You can use LIBJARS option when submitting the jobs like this:
export LIBJARS=/path/jar1,/path/jar2
hadoop jar my-example.jar com.example.MyTool -libjars ${LIBJARS} -mytoolopt value
I would recommend reading this article which describes precisely what you're looking for, in detail:
http://grepalex.com/2013/02/25/hadoop-libjars/
Add external jar file into the hadoop/lib folder to get rid out of it...

Hadoop WordCount.java Dependency Issues

I am trying to compile WordCount.java file into jar inside of /Desktop/Hadoop/playground/src.
Here's the command I am using.
javac -classpath hadoop-1.2.1-core.jar -d playground/classes playground/src/WordCount.java
The compiler seem to be getting invoked, however I am getting tons of errors like this
error: package org.apache.hadoop.conf does not exist import org.apache.hadoop.conf.Configuration
How do I go about fixing this?
May be there is an answer to this issue already. However I could not fix it.
You need to set the paths of hadoop-1.2.1-core.jar and all the other dependent jars correctly --
Try this exactly while you are in the Desktop/hadoop directory (valid in your case only solely based upon the inputs you provided in the comments)
javac -classpath *:lib/* -d playground/classes playground/src/WordCount.java

Resources