Edit YARN's classpath in Oozie - hadoop

I am trying to run a hadoop job through Oozie. The job uploads data to DynamoDB in AWS. As such, I use AmazonDynamoDBClient. I get the following exception in reducers:
2016-06-14 10:30:52,997 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:458)
at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:379)
at com.amazonaws.util.json.Jackson.<clinit>(Jackson.java:32)
at com.amazonaws.internal.config.InternalConfig.loadfrom(InternalConfig.java:233)
at com.amazonaws.internal.config.InternalConfig.load(InternalConfig.java:251)
at com.amazonaws.internal.config.InternalConfig$Factory.<clinit>(InternalConfig.java:308)
at com.amazonaws.util.VersionInfoUtils.userAgent(VersionInfoUtils.java:139)
at com.amazonaws.util.VersionInfoUtils.initializeUserAgent(VersionInfoUtils.java:134)
at com.amazonaws.util.VersionInfoUtils.getUserAgent(VersionInfoUtils.java:95)
at com.amazonaws.ClientConfiguration.<clinit>(ClientConfiguration.java:42)
at com.amazonaws.PredefinedClientConfigurations.dynamoDefault(PredefinedClientConfigurations.java:38)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.<init>(AmazonDynamoDBClient.java:292)
at com.mypackage.UploadDataToDynamoDBMR$DataUploaderReducer.setup(UploadDataToDynamoDBMR.java:396)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
I used a fat jar which packages all dependencies and copied the jar to Oozie's lib directory.
I have also used dependency management in pom to pin fasterxml jackson dependency to 2.4.1 (which is used by AWS dynamodb SDK). However, when the execution happens on the reducers, somehow some other version of fasterxml jackson appears first on the classpath (or so I believe).
I also excluded jackson dependency from dynamodb and aws sdks.
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-dynamodb</artifactId>
<version>1.10.11</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-core</artifactId>
<version>1.10.11</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
How can I make sure that my jar is the first one on the classpath in mappers and reducers? I tried the suggestion on this page and added the following property to the job's configuration xml:
<property>
<name>oozie.launcher.mapreduce.user.classpath.first</name>
<value>true</value>
</property>
But this did not help.
Any suggestions?

Have you copied your jar into the lib folder next to the lib workflow.xml or into sharelib?
Check what version of Jackson your Hadoop distribution is using and try to use that version of Jackson everywhere. Also, it might worth checking that no other Jackson jars are on the classpath.
From the exception it looks like that Hadoop tries to call a method:
com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering
This method was introduced in Jackson version 2.3, so probably an even older version of Jackson is in there somewhere.

Related

Databricks local test fail with java.lang.NoSuchMethodError: org.apache.hadoop.security.HadoopKerberosName.setRuleMechanism

I have a unit test to databricks code, and I want to run it locally on windows. Unluckily when I run pytest with PyCharm, it throws the following exception:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.security.HadoopKerberosName.setRuleMechanism(Ljava/lang/String;)V
at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:84)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2747)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2747)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:368)
at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:368)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$8(SparkSubmit.scala:376)
at scala.Option.map(Option.scala:230)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:376)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
And from source code it is from the initialization:
spark = SparkSession.builder \
.master("local[2]") \
.appName("Helper Functions Unit Testing") \
.getOrCreate()
I do search the above error and most of them are related to maven configure to add dependency of hadoop auth. However, for pyspark, I don't know how to deal with it. Does anyone have experience or insight for this error?
Here my workaround is to have python version to 3.7 and change pyspark version to 3.0, and then it seems ok. So it is related to the environment and dependency inconsistent.
This is just limit to my case, and from my search on web most is related to maven to add hadoop-auth.jar dependency for hadoop configuration.
Encountered this error for a Maven project written in Scala, not Python. What did it for me was adding not only the hadoop-auth dependency like OP specified but also the hadoop-common dependency in my pom file like so,
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>3.1.2</version>
</dependency>
Replace 3.1.2 with whatever version you're using. However, I also found that I had to find other dependencies that conflicted with hadoop-common and hadoop-auth and add exclusions to them like so,
<exclusions>
<exclusion>
<artifactId>hadoop-common</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
<exclusion>
<artifactId>hadoop-auth</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
</exclusions>

Hive crashing with java.lang.IncompatibleClassChangeError

Running hive 3.1.1 against Hadoop 3.2.0 crashes when running 'select * from employee' with
java.lang.IncompatibleClassChangeError: Class com.google.common.collect.ImmutableSortedMap does not implement the requested interface java.util.NavigableMap
Commands like show tables all run fine and data is loaded ok from the CLI as well.
Checked various other commands and e.g. data is loaded etc. Uses MySQL as metastore with MySQL-connector-java-5.1.47.jar. The only other observation is that sometimes I get
WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
which other people seem to get as well and seems not to impact me here.
Anybody seen this as well? Help greatly appreciated ...
2019-04-02 16:24:41,643 INFO metastore.HiveMetaStore: 0: Done cleaning up thread local RawStore
2019-04-02 16:24:41,645 INFO HiveMetaStore.audit: ugi=fdai0145 ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore
Exception in thread "main" java.lang.IncompatibleClassChangeError: Class com.google.common.collect.ImmutableSortedMap does not implement the requested interface java.util.NavigableMap
at org.apache.calcite.schema.Schemas.gatherLattices(Schemas.java:498)
at org.apache.calcite.schema.Schemas.getLatticeEntries(Schemas.java:492)
at org.apache.calcite.jdbc.CalciteConnectionImpl.init(CalciteConnectionImpl.java:153)
at org.apache.calcite.jdbc.Driver$1.onConnectionInit(Driver.java:109)
at org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:139)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:150)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1414)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12161)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236).
Perhaps it's a late answer, but I did run into the same issue. In my case, I found that the hive-exec Maven artifact's jar file is shading the Google collections framework. Now, since I've seen that other Hadoop/Hive artifacts also make use of Google Guava (version 11 if I'm not mistaken), there's a good chance that calcite will find the wrong class definition for ImmutableSortedMap (from Guava 11).
For me, excluding guava from the Hadoop/Hive artifacts that my code uses made calcite find the correct class version from Google collections.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-minicluster</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>${hive.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
This is probably an issue that should be reported to the Hive project, since these kinds of class path collision errors are hard to diagnose. Internally shaded artifacts should have the project's own package prefix to indicate explicit shading of the external code in question.
Oh well. Hope this helps.

Spring boot and apache spark - container conflict

I am trying to use spring boot 1.1.5 and apache spark 1.0.2 together in project. Look like apache spark uses Jetty container internally and I have configured spring-boot to use Tomcat container. However application startup fails with some securityException at root cause. If I see full stack trace looks like spring boot trying to initialize "jettyEmbeddedServletContainerFactory" which it shouldn't in first place. It probably picks it up from classpath due to jetty presence via spark. If I exclude jetty from spark and run again I don't see same error again but then SparkContext initialization fails due to not finding jetty. How do i tell spring-boot runtime to look for "TomcatEmbeddedServletContainerFactory" instead of jetty one?
I got "java.lang.SecurityException: class "javax.servlet.http.HttpSessionIdListener"'s signer information does not match signer information of other classes in the same package"
To fix this issue I was need to remove all javax.servlet dependencies.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.3.1</version>
<exclusions>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>javax.servlet-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.glassfish</groupId>
<artifactId>javax.servlet</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty.orbit</groupId>
<artifactId>javax.servlet</artifactId>
</exclusion>
</exclusions>
</dependency>
#Joakim Erdfelt, Thanks.
I was just waiting to see if someone is familiar with this situation and if it's just a small configuration change. As it turns out it is! #Configuration
#EnableAutoConfiguration(exclude={EmbeddedServletContainerFactory.class})
public class MyConfiguration { }
I defined my own "EmbeddedServletContainerFactory" bean as a "org.springframework.boot.context.embedded.tomcat.TomcatEmbeddedServletContaine‌​rFactory" and it started working as I expected.

Glassfish incremental deployment failes when including Selenium

I have a Java EE project which is meant to run on Glassfish 4.1. I want to use Selenium to collect information from some web pages, i.e. I need to include Selenium in the deployment (not just for tests).
I am using Eclipse IDE and have previously utilized the incremental deployment function in Eclipse to automatically deploy all saved changes to the project. But when I included (with Maven) the dependencies for Selenium incremental deployment stopped working. The project can still be deployed to Glassfish but I have to restart Glassfish between every change. I get the following error in Eclipse:
Exception while loading the app : java.lang.IllegalStateException: ContainerBase.addChild: start: org.apache.catalina.LifecycleException: java.lang.RuntimeException: com.sun.faces.config.ConfigurationException: java.util.concurrent.ExecutionException: com.sun.faces.config.ConfigurationException: Unable to parse document 'bundle://136.0:1/com/sun/faces/jsf-ri-runtime.xml': DTD factory class org.apache.xerces.impl.dv.dtd.DTDDVFactoryImpl does not extend from DTDDVFactory.. Please see server.log for more details.
org.apache.xerces.impl.dv.dtd.DTDDVFactoryImpl is included with Selenium as a transitive dependency (xerces:xercesImpl:2.11.0).
Here are my Maven dependencies:
<dependency>
<groupId>org.jboss.arquillian.selenium</groupId>
<artifactId>selenium-bom</artifactId>
<version>2.44.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-htmlunit-driver</artifactId>
</dependency>
I hope there is a solution to this but after reading Jens Schauder's response in Dealing with "Xerces hell" in Java/Maven? I'm afraid there might not be. Anyone?
I currently can't reproduce the issue with a simple project, did you make sure that you don't have any other dependencies which are importing another version of xercesImpl?
You can try to place the xercesImpl-2.11.0.jar and the transitive dependency xml-apis-1.4.01.jar in the lib folder of your Glassfish domain and exclude it from your dependencies like this:
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-htmlunit-driver</artifactId>
<version>2.44.0</version>
<exclusions>
<exclusion>
<artifactId>xercesImpl</artifactId>
<groupId>xerces</groupId>
</exclusion>
</exclusions>
</dependency>
See also:
org.apache.xerces.impl.dv.DVFactoryException: DTD factory class org.apache.xerces.impl.dv.dtd.DTDDVFactoryImpl does not extend from DTDDVFactory
Xerces error: org.apache.xerces.impl.dv.dtd.DTDDVFactoryImpl

Spring + Hibernate + Tomcat Dependency problems

when I run tomcat and the war is deployed I get :
NoClassDefFoundError : org/apache/commons/collections/map/LRUMap
Invocation of init method failed; nested exception is
java.lang.NoClassDefFoundError:
org/apache/commons/collections/map/LRUMap
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:527)
~[spring-beans-3.1.0.RELEASE.jar:3.1.0.RELEASE]
What is strange is that I got the commons-collections-2.1.jar (I even tried 3.1) in my WEB-INF lib folder.
Edit :
I did copy the commons-collections from WEB-INF/lib to Tomcat lib and it seems to work. However I won't be able to do that on the production server, why isn't it taking my WEB-INF/lib version ?
Ok so I did put version 3.2.1 of commons-collections and the error disappeared. I unfortunately still don't know which library is depending on this version. Even mvn dependency:tree didn't help ...
I had this exception when I was with xdoclet on dependencies.
If you are with this dependency, just exclude it.
I have the same probleme, maybe it's too late to approve the answer but it's still benefitial for people who will have this problem in the futur.
So I exclude commons-collections from net.sf.jasperreports, after that the tomcat runs perfectly whithout any problem.
<dependency>
<groupId>net.sf.jasperreports</groupId>
<artifactId>jasperreports</artifactId>
<version>4.1.1</version>
<type>jar</type>
<scope>compile</scope>
<exclusions>
<exclusion>
<artifactId>commons-collections</artifactId>
<groupId>commons-collections</groupId>
</exclusion>
</exclusions>
</dependency>

Resources