java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found - hadoop

I am trying to use the Hadoop HDFS Java API to list all files in HDFS.
I am able to list the files on remote HDFS by running the code in my local eclipse.
But i get the exception
java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2290)
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2303)
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342)
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:163)
when I execute the code from a web server.
I have added the below maven dependencies.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.0.0-cdh4.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>2.0.0-cdh4.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.0.0-cdh4.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.0.0-mr1-cdh4.5.0</version>
</dependency>
Also I have embedded the required jars into the exported jar and maven has added the same in the buildpath.
If any one has encountered this issue earlier request you to please share the solution.

I am facing a similar issue with Apache hadoop 2.2.0 realease, I did a workaround by running it as a separate process, by
final Process p = Runtime.getRuntime ().exec ("java -jar {jarfile} {classfile}";
final Scanner output = new Scanner (p.getErrorStream ());
while (output.hasNext ()) {
try {
System.err.println (output.nextLine ());
} catch (final Exception e) {
}
}
The jar file contains the implementation using the apache hadoop 2.2.0 jars.
Though, I am still searching for an exact solution.

For me, hadoop-hdfs-2.6.0.jar was missing in zeppelin server's lib dir. I copied in zeppelin lib forder and my problem was resolved. :)
and add dependency for hadoop-hdfs-2.6.0.jar in pom.xml also.

Related

Databricks local test fail with java.lang.NoSuchMethodError: org.apache.hadoop.security.HadoopKerberosName.setRuleMechanism

I have a unit test to databricks code, and I want to run it locally on windows. Unluckily when I run pytest with PyCharm, it throws the following exception:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.security.HadoopKerberosName.setRuleMechanism(Ljava/lang/String;)V
at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:84)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2747)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2747)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:368)
at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:368)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$8(SparkSubmit.scala:376)
at scala.Option.map(Option.scala:230)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:376)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
And from source code it is from the initialization:
spark = SparkSession.builder \
.master("local[2]") \
.appName("Helper Functions Unit Testing") \
.getOrCreate()
I do search the above error and most of them are related to maven configure to add dependency of hadoop auth. However, for pyspark, I don't know how to deal with it. Does anyone have experience or insight for this error?
Here my workaround is to have python version to 3.7 and change pyspark version to 3.0, and then it seems ok. So it is related to the environment and dependency inconsistent.
This is just limit to my case, and from my search on web most is related to maven to add hadoop-auth.jar dependency for hadoop configuration.
Encountered this error for a Maven project written in Scala, not Python. What did it for me was adding not only the hadoop-auth dependency like OP specified but also the hadoop-common dependency in my pom file like so,
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>3.1.2</version>
</dependency>
Replace 3.1.2 with whatever version you're using. However, I also found that I had to find other dependencies that conflicted with hadoop-common and hadoop-auth and add exclusions to them like so,
<exclusions>
<exclusion>
<artifactId>hadoop-common</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
<exclusion>
<artifactId>hadoop-auth</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
</exclusions>

TaskID.<init>(Lorg/apache/hadoop/mapreduce/JobID;Lorg/apache/hadoop/mapreduce/TaskType;I)V

val jobConf = new JobConf(hbaseConf)
jobConf.setOutputFormat(classOf[TableOutputFormat])
jobConf.set(TableOutputFormat.OUTPUT_TABLE, tablename)
val indataRDD = sc.makeRDD(Array("1,jack,15","2,Lily,16","3,mike,16"))
indataRDD.map(_.split(','))
val rdd = indataRDD.map(_.split(',')).map{arr=>{
val put = new Put(Bytes.toBytes(arr(0).toInt))
put.add(Bytes.toBytes("cf"),Bytes.toBytes("name"),Bytes.toBytes(arr(1)))
put.add(Bytes.toBytes("cf"),Bytes.toBytes("age"),Bytes.toBytes(arr(2).toInt))
(new ImmutableBytesWritable, put)
}}
rdd.saveAsHadoopDataset(jobConf)
When I run hadoop or spark jobs, I often meet the error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.TaskID.<init>(Lorg/apache/hadoop/mapreduce/JobID;Lorg/apache/hadoop/mapreduce/TaskType;I)V
at org.apache.spark.SparkHadoopWriter.setIDs(SparkHadoopWriter.scala:158)
at org.apache.spark.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:60)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1188)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1161)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1161)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1161)
at com.iteblog.App$.main(App.scala:62)
at com.iteblog.App.main(App.scala)`
At the begin, I think, is the jar conflict, but I carefully checked the jar: there are no other jars. The spark and hadoop versions are:
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.1</version>`
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.6.0-mr1-cdh5.5.0</version>
And I found that the TaskID and TaskType are all in the hadoop-core jar, but not in the same package. Why the mapred.TaskID can refer the mapreduce.TaskType ?
Oh,I have resolve this problem,add the maven dependency
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.6.0-cdh5.5.0</version>
</dependency>
the error disappear!
I have also faced such issue . It basically due to jar issue only.
Add the Jar file from Maven spark-core_2.10
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>2.0.2</version>
</dependency>
After changing the Jar file

Edit YARN's classpath in Oozie

I am trying to run a hadoop job through Oozie. The job uploads data to DynamoDB in AWS. As such, I use AmazonDynamoDBClient. I get the following exception in reducers:
2016-06-14 10:30:52,997 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:458)
at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:379)
at com.amazonaws.util.json.Jackson.<clinit>(Jackson.java:32)
at com.amazonaws.internal.config.InternalConfig.loadfrom(InternalConfig.java:233)
at com.amazonaws.internal.config.InternalConfig.load(InternalConfig.java:251)
at com.amazonaws.internal.config.InternalConfig$Factory.<clinit>(InternalConfig.java:308)
at com.amazonaws.util.VersionInfoUtils.userAgent(VersionInfoUtils.java:139)
at com.amazonaws.util.VersionInfoUtils.initializeUserAgent(VersionInfoUtils.java:134)
at com.amazonaws.util.VersionInfoUtils.getUserAgent(VersionInfoUtils.java:95)
at com.amazonaws.ClientConfiguration.<clinit>(ClientConfiguration.java:42)
at com.amazonaws.PredefinedClientConfigurations.dynamoDefault(PredefinedClientConfigurations.java:38)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.<init>(AmazonDynamoDBClient.java:292)
at com.mypackage.UploadDataToDynamoDBMR$DataUploaderReducer.setup(UploadDataToDynamoDBMR.java:396)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
I used a fat jar which packages all dependencies and copied the jar to Oozie's lib directory.
I have also used dependency management in pom to pin fasterxml jackson dependency to 2.4.1 (which is used by AWS dynamodb SDK). However, when the execution happens on the reducers, somehow some other version of fasterxml jackson appears first on the classpath (or so I believe).
I also excluded jackson dependency from dynamodb and aws sdks.
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-dynamodb</artifactId>
<version>1.10.11</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-core</artifactId>
<version>1.10.11</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
How can I make sure that my jar is the first one on the classpath in mappers and reducers? I tried the suggestion on this page and added the following property to the job's configuration xml:
<property>
<name>oozie.launcher.mapreduce.user.classpath.first</name>
<value>true</value>
</property>
But this did not help.
Any suggestions?
Have you copied your jar into the lib folder next to the lib workflow.xml or into sharelib?
Check what version of Jackson your Hadoop distribution is using and try to use that version of Jackson everywhere. Also, it might worth checking that no other Jackson jars are on the classpath.
From the exception it looks like that Hadoop tries to call a method:
com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering
This method was introduced in Jackson version 2.3, so probably an even older version of Jackson is in there somewhere.

Servlet 500 error ClassNotFound exception

I'm building a web app using Vaadin and it needs to communicate with several REST APIs. I've set it up in IntelliJ with Maven. I was thinking for the REST client I would use GSON to parse the JSON objects I'd be receiving from the open APIs, however, the application crashes due to a servlet exception error.
Caused by:
java.lang.ClassNotFoundException: com.google.gwt.json.client.JSONObject
I've added the GSON dependency to the pom.xml:
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.3.1</version>
</dependency>
And have tried changing the module settings from Provided to Compiled to Runtime but with no change.
I'm just stumped as to why the GSON jar appears in the project dependencies within IntelliJ using Maven but fails on run time. I've seen references to Eclipse and including the jar in the classpath but, again, I'm using IntelliJ/Maven to build my Vaadin project and satisfy dependencies.
Any help is greatly appreciated!
Add below dependency in your classpath:
<dependency>
<groupId>com.google.gwt</groupId>
<artifactId>gwt-user</artifactId>
<version>2.3.0</version>
</dependency>

Glassfish incremental deployment failes when including Selenium

I have a Java EE project which is meant to run on Glassfish 4.1. I want to use Selenium to collect information from some web pages, i.e. I need to include Selenium in the deployment (not just for tests).
I am using Eclipse IDE and have previously utilized the incremental deployment function in Eclipse to automatically deploy all saved changes to the project. But when I included (with Maven) the dependencies for Selenium incremental deployment stopped working. The project can still be deployed to Glassfish but I have to restart Glassfish between every change. I get the following error in Eclipse:
Exception while loading the app : java.lang.IllegalStateException: ContainerBase.addChild: start: org.apache.catalina.LifecycleException: java.lang.RuntimeException: com.sun.faces.config.ConfigurationException: java.util.concurrent.ExecutionException: com.sun.faces.config.ConfigurationException: Unable to parse document 'bundle://136.0:1/com/sun/faces/jsf-ri-runtime.xml': DTD factory class org.apache.xerces.impl.dv.dtd.DTDDVFactoryImpl does not extend from DTDDVFactory.. Please see server.log for more details.
org.apache.xerces.impl.dv.dtd.DTDDVFactoryImpl is included with Selenium as a transitive dependency (xerces:xercesImpl:2.11.0).
Here are my Maven dependencies:
<dependency>
<groupId>org.jboss.arquillian.selenium</groupId>
<artifactId>selenium-bom</artifactId>
<version>2.44.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-htmlunit-driver</artifactId>
</dependency>
I hope there is a solution to this but after reading Jens Schauder's response in Dealing with "Xerces hell" in Java/Maven? I'm afraid there might not be. Anyone?
I currently can't reproduce the issue with a simple project, did you make sure that you don't have any other dependencies which are importing another version of xercesImpl?
You can try to place the xercesImpl-2.11.0.jar and the transitive dependency xml-apis-1.4.01.jar in the lib folder of your Glassfish domain and exclude it from your dependencies like this:
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-htmlunit-driver</artifactId>
<version>2.44.0</version>
<exclusions>
<exclusion>
<artifactId>xercesImpl</artifactId>
<groupId>xerces</groupId>
</exclusion>
</exclusions>
</dependency>
See also:
org.apache.xerces.impl.dv.DVFactoryException: DTD factory class org.apache.xerces.impl.dv.dtd.DTDDVFactoryImpl does not extend from DTDDVFactory
Xerces error: org.apache.xerces.impl.dv.dtd.DTDDVFactoryImpl

Resources