Alluxio Error:java.lang.IllegalArgumentException: Wrong FS - hadoop

I am able to run wordcount on alluxio with an example jar provided by cloudera, using:
sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar wordcount -libjars /home/nn1/alluxio-1.2.0/core/client/target/alluxio-core-client-1.2.0-jar-with-dependencies.jar alluxio://nn1:19998/wordcount alluxio://nn1:19998/wc1
and it's a success.
But I can't run it when I use the jar created with the ATTACHED CODE, This is also a sample wordcount example
code
sudo -u hdfs hadoop jar /home/nn1/HadoopWordCount-0.0.1-SNAPSHOT-jar-with-dependencies.jar edu.am.bigdata.C45TreeModel.C45DecisionDriver -libjars /home/nn1/alluxio-1.2.0/core/client/target/alluxio-core-client-1.2.0-jar-with-dependencies.jar alluxio://10.30.60.45:19998/abdf alluxio://10.30.60.45:19998/outabdf
Above code is build using maven
Pom.xml file contains
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.6.0-mr1-cdh5.4.5</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0-cdh5.4.5</version>
</dependency>
Could you please help me in running my wordcount program in alluxio cluster. Hope no extra configurations are added into pom file for running the same.
I am getting the following error after running my jar :
java.lang.IllegalArgumentException: Wrong FS:
alluxio://10.30.60.45:19998/outabdf, expected: hdfs://10.30.60.45:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1215)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1412)
at edu.WordCount.run(WordCount.java:47)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at edu.WordCount.main(WordCount.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

The problem comes from the call to
FileSystem fs = FileSystem.get(conf);
on line 101. The FileSystem created by FileSystem.get(conf) will only support paths with the scheme defined by Hadoop's fs.defaultFS property. To fix the error, change that line to
FileSystem fs = FileSystem.get(URI.create("alluxio://nn1:19998/", conf)
By passing a URI, you override fs.defaultFS, enabling the created FileSystem to support paths using the alluxio:// scheme.
You could also fix the error by modifying fs.defaultFS in your core-site.xml
<property>
<name>fs.defaultFS</name>
<value>alluxio://nn1:19998/</value>
</property>
However, this could impact other systems that share the core-site.xml file, so I recommend the first approach of passing an alluxio:// URI to FileSystem.get()

Related

Run time Error in executing a jar file on hdfs

What should i do further?
I have an error message when running this jar file on hadoop system.
hadoop jar units.jar /input_dir/sample.txt /output_dir/result
Exception in thread "main" java.lang.ClassNotFoundException:
/input_di /sample/txt at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:278) at
org.apache.hadoop.util.RunJar.run(RunJar.java:214) at
org.apache.hadoop.util.RunJar.main(RunJar.java:136)
From the Apache Hadoop Docs :
Usage: hadoop jar <jar> [mainClass] args...
Runs a jar file.
You are missing the fully qualified class name in your JAR command.

Unable to run a Spark Java Program

I am running a Spark Program written in java & I am using the sample wordcount example.
I have created a jar file but, when I am submitting the spark job it is throwing an error.
$ spark-submit --class WordCount --master local \ home/cloudera/workspace/sparksample/target/sparksample-0.0.1-SNAPSHOT.jar
I am getting the below error
java.lang.ClassNotFoundException: wordCount
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.util.Utils$.classForName(Utils.scala:175)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Edited
i am also adding my pom.xml so that you can help.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.igi.sparksample</groupId>
<artifactId>sparksample</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
</dependencies>
</project>
After trying so many combinations and doing a bit R&D i solved my issue.
Issue was in my spark submit command i changed it to this
spark-submit --class com.xxx.sparksample.WordCount --master local /home/cloudera/workspace/sparksample/target/sparksample-0.0.1-SNAPSHOT.jar
and it worked.
It can't find the WordCount class. You probably need to include the package that class is in, so you have the full classpath, ie:
--class <PACKAGE>.WordCount
The error you posted doesn't show any problem with Spark.
However, you must have a typo in your program. Java threw a ClassNotFoundException looking for wordCount, where it should most probably be WordCount, with a capital W.
Please check the names of your classes and your imports.
Make sure that the name of class (wordcount or WordCount or whatever...) that you pass to spark-submit is exactly similar to what you have defined.
Make sure that packaging is correct.
To verify, open/extract your jar and see the class name and the package hierarchy.

Using Oozie to create a hive table on hbase causes an error with libthrift?

I'm using an oozie hive action on cloudera (cdh 4) to create an hbase hive table. Running the create table command on my local dev util box executes without error. When I execute the same command via an oozie hive action in the cluster, I get this error:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, org.apache.thrift.EncodingUtils.setBit(BIZ)B
java.lang.NoSuchMethodError: org.apache.thrift.EncodingUtils.setBit(BIZ)B
at org.apache.hadoop.hive.ql.plan.api.Query.setStartedIsSet(Query.java:487)
at org.apache.hadoop.hive.ql.plan.api.Query.setStarted(Query.java:474)
at org.apache.hadoop.hive.ql.QueryPlan.updateCountersInQueryPlan(QueryPlan.java:309)
at org.apache.hadoop.hive.ql.QueryPlan.getQueryPlan(QueryPlan.java:450)
at org.apache.hadoop.hive.ql.QueryPlan.toString(QueryPlan.java:622)
at org.apache.hadoop.hive.ql.history.HiveHistory.logPlanProgress(HiveHistory.java:504)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1106)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:445)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:455)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:713)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:302)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:260)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:495)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Googling around, most answers said that this was due to different versions of thrift on hive, hbase, or hadoop; but as far as I can tell (using find -name in a shell action) they all have version 0.9.0:
Stdoutput ./lib/flume-ng/lib/libthrift-0.9.0.jar
Stdoutput ./lib/hcatalog/share/webhcat/svr/lib/libthrift-0.9.0.jar
Stdoutput ./lib/whirr/lib/libthrift-0.9.0.jar
Stdoutput ./lib/whirr/lib/libthrift-0.5.0.jar
Stdoutput ./lib/hive/lib/libthrift-0.9.0-cdh4-1.jar
Stdoutput ./lib/oozie/libserver/libthrift-0.9.0.jar
Stdoutput ./lib/oozie/libtools/libthrift-0.9.0.jar
Stdoutput ./lib/hbase/lib/libthrift-0.9.0.jar
Stdoutput ./lib/mahout/lib/libthrift-0.9.0.jar
These same versions are on my dev util box, and the hive command works fine. Any ideas what could be causing this issue?
Thanks in advance!
The issue was with a jar included in the workflow's lib directory. This jar had dependencies that had dependencies with an older version of thrift.
I was able to circumvent this by making the hive action happen in a sub workflow, then setting
<global>
<configuration>
<property>
<name>oozie.use.system.libpath</name>
<value>false</value>
</property>
<property>
<name>oozie.libpath</name>
<value>${wf:appPath()}/lib</value>
</property>
</configuration>
</global>
on the workflow. This essentially told it to use the lib in my subworkflow's directory, not the main workflow's lib (which included the bad jar).

Unable to configure hive.exec hooks due to missing jar

I am trying to use Hive and to switch databases using the 'use db' command. My setup is Hadoop 2.4.0 and Hive 0.13.1. I add the following 3 properties to a .settings file
set hive.exec.failure.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook;
set hive.exec.post.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook;
set hive.exec.pre.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook;
I then open hive command line, passing in the .settings file via 'hive -i my.settings' and then I get:
hive> use db;
hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook
FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
It seems there is a jar missing from my classpath. I tried searching the web for a jar containing "org.apache.hadoop.hive.ql.hooks.ATSHook" class, but have had no luck. I tried adding all paths with jars in them from HIVE_HOME to yarn-site.xml via:
<property>
<name>yarn.application.classpath</name>
<value>
...
/apps/hive/hive-0.13.1/hcatalog/share/hcatalog/*,
/apps/hive/hive-0.13.1/hcatalog/share/hcatalog/storage-handlers/hbase/lib/*,
/apps/hive/hive-0.13.1/hcatalog/share/webhcat/java-client/*,
/apps/hive/hive-0.13.1/hcatalog/share/webhcat/svr/lib/*,
/apps/hive/hive-0.13.1/lib/*
</value>
</property>
Still no luck. Does anyone know is there some additional step I need to do configure these properties?
Apparently the jar is only available in the, as yet unreleased, Hive 0.14.0. So I had to download and build Hive according to the directions on the Hive Wiki. Which is simply:
mvn clean install -DskipTests -Phadoop-2
Once that was built I was able to do this:
hive> add jar <HIVE_HOME>/ql/target
> ;
Or by adding this property to hive-site.xml
<property>
<name>hive.aux.jars.path</name>
<value>file:///<HIVE_HOME>/ql/target/hive-exec-0.14.0-SNAPSHOT.jar</value>
</property>
I also found a nice slide share presentation about plugins.

IncompatibleClassChangeError when calling getSplit hadoop 2.0.0-cdh4.0.0

I'm using the Cloudera-VM. Hadoop version: Hadoop 2.0.0-cdh4.0.0.
I have written an inputFileFormat, when the client calls the getSplits method I get an exception:
IncompatibleClassChangeError found interface org.apache.hadoop.mapreduce.JobContext expecting
I'm using the classes from the mapreduce package not mapred.
However when I look at the stacktrace I see that somewhere along the line the library changes to mapred:
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at com.hadoopApp.DataGeneratorFileInput.getSplits(DataGeneratorFileInput.java:27)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
at com.hadoopApp.HBaseApp.generateData(HBaseApp.java:54)
at com.hadoopApp.HBaseApp.run(HBaseApp.java:24)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.hadoopApp.HBaseApp.main(HBaseApp.java:19)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Not sure if this helps, but i'm using this in my maven pom:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
solved it not sure why
changed my pom to this and started working - not sure why it solved it though - your input is appreciated it:
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.0.0-cdh4.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.0.0-cdh4.2.0</version>
</dependency>
How can get around this?
I've bumped into the same problem when using Hipi on cdh4.2.0.
The problem is caused by incompatibilities between Hadoop versions (jobs build with Hadoop 1 may not work on Hadoop 2). Initially you were building the job with Hadoop v1 and running it on Hadoop 2.0.0 environment (cloudera uses Hadoop 2.0.0).
Fortunately, hadoop 1.x API is fully supported in Hadoop 2.x, so rebuilding the job with newer version of hadoop helps.

Resources