hcatalog with mapreduce - hadoop

I get the following error while executing a MapReduce program.
I have placed all jars in hadoop/lib directory and have also mentioned the jars in -libjars.
This is the cmd I am executing:
$HADOOP_HOME/bin/hadoop --config $HADOOP_HOME/conf jar /home/shash/distinct.jar HwordCount -libjars $LIB_JARS WordCount HWordCount2
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.hcatalog.mapreduce.HCatOutputFormat at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996) at
org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:248) at org.apache.hadoop.mapred.Task.initialize(Task.java:501) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:306) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at
org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.ClassNotFoundException: org.apache.hcatalog.mapreduce.HCatOutputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:423) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:356) at
java.lang.Class.forName0(Native Method) at
java.lang.Class.forName(Class.java:264) at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:994) ...
8 more

Make sure LIB_JARS is a comma-separated list (as opposed to colon-separated like CLASSPATH)

Applies To CDH 5.0.x CDH 5.1.x CDH 5.2.x CDH 5.3.x Sqoop
Cause Sqoop cannot pick up the HCatalog libraries because Cloudera
Manager does not set the HIVE_HOME environment. It needs to be set
manually.
This problem is tracked with below JIRA:
https://issues.apache.org/jira/browse/SQOOP-2145
The fix of this JIRA has been included in CDH since version 5.4.0.
Workaround: Applicable to CDH versions lower than 5.4.0.
Execute below commands in shell before calling Sqoop command or adding them to /etc/sqoop/conf/sqoop-env.sh (create one, if it does not already exists):
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive (for parcel installation)
export HIVE_HOME=/usr/lib/hive (for package installation)

Related

Oozie Fails - Running MRv1 on CDH 5.4.2

I'm trying to run an oozie workflow the should execute MRv1 hadoop job.
Started with Cloudera QuickStart VM 5.4.2-0. Configured it to use MRv1 (appended at the bottom).
But the workflow fails:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.MapReduceMain], main() threw exception, org/apache/hadoop/yarn/exceptions/YarnException
java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
at org.apache.oozie.action.hadoop.MapReduceMain.run(MapReduceMain.java:58)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:46)
at org.apache.oozie.action.hadoop.MapReduceMain.main(MapReduceMain.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:228)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.YarnException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
The strange thing is that when I look at the classpath, it is loading the jar files from the right location; namely /usr/lib/hadoop-0.20-mapreduce
-----configurations----
Configured OOzie to use MapReduce instead of yarn:
Changed config for Oozie in CDH by changing: MapReduce Service from YARN to MapReduce
Changed config for Oozie in by setting oozie-site.xml valve value:
<property>
<name>oozie.service.HadoopAccessorService.jobTracker.whitelist</name>
<value>quickstart.cloudera:8021</value>
</property>
Changed tomcat config using alternatives:
sudo alternatives --config oozie-tomcat-deployment -- chose tomcat-conf.http.mr1
Configured Hadoop to use MapReduce:
sudo alternatives --config hadoop-conf -- chose conf.cloudera.mapreduce
I had forgotten to update the actions to use MR1:
$ sudo -u oozie hadoop fs -rmr /user/oozie/share
sudo oozie-setup sharelib create -fs <FS_URI> -locallib /usr/lib/oozie/oozie-sharelib-mr1

Error in launching Spark REPL

I got pre-built Spark 1.4.1 and I'm running HDP 2.6. when I try to run spark-shell it gives me an error message as follows.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:111)
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:111)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:111)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:97)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:107)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
What is the issue?
ClassNotFoundException occurs when class loader could not find the
required class in class path . So , basically you should check your
class path and add the class in the classpath.
Check whether hadoop-common-0.21.0.jar is added to your classpath.
Is it possible that your Hadoop home is not set, as in here?
Cannot find hadoop installation: $HADOOP_HOME must be set or hadoop must be in the path

hbase mapreduce file not found exception

I have installed hadoop 2.4.1 and hbase 0.98.8 in 2 machines. When I run an hbase mapreduce job I get the below error:
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://pc1/opt/hbase-0.98.8-hadoop2/lib/hbase-server-0.98.8-hadoop2.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at thesis.test2.run(test2.java:93)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at thesis.test2.main(test2.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
I can run hadoop mapreduce jobs and simple hbase jobs without any problems. The code I m trying to run is an example that is supposed to run.
Please provide "jps" output.
Because it seems like your hbase is not working , hopefully the problem will be with zookeeper
I faced the exact problem. You have to add the hbase library path to the .bashrc file. Add the lib folder in hbase to the CLASSPATH.
Also, add the classpath of hbase to HADOOP_CLASSPATH.
Your .bashrc file should contain the following:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`${HBASE_HOME}/bin/hbase classpath`
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`${HBASE_HOME}/bin/hbase mapredcp`
export CLASSPATH=${HBASE_HOME}/lib/*
Note: The CLASSPATH should point to the lib folder of your hbase installation folder. Use the following to compile and run your java code.
javac Example.java
java -classpath $CLASSPATH:. Example

org.apache.nutch.crawl.Crawl missing in nutch 1.9 on hadoop 1.2.1

I have installed fully distributed Hadoop 1.2.1. I was trying to integrated nutch with steps below:
Download apache-nutch-1.9-src.zip
Add value http.agent.name into nutch-site.xml
Copy hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml,
masters, slaves into $NUTCH_HOME/conf
compile using ant runtime
create urls/seed.txt and put on hadoop dfs
edit $NUTCH_HOME/conf/regex-urlfilter.txt
Test crawl using command:
bin/hadoop -jar nutch-1.9.job org.apache.nutch.crawl.Crawl urls -dir urls -depth 1 -topN 5
and get this error:
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.nutch.crawl.Crawl
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I tried extract nutch-1.9.job and I didn't find out class Crawl in org/apache/nutch/crawl.
Do I need to config something?
Crawl.java removed at 1.8 version. You can use crawl shell script for all crawling.
Deprecated class o.a.n.crawl.Crawler is still in code base https://issues.apache.org/jira/browse/NUTCH-1621

ERROR in ImportTsv-HBASE BULK LOAD

I have hbase 0.94.0. I tried doing bulk import using the importtsv tool.
Here is the command i gave
./hadoop jar /home/ericsson/Desktop/ProjectFiles/hbase-0.94.0/hbase-0.94.0.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,a,b,c,d,e,f,g '-Dimporttsv.separator=,' Test1 /home/ericsson/Desktop/ProjectFiles/inputFiles1/CharginUsage-m-00000
Test1-My table that already exists in Hbase.
/home/ericsson/Desktop/ProjectFiles/inputFiles1/CharginUsage-m-00000- My directory where i have the CSV file.
I got the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/collect/Multimap
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.Multimap
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 6 more
importtsv task needs Google's Guava library in order run. This library is present under $HBASE_HOME/lib/guava-.jar
It is matter of telling hadoop to fetch this guava jar during execution. Simply you could copy the jar from hbase lib to hadoop lib. A more decent solution is to add this jar path to hadoop classpath or execute the hadoop task with the below command.
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/lib/guava-<version>.jar
OR
export HADOOP_CLASSPATH=`hbase classpath ` /hadoop jar /home/ericsson/Desktop/ProjectFiles/hbase-0.94.0/hbase-0.94.0.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,a,b,c,d,e,f,g '-Dimporttsv.separator=,' Test1 /home/ericsson/Desktop/ProjectFiles/inputFiles1/CharginUsage-m-00000*

Resources