Oozie Fails - Running MRv1 on CDH 5.4.2 - hadoop

I'm trying to run an oozie workflow the should execute MRv1 hadoop job.
Started with Cloudera QuickStart VM 5.4.2-0. Configured it to use MRv1 (appended at the bottom).
But the workflow fails:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.MapReduceMain], main() threw exception, org/apache/hadoop/yarn/exceptions/YarnException
java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
at org.apache.oozie.action.hadoop.MapReduceMain.run(MapReduceMain.java:58)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:46)
at org.apache.oozie.action.hadoop.MapReduceMain.main(MapReduceMain.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:228)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.YarnException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
The strange thing is that when I look at the classpath, it is loading the jar files from the right location; namely /usr/lib/hadoop-0.20-mapreduce
-----configurations----
Configured OOzie to use MapReduce instead of yarn:
Changed config for Oozie in CDH by changing: MapReduce Service from YARN to MapReduce
Changed config for Oozie in by setting oozie-site.xml valve value:
<property>
<name>oozie.service.HadoopAccessorService.jobTracker.whitelist</name>
<value>quickstart.cloudera:8021</value>
</property>
Changed tomcat config using alternatives:
sudo alternatives --config oozie-tomcat-deployment -- chose tomcat-conf.http.mr1
Configured Hadoop to use MapReduce:
sudo alternatives --config hadoop-conf -- chose conf.cloudera.mapreduce

I had forgotten to update the actions to use MR1:
$ sudo -u oozie hadoop fs -rmr /user/oozie/share
sudo oozie-setup sharelib create -fs <FS_URI> -locallib /usr/lib/oozie/oozie-sharelib-mr1

Related

issues with Oozie and Sqoop Export

I am trying to do an Sqoop export, the sqoop command works just fine in the local Servers, however, when I try to use the same command as an Oozie action, I am getting the following error, any help would be appreciated.
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], main() threw exception, org.apache.hadoop.hive.ql.io.AcidUtils.isTablePropertyTransactional(Ljava/util/Map;)Z
java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.io.AcidUtils.isTablePropertyTransactional(Ljava/util/Map;)Z
at org.apache.hive.hcatalog.mapreduce.FosterStorageHandler.configureInputJobProperties(FosterStorageHandler.java:134)
at org.apache.hive.hcatalog.common.HCatUtil.getInputJobProperties(HCatUtil.java:458)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.extractPartInfo(InitializeInput.java:161)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:137)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:88)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:349)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:433)
at org.apache.sqoop.manager.SQLServerManager.exportTable(SQLServerManager.java:192)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:179)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:48)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:240)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://EAP-DR/user/sps_hpe_ibproetl/oozie-oozi/0005868-200326034412660-oozie-oozi-W/sqoop_2--sqoop/action-data.seq
7378 [main] INFO org.apache.hadoop.io.compress.zlib.ZlibFactory - Successfully loaded & initialized native-zlib library
7379 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.deflate]
Successfully reset security manager from org.apache.oozie.action.hadoop.LauncherSecurityManager#4440750 to null
Oozie Launcher ends
When you run sqoop command from local it uses jars from /usr/lib/sqoop/lib.
And when you use oozie sqoop action it uses jars from hdfs:///user/oozie/share/lib/lib_*/sqoop/
Now looking at your error, you either are
Missing org.apache.hadoop.hive.ql.io.AcidUtils on the classpath
Have the wrong version of org.apache.hadoop.hive.ql.io.AcidUtils on the classpath
Have multiple versions of org.apache.hadoop.hive.ql.io.AcidUtils on the classpath

HiveServer: ClassNotFound (HiveServer)

first of all: I am new to Hive.
I just installed Hive and when I run "hive" the server starts up and brings me into the CLI. But when I try to start it as a service/server with "hive --service hiveserver" I get:
Starting Hive Thrift Server
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.hive.service.HiveServer
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Two questions:
Is that the right way to start up a Hive server? (My understanding is that HCat and WebHCat are included automatically!?)
Why does this problem show up and how can I resolve it?
Thanks and regards!
Try to run bin/hive --service hiveserver2 instead of hive --service hiveserver for this version of apache hive

org.apache.nutch.crawl.Crawl missing in nutch 1.9 on hadoop 1.2.1

I have installed fully distributed Hadoop 1.2.1. I was trying to integrated nutch with steps below:
Download apache-nutch-1.9-src.zip
Add value http.agent.name into nutch-site.xml
Copy hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml,
masters, slaves into $NUTCH_HOME/conf
compile using ant runtime
create urls/seed.txt and put on hadoop dfs
edit $NUTCH_HOME/conf/regex-urlfilter.txt
Test crawl using command:
bin/hadoop -jar nutch-1.9.job org.apache.nutch.crawl.Crawl urls -dir urls -depth 1 -topN 5
and get this error:
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.nutch.crawl.Crawl
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I tried extract nutch-1.9.job and I didn't find out class Crawl in org/apache/nutch/crawl.
Do I need to config something?
Crawl.java removed at 1.8 version. You can use crawl shell script for all crawling.
Deprecated class o.a.n.crawl.Crawler is still in code base https://issues.apache.org/jira/browse/NUTCH-1621

hcatalog with mapreduce

I get the following error while executing a MapReduce program.
I have placed all jars in hadoop/lib directory and have also mentioned the jars in -libjars.
This is the cmd I am executing:
$HADOOP_HOME/bin/hadoop --config $HADOOP_HOME/conf jar /home/shash/distinct.jar HwordCount -libjars $LIB_JARS WordCount HWordCount2
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.hcatalog.mapreduce.HCatOutputFormat at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996) at
org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:248) at org.apache.hadoop.mapred.Task.initialize(Task.java:501) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:306) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at
org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.ClassNotFoundException: org.apache.hcatalog.mapreduce.HCatOutputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:423) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:356) at
java.lang.Class.forName0(Native Method) at
java.lang.Class.forName(Class.java:264) at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:994) ...
8 more
Make sure LIB_JARS is a comma-separated list (as opposed to colon-separated like CLASSPATH)
Applies To CDH 5.0.x CDH 5.1.x CDH 5.2.x CDH 5.3.x Sqoop
Cause Sqoop cannot pick up the HCatalog libraries because Cloudera
Manager does not set the HIVE_HOME environment. It needs to be set
manually.
This problem is tracked with below JIRA:
https://issues.apache.org/jira/browse/SQOOP-2145
The fix of this JIRA has been included in CDH since version 5.4.0.
Workaround: Applicable to CDH versions lower than 5.4.0.
Execute below commands in shell before calling Sqoop command or adding them to /etc/sqoop/conf/sqoop-env.sh (create one, if it does not already exists):
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive (for parcel installation)
export HIVE_HOME=/usr/lib/hive (for package installation)

FileNotFoundException for hive/lib/hive-builtins-0.9.0.jar in Hive

I've a configured Hadoop 0.23 on my local box and got it work with a simple map-reduce wordcount program. I Have configured Hive to work with it. All the DDL queries works fine. But when i fire queries that have aggregates (which will trigger Map-educe jobs)
java.io.FileNotFoundException: File does not exist: /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:738)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:604)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:604)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:693)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar)'
You should create the same file /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar in your hadoop file system. Then it should work.
Make sure you have the jar in HDFS.

Resources