issues with Oozie and Sqoop Export - hadoop

I am trying to do an Sqoop export, the sqoop command works just fine in the local Servers, however, when I try to use the same command as an Oozie action, I am getting the following error, any help would be appreciated.
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], main() threw exception, org.apache.hadoop.hive.ql.io.AcidUtils.isTablePropertyTransactional(Ljava/util/Map;)Z
java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.io.AcidUtils.isTablePropertyTransactional(Ljava/util/Map;)Z
at org.apache.hive.hcatalog.mapreduce.FosterStorageHandler.configureInputJobProperties(FosterStorageHandler.java:134)
at org.apache.hive.hcatalog.common.HCatUtil.getInputJobProperties(HCatUtil.java:458)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.extractPartInfo(InitializeInput.java:161)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:137)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:88)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:349)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:433)
at org.apache.sqoop.manager.SQLServerManager.exportTable(SQLServerManager.java:192)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:179)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:48)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:240)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://EAP-DR/user/sps_hpe_ibproetl/oozie-oozi/0005868-200326034412660-oozie-oozi-W/sqoop_2--sqoop/action-data.seq
7378 [main] INFO org.apache.hadoop.io.compress.zlib.ZlibFactory - Successfully loaded & initialized native-zlib library
7379 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.deflate]
Successfully reset security manager from org.apache.oozie.action.hadoop.LauncherSecurityManager#4440750 to null
Oozie Launcher ends

When you run sqoop command from local it uses jars from /usr/lib/sqoop/lib.
And when you use oozie sqoop action it uses jars from hdfs:///user/oozie/share/lib/lib_*/sqoop/
Now looking at your error, you either are
Missing org.apache.hadoop.hive.ql.io.AcidUtils on the classpath
Have the wrong version of org.apache.hadoop.hive.ql.io.AcidUtils on the classpath
Have multiple versions of org.apache.hadoop.hive.ql.io.AcidUtils on the classpath

Related

warehouse dir argument and re-attempt of map reduce task

I am using warehouse-dir argument for a reason and not using target-dir in my sqoop job. If Map-reduce task is re-attempted, it throws error given below.
How do I fix this?
Since it is only a re-attempt, it makes no difference if I delete directoy before execution of task.
from oozie logs:
Fetching child yarn jobs
tag id : oozie-8b2856ac4965b4431c0e9a9b80fb2bda
2018-05-23 22:45:15,153 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at ip.compute.internal/172.31.4.192:8032
Child yarn jobs are found - application_1527093425690_0231
Found [1] Map-Reduce jobs from this launcher
Killing existing jobs and starting over:
error message: output directory already exists:
2018-05-23 22:45:26,374 [main] ERROR org.apache.sqoop.tool.ImportTool - Import failed: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /path_to_directory_in_hdfs/already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:270)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:203)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:176)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:273)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:513)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:243)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:298)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:196)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:179)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:60)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:48)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:234)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
<<< Invocation of Sqoop command completed <<<
adding --delete-target-dir will delete the directory if exists already

IllegalStateException when running spark wordcount example

Spark version 1.6.2
Hadoop version 2.7.3
when running spark on standalone cluster mode
Command wordcount example:
spark-submit --class org.apache.spark.examples.JavaWordCount --master spark://IP:7077 spark-examples-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar file.txt output
getting following error
INFO cluster.SparkDeploySchedulerBackend: Executor app-20161125052710-0012/10 removed: java.io.IOException: Failed to create directory /usr/hdp/2.5.0.0-1245/spark/work/app-20161125052710-0012/10
ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:44)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
The currently active SparkContext was created at:
(No active SparkContext.)
at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106)
at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1602)
at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2203)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:579)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/11/25 04:24:48 INFO spark.SparkContext: SparkContext already stopped.
in spark master node url, I see to two workers in ALIVE state
seems like Failed to create directory /usr/hdp/2.5.0.0-1245/spark/work was the root cause. After giving permission to /usr/hdp/2.5.0.0-1245/spark/work path, it worked fine

When spark-shell launch, It has RuntimeException for SimpleUserGroupsMapping

I install HDFS, YARN through Ambari and try to deploy spark on yarn.
But When I execute follow script, Spark has error
How to deploy spark on yarn.
Would you mind explaining how to deploy spark on yarn step by step?
I set HADOOP_CONF_DIR, YARN_CONF_DIR in spark-env.sh and spark.master in spark-defaults.conf.
execute script
./bin/spark-shell --master yarn-client
Error
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.security.SimpleUserGroupsMapping not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2106)
at org.apache.hadoop.security.Groups.<init>(Groups.java:70)
at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2136)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2136)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2136)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:214)
at org.apache.spark.repl.SparkIMain.<init>(SparkIMain.scala:118)
at org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.<init>(SparkILoop.scala:187)
at org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:217)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:949)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.security.SimpleUserGroupsMapping not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2098)
... 33 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.security.SimpleUserGroupsMapping not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
... 34 more
16/02/19 22:07:20 INFO util.ShutdownHookManager: Shutdown hook called
16/02/19 22:07:20 INFO util.ShutdownHookManager: Deleting directory
Check if the class is present in your hadoop classpath.
find $HADOOP_HOME/* -name *.jar -print |xargs grep "org.apache.hadoop.security.SimpleUserGroupsMapping" -0
if present then check if the class is present in spark distribution
grep "org.apache.hadoop.security.SimpleUserGroupsMapping" $SPARK_HOME/lib/*
If the jar is present in hadoop distribution try copy it to $SPARK_HOME/lib/.
If none of the above works try changing
hadoop.security.group.mapping org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback
in core-site.xml and restart hadoop and spark.

ERROR : org.apache.oozie.action.hadoop.PigMain not found

I'm trying to execute a simple pig script through oozie workflow which imports a python jar and as well as some other jar and eventually getting error like:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], exception invoking main(), java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.PigMain not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.PigMain not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:224)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.PigMain not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 9 more
Oozie Launcher failed, finishing Hadoop job gracefully
and for this workflow i added all jars in lib directory including pig.jar .
Please check the Pig Jar should be present in Physical location of the Node where the Oozie Workflow is running.
Also You can plase the Pig jar in hadoop location of Oozie Shared Lib, and pass parameter
oozie.use.system.libpath = true
these will read the jar from Shared Lib Location

ERROR in ImportTsv-HBASE BULK LOAD

I have hbase 0.94.0. I tried doing bulk import using the importtsv tool.
Here is the command i gave
./hadoop jar /home/ericsson/Desktop/ProjectFiles/hbase-0.94.0/hbase-0.94.0.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,a,b,c,d,e,f,g '-Dimporttsv.separator=,' Test1 /home/ericsson/Desktop/ProjectFiles/inputFiles1/CharginUsage-m-00000
Test1-My table that already exists in Hbase.
/home/ericsson/Desktop/ProjectFiles/inputFiles1/CharginUsage-m-00000- My directory where i have the CSV file.
I got the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/collect/Multimap
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.Multimap
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 6 more
importtsv task needs Google's Guava library in order run. This library is present under $HBASE_HOME/lib/guava-.jar
It is matter of telling hadoop to fetch this guava jar during execution. Simply you could copy the jar from hbase lib to hadoop lib. A more decent solution is to add this jar path to hadoop classpath or execute the hadoop task with the below command.
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/lib/guava-<version>.jar
OR
export HADOOP_CLASSPATH=`hbase classpath ` /hadoop jar /home/ericsson/Desktop/ProjectFiles/hbase-0.94.0/hbase-0.94.0.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,a,b,c,d,e,f,g '-Dimporttsv.separator=,' Test1 /home/ericsson/Desktop/ProjectFiles/inputFiles1/CharginUsage-m-00000*

Resources