Sqoop Export Oozie Workflow Fails with File Not Found, Works when ran from the console - hadoop

I have a hadoop cluster with 6 nodes. I'm pulling data out of MSSQL and back into MSSQL via Sqoop. Sqoop import commands work fine, and I can run a sqoop export command from the console (on one of the hadoop nodes). Here's the shell script I run:
SQLHOST=sqlservermaster.local
SQLDBNAME=db1
HIVEDBNAME=db1
BATCHID=
USERNAME="sqlusername"
PASSWORD="password"
sqoop export --connect 'jdbc:sqlserver://'$SQLHOST';username='$USERNAME';password='$PASSWORD';database='$SQLDBNAME'' --table ExportFromHive --columns col1,col2,col3 --export-dir /apps/hive/warehouse/$HIVEDBNAME.db/hivetablename
When I run this command from an oozie workflow, and it's passed the same parameters, I receive the error (when digging into the actual job run logs from the yarn scheduler screen):
**2015-10-01 20:55:31,084 WARN [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Job init failed
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://hadoopnode1:8020/user/root/.staging/job_1443713197941_0134/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1568)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1432)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1390)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1312)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1080)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1519)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://hadoopnode1:8020/user/root/.staging/job_1443713197941_0134/job.splitmetainfo
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1563)
... 17 more**
Has anyone ever seen this and been able to troubleshoot it? It only happens from the oozie workflow. There are similar topics but no one seems to have solved this specific problem.
Thanks!

I was able to solve this problem by setting the user.name property on the job.properties file for the oozie workflow to the user yarn.
user.name=yarn
I think the problem was it did not have permission to create the staging files under /user/root. Once I modified the running user to yarn, the staging files were created under /user/yarn which did have the proper permission.

Related

mapr windows client not working

I am trying to install mapr windows client. I have followed all the steps outlined in the mapr windows client installation. I have copied the ssl_truststore file from our cluster into the C:\opt\mapr\conf folder and ran the configure.bat file. It ran without any errors and I even verified the C:\opt\mapr\conf\mapr-clusters.conf with updated cluster name and CLDB nodes.
But however when i run the following command by changing to folder c:\opt\mapr\hadoop\hadoop-2.7.0\bin
hadoop fs -ls /
I get the following error
18/01/19 14:05:07 ERROR cldbutils.CLDBRpcCommonUtils: Exception during init
java.lang.UnsatisfiedLinkError: com.mapr.security.JNISecurity.SetClusterOption(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)I
at com.mapr.security.JNISecurity.SetClusterOption(Native Method)
at com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils.init(CLDBRpcCommonUtils.java:163)
at com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils.<init>(CLDBRpcCommonUtils.java:73)
at com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils.<clinit>(CLDBRpcCommonUtils.java:63)
at org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:69)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2147)
at org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2362)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2579)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2531)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2444)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1156)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1128)
at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1464)
at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:321)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:487)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
Exception in thread "main" java.lang.UnsatisfiedLinkError: com.mapr.security.JNISecurity.SetParsingDone()V
at com.mapr.security.JNISecurity.SetParsingDone(Native Method)
at com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils.init(CLDBRpcCommonUtils.java:231)
at com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils.<init>(CLDBRpcCommonUtils.java:73)
at com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils.<clinit>(CLDBRpcCommonUtils.java:63)
at org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:69)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2147)
at org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2362)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2579)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2531)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2444)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1156)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1128)
at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1464)
at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:321)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:487)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
We use java 8 and windows 7.
I am stuck with this issue for a while. I tried all the possible options but was not successful. Any help is greatly appreciated.
These are steps :
1. Install Mapr applications
Configure MapR client
/opt/mapr/server/configure.sh -N poc2.cibdatahub.com -c -C (Ur cluster name)
Upload the missing jar from MapR cluster node to Edge node under:
/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/
Create a local mapr account if you don't have one:
Example: useradd mapr
Passwd mapr
Set environment for user mapr
su mapr
vi ~/.bashrc
#append line below
export HADOOP_HOME=/opt/mapr/hadoop/hadoop-2.7.0
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export LD_LIBRARY_PATH=$HADOOP_COMMON_LIB_NATIVE_DIR
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
#HIVE home directory configuration
export HIVE_HOME=/opt/mapr/hive/hive-2.1
export PATH="$PATH:$HIVE_HOME/bin"
export HADOOP_USER_NAME="USERname"
# load environment
source ~/.bashrc
Edit /opt/mapr/spark/spark-2.1.0/conf/spark-defaults.conf by adding
spark.yarn.archive maprfs:///apps/spark/jars/spark-jars.zip
Verify the cluster setting
/opt/mapr/conf/mapr-clusters.conf has option secure=false
Test
hadoop fs -Dfs.mapr.trace -ls
Please try these options.

Permission Issue : while running spark job from shell action in Oozie

I am trying to run spark job on Yarn cluster in "yarn-client" mode; using Oozie shell action to issue spark-submit command. Note that oozie job is triggered by logged in user ("my-user"), but i get Execute permission issue.Please refer log below
For more detailed output, check application tracking page:http://physrv3:8088/proxy/application_1452252389874_0018/Then, click on links to logs of each attempt.
Diagnostics: Permission denied: user=my-user, access=EXECUTE, inode="/user/yarn/.sparkStaging/application_1452252389874_0018":yarn:hdfs:drwx------
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6609)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4223)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:894)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getFileInfo(AuthorizationProviderProxyClientProtocol.java:526)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:822)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
I would suggest you to use a java action as described Spark with Java Action if you have Oozie version smaller than 4.2.
Otherwise, if you have admin rights to the oozie server, upgrade to Oozie version >= 4.2.x where there is a native support for Spark actions.

Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)

I am running a hive query throwh oozie using hue..
I am creating a table through hue-oozie work flow...
My job is failing but when I check in hive the table is created.
Log shows below error:
16157 [main] INFO org.apache.hadoop.hive.ql.hooks.ATSHook - Created ATS Hook
2015-09-24 11:05:35,801 INFO [main] hooks.ATSHook (ATSHook.java:<init>(84)) - Created ATS Hook
16159 [main] ERROR org.apache.hadoop.hive.ql.Driver - hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
2015-09-24 11:05:35,803 ERROR [main] ql.Driver (SessionState.java:printError(960)) - hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
16159 [main] ERROR org.apache.hadoop.hive.ql.Driver - FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)
java.lang.ClassNotFoundException: org.apache.atlas.hive.hook.HiveHook
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
Not able to identify the issue....
I am usig HDP 2.3.1
Basically this error is due to missing atlas jar in oozie share lib.
In HDP the Atlas jar is available in /usr/hdp/2.3.0.0-2557/atlas/
Put all the jars related to atlas in hadoop share lib ..
hadoop fs -put /usr/hdp/2.3.0.0-2557/atlas/hook/hive/* /user/oozie/share/lib/lib200344/hive
Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh .
Copy <atlas package>/conf/application.propertiesto hive conf directory.
Restart the oozie services. This will solve this problem. If anybody face the problem please comment here so that I can help.
[Comment by Immo Huneke: when using the Hortonworks sandbox VM, I found that just putting the jar files in the share/lib folder under HDFS was enough to resolve the problem. I didn't have to update hive-env.sh or copy the application.properties file. But check the exact path of your share/lib folder by executing the command hdfs dfs -ls /user/oozie/share/lib before copying.]
hive>add jar /usr/hdp//atlas/hook/hive/hive-bridge-${VERSION}.jar
it will be ok.
hope help for u.
It Seems You CLASS is not found exception.
Have you installed Oozie Sharedlib, if Yes, please update all the hive dependent Jar in the sharedLib Location, and check if the status
Also check if Hive Client is available in all the Nodes under the cluster and same should be running
​I tried each and every possible solution mentioned in this forum and in stackoverflow, but it did not resolve my issue.
Finally, I resolved it by copying all the jars in /hook/hive to lib (create a new lib folder at job.properties level) folder of my oozie workflow

Job via Oozie HDP 2.1 not creating job.splitmetainfo

When trying to execute a sqoop job which has my Hadoop program passed as a jar file in -jarFiles parameter, the execution blows off with below error. Any resolution seems to be not available. Other jobs with same Hadoop user is getting executed successfully.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/user/root/.staging/job_1423050964699_0003/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1541)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1396)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1363)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:976)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:135)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1241)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1041)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1452)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1448)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1381)
So here is the way I solved it. We are using CDH5 to run Camus to pull data from kafka. We run CamusJob which is responsible for getting data from kafka using comman line:
hadoop jar...
The problem is that new hosts didn't get so-called "yarn-gateway". Cloudera names pack of configs related to service and copied to /etc/hadoop/conf
as "gateway". So I just clicked "deploy client configuration" in CM UI. YARN client conf has been copied to each YARN NodeManager node and it solved problem.

How to set User/Group permission with Hadoop/Kerberos Setup?

I am trying to setup Hadoop with Kerberos
I am following the CDH3 Security Guide.
Things went pretty well so far (HFDS works ok etc), but I am getting the following error when I try to submit the Job.
I run HDFS server as user HDFS and Hadoop as user called mapred. I Submit the job using user called bob, who is in mapred group.
Following are values I have for taskcontroller.cfg
mapred.local.dir=/opt/hadoop-work/local/
hadoop.log.dir=/opt/hadoop-1.0.3/logs
mapreduce.tasktracker.group=mapred
min.user.id=1000
Error I am getting is
java.io.IOException: Job initialization failed (24) with output: Reading task controller config from /etc/hadoop/taskcontroller.cfg
Can't get group information for mapred - Success.
at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:192)
at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1228)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1203)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1118)
at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2430)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:185)
... 8 more
Error always comes with value given to "mapreduce.tasktracker.group=mapred" in the taskcontroller.cfg.
I have been debugging and looking in, and I think the problem is I have setup the permission among different users and groups wrong.
Any help is greatly appreciated.

Resources