Job via Oozie HDP 2.1 not creating job.splitmetainfo - hadoop

When trying to execute a sqoop job which has my Hadoop program passed as a jar file in -jarFiles parameter, the execution blows off with below error. Any resolution seems to be not available. Other jobs with same Hadoop user is getting executed successfully.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/user/root/.staging/job_1423050964699_0003/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1541)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1396)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1363)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:976)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:135)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1241)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1041)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1452)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1448)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1381)

So here is the way I solved it. We are using CDH5 to run Camus to pull data from kafka. We run CamusJob which is responsible for getting data from kafka using comman line:
hadoop jar...
The problem is that new hosts didn't get so-called "yarn-gateway". Cloudera names pack of configs related to service and copied to /etc/hadoop/conf
as "gateway". So I just clicked "deploy client configuration" in CM UI. YARN client conf has been copied to each YARN NodeManager node and it solved problem.

Related

What is this error on spark-submit by HDFS HA yarn

here is my error log:
$ /spark-submit --master yarn --deploy-mode cluster pi.py
...
2021-12-23 01:31:04,330 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1954)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1442)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1895)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:860)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:526)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
, while invoking ClientNamenodeProtocolTranslatorPB.setPermission over master/172.17.0.2:8020. Trying to failover immediately.
...
Why I get this erorr??
NOTE. Spark master is run 'master', so spark-submit command run in 'master'
NOTE. Spark worker is run 'worker1' and 'worker2' and 'worker3'
NOTE. ResourceManager run in 'master' and 'master2'
ADD. When print above error log, master2's DFSZKFailoverController is disappeard to jps command result.
ADD. When print above error log, master's Namenode is disappeard to jps command result.
It happens when Spark is unable to access HDFS.
If configured correctly HDFS client will handle the StandbyException by attempting to fail itself over to the other NameNode in the HA, and then it will reattempt the operation.
Replace active Namenode URI manually and check if you are still having the same error, if not HA is not properly configured.

Configuring Prometheus JMX exporter for Hadoop2

I am trying to scrape metrics from following Hadoop2 daemons running on ec2 instance using Prometheus JMX exporter:
hadoop namenode
hadoop datanode
yarn resourcemanager
yarn nodemanager
I am trying to run JMX exporter as a java agent with all the four daemons. For this I have added EXTRA_JAVA_OPTS in hadoop-env.sh and yarn-env.sh :
export HADOOP_NAMENODE_OPTS="$HADOOP_NAMENODE_OPTS -javaagent:/home/ec2-user/jmx_exporter/jmx_prometheus_javaagent-0.10.jar=9102:/home/ec2-user/jmx_exporter/prometheus_config.yml"
export HADOOP_DATANODE_OPTS="$HADOOP_DATANODE_OPTS -javaagent:/home/ec2-user/jmx_exporter/jmx_prometheus_javaagent-0.10.jar=9102:/home/ec2-user/jmx_exporter/prometheus_config.yml"
export YARN_RESOURCEMANAGER_OPTS="$YARN_RESOURCEMANAGER_OPTS -javaagent:/home/ec2-user/jmx_exporter/jmx_prometheus_javaagent-0.10.jar=9102:/home/ec2-user/jmx_exporter/prometheus_config.yml"
export YARN_NODEMANAGER_OPTS="$YARN_NODEMANAGER_OPTS -javaagent:/home/ec2-user/jmx_exporter/jmx_prometheus_javaagent-0.10.jar=9102:/home/ec2-user/jmx_exporter/prometheus_config.yml"
Sample prometheus_config.yml for a resourcemanager metric NumAllSources is as follows :
rules:
- pattern: Hadoop<service=ResourceManager, name=MetricsSystem, sub=Stats><>NumAllSources
name: sources
labels:
app_id: "hadoop_rm"
I am getting the following exception when I restart the resourcemanager or other daemons with new configs and java_opts :
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:382)
at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:397)
Caused by: java.lang.IllegalArgumentException: Collector already registered that provides name: jmx_scrape_duration_seconds
at io.prometheus.jmx.shaded.io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:54)
at io.prometheus.jmx.shaded.io.prometheus.client.Collector.register(Collector.java:128)
Any suggestions how to fix this?
While #chanhou's solution will work, I wanted to keep my edits in hadoop-env.sh, so I went with
if ! grep -q <<<"$HADOOP_NAMENODE_OPTS" jmx_prometheus_javaagent; then
HADOOP_NAMENODE_OPTS="$HADOOP_NAMENODE_OPTS -javaagent:/home/caesarli/platform/jmx_prometheus_javaagent-0.12.0.jar=11099:/home/caesarli/platform/hadoop-2.8.4/etc/hadoop/jmx-name.yaml"
fi
and similar for the HADOOP_DATANODE_OPTS.
That is because -javaagent opts is declare multiple times in $HADOOP_OPTS when you call /usr/local/hadoop/sbin/hadoop-daemon.sh start datanode and hadoop-daemon.sh will eventually call /usr/local/hadoop/bin/hdfs to start related service.
During the process, it will source multiple times of hadoop-config.sh and if you echo $HADOOP_OPTS in the shell script /usr/local/hadoop/bin/hdfs then you will find multiple -javaagent there.
A workaround is declare HADOOP_OPTS=$HADOOP_OPTS -javaagent:... in /usr/local/hadoop/bin/hdfs to ensure only one -javaagent appear in HADOOP_OPTS
I think this is because you are using same ports (9102) for all the registrations, changing ports will help.

Permission Issue : while running spark job from shell action in Oozie

I am trying to run spark job on Yarn cluster in "yarn-client" mode; using Oozie shell action to issue spark-submit command. Note that oozie job is triggered by logged in user ("my-user"), but i get Execute permission issue.Please refer log below
For more detailed output, check application tracking page:http://physrv3:8088/proxy/application_1452252389874_0018/Then, click on links to logs of each attempt.
Diagnostics: Permission denied: user=my-user, access=EXECUTE, inode="/user/yarn/.sparkStaging/application_1452252389874_0018":yarn:hdfs:drwx------
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6609)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4223)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:894)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getFileInfo(AuthorizationProviderProxyClientProtocol.java:526)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:822)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
I would suggest you to use a java action as described Spark with Java Action if you have Oozie version smaller than 4.2.
Otherwise, if you have admin rights to the oozie server, upgrade to Oozie version >= 4.2.x where there is a native support for Spark actions.

Sqoop Export Oozie Workflow Fails with File Not Found, Works when ran from the console

I have a hadoop cluster with 6 nodes. I'm pulling data out of MSSQL and back into MSSQL via Sqoop. Sqoop import commands work fine, and I can run a sqoop export command from the console (on one of the hadoop nodes). Here's the shell script I run:
SQLHOST=sqlservermaster.local
SQLDBNAME=db1
HIVEDBNAME=db1
BATCHID=
USERNAME="sqlusername"
PASSWORD="password"
sqoop export --connect 'jdbc:sqlserver://'$SQLHOST';username='$USERNAME';password='$PASSWORD';database='$SQLDBNAME'' --table ExportFromHive --columns col1,col2,col3 --export-dir /apps/hive/warehouse/$HIVEDBNAME.db/hivetablename
When I run this command from an oozie workflow, and it's passed the same parameters, I receive the error (when digging into the actual job run logs from the yarn scheduler screen):
**2015-10-01 20:55:31,084 WARN [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Job init failed
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://hadoopnode1:8020/user/root/.staging/job_1443713197941_0134/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1568)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1432)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1390)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1312)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1080)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1519)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://hadoopnode1:8020/user/root/.staging/job_1443713197941_0134/job.splitmetainfo
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1563)
... 17 more**
Has anyone ever seen this and been able to troubleshoot it? It only happens from the oozie workflow. There are similar topics but no one seems to have solved this specific problem.
Thanks!
I was able to solve this problem by setting the user.name property on the job.properties file for the oozie workflow to the user yarn.
user.name=yarn
I think the problem was it did not have permission to create the staging files under /user/root. Once I modified the running user to yarn, the staging files were created under /user/yarn which did have the proper permission.

hadoop distcp fail over hftp protocol

I want to use distcp over hftp protocol to copy file from cdh3 and cdh4.
The command is like:
hadoop distcp hftp://cluster1:50070/folder1 hdfs://cluster2/folder2
But the job fails due to some http connection error from jobtracker UI
INFO org.apache.hadoop.tools.DistCp: FAIL test1.dat : java.io.IOException: HTTP_OK expected, received 503
*at org.apache.hadoop.hdfs.HftpFileSystem$RangeHeaderUrlOpener.connect(HftpFileSystem.java:376)
at org.apache.hadoop.hdfs.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119)
at org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103)
at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:187)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:424)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:547)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:314)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)*
Most files in folder1 will be copied to folder2 except some files fail due to the exception above.
Anyone has the same problem with me, and how to solve this problem?
Thanks in advance.
HFTP uses HTTP web server on datanodes to get data. Check if this HTTP web server is working on all the datanodes or not. I got this exact error and after debugging I found out this web server on some data nodes wasnt started due to some corrupt jar file.
This webserver is started when you start a datanode. You can check initial 500 lines of datanode log to see if thi webserver is starting or not.
Is your Hadoop cluster cluster1 and cluster2 running same version of Hadoop? What's the detail release version?
Any security setting you enabled on Hadoop?
HTTP return code 503 is server temporarily unavailable, is there any network issue happened during your copy?

Resources