Oozie Jobs NOT Running - Getting SUSPENDED

Oozie Jobs NOT Running - Getting SUSPENDED - hadoop

I am running Hadoop with Oozie in pseudo-mode ( I am not using any distribution of hadoop per say by CDH or Hortonworks etc.,). I have having the following configuration while running - Fedora 22 VM running on VirtualBox, RAM allocated 4GB, Hadoop 2.7, Oozie 4.2
After I submit the example Mapreduce job for OOZIE it gets SUSPENDED with the Job error below,
2015-10-29 15:44:59,048 WARN ActionStartXCommand:523 - SERVER[hadoop] USER[hadoop] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000000-151029154441128-OOZIE-VB-W] ACTION[0000000-151029154441128-OOZIE-VB-W#mr-node] Error starting action [mr-node]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory < 0, or requested memory > max configured, requestedMemory=2048, maxMemory=1024
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)]
org.apache.oozie.action.ActionExecutorException: JA009: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory < 0, or requested memory > max configured, requestedMemory=2048, maxMemory=1024
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:456)
at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:440)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1132)
at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1286)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:250)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:321)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:250)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I think this is something got to do with the Memory allocations to the MapReduce jobs , but I am not able to figure out the exact math behind this. Help on this is much appreciated.
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>http://localhost:50031</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>http://localhost:50030</value>
</property>
<property>
<name>mapreduce.jobtracker.jobhistory.location</name>
<value>/home/osboxes/hadoop/logs/jobhistory</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>http://localhost:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/home/osboxes/hadoop/mr-history/temp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/home/osboxes/hadoop/mr-history/done</value>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/home/osboxes/hadoop/dfs/local</value>
</property>
<property>
<name>mapreduce.jobtracker.system.dir</name>
<value>/home/osboxes/hadoop/dfs/system</value>
</property>
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description> Execution Framework </description>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
EDITED 30-Oct-2015
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>

Looks like User group permissions issue Try running it as Oozie USER.

Related

Hadoop HA ERROR: Exception in doCheckpoint (IOException) Exception during image upload doCheckpoint

I am using Hadoop 3.2.2 in a cluster based on Windows 10 and on which the high availability is configured on HDFS using the Quorum Journal manager.
The system works just fine, I am able to transition nodes from active to standby state without issues, but I often get the following error message :
java.io.IOException: Exception during image upload
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:315)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1300(StandbyCheckpointer.java:64)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:480)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$600(StandbyCheckpointer.java:383)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:403)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:502)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:399)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Error writing request body to server
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:295)
... 6 more
Caused by: java.io.IOException: Error writing request body to server
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3597)
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3580)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:377)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:321)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:295)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:230)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748
My cluster setup is the following
A: Namenode, Zookeeper, ZKFC, Journal
B: Namenode, Zookeeper, ZKFC, Journal
C: Namenode, Zookeeper, ZKFC
D: Journal, Datanode
E,F,G....: Datanode
Here is my hdfs-site configuration
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<description>Logical name for this new nameservice</description>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>A,B,C</value>
<description>Unique identifiers for each NameNode in the
nameservice</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.A</name>
<value>A:8020</value>
<description>RPC address for NameNode 1, it is necessary to use the real host name of the machine instead of an aliases</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.B</name>
<value>B:8020</value>
<description>RPC address for NameNode 2</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.C</name>
<value>C:8020</value>
<description>RPC address for NameNode 3</description>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.A</name>
<value>A:9870</value>
<description>HTTP address for NameNode 1</description>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.B</name>
<value>B:9870</value>
<description>HTTP address for NameNode 2</description>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.C</name>
<value>C:9870</value>
<description>HTTP address for NameNode 3</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://A:8485;B:8485;D:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(C:/mylocation/stop-namenode.bat $target_host)</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>C:/hadoop-3.2.2/data/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>A:2181,B:2181,C:2181</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop-3.2.2/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///C:/hadoop-3.2.2/data/dfs/datanode</value>
</property>
<property>
<name>dfs.namenode.safemode.threshold-pct</name>
<value>0.5f</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
</property>
</configuration>
Does someone got the same issue ? Am I missing something here ?

Not sure if this issue is resolved. It may be because of this change https://issues.apache.org/jira/browse/HADOOP-16886. Solution would be to add the desired value for hadoop.http.idle_timeout.ms in core-site.xml.

In Hadoop 3.1.0 namenode is working but datanode is not working

In Hadoop 3.1.0 namenode is working but datanode is not working showing below message:
STARTUP_MSG: build = https://github.com/apache/hadoop -r 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d; compiled by 'centos' on 2018-03-30T00:00Z
STARTUP_MSG: java = 1.8.0_231
************************************************************/
2019-11-13 20:58:38,398 INFO checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/C:/Appliacation/hadoop-3.1.0/data/datanode
2019-11-13 20:58:38,436 WARN checker.StorageLocationChecker: Exception checking StorageLocation [DISK]file:/C:/Appliacation/hadoop-3.1.0/data/datanode
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.getStat(NativeIO.java:455)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNativeIO(RawLocalFileSystem.java:796)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:710)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:678)
at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:191)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:98)
at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:239)
at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:52)
at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$1.call(ThrottledAsyncChecker.java:142)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-11-13 20:58:38,436 ERROR datanode.DataNode: Exception in secureMain
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:220)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2762)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2677)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2719)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2863)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2887)
2019-11-13 20:58:38,436 INFO util.ExitUtil: Exiting with status 1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
2019-11-13 20:58:38,451 INFO datanode.DataNode: SHUTDOWN_MSG:

I had same issue I had to replace some binaries in bin folder reference Hadoop-3.1.2: Datanode and Nodemanager shuts down also I had done some changes in configuration some configuration files as follows :-
1. Edit file [core-site.xml]
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:19000</value>
</property>
</configuration>
2. Edit file hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.dir</name>
<value>file:///C:/hadoop-3.1.0/data/namenode</value>
</property>
<property>
<name>dfs.datanode.dir</name>
<value>file:///C:/hadoop-3.1.0/data/datanode</value>
</property>
</configuration>
3. Edit file workers
localhost
4. Edit file mapred-site.xml
<configuration>
<property>
<name>mapreduce.job.user.name</name>
<value>%USERNAME%</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.apps.stagingDir</name>
<value>/user/%USERNAME%/staging</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>local</value>
</property>
</configuration>
5. Edit file yarn-site.xml
<configuration>
<property>
<name>yarn.server.resourcemanager.address</name>
<value>0.0.0.0:8020</value>
</property>
<property>
<name>yarn.server.resourcemanager.application.expiry.interval</name>
<value>60000</value>
</property>
<property>
<name>yarn.server.nodemanager.address</name>
<value>0.0.0.0:45454</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.server.nodemanager.remote-app-log-dir</name>
<value>/app-logs</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/dep/logs/userlogs</value>
</property>
<property>
<name>yarn.server.mapreduce-appmanager.attempt-listener.bindAddress</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.server.mapreduce-appmanager.client-service.bindAddress</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>%HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*</value>
</property>
</configuration>

Connection refused on a Mapreduce job in a Hadoop cluster enviorment

I've set up a 4 node Hadoop cluster with a master node and three data nodes. It all seems to run fine until I try to execute a map reduce job.
Jps (master-node):
[root#master logs]# jps
26967 SecondaryNameNode
25720 JobHistoryServer
26778 NameNode
27115 ResourceManager
27839 Jps
Jps (data-nodes):
[root#localhost ~]# jps
21872 DataNode
22257 Jps
21974 NodeManager
The yarn log file on the master node gives the following exception:
2018-05-22 21:59:10,376 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1527018750538_0001 failed 2 times due to Error launching appattempt_1527018750538_0001_000002. Got exception: java.net.ConnectException: Call From NameNode/193.198.139.50 to localhost:41227 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1480)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy83.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy84.startContainers(Unknown Source)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:250)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:615)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
at org.apache.hadoop.ipc.Client.call(Client.java:1452)
... 15 more
. Failing the application.
As far as I see it the problem is with the localhost:41227, since I've never specified anything like that in any of the configuration files, and the port number is a new one every time a try to run a new job, but obviously I'm not sure. Any advice or help is appreciated. Thanks
core-site.xml
<configuration>
<!-- core-site.xml -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://NameNode:9000/</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>NameNode:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>NameNode:19888</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<!-- hdfs-site.xml -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_work/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_work/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:/usr/local/hadoop_work/hdfs/namesecondary</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>NameNode</value>
</property>
<property>
<name>yarn.resourcemanager.bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.nodemanager.bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:/usr/local/hadoop_work/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:/usr/local/hadoop_work/yarn/log</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://NameNode:9000/var/log/hadoop-yarn/apps</value>
</property>
</configuration>

It's the problem in the hostname of the Datanodes.
Give a meaningful hostname to Datanodes other than localhost and restart the processes.
Call From NameNode/193.198.139.50 to localhost:41227
means it's trying to reach a random port of Datanode(localhost) from Namenode. Each node will listen to its loopback IP(127.0.0.1/localhost). It supposed to reach the data node but as per your config, it's trying to reach its own machine.
Can you also post your slaves file?

oozie run example error: IllegalArgumentException: Wrong FS: hdfs://**/user/ubuntu/share/lib, expected: file:///

Recently, I have been investigting oozie and trying very hard to build a local oozie system.After reading the official web page again and again, I finally made it. But when I tried to run a example in examples/ direction, I always got an error:
2016-07-21 21:55:17,936 WARN ActionStartXCommand:523 - SERVER[namenode] USER[ubuntu] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-160720220441466-oozie-ubun-W] ACTION[0000001-160720220441466-oozie-ubun-W#mr-node] Error starting action [mr-node]. ErrorType [ERROR], ErrorCode [IllegalArgumentException], Message [IllegalArgumentException: Wrong FS: hdfs://166.111.81.254:9000/user/ubuntu/share/lib, expected: file:///]
org.apache.oozie.action.ActionExecutorException: IllegalArgumentException: Wrong FS: hdfs://166.111.81.254:9000/user/ubuntu/share/lib, expected: file:///
at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:445)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1132)
at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1286)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:250)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:321)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:250)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://166.111.81.254:9000/user/ubuntu/share/lib, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:372)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:570)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
at org.apache.oozie.service.ShareLibService.getLatestLibPath(ShareLibService.java:687)
at org.apache.oozie.service.ShareLibService.updateShareLib(ShareLibService.java:551)
at org.apache.oozie.service.ShareLibService.getShareLibJars(ShareLibService.java:346)
at org.apache.oozie.service.ShareLibService.getSystemLibJars(ShareLibService.java:412)
at org.apache.oozie.action.hadoop.JavaActionExecutor.addSystemShareLibForAction(JavaActionExecutor.java:721)
at org.apache.oozie.action.hadoop.JavaActionExecutor.addAllShareLibs(JavaActionExecutor.java:818)
at org.apache.oozie.action.hadoop.JavaActionExecutor.setLibFilesArchives(JavaActionExecutor.java:809)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1037)
... 10 more
2016-07-21 21:55:17,939 WARN ActionStartXCommand:523 - SERVER[namenode] USER[ubuntu] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-160720220441466-oozie-ubun-W] ACTION[0000001-160720220441466-oozie-ubun-W#mr-node] Setting Action Status to [DONE]
The error bothered me for a few days and I've been trying to solve it. If you have any suggestion, please teach me~ Thanks for it very much!
My configuration are AS FOLLOW：
hadoop
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://166.111.81.254:9000</value>
</property>
<property>
<name>hadoop.proxyuser.ubuntu.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.ubuntu.groups</name>
<value>*</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>166.111.81.254</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>166.111.81.254:54311</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop-2.6.4/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
oozie
oozie-site.xml
<configuration>
<property>
<name>oozie.service.HadoopAccessorService.root.configurations</name>
<value>*=/usr/local/hadoop-2.6.4/etc/hadoop/</value>
<description>
Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of
the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is
used when there is no exact match for an authority. The HADOOP_CONF_DIR contains
the relevant Hadoop *-site.xml files. If the path is relative is looked within
the Oozie configuration directory; though the path can be absolute (i.e. to point
to Hadoop client conf/ directories in the local filesystem.
</description>
</property>
<property>
<name>oozie.service.WorkflowAppService.system.libpath</name>
<value>hdfs://166.111.81.254:9000/user/${user.name}/share/lib</value>
<description>
System library path to use for workflow applications.
This path is added to workflow application if their job properties sets
the property 'oozie.use.system.libpath' to true.
</description>
</property>
<property>
<name>oozie.service.ProxyUserService.proxyuser.ubuntu.hosts</name>
<value>*</value>
</property>
<property>
<name>oozie.service.ProxyUserService.proxyuser.ubuntu.groups</name>
<value>*</value>
</property>

scheduling a HBase Hadoop MR job having input parameters

I am able to run the job using hadoop jar command.
But when I try to schedule the job using oozie I am unable to do that.
Also please let me know if the error is due to data in hbase table or due to xml file.
The WorkFlow xml File is as follows :
<workflow-app xmlns="uri:oozie:workflow:0.1" name="java-main-wf">
<start to="java-node"/>
<action name="java-node">
<java>
<job-tracker>00.00.00.116:00000</job-tracker>
<name-node>hdfs://00.00.000.116:00000</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>aaaaaa0000002d:2888:3888,bbbbbb000000d:2888:3888,bbbbbb000000d:2888:3888</value>
</property>
<property>
<name>hbase.master</name>
<value>aaaaaa000000d:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://aaaa000000d:54310/hbase</value>
</property>
</configuration>
<main-class>com.cf.mapreduce.nord.GetSuggestedItemsForViewsCarts</main-class>
</java>
<map-reduce>
<job-tracker>1000.0000.00.000</job-tracker>
<name-node>hdfs://10.00.000.000:00000</name-node>
<configuration>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>mahout.cf.mapreduce.nord.GetSuggestedItemsForViewsCarts$GetSuggestedItemsForViewsCartsMapper</value>
</property>
<property>
<name>mapreduce.reduce.class</name>
<value>mahout.cf.mapreduce.nord.GetSuggestedItemsForViewsCarts$GetSuggestedItemsForViewsCartsReducer</value>
</property>
<property>
<name>hbase.mapreduce.inputtable</name>
<value>${MAPPER_INPUT_TABLE}</value>
</property>
<property>
<name>hbase.mapreduce.scan</name>
<value>${wf:actionData('get-scanner')['scan']}</value>
</property>
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.hbase.mapreduce.TableInputFormat</value>
</property>
<property>
<name>mapreduce.outputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.output.NullOutputFormat</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>10</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>aaa000,aaaa0000,aaaa00000</value>
</property>
<property>
<name>hbase.master</name>
<value>blrkec242032d:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://aaaa0000:00010/hbase</value>
</property>
</configuration>
</map-reduce>
and the error log of mapper is :
Submitting Oozie action Map-Reduce job
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.MapReduceMain], main() threw exception, No table was provided.
java.io.IOException: No table was provided. at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:130) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:891)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:818)
org.apache.oozie.action.hadoop.MapReduceMain.submitJob(MapReduceMain.java:91)
at org.apache.oozie.action.hadoop.MapReduceMain.run(MapReduceMain.java:57)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
at org.apache.oozie.action.hadoop.MapReduceMain.main(MapReduceMain.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:454)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher ends
syslog logs
2012-12-11 10:21:18,472 WARN org.apache.hadoop.mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
2012-12-11 10:21:18,586 ERROR org.apache.hadoop.hbase.mapreduce.TableInputFormat: java.lang.NullPointerException
at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:404)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:153) org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:70) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:959)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:891)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:818) at org.apache.oozie.action.hadoop.MapReduceMain.submitJob(MapReduceMain.java:91)
at org.apache.oozie.action.hadoop.MapReduceMain.run(MapReduceMain.java:57)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
at org.apache.oozie.action.hadoop.MapReduceMain.main(MapReduceMain.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:454)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)

When you call TableMapReduceUtil.initTableMapper(..), the utility method is configuring a number of job properties, one of which is the HBase table to scan.
Looking through the code (#GrepCode), i can see the following properties being set by this method:
<property>
<name>hbase.mapreduce.inputtable</name>
<value>CUSTOMER_INFO</value>
</property>
<property>
<name>hbase.mapreduce.scan</name>
<value>...</value>
</property>
The input table should be the name of your table, the scan property is some serialization of the scan information (a Base 64 encoded version). You best bet in my opinion is to run a job manually, and inspect the job.xml via the job tracker to see what the set values are.
Note you'll also need to set the properties for the reducer (see the source in the initTableReducerJob method), again inspecting the job.xml for a job that has been submitted manually may be your best bet.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Oozie Jobs NOT Running - Getting SUSPENDED - hadoop

Looks like User group permissions issue Try running it as Oozie USER.

Related

Hadoop HA ERROR: Exception in doCheckpoint (IOException) Exception during image upload doCheckpoint

In Hadoop 3.1.0 namenode is working but datanode is not working

Connection refused on a Mapreduce job in a Hadoop cluster enviorment

oozie run example error: IllegalArgumentException: Wrong FS: hdfs://**/user/ubuntu/share/lib, expected: file:///

scheduling a HBase Hadoop MR job having input parameters

Categories

Resources