"Resources are low on NN" after deactivation of safemode and added space - hadoop

I'm trying mapreduce with python in accord with http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
I've received a "Resources are low on NN" continously when I'm trying my mapreduce job in hadoop. I've removed files from hdfs and also attached a volume to the instance on the cloud. I've deactivated safemode but the problem is that it activates instantly again.
Filesystem Size Used Available Use%
hdfs://localhost:9000 38.8 G 9.3 G 26.4 G 24%
I should have enough space...
The python files has been tested locally and works fine. In fact, this has worked on the cloud before but for some reason it doesn't anymore.
ubuntu#lesolab2:~$ hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.2.jar -file mapper.py -mapper mapper.py -file reducer.py -reducer reducer.py -input /user/python/tweets/files/* -output /user/python/result
16/04/25 14:22:09 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [mapper.py, reducer.py] [] /tmp/streamjob5539719163695874514.jar tmpDir=null
16/04/25 14:22:10 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/04/25 14:22:10 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/04/25 14:22:10 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/25 14:22:11 INFO mapred.FileInputFormat: Total input paths to process : 21
16/04/25 14:22:11 INFO net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010
16/04/25 14:22:11 INFO mapreduce.JobSubmitter: number of splits:86
16/04/25 14:22:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1741181014_0001
16/04/25 14:22:12 INFO mapred.LocalDistributedCacheManager: Localized file:/home/ubuntu/mapper.py as file:/tmp/hadoop-ubuntu/mapred/local/1461594132014/mapper.py
16/04/25 14:22:12 INFO mapred.LocalDistributedCacheManager: Localized file:/home/ubuntu/reducer.py as file:/tmp/hadoop-ubuntu/mapred/local/1461594132015/reducer.py
16/04/25 14:22:12 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/04/25 14:22:12 INFO mapreduce.Job: Running job: job_local1741181014_0001
16/04/25 14:22:12 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/04/25 14:22:12 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/04/25 14:22:12 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/04/25 14:22:12 INFO mapred.LocalJobRunner: Error cleaning up job:job_local1741181014_0001
16/04/25 14:22:12 WARN mapred.LocalJobRunner: job_local1741181014_0001
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /user/python/result/_temporary/0. Name node is in safe mode.
Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1327)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3893)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:983)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:558)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3000)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2970)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1047)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1043)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1043)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1036)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1877)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:305)
at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:131)
at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:233)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:511)
16/04/25 14:22:13 INFO mapreduce.Job: Job job_local1741181014_0001 running in uber mode : false
16/04/25 14:22:13 INFO mapreduce.Job: map 0% reduce 0%
16/04/25 14:22:13 INFO mapreduce.Job: Job job_local1741181014_0001 failed with state FAILED due to: NA
16/04/25 14:22:13 INFO mapreduce.Job: Counters: 0
16/04/25 14:22:13 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
Any ideas to why the streaming fails?

Related

HDFS Data Transfer from One cluster to Another is not working with distcp

I need to transfer HDFS data from one cluster to another. I see "distcp" command to be helpful for this case. But it was not. Both cluster Namenode is privately interconnected with other datanodes. So I have two proxy machines to connect publically with the namenode. Say, I made namenode's 8070 port to run under 20000 in haproxy. Now I can ping both clusters namenode. So, I went for distcp option. There the mapreduce job starts executing for the data transfer but it is not completing.
[hdfs#ip-20-0-42-252 ~]$ hadoop distcp hdfs://YY.YY.YY.YY:20000/user/ce_prasith/filter.txt hdfs://xx.xx.xx.xx:20000/user/gl_qauser
18/10/09 10:12:15 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs:/user/ce_prasith/filter.txt], targetPath=hdfs://xx.xx.xx.xx:20000/user/gl_qauser, targetPathExists=true, filtersFile='null'}
18/10/09 10:12:16 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
18/10/09 10:12:16 INFO tools.SimpleCopyListing: Build file listing completed.
18/10/09 10:12:16 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
18/10/09 10:12:16 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
18/10/09 10:12:16 INFO tools.DistCp: Number of paths in the copy list: 1
18/10/09 10:12:16 INFO tools.DistCp: Number of paths in the copy list: 1
18/10/09 10:12:16 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm97
18/10/09 10:12:16 INFO mapreduce.JobSubmitter: number of splits:1
18/10/09 10:12:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1539063069030_0003
18/10/09 10:12:16 INFO impl.YarnClientImpl: Submitted application application_1539063069030_0003
18/10/09 10:12:17 INFO mapreduce.Job: The url to track the job: http://ip-20-0-21-94.ec2.internal:8088/proxy/application_1539063069030_0003/
18/10/09 10:12:17 INFO tools.DistCp: DistCp job-id: job_1539063069030_0003
18/10/09 10:12:17 INFO mapreduce.Job: Running job: job_1539063069030_0003
18/10/09 10:12:22 INFO mapreduce.Job: Job job_1539063069030_0003 running in uber mode : true
18/10/09 10:12:22 INFO mapreduce.Job: map 0% reduce 0%
18/10/09 10:13:22 INFO mapreduce.Job: map 100% reduce 0%
For your information I have taken few logs of the job
2018-10-09 12:01:42,715 WARN [CommitterEvent Processor #2] org.apache.hadoop.tools.mapred.CopyCommitter: Unable to cleanup temp files
org.apache.hadoop.net.ConnectTimeoutException: Call From ip-YY.YY.YY.YY.ec2.internal/YY.YY.YY.YY to ec2-xx.xx.xx.xx.ap-south-1.compute.amazonaws.com:20000 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=ec2-xx.xx.xx.xx.ap-south-1.compute.amazonaws.com/xx.xx.xx.xx:20000]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750)
at org.apache.hadoop.ipc.Client.call(Client.java:1508)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy10.getListing(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:573)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy11.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2101)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2084)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:731)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:110)
at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:796)
at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:792)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:792)
at org.apache.hadoop.fs.Globber.listStatus(Globber.java:76)
at org.apache.hadoop.fs.Globber.doGlob(Globber.java:237)
at org.apache.hadoop.fs.Globber.glob(Globber.java:151)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1714)
at org.apache.hadoop.tools.mapred.CopyCommitter.deleteAttemptTempFiles(CopyCommitter.java:145)
at org.apache.hadoop.tools.mapred.CopyCommitter.cleanupTempFiles(CopyCommitter.java:131)
at org.apache.hadoop.tools.mapred.CopyCommitter.abortJob(CopyCommitter.java:118)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobAbort(CommitterEventHandler.java:298)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:240)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=ec2-xx.xx.xx.xx.ap-south-1.compute.amazonaws.com/xx.xx.xx.xx:20000]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:648)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:744)
at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557)
at org.apache.hadoop.ipc.Client.call(Client.java:1480)
... 31 more
2018-10-09 12:01:42,716 INFO [CommitterEvent Processor #2] org.apache.hadoop.tools.mapred.CopyCommitter: Cleaning up temporary work folder: /user/hdfs/.staging/_distcp1087004350
I get stuck over here. Does somebody have any idea to overcome this?
All nodes in source cluster should see all nodes in destination cluster.

ERROR streaming.StreamJob: Job not successful

I install Hadoop 2.9.0 and I have 4 nodes. The namenode and resourcemanager services are running on master and datanodes and nodemanagers are running on slaves. Now I wanna run a python MapReduce job. But Job not successful!
Please tell me what should I do?
Log of running of job in terminal:
hadoop#hadoopmaster:/usr/local/hadoop$ bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.9.0.jar -file mapper.py -mapper mapper.py -file reducer.py -reducer reducer.py -input /user/hadoop/* -output /user/hadoop/output
18/06/17 04:26:28 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
18/06/17 04:26:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [mapper.py, /tmp/hadoop-unjar3316382199020742755/] [] /tmp/streamjob4930230269569102931.jar tmpDir=null
18/06/17 04:26:28 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.111.175:8050
18/06/17 04:26:29 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.111.175:8050
18/06/17 04:26:29 WARN hdfs.DataStreamer: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1249)
at java.lang.Thread.join(Thread.java:1323)
at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:630)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:807)
18/06/17 04:26:29 WARN hdfs.DataStreamer: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1249)
at java.lang.Thread.join(Thread.java:1323)
at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:630)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:807)
18/06/17 04:26:29 INFO mapred.FileInputFormat: Total input files to process : 4
18/06/17 04:26:29 WARN hdfs.DataStreamer: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1249)
at java.lang.Thread.join(Thread.java:1323)
at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:630)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:807)
18/06/17 04:26:29 INFO mapreduce.JobSubmitter: number of splits:4
18/06/17 04:26:29 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/06/17 04:26:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1529233655437_0004
18/06/17 04:26:30 INFO impl.YarnClientImpl: Submitted application application_1529233655437_0004
18/06/17 04:26:30 INFO mapreduce.Job: The url to track the job: http://hadoopmaster.png.com:8088/proxy/application_1529233655437_0004/
18/06/17 04:26:30 INFO mapreduce.Job: Running job: job_1529233655437_0004
18/06/17 04:45:12 INFO mapreduce.Job: Job job_1529233655437_0004 running in uber mode : false
18/06/17 04:45:12 INFO mapreduce.Job: map 0% reduce 0%
18/06/17 04:45:12 INFO mapreduce.Job: Job job_1529233655437_0004 failed with state FAILED due to: Application application_1529233655437_0004 failed 2 times due to Error launching appattempt_1529233655437_0004_000002. Got exception: org.apache.hadoop.net.ConnectTimeoutException: Call From hadoopmaster.png.com/192.168.111.175 to hadoopslave1.png.com:40569 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=hadoopslave1.png.com/192.168.111.173:40569]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.GeneratedConstructorAccessor38.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:774)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
at org.apache.hadoop.ipc.Client.call(Client.java:1439)
at org.apache.hadoop.ipc.Client.call(Client.java:1349)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy82.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy83.startContainers(Unknown Source)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:307)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=hadoopslave1.png.com/192.168.111.173:40569]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:687)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:790)
at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:411)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1385)
... 19 more
. Failing the application.
18/06/17 04:45:13 INFO mapreduce.Job: Counters: 0
18/06/17 04:45:13 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
OK. I found the reason of problem. In fact, the following error should be resolved:
hadoopmaster.png.com/192.168.111.175 to hadoopslave1.png.com:40569
failed on socket timeout exception
So just did:
sudo ufw allow 40569

hadoop mahout:run org.apache.classifier.df.mapreduce.TestForest error

I'm new to Mahout and Random-Forest.I wanna classify my dataset and have built the random-forest on my three virtual Hadoop nodes.
As we know,first of all,I made the descriptor(/des.info).And I built the classifier(/user/hadoop/forest) .There was an error but it completed successfully.However I was stuck when I tried to test it.
My systems are all CentOS7 with hadoop-3.0.0.
Here is the HDFS:
[hadoop#hadoop1 ~]$ hadoop fs -ls /
Found 5 items
-rw-r--r-- 1 hadoop supergroup 8807688 2018-04-29 19:59 /des.info
-rw-r--r-- 1 hadoop supergroup 79736192 2018-04-29 19:55 /features.txt
-rw-r--r-- 1 hadoop supergroup 278 2018-05-01 03:50 /test.txt
drwx------ - hadoop supergroup 0 2018-04-29 20:05 /tmp
drwxr-xr-x - hadoop supergroup 0 2018-04-29 20:05 /user
Here is the code of the third step:
[hadoop#hadoop1 ~]$hadoop jar /opt/mahout-distribution-0.9/mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.mapreduce.TestForest -i /test.txt -ds /des.info -m /user/hadoop/forest -a -mr -o prediction
2018-05-01 03:53:21,595 INFO mapreduce.Classifier: Adding the dataset to the DistributedCache
2018-05-01 03:53:21,597 INFO mapreduce.Classifier: Adding the decision forest to the DistributedCache
2018-05-01 03:53:21,600 INFO mapreduce.Classifier: Configuring the job...
2018-05-01 03:53:21,605 INFO mapreduce.Classifier: Running the job...
2018-05-01 03:53:21,702 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/192.168.80.100:8032
2018-05-01 03:53:22,073 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1525056498669_0002
2018-05-01 03:53:22,507 INFO input.FileInputFormat: Total input files to process : 1
2018-05-01 03:53:23,009 INFO mapreduce.JobSubmitter: number of splits:1
2018-05-01 03:53:23,170 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-05-01 03:53:23,353 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1525056498669_0002
2018-05-01 03:53:23,355 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-05-01 03:53:23,663 INFO conf.Configuration: resource-types.xml not found
2018-05-01 03:53:23,663 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-05-01 03:53:23,813 INFO impl.YarnClientImpl: Submitted application application_1525056498669_0002
2018-05-01 03:53:23,870 INFO mapreduce.Job: The url to track the job: http://hadoop1:8088/proxy/application_1525056498669_0002/
2018-05-01 03:53:23,871 INFO mapreduce.Job: Running job: job_1525056498669_0002
2018-05-01 03:53:31,029 INFO mapreduce.Job: Job job_1525056498669_0002 running in uber mode : false
2018-05-01 03:53:31,030 INFO mapreduce.Job: map 0% reduce 0%
2018-05-01 03:53:34,089 INFO mapreduce.Job: Task Id : attempt_1525056498669_0002_m_000000_0, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 946827879
at org.apache.mahout.classifier.df.node.Node.read(Node.java:58)
at org.apache.mahout.classifier.df.DecisionForest.readFields(DecisionForest.java:197)
at org.apache.mahout.classifier.df.DecisionForest.read(DecisionForest.java:203)
at org.apache.mahout.classifier.df.DecisionForest.load(DecisionForest.java:225)
at org.apache.mahout.classifier.df.mapreduce.Classifier$CMapper.setup(Classifier.java:209)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
2018-05-01 03:53:46,822 INFO mapreduce.Job: Task Id : attempt_1525056498669_0002_m_000000_1, Status : FAILED
2018-05-01 03:53:51,342 INFO mapreduce.Job: Task Id : attempt_1525056498669_0002_m_000000_2, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 946827879
at org.apache.mahout.classifier.df.node.Node.read(Node.java:58)
at org.apache.mahout.classifier.df.DecisionForest.readFields(DecisionForest.java:197)
at org.apache.mahout.classifier.df.DecisionForest.read(DecisionForest.java:203)
at org.apache.mahout.classifier.df.DecisionForest.load(DecisionForest.java:225)
at org.apache.mahout.classifier.df.mapreduce.Classifier$CMapper.setup(Classifier.java:209)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
2018-05-01 03:54:05,463 INFO mapreduce.Job: map 100% reduce 0%
2018-05-01 03:54:06,479 INFO mapreduce.Job: Job job_1525056498669_0002 failed with state FAILED due to: Task failed task_1525056498669_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0
2018-05-01 03:54:06,556 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=28056
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=28056
Total vcore-milliseconds taken by all map tasks=28056
Total megabyte-milliseconds taken by all map tasks=28729344
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Exception in thread "main" java.lang.IllegalStateException: Job failed!
at org.apache.mahout.classifier.df.mapreduce.Classifier.run(Classifier.java:127)
at org.apache.mahout.classifier.df.mapreduce.TestForest.mapreduce(TestForest.java:188)
at org.apache.mahout.classifier.df.mapreduce.TestForest.testForest(TestForest.java:174)
at org.apache.mahout.classifier.df.mapreduce.TestForest.run(TestForest.java:146)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.mahout.classifier.df.mapreduce.TestForest.main(TestForest.java:315)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Now I'm confused very much.
Because this is incompatible with hadoop and mahout versions. The Mahout 0.9 random forest algorithm can only run on hadoop 1.x.

Hadoop Mapreduce Job: Input path does not exist: hdfs://localhost:9000/user/******/grep-temp-1208171489

I am running an example mapreduce job that comes with Hadoop 2.8.1
I am using these commands:
bin/hdfs dfs -copyFromLocal etc/hadoop/core-site.xml .
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep ./core-site.xml output ‘configuration’
However, when I run this, the job exits (the main error appears in the title):
***********s-mbp-2:hadoop-2.8.1 ***********$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.1.jar grep ./core-site.xml output ‘configuration’
17/09/12 14:30:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/09/12 14:30:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/09/12 14:30:29 INFO input.FileInputFormat: Total input files to process : 1
17/09/12 14:30:30 INFO mapreduce.JobSubmitter: number of splits:1
17/09/12 14:30:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1505251091124_0001
17/09/12 14:30:30 INFO impl.YarnClientImpl: Submitted application application_1505251091124_0001
17/09/12 14:30:30 INFO mapreduce.Job: The url to track the job: http://***********s-mbp-2.lan:8088/proxy/application_1505251091124_0001/
17/09/12 14:30:30 INFO mapreduce.Job: Running job: job_1505251091124_0001
17/09/12 14:30:33 INFO mapreduce.Job: Job job_1505251091124_0001 running in uber mode : false
17/09/12 14:30:33 INFO mapreduce.Job: map 0% reduce 0%
17/09/12 14:30:33 INFO mapreduce.Job: Job job_1505251091124_0001 failed with state FAILED due to: Application application_1505251091124_0001 failed 2 times due to AM Container for appattempt_1505251091124_0001_000002 exited with exitCode: 127
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1505251091124_0001_02_000001
Exit code: 127
Stack trace: ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Shell.run(Shell.java:869)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 127
For more detailed output, check the application tracking page: http://***********s-mbp-2.lan:8088/cluster/app/application_1505251091124_0001 Then click on links to logs of each attempt.
. Failing the application.
17/09/12 14:30:33 INFO mapreduce.Job: Counters: 0
17/09/12 14:30:33 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/09/12 14:30:33 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/***********/.staging/job_1505251091124_0002
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/***********/grep-temp-576807334
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:329)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:271)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:393)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)
at org.apache.hadoop.examples.Grep.run(Grep.java:94)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.examples.Grep.main(Grep.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
What is going wrong, and how do I fix it?
It looks like the program can't access the path in the exception. You may try to manually create that directory with mkdir in hdfs system before running the code. Hope it will help.

Hadoop: NullPointerException when redirecting to job history server

I have a Hadoop cluster (HDP 2.1). Everything has been working for a long time, but suddenly jobs have started to return the following recurrent error:
16/10/13 16:21:11 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
16/10/13 16:21:12 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
16/10/13 16:21:12 INFO impl.TimelineClientImpl: Timeline service address: http://dev-fiwr-bignode-12.hi.inet:8188/ws/v1/timeline/
16/10/13 16:21:13 INFO client.RMProxy: Connecting to ResourceManager at dev-fiwr-bignode-12.hi.inet/10.95.76.79:8050
16/10/13 16:21:13 INFO input.FileInputFormat: Total input paths to process : 2
16/10/13 16:21:13 INFO mapreduce.JobSubmitter: number of splits:2
16/10/13 16:21:13 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
16/10/13 16:21:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1476366871137_0003
16/10/13 16:21:14 INFO impl.YarnClientImpl: Submitted application application_1476366871137_0003
16/10/13 16:21:14 INFO mapreduce.Job: The url to track the job: http://dev-fiwr-bignode-12.hi.inet:8088/proxy/application_1476366871137_0003/
16/10/13 16:21:14 INFO mapreduce.Job: Running job: job_1476366871137_0003
16/10/13 16:21:19 INFO mapreduce.Job: Job job_1476366871137_0003 running in uber mode : false
16/10/13 16:21:19 INFO mapreduce.Job: map 0% reduce 0%
16/10/13 16:21:23 INFO mapreduce.Job: map 50% reduce 0%
16/10/13 16:21:24 INFO mapreduce.Job: map 100% reduce 0%
16/10/13 16:21:28 INFO mapreduce.Job: map 100% reduce 100%\
6/10/13 16:21:29 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
16/10/13 16:21:29 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
16/10/13 16:21:29 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
Exception in thread \"main\" java.io.IOException:
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException
org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:277)
org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173)
org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283)
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:415)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:334)
org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:386)
org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:539)
org.apache.hadoop.mapreduce.Job$5.run(Job.java:668)
org.apache.hadoop.mapreduce.Job$5.run(Job.java:665)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:415)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:665)
org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1366)
org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1306)
dijkstra.adjacencylist.AdjacencyListDriver.jobRun(AdjacencyListDriver.java:53)
dijkstra.adjacencylist.AdjacencyListDriver.run(AdjacencyListDriver.java:31)
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
dijkstra.launch.LaunchClass.launchAdjMatrix(LaunchClass.java:226)
dijkstra.launch.LaunchClass.main(LaunchClass.java:199)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException):
java.lang.NullPointerException
org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:277)
...
Googling a bit, I've seen these issues:
https://issues.apache.org/jira/browse/MAPREDUCE-5703
https://issues.apache.org/jira/browse/MAPREDUCE-5547
They seem to be related. Nevertheless, why was the cluster running properly until now? Nothing was changed in the configuration, the clsuter is not in safe mode, the HDFS space usage is around 0.03%... Any clues? And in the case this is related to the issues above mentioned, any workaround?
Many thanks, I'll stay tuned for your answers or additional info requirements.
Your issues is similar to 5703, judging by the stack trace, and as stated in that bug:
"The method GetTaskAttemptCompletionEventsResponse() fetched a Job by calling verifyAndGetJob(), but it never checked if job was null or not, which was the root cause of this issue."
There is a job lookup using a job id, the job is not found.
In that bug it lists a scenario in which a job history server (JHS) is queried about a finished job but JHS failed to receive the info for that job.
There seems to be open issues regarding job termination and job history uploads that allow this exception to happen when job history upload fails. In the bug the issue was triggered by restarting the node writing the history before the history upload is complete, or by that node having no good nodes to write the history to.
Unfortunately, there is nothing else here that might help identify what caused the history upload to fail in your case, but that appears to be the underlying source of the issue. Your job history server has no record of the job that successfully completed.

Resources