why the word count job failed on Hadoop? - hadoop

I run a word count job on Hadoop.I have checked the HDFS and yarn were all in running before the job.but when the job started,hadoop jar /usr/local/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar wordcount /input /output.
The job failed,seems like the master can't be connected.
2021-12-17 03:33:37,196 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop-master/172.18.0.2:8032
2021-12-17 03:33:38,366 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:39,371 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:40,379 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:41,381 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:42,384 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:43,388 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:44,392 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:45,395 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:46,403 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:47,406 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:48,429 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:49,430 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:50,433 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:51,439 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:52,442 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:53,444 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:54,453 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:55,458 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:56,462 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:57,465 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-12-17 03:33:57,470 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From hadoop-master/172.18.0.2 to hadoop-master:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over null after 1 failover attempts. Trying to failover after sleeping for 34878ms.
Then I typed jps,found that the resourceManager has quit.
enter code ^Croot#hadoop-master:~# jps
8224 NodeManager
631 SecondaryNameNode
423 DataNode
8634 Jps
255 NameNode

Related

Inability to connect to Yarn leads to infinite loop in retries

I am using a Yarn EMR cluster through AWS. I am trying to write some tests for my codebase to ensure that I have test coverage for a Yarn connection.
Right now I am purposely making my connection fail by messing with the variable YARN_CONF_DIR. If the variable is set properly, I can connect to Yarn with no issue and run my job. However, when YARN_CONF_DIR is set but not to a path that has the expected xml, cfg, and bash files, I get the following output:
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
Now, I expected an error, of course. The weird thing is that you can see that the number of retries is looping, which I find odd. Look at the part of each line that says Already tried (x) time(s) and you can see it goes 0,1,2,3...9,0,1,2,3...9,0,1. I figured after retry #9 it would simply error and yield some sort of connection refused error. Instead, I am stuck in an infinite loop.
The command I am running is:
spark-submit --master yarn --deploy-mode cluster simple_spark_app_test.py
I also tried setting spark.yarn.maxAppAttempts to be 10 explicitly, but it doesn't affect the infinite looping. E.g. the command looked like
spark-submit --master yarn --deploy-mode cluster --conf spark.yarn.maxAppAttempts=10 simple_spark_app_test.py
Please note this only occurs if YARN_CONF_DIR is set but not to the right directory. If it is not set at all then it gives me an error that YARN_CONF_DIR must be set, as I would expect.

MapReduce client retrying to connect after job completion

Running on Hadoop 2.6.0-cdh5.7.0 and issuing a simple Pig script.
After a successful job completion I'm getting the following message :
Seems like the workers are trying to communicate with each other (with a maximum of 3 retries) but I'm not sure why, and where this behavior is configured.
Does anyone know how to solve this issue ?
Output(s):
Successfully stored 46933 records (12822705 bytes) in: "/profile/main_output_merged"
Counters:
Total records written : 46933
Total bytes written : 12822705
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1469941650260_0002 -> job_1469941650260_0011,
job_1469941650260_0003 -> job_1469941650260_0011,
job_1469941650260_0001 -> job_1469941650260_0005,job_1469941650260_0006,
job_1469941650260_0005 -> job_1469941650260_0006,
job_1469941650260_0006 -> job_1469941650260_0007,
job_1469941650260_0007 -> job_1469941650260_0008,job_1469941650260_0009,
job_1469941650260_0004 -> job_1469941650260_0008,
job_1469941650260_0008 -> job_1469941650260_0010,
job_1469941650260_0010 -> job_1469941650260_0011,
job_1469941650260_0009 -> job_1469941650260_0011,
job_1469941650260_0011
2016-07-31 05:28:54,418 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p1.c.project.internal/10.240.0.22:38762. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:55,419 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p1.c.project.internal/10.240.0.22:38762. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:56,420 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p1.c.project.internal/10.240.0.22:38762. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:56,527 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:28:57,626 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:35325. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:58,628 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:35325. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:59,629 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:35325. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:59,732 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:00,833 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:45573. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:01,834 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:45573. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:02,835 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:45573. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:02,939 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:04,051 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:36934. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:05,052 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:36934. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:06,053 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:36934. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:06,157 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:07,244 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:43862. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:08,245 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:43862. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:09,246 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:43862. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:09,350 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:10,643 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:38481. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:11,644 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:38481. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:12,645 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:38481. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:12,749 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:13,832 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:34431. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:14,833 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:34431. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:15,834 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:34431. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:15,937 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:17,045 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker1.c.project.internal/10.240.0.27:38757. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:18,046 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker1.c.project.internal/10.240.0.27:38757. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:19,047 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker1.c.project.internal/10.240.0.27:38757. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:19,149 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:20,230 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:37952. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:21,231 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:37952. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:22,232 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:37952. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:22,335 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:22,417 [MainThread] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!

Hadoop not listing any slaves

I've followed this guide to setup a simple cluster for Hadoop.
Though I can't see my slave node at http://master:50070.
I have followed the guide until Start the Yarn MapReduce Job tracker and everything is listed as it should when running jps on both master and slave node.
In the hadoop-hadoopuser-datanode-slave-1.log I see these messages over and over.
2016-05-25 13:26:11,884 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-05-25 13:26:11,886 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2016-05-25 13:26:13,028 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.1.32:54310. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-25 13:26:14,029 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.1.32:54310. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-25 13:26:15,031 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.1.32:54310. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-25 13:26:16,032 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.1.32:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-25 13:26:17,033 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.1.32:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-25 13:26:18,034 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.1.32:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-25 13:26:19,035 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.1.32:54310. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-25 13:26:20,036 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.1.32:54310. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-25 13:26:21,037 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.1.32:54310. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-25 13:26:22,038 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.1.32:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-25 13:26:22,040 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: master/10.0.1.32:54310
What am I missing?
This was solved by removing the line for 127.0.1.1 from /etc/hosts on both master and slave nodes.

hadoop enviroment is not available until i format the name node and continue always

Iam using single node hadoop cluster in ubuntu 13.10 with hadoop 1.2.1
Always iam having a problem like
whenever i restart my compueter and want to enter into hadoop environment
i login to terminal and type su
i get error like bin/su not in something
i do export /user/bin:/bin
then it works
then after getting into su
when i type hadoop fs -ls
i get error like trying 1..2...
lastly it failes
user#ubuntu1310:~$ su
Command 'su' is available in '/bin/su'
The command could not be located because '/bin' is not included in the PATH environment variable.
su: command not found
user#ubuntu1310:~$ export PATH=/user/bin:bin
user#ubuntu1310:~$ su
Command 'su' is available in '/bin/su'
The command could not be located because '/bin' is not included in the PATH environment variable.
su: command not found
user#ubuntu1310:~$ su
Command 'su' is available in '/bin/su'
The command could not be located because '/bin' is not included in the PATH environment variable.
su: command not found
user#ubuntu1310:~$ export PATH=/usr/bin:bin
user#ubuntu1310:~$ su
Command 'su' is available in '/bin/su'
The command could not be located because '/bin' is not included in the PATH environment variable.
su: command not found
user#ubuntu1310:~$ export PATH=/usr/bin:/bin
user#ubuntu1310:~$ su
Password:
root#ubuntu1310:/home/user# start-all.sh
starting namenode, logging to /usr/lib/hadoop/libexec/../logs/hadoop-root-namenode-ubuntu1310.out
root#localhost's password:
localhost: starting datanode, logging to /usr/lib/hadoop/libexec/../logs/hadoop-root-datanode-ubuntu1310.out
root#localhost's password:
localhost: starting secondarynamenode, logging to /usr/lib/hadoop/libexec/../logs/hadoop-root-secondarynamenode-ubuntu1310.out
starting jobtracker, logging to /usr/lib/hadoop/libexec/../logs/hadoop-root-jobtracker-ubuntu1310.out
root#localhost's password:
localhost: starting tasktracker, logging to /usr/lib/hadoop/libexec/../logs/hadoop-root-tasktracker-ubuntu1310.out
root#ubuntu1310:/home/user# hadoop fs -ls
14/01/09 05:46:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:46:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:46:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:46:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:47:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:47:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:47:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:47:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:47:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:47:05 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
ls: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
root#ubuntu1310:/home/user# hadoop fs -ls
14/01/09 05:54:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:54:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:05 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:06 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:07 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
ls: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
root#ubuntu1310:/home/user#
How can i aviod this 2 errors as
its very hard for me to format namenode always
First of all su has nothing to do with Hadoop. Next, you are getting these errors probably because you have not specified hadoop.tmp.dir property in your core-site.xml. The value of this property defaults to /tmp which gets emptied at each restart. Thus you loose all the HDFS metadata+data and have to reformat it.
It is always a good practice to add this property. Also, it is advisable to add dfs.name.dir and dfs.data.dir properties in your hdfs-site.xml files.
it seems like you have not set the password less login while installing hadoop it is recommended to set password less log in
for proper installation guide you can refer
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

Hadoop error dfs -copyFromLocal

While moving a file in hadoop from temp directory
Used below command :-
[Divya#localhost hadoop]$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg /home/Divya/gutenberg
How to resolve this error :-
13/07/03 14:42:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/07/03 14:42:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/07/03 14:42:30 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/07/03 14:42:31 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/07/03 14:42:32 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/07/03 14:42:33 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/07/03 14:42:34 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/07/03 14:42:35 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/07/03 14:42:36 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/07/03 14:42:37 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
copyFromLocal: Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused
Check whether namenode process is running on your localhost and on this port 54310
Check whether the hadoop daemon services are running properly. Use JPS command as root to check whether they are running properly. Make sure the cluster is properly set up.

Resources