In the newest version of Hadoop mapreduce(called 'Yarn'), JobTracker(exists in previous version) has been replaced by the ResourceManager(called 'RM') and ApplicationMaster.
In official document about Yarn architecture, there are no words say that how many RMs are there in a MapReduce cluster, and the given graph about Yarn architecture shows only 1 RM exists in a cluster.
So, what if the only RM down? If there are several RMs, how do they work together?
Hope someone can explain it to me.
Thanks.
There is 1 RessourceManager per rack but you can have several racks in your cluster.
If you try to submit a job while RessourceManager is down, Hadoop will try to connect to the RessourceManager because it needs it to execute the job.
Here is an example of the logs when the RM is down and try to submit a job :
14/06/06 09:39:54 INFO ipc.Client: Retrying connect to server: hadoop01.sii.fr/10.6.6.211:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/06/06 09:39:55 INFO ipc.Client: Retrying connect to server: hadoop01.sii.fr/10.6.6.211:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/06/06 09:39:56 INFO ipc.Client: Retrying connect to server: hadoop01.sii.fr/10.6.6.211:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
When the RM is back, the job is submitting correctly.
Related
I have been following some instructions on setting-up a vagrant single-node cluster and have been through the instrustions once without issue. However, I am running into several problems when trying to repeat the same instructions. Now I am getting a connection refused when trying to run hadoop fs -ls /
$ hadoop fs -ls /
Warning: $HADOOP_HOME is deprecated.
15/01/18 04:09:21 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:22 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:23 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:24 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:26 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:27 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:30 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
ls: Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused
When I run jps I get the following:
$jps
4176 JobTracker
4313 TaskTracker
3970 DataNode
4581 Jps
4094 SecondaryNameNode
I'm really at a loss as to what I have missed to cause this different behavior. Any help would be greatly appreciated.
Looks like your namenode is not up. Make sure u format the namenode
stop the cluster and stop all daemons
format the namenode
Once you format, try starting namenode first and other daemons.
By looking at the daemons running there is no name node. I would suggest you go ahead and restart all the daemons. Below are the commands which just restart the daemons and name node will be up and running. Hope this helps!
sudo service hadoop-master stop
sudo service hadoop-master start
hadoop dfsadmin -safemode leave
sudo jps
Hadoop cluster started normally and JPS shows datanodes and tasktracker running correctly.
When i copy a file into HDFS this is the error message i am getting.
hduser#nn:~$ hadoop fs -put gettysburg.txt /user/hduser/getty/gettysburg.txt
Warning: $HADOOP_HOME is deprecated.
14/08/24 21:12:50 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:51 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:52 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:53 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:54 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:55 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:56 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:57 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:58 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:59 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Bad connection to FS. command aborted. exception: Call to nn/10.10.1.1:54310 failed on connection exception: java.net.ConnectException: Connection refused
hduser#nn:~$
I am able to do ssh from NN to DNs and Viceverssa and between DNs.
I have changed the cd /etc/hosts in all NNs and DNs as below.
#127.0.0.1 localhost loghost localhost.project1.ch-geni-net.emulab.net
#10.10.1.1 NN-Lan NN-0 NN
#10.10.1.2 DN1-Lan DN1-0 DN1
#10.10.1.3 DN2-Lan DN2-0 DN2
#10.10.1.5 DN4-Lan DN4-0 DN4
#10.10.1.4 DN3-Lan DN3-0 DN3
10.10.1.1 nn
10.10.1.2 dn1
10.10.1.3 dn2
10.10.1.4 dn3
10.10.1.5 dn4
My mapredsite.xml looks like this.
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://nn:54310</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHE$
</property>
</configuration>
Configured cd /usr/local/hadoop/conf/master
hduser#nn:/usr/local/hadoop/conf$ vi masters
#localhost
nn
hduser#dn1:~$ jps
9975 DataNode
10186 Jps
10070 TaskTracker
hduser#dn1:~$
hduser#nn:~$ jps
5979 JobTracker
5891 SecondaryNameNode
6159 Jps
hduser#nn:~$
What is the problem?
Check your fs.default.name property in core-site.xml file. The value should be hdfs://NN:port.
Check the following :
core-site.xml - the hdfs url mentioned - hdfs://ip:port
Format namenode
Check if safemode is on
Iam using single node hadoop cluster in ubuntu 13.10 with hadoop 1.2.1
Always iam having a problem like
whenever i restart my compueter and want to enter into hadoop environment
i login to terminal and type su
i get error like bin/su not in something
i do export /user/bin:/bin
then it works
then after getting into su
when i type hadoop fs -ls
i get error like trying 1..2...
lastly it failes
user#ubuntu1310:~$ su
Command 'su' is available in '/bin/su'
The command could not be located because '/bin' is not included in the PATH environment variable.
su: command not found
user#ubuntu1310:~$ export PATH=/user/bin:bin
user#ubuntu1310:~$ su
Command 'su' is available in '/bin/su'
The command could not be located because '/bin' is not included in the PATH environment variable.
su: command not found
user#ubuntu1310:~$ su
Command 'su' is available in '/bin/su'
The command could not be located because '/bin' is not included in the PATH environment variable.
su: command not found
user#ubuntu1310:~$ export PATH=/usr/bin:bin
user#ubuntu1310:~$ su
Command 'su' is available in '/bin/su'
The command could not be located because '/bin' is not included in the PATH environment variable.
su: command not found
user#ubuntu1310:~$ export PATH=/usr/bin:/bin
user#ubuntu1310:~$ su
Password:
root#ubuntu1310:/home/user# start-all.sh
starting namenode, logging to /usr/lib/hadoop/libexec/../logs/hadoop-root-namenode-ubuntu1310.out
root#localhost's password:
localhost: starting datanode, logging to /usr/lib/hadoop/libexec/../logs/hadoop-root-datanode-ubuntu1310.out
root#localhost's password:
localhost: starting secondarynamenode, logging to /usr/lib/hadoop/libexec/../logs/hadoop-root-secondarynamenode-ubuntu1310.out
starting jobtracker, logging to /usr/lib/hadoop/libexec/../logs/hadoop-root-jobtracker-ubuntu1310.out
root#localhost's password:
localhost: starting tasktracker, logging to /usr/lib/hadoop/libexec/../logs/hadoop-root-tasktracker-ubuntu1310.out
root#ubuntu1310:/home/user# hadoop fs -ls
14/01/09 05:46:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:46:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:46:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:46:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:47:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:47:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:47:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:47:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:47:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:47:05 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
ls: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
root#ubuntu1310:/home/user# hadoop fs -ls
14/01/09 05:54:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:54:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:05 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:06 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/09 05:55:07 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
ls: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
root#ubuntu1310:/home/user#
How can i aviod this 2 errors as
its very hard for me to format namenode always
First of all su has nothing to do with Hadoop. Next, you are getting these errors probably because you have not specified hadoop.tmp.dir property in your core-site.xml. The value of this property defaults to /tmp which gets emptied at each restart. Thus you loose all the HDFS metadata+data and have to reformat it.
It is always a good practice to add this property. Also, it is advisable to add dfs.name.dir and dfs.data.dir properties in your hdfs-site.xml files.
it seems like you have not set the password less login while installing hadoop it is recommended to set password less log in
for proper installation guide you can refer
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
I am trying to copy a file quangle.txt from my localsystem to Hadoop using the command below:
testuser#ubuntu:~/Downloads/hadoop/bin$ ./hadoop fs -copyFromLocal Desktop/quangle.txt hdfs://localhost/testuser/quangle.txt
13/11/28 06:35:50 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:51 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:52 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:53 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:54 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
copyFromLocal: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
I tried to ping 127.0.0.1 and I got the response. Please advice
just add correct port to the filepath after localhost:
hdfs://localhost:9000/testuser/quangle.txt
Looks like your Name node isn't running - try running the jps cmd and see if NameNode is listed in the running services (or you might have to run ps axww | grep NameNode if the NameNode was started by/under a different user)
Does sudo netstat -atnp | grep 8020 yield any results?
If the Name Node is refusing to start then copy in your Name Node logs into to your original question (or post a new question - after searching for the error first of all to see if someone else has had this problem)
Try running jps to see the currently running Java processes.
Are all Hadoop processes running, especially the Namemode?
If yes, you should get this output (with different process ids):
10015 JobTracker
9670 TaskTracker
9485 DataNode
10380 Jps
9574 SecondaryNameNode
9843 NameNode
I think you can use hadoop fs -put ~/Desktop/quangle.txt /testuser, after copied, you can look up it via hadoop fs -ls /testuser in the /testuser directory
you create Desktop and others with the command hadoop fs -mkdir testuser and then try, it worked for me that way
Maybe there is something wrong with your setting for Pseudodistributed Mode.
It should be configured in this order:
fill up the configuration files:core-site.xml, hdfs-site.xml,
mapred-site.xml, yar-site.xml.
Configuring SSH
Formatting the
HDFS filesystem
Starting and stopping the daemons
I got all my settings right and I am able to run Hadoop ( 1.1.2 ) on a single-Node. However, after making the changes to the relevant files ( /etc/hosts, *-site.xml ), I am not able to add a Datanode to the cluster and I keep getting the following error on the Slave.
Anybody knows how to rectify this?
2013-05-13 15:36:10,135 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-05-13 15:36:11,137 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-05-13 15:36:12,140 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Check the value of fs.default.name in your core-site.xml conf file (on each node in your cluster). This needs to be the network name of the name node and i suspect you have this as hdfs://localhost:54310).
Failing that check for any mention of localhost in your hadoop configuration files on all nodes in your cluster:
grep localhost $HADOOP_HOME/conf/*.xml
try relpacing localhost with the namenode's ip address or network name