I have a Vagrant machine running a local Hadoop installation. Hadoop was working fine until today. Today Vagrant's insecure SSH key stopped working so I had to replace it. Now Hadoop is not working. In the logs I see:
17/09/18 09:35:41 INFO ipc.Client: Retrying connect to server: mymachine/192.168.33.10:8020. Already tried 0 time(s); maxRetries=45
17/09/18 09:36:01 INFO ipc.Client: Retrying connect to server: mymachine/192.168.33.10:8020. Already tried 1 time(s); maxRetries=45
17/09/18 09:36:21 INFO ipc.Client: Retrying connect to server: mymachine/192.168.33.10:8020. Already tried 2 time(s); maxRetries=45
17/09/18 09:36:41 INFO ipc.Client: Retrying connect to server: mymachine/192.168.33.10:8020. Already tried 3 time(s); maxRetries=45
The claim here is that it's a datanode -> namenode communication issue. core-site.xml contains:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mymachine:8020</value>
</property>
</configuration>
Which is correct. Trying getent hosts mymachine yields 192.168.33.10, which means the host is ok. I tried sudo netstat -antp | grep 8020 and got:
tcp 0 1 10.0.2.15:42002 192.168.33.10:8020 SYN_SENT 2630/java
tcp 0 1 10.0.2.15:42004 192.168.33.10:8020 SYN_SENT 2772/java
tcp 0 1 10.0.2.15:41998 192.168.33.10:8020 SYN_SENT 3312/java
So it appears that the port is also ok. However, when I do curl http://mymachine:8020 I get no reply. I checked on an identical machine and the correct reply should be It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon..
Any ideas?
There are some answers in my opinion following:
1. Check if you can "ssh" to localhost without the password;
2. Check the authority when you start the hadoop;
3. It should be 127.0.0.1:8020 if running a local hadoop on your machine.Because the hadoop may run rightly while the network disconnecting...
Should I be able to run the command:
hadoop dfs -ls
from slave node?
Currently I cannot, I get the following error:
13/11/01 14:58:03 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 0 time(s).
13/11/01 14:58:04 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 1 time(s).
13/11/01 14:58:05 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 2 time(s).
13/11/01 14:58:06 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 3 time(s).
13/11/01 14:58:07 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 4 time(s).
13/11/01 14:58:08 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 5 time(s).
13/11/01 14:58:09 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 6 time(s).
13/11/01 14:58:10 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 7 time(s).
13/11/01 14:58:11 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 8 time(s).
13/11/01 14:58:12 INFO ipc.Client: Retrying connect to server: ec2-54-200-245-100.us-west-2.compute.amazonaws.com/172.31.17.66:9000. Already tried 9 time(s).
Bad connection to FS. command aborted.
You should check on your slave nodes the property called fs.default.name in core-site.xml and make sure it points to your namenode.
Since you seem to be on EC2 it should be something like
<property>
<name>fs.default.name</name>
<value>hdfs://namenode.ec2.demdex.com:9000</value>
</property>
I got all my settings right and I am able to run Hadoop ( 1.1.2 ) on a single-Node. However, after making the changes to the relevant files ( /etc/hosts, *-site.xml ), I am not able to add a Datanode to the cluster and I keep getting the following error on the Slave.
Anybody knows how to rectify this?
2013-05-13 15:36:10,135 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-05-13 15:36:11,137 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-05-13 15:36:12,140 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Check the value of fs.default.name in your core-site.xml conf file (on each node in your cluster). This needs to be the network name of the name node and i suspect you have this as hdfs://localhost:54310).
Failing that check for any mention of localhost in your hadoop configuration files on all nodes in your cluster:
grep localhost $HADOOP_HOME/conf/*.xml
try relpacing localhost with the namenode's ip address or network name
I am working with a 2 node fully distributed hadoop cluster. I am trying to connect tasktracker to run on the slave node but it is not able to connect to my 9000/9001 ports. Below are the config files so if anyone spots something then please holler!
Error message from Tasktracker (ran using start-all on master)
2012-12-19 09:33:03,161 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-12-19 09:33:03,316 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-12-19 09:33:03,320 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-12-19 09:33:03,320 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics system started
2012-12-19 09:33:03,888 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-12-19 09:33:04,502 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-12-19 09:33:04,755 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-12-19 09:33:04,799 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-12-19 09:33:04,807 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as hadoop
2012-12-19 09:33:04,813 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-hadoop/mapred/local
2012-12-19 09:33:04,826 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2012-12-19 09:33:04,856 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2012-12-19 09:33:04,857 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered.
2012-12-19 09:33:04,920 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2012-12-19 09:33:04,923 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort38644 registered.
2012-12-19 09:33:04,926 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort38644 registered.
2012-12-19 09:33:04,929 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2012-12-19 09:33:04,931 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 38644: starting
2012-12-19 09:33:04,931 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 38644: starting
2012-12-19 09:33:04,932 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 38644: starting
2012-12-19 09:33:04,932 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 38644: starting
2012-12-19 09:33:04,933 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 38644: starting
2012-12-19 09:33:04,935 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:38644
2012-12-19 09:33:04,935 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_10.77.26.116:localhost/127.0.0.1:38644
2012-12-19 09:33:05,980 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:06,982 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:07,985 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:08,987 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:09,989 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:10,991 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:11,994 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:12,996 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:13,998 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:15,001 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:15,004 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:33:17,009 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:18,011 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:19,013 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:20,015 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:21,018 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:22,020 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:23,022 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:24,026 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:25,033 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:26,036 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:26,039 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:33:28,044 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:29,045 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:30,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:31,051 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:32,055 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:33,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:34,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:35,063 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:36,071 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:37,073 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:37,083 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:33:39,086 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:40,094 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:41,097 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:42,101 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:43,104 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:44,107 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:45,113 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:46,118 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:47,122 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:48,131 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:48,134 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:33:50,137 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:51,140 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:52,143 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:53,145 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:54,148 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:55,151 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:56,154 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:57,158 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:58,161 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:59,167 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:59,169 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:34:01,173 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:34:02,175 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:34:03,178 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:34:04,181 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:34:05,183 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:34:06,189 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:34:07,191 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:34:08,193 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:34:09,195 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:34:10,196 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:34:10,199 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:34:12,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
MASTER hosts file
#127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
#10.77.42.2 ipdiscovermaster.cloudapp.net
ipdiscoverreg1.cloudapp.net
#10.76.174.108 ipdiscoverreg1.cloudapp.net
ipdiscovermaster.cloudapp.net
MASTER core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ipdiscovermaster.cloudapp.net:9000</value>
</property>
</configuration>
MASTER mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>ipdiscovermaster.cloudapp.net:9001</value>
</property>
</configuration>
MASTER masters file
ipdiscovermaster.cloudapp.net
MASTER slaves file
ipdiscovermaster.cloudapp.net
ipdiscoverreg1.cloudapp.net
SLAVE hosts file
#127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
#10.77.42.2 ipdiscovermaster.cloudapp.net
ipdiscoverreg1.cloudapp.net
ipdiscovermaster.cloudapp.net
#10.76.174.108 ipdiscoverreg1.cloudapp.net
SLAVE core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ipdiscovermaster.cloudapp.net:9000</value>
</property>
</configuration>
SLAVE mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>ipdiscovermaster.cloudapp.net:9001</value>
</property>
</configuration>
SLAVE masters file
ipdiscovermaster.cloudapp.net
You need to check following possibilities
i Am amusing you have check log on Datanode ( 192.168.135.111 slave01) Which is best way go get exact error
If you have formatted nameNode
i)delete temp data folder ..
ii)recreate it
iii)give all the permission to temp folder
iv)format namenode
v)start hadoop cluster
add the IP and hostname of the slave into the /etc/hosts file of the master machine and vice-versa. also, add dfs.data.dir and dfs.name.dir properties in your hdfs-site.xml file. these values default to /temp which gets emptied at restart. as a result you may loose information and face some problems on machine restart. make sure you have proper name resolution as this is really important for proper hadoop functioning.
I had similar problem with this. the logs just showing "retrying connect to server XXX". Here is what i did to solve this issue. Simply modify master & slave nodes /etc/hosts files particularly it's own hostname and corresponding IP. Dont bind hostname with 127.0.0.1:
original hosts file in master:
127.0.0.1 master
192.168.135.111 slave01
original hosts file in slave:
192.168.135.110 master
127.0.0.1 slave01
Resolved hosts file in master:
**192.168.135.110** master
192.168.135.111 slave
Resolve hosts file in slave:
192.168.135.110 master
**192.168.135.111** slave
I was following a tutorial to install hadoop: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Now I am stuck at the "Copy local example data to HDFS" step.
The connection error I get:
<12/10/26 17:29:16 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 0 time(s).
12/10/26 17:29:17 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 1 time(s).
12/10/26 17:29:18 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 2 time(s).
12/10/26 17:29:19 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s).
12/10/26 17:29:20 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s).
12/10/26 17:29:21 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s).
12/10/26 17:29:22 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 6 time(s).
12/10/26 17:29:23 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 7 time(s).
12/10/26 17:29:24 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 8 time(s).
12/10/26 17:29:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 9 time(s).
Bad connection to FS. command aborted. exception: Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused
which is pretty much the same to this question already:
Errors while running hadoop
The point now is, I have disabled the ivp6, as described there and in above tutorial, but it doesn't help. Is there something I have been missing?
EDIT:
I repeated the tutorial on a second machine with a freshly installed ubuntu and compared it step by step. It turned out, there was some bug in the bashrc configuration of the hduser. Afterwards it worked fine...
I get the exact error message if I try to do Hadoop fs <anything> when the DataNode/NameNode aren't running, so I would guess the same is happening for you.
Type jps in your terminal. If everything is running, it should look like:
16022 DataNode
16524 Jps
15434 TaskTracker
15223 JobTracker
15810 NameNode
16229 SecondaryNameNode
I would wager that you're DataNode or NameNode isn't running. If anything is missing from jps's print out, start it again.
after the whole configuration give this command
hadoop namenode -formate
and the start all services by this command
start-all.sh
this will solve your problem
go to your etc/hadoop/core-site.xml. check the value for fs.default.name
It should be as shown below.
{
fs.default.name
hdfs://localhost:54310
}
after the whole configuration give this command
hadoop namenode -format
the start all services by this command
start-all.sh
this will solve your problem .
Your namenode may be in safe mode ,run bin/hdfs dfsadmin -safemode leave or bin/hadoop dsfadmin -safemode leave
then follow step - 2 and step -3