I setup a hadoop cluster with two nodes hadoop01(master- 10.0.0.151) and hadoop02(slaves- 10.0.0.152)
when a type start-dfs.sh and then visit website
my_ip(just 10.0.0.151 above):50070. It's successful.
But when i type start-yarn.sh then visit website my_ip:8088. It's failed.
my yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop01:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop01:8088</value>
</property>
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:8020</value>
</property>
</configuration>
the same set as hadoop02(slave)
my hadoop01 hadoop-2.2.0/etc/hadoop/slaves set
hadoop01
hadoop02
After typing start-dfs.sh & start-yarn.sh then i type jps
hadoop01:
21594 NameNode
22345 NodeManager
22007 SecondaryNameNode
22171 ResourceManager
23147 Jps
21762 DataNode
hadoop02:
29861 NodeManager
30358 Jps
29665 DataNode
my /etc/hosts in hadoop01:
localhost hadoop01
10.0.0.151 hadoop01
10.0.0.152 hadoop02
my /etc/hosts in hadoop02:
localhost hadoop02
10.0.0.151 hadoop01
10.0.0.152 hadoop02
This bellow link is my yarn-nodemanager.log I upload to google dirve
https://drive.google.com/file/d/0B7nCJ_XJWSrQN1BZVTVyOEgxd1E/edit?usp=sharing
but yarn-nodemanager.log doesn't appear ERROR if i didn't miss some information..
Please help me improve the problem why I can't visit the website http://10.0.0.151:8088
**if need other information (such as hdfs-site.xml...etc) just tell me. I'll update..
netstat -tunalp | grep LISTEN
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 17442/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 17442/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 17442/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 17693/java
tcp 0 0 10.0.0.151:8020 0.0.0.0:* LISTEN 17267/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 17267/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp6 0 0 :::13562 :::* LISTEN 21061/java
tcp6 0 0 10.0.0.151:8030 :::* LISTEN 20881/java
tcp6 0 0 10.0.0.151:8031 :::* LISTEN 20881/java
tcp6 0 0 10.0.0.151:8032 :::* LISTEN 20881/java
tcp6 0 0 10.0.0.151:8033 :::* LISTEN 20881/java
tcp6 0 0 :::33762 :::* LISTEN 21061/java
tcp6 0 0 :::8040 :::* LISTEN 21061/java
tcp6 0 0 :::8042 :::* LISTEN 21061/java
tcp6 0 0 :::22 :::* LISTEN -
tcp6 0 0 10.0.0.151:8088 :::* LISTEN 20881/java
After disable ipv6 , then i type netstat -tunalp | grep LISTEN
tcp 0 0 0.0.0.0:13562 0.0.0.0:* LISTEN 30608/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 29967/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 29967/java
tcp 0 0 10.0.0.151:8030 0.0.0.0:* LISTEN 30424/java
tcp 0 0 10.0.0.151:8031 0.0.0.0:* LISTEN 30424/java
tcp 0 0 0.0.0.0:52992 0.0.0.0:* LISTEN 30608/java
tcp 0 0 10.0.0.151:8032 0.0.0.0:* LISTEN 30424/java
tcp 0 0 10.0.0.151:8033 0.0.0.0:* LISTEN 30424/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 29967/java
tcp 0 0 0.0.0.0:8040 0.0.0.0:* LISTEN 30608/java
tcp 0 0 0.0.0.0:8042 0.0.0.0:* LISTEN 30608/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 30222/java
tcp 0 0 10.0.0.151:8020 0.0.0.0:* LISTEN 29790/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 29790/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp 0 0 10.0.0.151:8088 0.0.0.0:* LISTEN 30424/java
tcp6 0 0 :::22 :::* LISTEN -
the reason is you should change 10.0.0.151:8088 to 0.0.0.0:8088. the method is open the yarn-site.xml, add:
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>0.0.0.0:8088</value>
</property>
Here the problem is resouce manager is running, but the port(8030,8031,8032,8033,8088) occupied by resource manager uses tcp6 instead of tcp(see the left portion). You have two options either you can disable ipv6 in the linux system then restart yarn services.
or
Try modify your yarn-site.xml only on master node as follows. Don't modify the yarn-site.xml in slave nodes:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<configuration>
Specifying hostname causes the ports started as tcp6, even with out specifying those ports it would take the default value. Have a look at the following default ports
http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
After modiying yarn-site.xml restart your yarn-service
For me this from Hadoop: binding multiple IP addresses to a cluster NameNode:
In hdfs-site.xml, set the value of dfs.namenode.rpc-bind-host to
0.0.0.0 and Hadoop will listen on both the private and public network interfaces allowing remote access and datanode access.
and opening 8088 in the firewall settings worked.
This is related to https://issues.apache.org/jira/browse/HADOOP-605 ; Hadoop configuration scripts have added the -Djava.net.preferIPv4Stack=true flag to force IPv4 binding, but this is missing from the Yarn configuration script. You can fix this by adding at the end of bin/yarn (before the exec):
YARN_OPTS="$YARN_OPTS -Djava.net.preferIPv4Stack=true"
Related
I set up hadoop on two clusters, and In the master node when I tried to put file using:
hadoop fs -put test.txt /mydata/
I got the following error:
put: File /mydata/test.txt._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
When I typed hdfs dfsadmin -report it gave me the following information:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Then when I tried to access hdfs from datanode with hadoop fs -ls / it gives me the following information:
INFO ipc.Client: Retrying connect to server: master/172.31.81.91:10001. Already tried 0 time(s); maxRetries=45
I set up the instance on 2 aws-ubuntu instances and opened all TCP/IPV4 ports. I have the following setups:
On two setups:
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.31.81.91:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
</configuration>
/etc/hosts
127.0.0.1 localhost
172.31.81.91 master
172.31.45.232 slave-1
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
workers
172.31.45.232
And when I type jps I can get
master
12532 NameNode
12847 SecondaryNameNode
13599 Jps
datanode
5172 Jps
4810 DataNode
When I type sudo netstat -ntlp I can get:
master:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:9870 0.0.0.0:* LISTEN 12532/java
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 696/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1106/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 809/cupsd
tcp 0 0 172.31.81.91:9000 0.0.0.0:* LISTEN 12532/java
tcp 0 0 0.0.0.0:9868 0.0.0.0:* LISTEN 12847/java
tcp6 0 0 :::80 :::* LISTEN 1176/apache2
tcp6 0 0 :::22 :::* LISTEN 1106/sshd
tcp6 0 0 ::1:631 :::* LISTEN 809/cupsd
datanode:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:9864 0.0.0.0:* LISTEN 4810/java
tcp 0 0 0.0.0.0:9866 0.0.0.0:* LISTEN 4810/java
tcp 0 0 0.0.0.0:9867 0.0.0.0:* LISTEN 4810/java
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 691/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1142/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 854/cupsd
tcp 0 0 127.0.0.1:45029 0.0.0.0:* LISTEN 4810/java
tcp6 0 0 :::22 :::* LISTEN 1142/sshd
tcp6 0 0 ::1:631 :::* LISTEN 854/cupsd
I am using hadoop 3.1.3, any help would be appreciated! Thanks!
A quick answer from myself...After many tries
If using AWS, all ip should be public IPs.
in core-site.xml, use the public dns instead of IP
delete the data node file after format. Not sure why... But this really solve my problem.
Thanks for help!
I am trying to install cloudera CDH 5.14.0. and facing below error during Cluster Installation.
Installation failed. Failed to receive heartbeat from agent.
Ensure that the host's hostname is configured properly.
Ensure that port 7182 is accessible on the Cloudera Manager Server (check firewall rules).
Ensure that ports 9000 and 9001 are not in use on the host being added.
Check agent logs in /var/log/cloudera-scm-agent/ on the host being added. (Some of the logs can be found in the installation details).
If Use TLS Encryption for Agents is enabled in Cloudera Manager (Administration -> Settings -> Security), ensure that /etc/cloudera-scm-agent/config.ini has use_tls=1 on the host being added. Restart the corresponding agent and click the Retry link here.
Detail Log:
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO Neither verify_cert_file nor verify_cert_dir are configured. Not performing validation of server certificates in HTTPS communication. These options can be configured in this agent's config.ini file to enable certificate validation.
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO Agent Logging Level: INFO
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO No command line vars
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO Found database jar: /usr/share/java/mysql-connector-java.jar
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO Missing database jar: /usr/share/java/oracle-connector-java.jar (normal, if you're not using this database type)
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO Found database jar: /usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO Agent starting as pid 12771 user root(0) group root(0).
END (0)
end of agent logs.
scm agent restarted
Local ubuntu host details:
chaithu#chaithu:~$ hostname
chaithu
chaithu#chaithu:~$ hostname -f
chaithu
chaithu#chaithu:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 chaithu
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
chaithu#chaithu:~$ sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:7180 0.0.0.0:* LISTEN 1353/java
tcp 0 0 0.0.0.0:7182 0.0.0.0:* LISTEN 1353/java
tcp 0 0 0.0.0.0:4433 0.0.0.0:* LISTEN 12930/python2.7
tcp 0 0 127.0.1.1:53 0.0.0.0:* LISTEN 1162/dnsmasq
tcp 0 0 127.0.0.1:7190 0.0.0.0:* LISTEN 12930/python2.7
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1031/sshd
tcp 0 0 0.0.0.0:7191 0.0.0.0:* LISTEN 12930/python2.7
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 5112/cupsd
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 1077/postgres
tcp 0 0 127.0.0.1:19001 0.0.0.0:* LISTEN 1632/python
tcp 0 0 127.0.1.1:9000 0.0.0.0:* LISTEN 12771/python2.7
tcp 0 0 0.0.0.0:7432 0.0.0.0:* LISTEN 1434/postgres
tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN 1045/mysqld
tcp6 0 0 :::80 :::* LISTEN 1441/apache2
tcp6 0 0 :::4434 :::* LISTEN 12930/python2.7
tcp6 0 0 :::22 :::* LISTEN 1031/sshd
tcp6 0 0 :::7191 :::* LISTEN 12930/python2.7
tcp6 0 0 ::1:631 :::* LISTEN 5112/cupsd
tcp6 0 0 :::7432 :::* LISTEN 1434/postgres
udp 0 0 0.0.0.0:57656 0.0.0.0:* 902/avahi-daemon: r
udp 0 0 0.0.0.0:631 0.0.0.0:* 5113/cups-browsed
udp 0 0 0.0.0.0:5353 0.0.0.0:* 2703/chrome
udp 0 0 0.0.0.0:5353 0.0.0.0:* 902/avahi-daemon: r
udp 0 0 0.0.0.0:39330 0.0.0.0:* 1162/dnsmasq
udp 0 0 0.0.0.0:7191 0.0.0.0:* 12930/python2.7
udp 0 0 127.0.1.1:53 0.0.0.0:* 1162/dnsmasq
udp 0 0 0.0.0.0:68 0.0.0.0:* 1117/dhclient
udp6 0 0 :::5353 :::* 2703/chrome
udp6 0 0 :::5353 :::* 902/avahi-daemon: r
udp6 0 0 :::46382 :::* 902/avahi-daemon: r
udp6 0 0 :::7191 :::* 12930/python2.7
I am trying to install CDH only on single node local machine. Do let me know if any more info required.
I'm a total hadoop newbee. I set up hadoop on two machines master and slave following this tutorial (I obtained the same error following this other tutorial).
Problem: After starting dfs and yarn, the only node appearing on localhost:50070 is the master, even if the right processes are running on the master (NameNode, DataNode, SecondaryNameNode, ResourceManager) and on the slave (DataNode).
The nodemanager log of the slave reports: INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.0.14:8031. Already tried 10 times.
Note that:
I have already edited yarn-site.xml following this thread.
I have disabled the firewall on the master
I ran netstat -anp | grep 8031 on the master and confirmed that there are a couple of processes listening on port 8031 using tcp.
I suffered from the same problem, and these steps may help.
Make sure your namenodes can negotiate with your datanodes.
hadoop#namenode-01:~$ jps
8678 NameNode
9530 WebAppProxyServer
9115 ResourceManager
8940 SecondaryNameNode
9581 Jps
hadoop#datanode-01:~$ jps
8592 NodeManager
8715 Jps
8415 DataNode
Check your nodemanager-datanode-01.log from $HADOOP_HOME/logs reports.
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode-01/192.168.4.2:8031. Already tried 8 time(s);
Check if your yarn.resourcemanager.resource-tracker.address listen on ipv6 instead of ipv4.
hadoop#namenode-01:~$ netstat -natp
tcp6 0 0 127.0.2.1:8088 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8030 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8031 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8032 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8033 :::* LISTEN 9115/java
If your yarn addresses listen on ipv6, maybe you should disable ipv6 first.
sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1
Finally, the default yarn addresses are like this:
yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8031
Check your /ect/hosts to avoid misconfigurations.
hadoop#namenode-01:~$ cat /etc/hosts
127.0.2.1 namenode-01 namenode-01 (for example, this line config hostname as 127.0.2.1, delete this line)
192.168.4.2 namenode-01
192.168.4.3 datanode-01
192.168.4.4 datanode-02
192.168.4.5 datanode-03
I installed a new hadoop-2.2.0 today, I found after hdfs is started (using /sbin/start-dfs.sh), namenode and datanode both always listen on 0.0.0.0 at a random port? I cannot find relate configuration on http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml.
The port is not 50070, 50470, 50090, 50010, 50020, 50075, 50475...etc. which are listed on hdfs-default.xml, it is just a random port.
8369 Jps
8109 DataNode
7936 NameNode
Namenode listens on the followings:
tcp 0 0 0.0.0.0:46628 0.0.0.0:* LISTEN 7936/java <==
tcp 0 0 10.173.130.119:9000 0.0.0.0:* LISTEN 7936/java
tcp 0 0 10.173.130.119:50070 0.0.0.0:* LISTEN 7936/java
Datanode listens on the followings:
tcp 0 0 10.173.130.119:50020 0.0.0.0:* LISTEN 8109/java
tcp 0 0 0.0.0.0:35114 0.0.0.0:* LISTEN 8109/java <==
tcp 0 0 10.173.130.119:50010 0.0.0.0:* LISTEN 8109/java
tcp 0 0 10.173.130.119:50075 0.0.0.0:* LISTEN 8109/java
Thanks for any advise.
Yes It assigns random port everytime a namenode or datanode restarted. But if you observer all the namenode listeners running on same process id in this case (7936) and Datanode listeners run on same process id ie 8109. So internally process is same.
I have successfully setup a Hadoop cluster with 6 nodes (master, salve<1-5>)
Formatted the namenode -> done
Starting up and shutting down cluster -> works fine
Executing "hadoop dfs -ls /" gives this error -> Error: INFO ipc.Client: Retrying connect to server: localhost
I tried to see the services running using:
sudo netstat -plten | grep java
hduser#ubuntu:~$ sudo netstat -plten | grep java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1000 93307 11384/java
tcp 0 0 0.0.0.0:44440 0.0.0.0:* LISTEN 1000 92491 11571/java
tcp 0 0 0.0.0.0:40633 0.0.0.0:* LISTEN 1000 92909 11758/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1000 93449 11571/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1000 93673 11571/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1000 93692 11571/java
tcp 0 0 127.0.0.1:40485 0.0.0.0:* LISTEN 1000 93666 12039/java
tcp 0 0 0.0.0.0:44582 0.0.0.0:* LISTEN 1000 93013 11852/java
tcp 0 0 10.42.43.1:54310 0.0.0.0:* LISTEN 1000 92471 11384/java
tcp 0 0 10.42.43.1:54311 0.0.0.0:* LISTEN 1000 93290 11852/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1000 93460 11758/java
tcp 0 0 0.0.0.0:34154 0.0.0.0:* LISTEN 1000 92179 11384/java
tcp 0 0 0.0.0.0:50060 0.0.0.0:* LISTEN 1000 94200 12039/java
tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1000 93550 11852/java
Its the master IP binded to port 54310 and 54311 and not the localhost(loopback).
The conf-site.xml has been properly configured:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
</property>
</configuration>
Why is it expecting localhost to be binded to 54310 rather than the master hich I have configured here. Help appreciated. How do I resolve this ??
Cheers
Apparently, someone added the older hadoop(1.0.3) bin directory into the path variable before I had added the new hadoop(1.0.4) bin directory. And thus whenever I ran "hadoop" from the CLI, it executed the binaries of the older hadoop rather than the new one.
Solution:
Remove the entire bin path of older hadoop
Shutdown cluster - Exit terminal
Login in new terminal session
Startup node
Tried hadoop dfs -ls / -> Works fine !!!! Good lesson learnt.
Looks like many people ran into this problem.
There might be no need to change /etc/hosts, and make sure you can access master and slave from each other, and your core-site.xml are the same pointing to the right master node and port number.
Then run $HADOOP/bin/stop-all.sh, $HADOOP/bin/start-all.sh on master node ONLY. (If run on slave might lead to problems). Use JPS to check whether all services are there as follows.
On master node:
4353 DataNode
4640 JobTracker
4498 SecondaryNameNode
4788 TaskTracker
4989 Jps
4216 NameNode
On slave node:
3143 Jps
2827 DataNode
2960 TaskTracker
In addition, check your firewall rules between namenode and datanode