Apache Hadoop multi-node cluster failed not showing remote Datanode - hadoop

I'm having a hard time with setting up a multi-node cluster. I have a Razer running Ubuntu 20.04 and a IMAC running OSX Catalina. Razer is the host namenode and both the Razer and IMAC are set are the datanodes (slave workers). Both computers have SSH-keys replicated so they can SSH connect without any password. However, I'm having problems with showing the remote datanode from the IMAC as Live on my Hadoop dashboard. I can see the datanode live from the Razer I think it has something to do with my remote machine MAC not being able to connect to the HDFS which I set in the core-site.xml as hfds://hadoopmaster:9000.
RAZER = Hostname: Hadoopmaster
IMAC = Hostname: Hadoopslave
Based on some troubleshooting, I reviewed the datanode logs in the IMAC and saw that it is refusing to connect to hadoopmaster on port 9000.
2020-06-01 13:44:33,193 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 6 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:35,550 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 7 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:36,574 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 8 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:37,597 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 9 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:37,619 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem
connecting to server: hadoopmaster/192.168.1.191:8070
2020-06-01 13:44:44,660 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:45,534 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED
SIGNAL 15: SIGTERM
2020-06-01 13:44:45,537 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
Here are my settings:
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopmaster:8070</value>
</property>
</configuration>
So I think there's issues with connecting to port 9000 on my machine. So my next step was testing out the ssh connections in my terminal command window:
IMAC Command: ssh username#hadoopmaster -p 9000
Results:
Refused to connect
So my next step was performing the SSH command on my Razer machine:
Razer Command: ssh hadoopmaster -p 9000
Results:
Refused to connect
So I tried on my Razer to modify the UFW firewall to open port 9000, any to hadoopmaster, all ports, and still no luck.
Please help me have my remote machine IMAC connect to port 9000 on the Razer so I can create the hadoop cluster in my network and view the remote slave machines as live datanodes on the dashboard.

Related

Hadoop datanode -> namenode communication issue

I have a Vagrant machine running a local Hadoop installation. Hadoop was working fine until today. Today Vagrant's insecure SSH key stopped working so I had to replace it. Now Hadoop is not working. In the logs I see:
17/09/18 09:35:41 INFO ipc.Client: Retrying connect to server: mymachine/192.168.33.10:8020. Already tried 0 time(s); maxRetries=45
17/09/18 09:36:01 INFO ipc.Client: Retrying connect to server: mymachine/192.168.33.10:8020. Already tried 1 time(s); maxRetries=45
17/09/18 09:36:21 INFO ipc.Client: Retrying connect to server: mymachine/192.168.33.10:8020. Already tried 2 time(s); maxRetries=45
17/09/18 09:36:41 INFO ipc.Client: Retrying connect to server: mymachine/192.168.33.10:8020. Already tried 3 time(s); maxRetries=45
The claim here is that it's a datanode -> namenode communication issue. core-site.xml contains:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mymachine:8020</value>
</property>
</configuration>
Which is correct. Trying getent hosts mymachine yields 192.168.33.10, which means the host is ok. I tried sudo netstat -antp | grep 8020 and got:
tcp 0 1 10.0.2.15:42002 192.168.33.10:8020 SYN_SENT 2630/java
tcp 0 1 10.0.2.15:42004 192.168.33.10:8020 SYN_SENT 2772/java
tcp 0 1 10.0.2.15:41998 192.168.33.10:8020 SYN_SENT 3312/java
So it appears that the port is also ok. However, when I do curl http://mymachine:8020 I get no reply. I checked on an identical machine and the correct reply should be It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon..
Any ideas?
There are some answers in my opinion following:
1. Check if you can "ssh" to localhost without the password;
2. Check the authority when you start the hadoop;
3. It should be 127.0.0.1:8020 if running a local hadoop on your machine.Because the hadoop may run rightly while the network disconnecting...

Not able to run datanode in multinode hadoop cluster setup, need suggestion

I am trying to setup a multi-node hadoop cluster, however datanode is failing to start, need hep on this. Below are the details. No other setups done apart from this. I have only one data node and one name node setup as of now.
NAMENODE setup -
CORE-SITE.xml
<property>
<name>fs.defult.name</name>
<value>hdfs://192.168.1.7:9000</value>
</property>
HDFS-SITE.XML
<property>
<name>dfs.name.dir</name>
<value>/data/namenode</value>
</property>
DATANODE SETUP:
NAMENODE setup -
CORE-SITE.xml
<property>
<name>fs.defult.name</name>
<value>hdfs://192.168.1.7:9000</value>
</property>
HDFS-SITE.XML
<property>
<name>dfs.data.dir</name>
<value>/data/datanode</value>
</property>
When I run namenode it runs fine however when I try to run data node on other machine whos IP is 192.168.1.8 it fails and log says
2017-05-13 21:26:27,744 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2017-05-13 21:26:27,862 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2017-05-13 21:26:32,908 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:34,979 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:36,041 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:37,093 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:38,162 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:39,238 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
#
and datanode dies
Is anything there to setup ?
let me any other details required. Is there any other files to change? I am using centos7 to setup the env. I did formatting of namenode also more than 2-3 times, and also permissions are proper. Only connectivity issue however when I try to scp from master to slave (namenode to datanode) its works fine.
Suggest if there are any other setup to be done to make it successful!
There is a typo in the property name of your configuration. A 'a' is missing : fs.defult.name (vs fs.default.name).

Error when copying the file into HDFS

Hadoop cluster started normally and JPS shows datanodes and tasktracker running correctly.
When i copy a file into HDFS this is the error message i am getting.
hduser#nn:~$ hadoop fs -put gettysburg.txt /user/hduser/getty/gettysburg.txt
Warning: $HADOOP_HOME is deprecated.
14/08/24 21:12:50 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:51 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:52 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:53 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:54 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:55 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:56 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:57 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:58 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:59 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Bad connection to FS. command aborted. exception: Call to nn/10.10.1.1:54310 failed on connection exception: java.net.ConnectException: Connection refused
hduser#nn:~$
I am able to do ssh from NN to DNs and Viceverssa and between DNs.
I have changed the cd /etc/hosts in all NNs and DNs as below.
#127.0.0.1 localhost loghost localhost.project1.ch-geni-net.emulab.net
#10.10.1.1 NN-Lan NN-0 NN
#10.10.1.2 DN1-Lan DN1-0 DN1
#10.10.1.3 DN2-Lan DN2-0 DN2
#10.10.1.5 DN4-Lan DN4-0 DN4
#10.10.1.4 DN3-Lan DN3-0 DN3
10.10.1.1 nn
10.10.1.2 dn1
10.10.1.3 dn2
10.10.1.4 dn3
10.10.1.5 dn4
My mapredsite.xml looks like this.
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://nn:54310</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHE$
</property>
</configuration>
Configured cd /usr/local/hadoop/conf/master
hduser#nn:/usr/local/hadoop/conf$ vi masters
#localhost
nn
hduser#dn1:~$ jps
9975 DataNode
10186 Jps
10070 TaskTracker
hduser#dn1:~$
hduser#nn:~$ jps
5979 JobTracker
5891 SecondaryNameNode
6159 Jps
hduser#nn:~$
What is the problem?
Check your fs.default.name property in core-site.xml file. The value should be hdfs://NN:port.
Check the following :
core-site.xml - the hdfs url mentioned - hdfs://ip:port
Format namenode
Check if safemode is on

Hadoop-1.2.1 in Solaris 11.1 VM: Call to name-node failed on connection exception

Hi I am following this below guide in link for VirtualBox Solaris Zones Hadoop installation.
Oracle Solaris Zones Hadoop Setup
I was able to successfully follow till step 10. Once I tried to check report I am getting this error::
adoop#name-node:~$ hadoop dfsadmin -report
14/05/17 16:45:12 INFO ipc.Client: Retrying connect to server: name-node/192.168.1.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/17 16:45:13 INFO ipc.Client: Retrying connect to server: name-node/192.168.1.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
....
14/05/17 16:45:21 INFO ipc.Client: Retrying connect to server: name-node/192.168.1.1:8020. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
report: Call to name-node/192.168.1.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
hadoop#name-node:~$
can someone kindly suggest resolution.
Also netstat shows this
name-node.8021 . 0 0 128000 0 LISTEN
*.50030 . 0 0 128000 0 LISTEN
how to configure dfsadmin to port 8021 instead?
Step by step to configure Hadoop cluster on Oracle Solaris 11.1 using zones --- http://hashprompt.blogspot.com/2014/05/multi-node-hadoop-cluster-on-oracle.html
Probably this is too old question and you might have already solved it. But just in case if anyone is wondering.
in core-site.xml make the following changes
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.1.1:8021/</value>
</property>
This will configure name node server port.

Unable to add a datanode to Hadoop

I got all my settings right and I am able to run Hadoop ( 1.1.2 ) on a single-Node. However, after making the changes to the relevant files ( /etc/hosts, *-site.xml ), I am not able to add a Datanode to the cluster and I keep getting the following error on the Slave.
Anybody knows how to rectify this?
2013-05-13 15:36:10,135 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-05-13 15:36:11,137 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-05-13 15:36:12,140 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Check the value of fs.default.name in your core-site.xml conf file (on each node in your cluster). This needs to be the network name of the name node and i suspect you have this as hdfs://localhost:54310).
Failing that check for any mention of localhost in your hadoop configuration files on all nodes in your cluster:
grep localhost $HADOOP_HOME/conf/*.xml
try relpacing localhost with the namenode's ip address or network name

Resources