I am trying to setup a multi-node hadoop cluster, however datanode is failing to start, need hep on this. Below are the details. No other setups done apart from this. I have only one data node and one name node setup as of now.
NAMENODE setup -
CORE-SITE.xml
<property>
<name>fs.defult.name</name>
<value>hdfs://192.168.1.7:9000</value>
</property>
HDFS-SITE.XML
<property>
<name>dfs.name.dir</name>
<value>/data/namenode</value>
</property>
DATANODE SETUP:
NAMENODE setup -
CORE-SITE.xml
<property>
<name>fs.defult.name</name>
<value>hdfs://192.168.1.7:9000</value>
</property>
HDFS-SITE.XML
<property>
<name>dfs.data.dir</name>
<value>/data/datanode</value>
</property>
When I run namenode it runs fine however when I try to run data node on other machine whos IP is 192.168.1.8 it fails and log says
2017-05-13 21:26:27,744 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2017-05-13 21:26:27,862 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2017-05-13 21:26:32,908 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:34,979 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:36,041 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:37,093 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:38,162 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2017-05-13 21:26:39,238 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/192.168.1.7:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
#
and datanode dies
Is anything there to setup ?
let me any other details required. Is there any other files to change? I am using centos7 to setup the env. I did formatting of namenode also more than 2-3 times, and also permissions are proper. Only connectivity issue however when I try to scp from master to slave (namenode to datanode) its works fine.
Suggest if there are any other setup to be done to make it successful!
There is a typo in the property name of your configuration. A 'a' is missing : fs.defult.name (vs fs.default.name).
Related
I'm having a hard time with setting up a multi-node cluster. I have a Razer running Ubuntu 20.04 and a IMAC running OSX Catalina. Razer is the host namenode and both the Razer and IMAC are set are the datanodes (slave workers). Both computers have SSH-keys replicated so they can SSH connect without any password. However, I'm having problems with showing the remote datanode from the IMAC as Live on my Hadoop dashboard. I can see the datanode live from the Razer I think it has something to do with my remote machine MAC not being able to connect to the HDFS which I set in the core-site.xml as hfds://hadoopmaster:9000.
RAZER = Hostname: Hadoopmaster
IMAC = Hostname: Hadoopslave
Based on some troubleshooting, I reviewed the datanode logs in the IMAC and saw that it is refusing to connect to hadoopmaster on port 9000.
2020-06-01 13:44:33,193 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 6 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:35,550 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 7 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:36,574 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 8 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:37,597 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 9 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:37,619 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem
connecting to server: hadoopmaster/192.168.1.191:8070
2020-06-01 13:44:44,660 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoopmaster/192.168.1.191:8070. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-06-01 13:44:45,534 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED
SIGNAL 15: SIGTERM
2020-06-01 13:44:45,537 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
Here are my settings:
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopmaster:8070</value>
</property>
</configuration>
So I think there's issues with connecting to port 9000 on my machine. So my next step was testing out the ssh connections in my terminal command window:
IMAC Command: ssh username#hadoopmaster -p 9000
Results:
Refused to connect
So my next step was performing the SSH command on my Razer machine:
Razer Command: ssh hadoopmaster -p 9000
Results:
Refused to connect
So I tried on my Razer to modify the UFW firewall to open port 9000, any to hadoopmaster, all ports, and still no luck.
Please help me have my remote machine IMAC connect to port 9000 on the Razer so I can create the hadoop cluster in my network and view the remote slave machines as live datanodes on the dashboard.
I have been following some instructions on setting-up a vagrant single-node cluster and have been through the instrustions once without issue. However, I am running into several problems when trying to repeat the same instructions. Now I am getting a connection refused when trying to run hadoop fs -ls /
$ hadoop fs -ls /
Warning: $HADOOP_HOME is deprecated.
15/01/18 04:09:21 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:22 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:23 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:24 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:26 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:27 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
15/01/18 04:09:30 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
ls: Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused
When I run jps I get the following:
$jps
4176 JobTracker
4313 TaskTracker
3970 DataNode
4581 Jps
4094 SecondaryNameNode
I'm really at a loss as to what I have missed to cause this different behavior. Any help would be greatly appreciated.
Looks like your namenode is not up. Make sure u format the namenode
stop the cluster and stop all daemons
format the namenode
Once you format, try starting namenode first and other daemons.
By looking at the daemons running there is no name node. I would suggest you go ahead and restart all the daemons. Below are the commands which just restart the daemons and name node will be up and running. Hope this helps!
sudo service hadoop-master stop
sudo service hadoop-master start
hadoop dfsadmin -safemode leave
sudo jps
Hadoop cluster started normally and JPS shows datanodes and tasktracker running correctly.
When i copy a file into HDFS this is the error message i am getting.
hduser#nn:~$ hadoop fs -put gettysburg.txt /user/hduser/getty/gettysburg.txt
Warning: $HADOOP_HOME is deprecated.
14/08/24 21:12:50 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:51 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:52 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:53 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:54 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:55 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:56 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:57 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:58 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/24 21:12:59 INFO ipc.Client: Retrying connect to server: nn/10.10.1.1:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Bad connection to FS. command aborted. exception: Call to nn/10.10.1.1:54310 failed on connection exception: java.net.ConnectException: Connection refused
hduser#nn:~$
I am able to do ssh from NN to DNs and Viceverssa and between DNs.
I have changed the cd /etc/hosts in all NNs and DNs as below.
#127.0.0.1 localhost loghost localhost.project1.ch-geni-net.emulab.net
#10.10.1.1 NN-Lan NN-0 NN
#10.10.1.2 DN1-Lan DN1-0 DN1
#10.10.1.3 DN2-Lan DN2-0 DN2
#10.10.1.5 DN4-Lan DN4-0 DN4
#10.10.1.4 DN3-Lan DN3-0 DN3
10.10.1.1 nn
10.10.1.2 dn1
10.10.1.3 dn2
10.10.1.4 dn3
10.10.1.5 dn4
My mapredsite.xml looks like this.
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://nn:54310</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHE$
</property>
</configuration>
Configured cd /usr/local/hadoop/conf/master
hduser#nn:/usr/local/hadoop/conf$ vi masters
#localhost
nn
hduser#dn1:~$ jps
9975 DataNode
10186 Jps
10070 TaskTracker
hduser#dn1:~$
hduser#nn:~$ jps
5979 JobTracker
5891 SecondaryNameNode
6159 Jps
hduser#nn:~$
What is the problem?
Check your fs.default.name property in core-site.xml file. The value should be hdfs://NN:port.
Check the following :
core-site.xml - the hdfs url mentioned - hdfs://ip:port
Format namenode
Check if safemode is on
I am trying to copy a file quangle.txt from my localsystem to Hadoop using the command below:
testuser#ubuntu:~/Downloads/hadoop/bin$ ./hadoop fs -copyFromLocal Desktop/quangle.txt hdfs://localhost/testuser/quangle.txt
13/11/28 06:35:50 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:51 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:52 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:53 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:54 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
copyFromLocal: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
I tried to ping 127.0.0.1 and I got the response. Please advice
just add correct port to the filepath after localhost:
hdfs://localhost:9000/testuser/quangle.txt
Looks like your Name node isn't running - try running the jps cmd and see if NameNode is listed in the running services (or you might have to run ps axww | grep NameNode if the NameNode was started by/under a different user)
Does sudo netstat -atnp | grep 8020 yield any results?
If the Name Node is refusing to start then copy in your Name Node logs into to your original question (or post a new question - after searching for the error first of all to see if someone else has had this problem)
Try running jps to see the currently running Java processes.
Are all Hadoop processes running, especially the Namemode?
If yes, you should get this output (with different process ids):
10015 JobTracker
9670 TaskTracker
9485 DataNode
10380 Jps
9574 SecondaryNameNode
9843 NameNode
I think you can use hadoop fs -put ~/Desktop/quangle.txt /testuser, after copied, you can look up it via hadoop fs -ls /testuser in the /testuser directory
you create Desktop and others with the command hadoop fs -mkdir testuser and then try, it worked for me that way
Maybe there is something wrong with your setting for Pseudodistributed Mode.
It should be configured in this order:
fill up the configuration files:core-site.xml, hdfs-site.xml,
mapred-site.xml, yar-site.xml.
Configuring SSH
Formatting the
HDFS filesystem
Starting and stopping the daemons
I got all my settings right and I am able to run Hadoop ( 1.1.2 ) on a single-Node. However, after making the changes to the relevant files ( /etc/hosts, *-site.xml ), I am not able to add a Datanode to the cluster and I keep getting the following error on the Slave.
Anybody knows how to rectify this?
2013-05-13 15:36:10,135 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-05-13 15:36:11,137 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-05-13 15:36:12,140 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Check the value of fs.default.name in your core-site.xml conf file (on each node in your cluster). This needs to be the network name of the name node and i suspect you have this as hdfs://localhost:54310).
Failing that check for any mention of localhost in your hadoop configuration files on all nodes in your cluster:
grep localhost $HADOOP_HOME/conf/*.xml
try relpacing localhost with the namenode's ip address or network name