Hadoop UI shows only one Datanode - hadoop

I've started hadoop cluster composed of on master and 4 slave nodes.
Configuration seems ok:
hduser#ubuntu-amd64:/usr/local/hadoop$ ./bin/hdfs dfsadmin -report
When I enter NameNode UI (http://10.20.0.140:50070/) Overview card seems ok - for example total Capacity of all Nodes sumes up.
The problem is that in the card Datanodes I see only one datanode.

I came across the same problem, fortunately, I solved it. I guess it causes by the 'localhost'.
Config different name for these IP in /etc/host
Remember to restart all the machines, things will go well.

It's because of the same hostname in both datanodes.
In your case both datanodes are registering to the namenode with same hostname ie 'localhost' Try with different hostnames it will fix your problem.
in UI it will show only one entry for a hostname.
in "hdfs dfsadmin -report" output you can see both.

The following tips may help you
Check the core-site.xml and ensure that the namenode hostname is correct
Check the firewall rules in namenode and datanodes and ensure that the required ports are open
Check the logs of datanodes
Ensure that all the datanodes are up and running

As #Rahul said the problem is because of the same hostname
change your hostname in /etc/hostname file and give different hostname for each host
and resolve hostname with ip address /etc/hosts file
then restart your cluster you will see all datanodes in Datanode information tab on browser

I have the same trouble because I use ip instead of hostname, [hdfs dfsadmin -report] is correct though it is only one[localhost] in UI. Finally, I solved it like this:
<property>
       <name>dfs.datanode.hostname</name>                   
       <value>the name you want to show</value>
</property>
you almost can't find it in any doucument...

Sorry, feels like it's been a time. But still I'd like to share my answer:
the root cause is from hadoop/etc/hadoop/hdfs-site.xml:
the xml file has a property named dfs.datanode.data.dir. If you set all the datanodes with the same name, then hadoop is assuming the cluster has only one datanode. So the proper way of doing it is naming every datanode with a unique name:
Regards,
YUN HANXUAN

Your admin report looks absolutely fine. Please run the below to check the HDFS disk space details.
"hdfs dfs -df /"
If you still see the size being good, its just a UI glitch.

My Problems: I have 1 master node and 3 slave nodes. when I start all nodes by start-all. sh and accessing the dashboard of master nodes. I was able to see only one data node on the web UI.
My Solution:
Try to stop the Firewall temporary by sudo systemctl stop firewalld. if you do not want to stop your firewalld service then r allow the ports of the data node by
sudo firewall-cmd --permanent --add-port{PORT_Number/tcp,PORT_number2/tcp} ; sudo firewall-cmd --reload
If you are using sapretae user for Hadoop in my case I am using hadoop user to manage hadoop daemons then change the owner on your dataNode and nameNode file by. sudo chown hadoop:hadoop /opt/data -R
My hdfs-site.xml config as given in image
Check your daemons on data node by jps command. it should show as given in the below image.
jps Output

Related

Cannot connect slave1:8088 in hadoop 2.7.2

I am new in hadoop and I have installed hadoop 2.7.2 into two machines which are master and slave1. I have followed this tutorial. It was not mentioned in the tutorial but I have also edited JAVA_HOME and HADOOP_CONF_DIR variables in hadoop-env.sh. At the end I have two machines hadoop installed. In master NameNode, DataNode, SecondaryNameNode, ResourceManager and NodeManager are running and in slave1 DataNode and NodeManager are running.
I am able to go to master:8088 in the browser and when I go http://master:8088/cluster/nodes, there is only master node here. I am not able to go isci17:8088 and that is not a live node. Why could it be?
Port 8088 is the resource manager web ui port, so if it is running on master you probably won't have it on the slave.
You should be able to also go to the name node web ui on port 50070 on your name node as well to see status such as http://master:50070/ and the MapReduce JobHistory Server at http://hostname:19888/ for a web ui.
If you have access to a terminal session you run the following command on each server as root/sudo user to see which ports are listening on which server in a Linux terminal session;
sudo lsof -i tcp | grep -i LISTEN
You also also run hadoop cli commands that will give you info;
You can run the following to check hadoops ports in a terminal session.
hdfs portmap
Other Health checks on command line;
hdfs classpath
hdfs getconf -namenodes
hdfs dfsadmin -report -live
hdfs dfsadmin -report -dead
hdfs dfsadmin -printTopology
Depending on if the hadoop cli command works automatically you might have to find the executable to run ./hdfs. Also depending on distro/version you might have to replace the command hdfs with the command hadoop.
If you want to see your cluster configurations check your /etc/hadoop/conf folder along with /etc/hadoop/hive . You will find about 5-10 *-site.xml files. There configuration files contain your clusters configuration with the hostnames and ports.

Hadoop: ...be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation

I'm getting the following error when attempting to write to HDFS as part of my multi-threaded application
could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
I've tried the top-rated answer here around reformatting but this doesn't work for me: HDFS error: could only be replicated to 0 nodes, instead of 1
What is happening is this:
My application consists of 2 threads each one configured with their own Spring Data PartitionTextFileWriter
Thread 1 is the first to process data and this can successfully write to HDFS
However, once Thread 2 starts to process data I get this error when it attempts to flush to a file
Thread 1 and 2 will not be writing to the same file, although they do share a parent directory at the root of my directory tree.
There are no problems with disk space on my server.
I also see this in my name-node logs, but not sure what it means:
2016-03-15 11:23:12,149 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 1 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2016-03-15 11:23:12,150 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=1, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
2016-03-15 11:23:12,150 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 1 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable: unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
2016-03-15 11:23:12,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.104.247.78:52004 Call#61 Retry#0
java.io.IOException: File /metrics/abc/myfile could only be replicated to 0 nodes instead of [2016-03-15 13:34:16,663] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 1 milliseconds. (kafka.coordinator.GroupMetadataManager)
What could be the cause of this error?
Thanks
This error is caused by the block replication system of HDFS since it could not manage to make any copies of a specific block within the focused file. Common reasons of that:
Only a NameNode instance is running and it's not in safe-mode
There is no DataNode instances up and running, or some are dead. (Check the servers)
Namenode and Datanode instances are both running, but they cannot communicate with each other, which means There is connectivity issue between DataNode and NameNode instances.
Running DataNode instances are not able to talk to the server because of some networking of hadoop-based issues (check logs that include datanode info)
There is no hard disk space specified in configured data directories for DataNode instances or DataNode instances have run out of space. (check dfs.data.dir // delete old files if any)
Specified reserved spaces for DataNode instances in dfs.datanode.du.reserved is more than the free space which makes DataNode instances to understand there is no enough free space.
There is no enough threads for DataNode instances (check datanode logs and dfs.datanode.handler.count value)
Make sure dfs.data.transfer.protection is not equal to “authentication” and dfs.encrypt.data.transfer is equal to true.
Also please:
Verify the status of NameNode and DataNode services and check the related logs
Verify if core-site.xml has correct fs.defaultFS value and hdfs-site.xml has a valid value.
Verify hdfs-site.xml has dfs.namenode.http-address.. for all NameNode instances specified in case of PHD HA configuration.
Verify if the permissions on the directories are correct
Ref: https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
Ref: https://support.pivotal.io/hc/en-us/articles/201846688-HDFS-reports-Configured-Capacity-0-0-B-for-datanode
Also, please check: Writing to HDFS from Java, getting "could only be replicated to 0 nodes instead of minReplication"
Another reason could be that your Datanode machine hasn't exposed the port(50010 by default). In my case, I was trying to write a file from Machine1 to HDFS running on a Docker container C1 which was hosted on Machine2.
For the host machine to forward the requests to the services running on the container, the port forwarding should be taken care of. I could resolve the issue after forwarding the port 50010 from host machine to guest machine.
Check if the jps command on the computers which run the datanodes show that the datanodes are running. If they are running, then it means that they could not connect with the namenode and hence the namenode thinks there are no datanodes in the hadoop system.
In such a case, after running start-dfs.sh, run netstat -ntlp in the master node. 9000 is the port number most tutorials tells you to specify in core-site.xml. So if you see a line like this in the output of netstat
tcp 0 0 120.0.1.1:9000 0.0.0.0:* LISTEN 4209/java
then you have a problem with the host alias. I had the same problem, so I'll state how it was resolved.
This is the contents of my core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://vm-sm:9000</value>
</property>
</configuration>
So the vm-sm alias in the master computer maps to the 127.0.1.1. This is because of the setup of my /etc/hosts file.
127.0.0.1 localhost
127.0.1.1 vm-sm
192.168.1.1 vm-sm
192.168.1.2 vm-sw1
192.168.1.3 vm-sw2
Looks like the core-site.xml of the master system seemed to have mapped on the the 120.0.1.1:9000 while that of the worker nodes are trying to connect through 192.168.1.1:9000.
So I had to change the alias of the master node for the hadoop system (just removed the hyphen) in the /etc/hosts file
127.0.0.1 localhost
127.0.1.1 vm-sm
192.168.1.1 vmsm
192.168.1.2 vm-sw1
192.168.1.3 vm-sw2
and reflected the change in the core-site.xml, mapred-site.xml, and slave files (wherever the old alias of the master occurred).
After deleting the old hdfs files from the hadoop location as well as the tmp folder and restarting all nodes, the issue was solved.
Now, netstat -ntlp after starting DFS returns
tcp 0 0 192.168.1.1:9000 0.0.0.0:* LISTEN ...
...
I had the same error, re-starting hdfs services solved this issue. ie re-started NameNode and DataNode services.
In my case it was a storage policy of output path set to COLD.
How to check settings of your folder:
hdfs storagepolicies -getStoragePolicy -path my_path
In my case it returned
The storage policy of my_path
BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
I dumped the data else where (to HOT storage) and the issue went away.
You may leave HDFS safe mode:
hdfs dfsadmin -safemode forceExit
I had a similar issue recently. As my datanodes (only) had SSDs for storage, I put [SSD]file:///path/to/data/dir for the dfs.datanode.data.dir configuration. Due to the logs containing unavailableStorages=[DISK] I removed the [SSD] tag, which solved the problem.
Apparently, Hadoop uses [DISK] as default Storage Type, and does not 'fallback' (or rather 'fallup') to using SSD if no [DISK] tagged storage location is available. I could not find any documenation on this behaviour though.
I too had the same error, then i have changed the block size. This came to resolve the problem.
In my case the problem was hadoop temporary files
Logs were showing the following error:
2019-02-27 13:52:01,079 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /tmp/hadoop-i843484/dfs/data/in_use.lock acquired by nodename 28111#slel00681841a
2019-02-27 13:52:01,087 WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /tmp/hadoop-i843484/dfs/data: namenode clusterID = CID-38b0104b-d3d2-4088-9a54-44b71b452006; datanode clusterID = CID-8e121bbb-5a08-4085-9817-b2040cd399e1
I solved by removing hadoop tmp files
sudo rm -r /tmp/hadoop-*
Got this error as Data Node was not running. To resolve this on VM
Removed Name/Data Node directories
Re-Created the directories
Formatted the name node & data node(not required)hadoop namenode -format
Restarted the service start-dfs.sh
Now jps shows both Name & Data nodes and Sqoop job worked successfully
maybe the number of your DataNode is too small(less than 3), I put 3 ip-address in hadoop/etc/hadoop/slaves, and it works!
1.check your firewall status, you can simply stop firewall in both master and slaves:systemctl stop firewalld. Which fixed my problem.
2.delete namenode and reformat it: delete namenode dir and datanode dir both.(as my slaves computer didn't shutdown normally, causing my datanode broken) then call hdfs namenode -format`.
call jps in both master and slaves. make sure master have namenode and slaves have datanode.

Hadoop datanode services is not starting in the slaves in hadoop

I am trying to configure hadoop-1.0.3 multinode cluster with one master and two slave in my laptop using vmware workstation.
when I ran the start-all.sh from master all daemon process running in master node (namenode,datanode,tasktracker,jobtracker,secondarynamenode) but Datanode and tasktracker is not starting on slave node. Password less ssh is enabled and I can do ssh for both master and slave from my masternode without pwd.
Please help me resolve this.
Stop the cluster.
If you have specifically defined tmp directory location in core-site.xml, then remove all files under those directory.
If you have specifically defined data node and namenode directory in hdfs-site.xml, then delete all the files under those directories.
If you have not defined anything in core-site.xml or hdfs-site.xml, then please remove all the files under /tmp/hadoop-*nameofyourhadoopuser.
Format the namenode.
It should work!

hadoop 2 node cluster setup

Can anyone please tell how to create a hadoop 2 node cluster ?i have 2 Virual machines both have hadoop single node setup installed properly,please tell how to form 2 node cluster from that ,and mention proper network setup for that.Thanks in Advance.i am using hadoop -1.0.3 version.
Hi can you do one thing for network configuration of vms.
Make sure of the network connect of virtual machine setting of hostonly.
And add two hostname and ips in /etc/hosts file.
add two hostnames in $HADOOP_HOME/conf/slaves file.
Make sure of secondary namenode hostname in $HADOOP_HOME/conf/masters file.
Then make sure same configuration on two machine.
hadoop namenode -format
start-all.sh
Best of luck.

Multi Node Cluster Hadoop Setup

Pseudo-Distributed single node cluster implementation
I am using window 7 with CYGWIN and installed hadoop-1.0.3 successfully. I still start services job tracker,task tracker and namenode on port (localhost:50030,localhost:50060 and localhost:50070).I have completed single node implementation.
Now I want to implement Pseudo-Distributed multiple node cluster . I don't understand how to divide in master and slave system through network ips?
For your ssh problem just follow the link of single node cluster :
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
and yes, you need to specify the ip's of master and slave in conf file
for that you can refer this url :
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
I hope this helps.
Try to create number of VM you want to add in your cluster.Make sure that those VM are having the same hadoop version .
Figure out the IPs of each VM.
you will find files named master and slaves in $HADOOP_HOME/conf mention the IP of VM to conf/master file which you want to treat as master and and do the same with conf/slaves
with slave nodes IP.
Make sure these nodes are having Passwordless-ssh connection.
Format your namenode and then run start-all.sh.
Thanks,

Resources