I want to acces cloudera from a distant machine via Talend for big data. In order to do that i changed the ip of the host in cloudera by editing the file /etc/hosts and /etc/sysconfig/network.
I can acces cloudera from Talend. However the problem is that my datanode and Namenode seems to be not connected. When i check the log details of my Datanode i get the following errors :
Block pool BP-1183376682-127.0.0.1-1433878467265 (Datanode Uuid null) service to quickstart.cloudera/127.0.0.1:8022 beginning handshake with NN
Initialization failed for Block pool BP-1183376682-127.0.0.1-1433878467265 (Datanode Uuid null) service to quickstart.cloudera/127.0.0.1:8022 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=5802ab81-2c28-4beb-b155-cac31567d23a, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-83500942-6c65-4ecd-a0c2-a448be86bcb0;nsid=975591446;c=0)
the datanode still uses the wrong ip adress ( 127.0.0.1 ) even though i made the modifications in core-site.xml, hdfs-site.xml and mapred-site.xml by editing the previous ip adress by the new one.
I followed the steps given in this tutorial to do so :
https://www.youtube.com/watch?v=fSGpYHjGIRY
How can i fix this error ?
On Debian 8, the /etc/hosts will contain entry for 127.0.1.1 with your hostname you gave during Linux installation. Cloudera will use this IP-address for some of its services.
A regular HDFS will contain multiple servers with different hostnames/IP-addresses and will list those IPs as allowed. As your log says, the traffic is originating from 127.0.0.1, which is not the IP-address of your hostname.
For Cloudera single-server setup, the only way I found was to do the initial setup so, that /etc/hosts doesn't have the 127.0.1.1 entry in it.
Related
I'm using the cloudera distribution of Hadoop and recently had to change the IP addresses of a few nodes in the cluster. After the change, on one of the nodes (Old IP:10.88.76.223, New IP: 10.88.69.31) the following error comes up when I try to start the data node service.
Initialization failed for block pool Block pool BP-77624948-10.88.65.174-13492342342 (storage id DS-820323624-10.88.76.223-50010-142302323234) service to hadoop-name-node-01/10.88.65.174:6666
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(10.88.69.31, storageID=DS-820323624-10.88.76.223-50010-142302323234, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=cluster25;nsid=1486084428;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:656)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3593)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:899)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:91), I was unable to start the datanode service due to the following error:
Has anyone had success with changing the IP address of a hadoop data node and join it back to the cluster without data loss?
CHANGE HOST IP IN CLOUDERA MANAGER
Change Host IP on all node
sudo nano /etc/hosts
Edit the ip cloudera config.ini on all node if the master node ip changes
sudo nano /etc/cloudera-scm-agent/config.ini
Change IP in PostgreSQL Database
For the password Open PostgreSQL password
cat /etc/cloudera-scm-server/db.properties
Find the password lines
Example. com.cloudera.cmf.db.password=gUHHwvJdoE
Open PostgreSQL
psql -h localhost -p 7432 -U scm
Select table in PostgreSQL
select name,host_id,ip_address from hosts;
Update table IP
update hosts set ip_address = 'xxx.xxx.xxx.xxx' where host_id=x;
Exit the tool
\q
Restart the service on all node
service cloudera-scm-agent restart
Restart the service on master node
service cloudera-scm-server restart
Turns out its better to:
Decommission the server from the cluster to ensure that all blocks are replicated to other nodes in the cluster.
Remove the server from the cluster
Connect to the server and change the IP address then restart the cloudera agent
Notice that cloudera manager now shows two entries for this server. Delete the entry with the old IP and longest heartbeat time
Add the server to the required cluster and add required roles back to the server (e.g. HDFS datanode, HBASE RS, Yarn)
HDFS will read all data disks and recognize the block pool and cluster IDs, then register the datanode.
All data will be available and the process will be transparent to any client.
NOTE: If you run into name resolution errors from HDFS clients, the application has likely cached the old IP and will most likely need be restarted. Particularly Java clients that previously referenced this server e.g. HBASE clients must be restarted due to the JVM caching IPs indefinitely. Java based clients will likely throw errors relating to connectivity to the server with changed IP because they have the old IP cached until they are restarted.
I have a ubuntu server VM in virtual box(in Mac OSX). And I configured a Hadoop Cluster via docker: 1 master(172.17.0.3), 2 slave nodes(172.17.0.4, 172.17.0.6). After run "./sbin/start-dfs.sh" under Hadoop home folder, I found below error in datanode machine:
Datanode denied communication with namenode because hostname cannot be
resolved (ip=172.17.0.4, hostname=172.17.0.4): DatanodeRegistration(0.0.0.0,
datanodeUuid=4c613e35-35b8-41c1-a027-28589e007e78, infoPort=50075,
ipcPort=50020, storageInfo=lv=-55;cid=CID-9bac5643-1f9f-4bc0-abba-
34dba4ddaff6;nsid=1748115706;c=0)
Because docker does not support bidirectional name linking and further more, my docker version does not allow editing /etc/hosts file, So I use IP address to set name node and slaves. Following is my slaves file:
172.17.0.4
172.17.0.6
After searching on google and stackoverflow, no solution works for my problem. However I guess that Hadoop Namenode regard 172.17.0.4 as a "hostname", so it reports "hostname can not be resolved" where "hostname=172.17.0.4".
Any Suggestions?
Finally I got a solution, which proved my suppose:
1.upgrade my docker to 1.4.1, following instructions from: https://askubuntu.com/questions/472412/how-do-i-upgrade-docker.
2.write IP=>hostname mappings of master and slaves into /etc/hosts
3.use hostname instead of ip address in Hadoop slaves file.
4."run ./sbin/start-dfs.sh"
5.Done!
I configured properly two node cluster environment for Hadoop, and Master is also configured for datanode as well.
So currently I have two data nodes, without any issue I am able to start all the services in Master.
Slave datanode is also able to stop start from Master Node.
But when I am checking the health by using the url http://<IP>:50070/dfshealth.jsp Live node count is always showing only one not two.
Master Process:
~/hadoop-1.2.0$ jps
9112 TaskTracker
8805 SecondaryNameNode
9182 Jps
8579 DataNode
8887 JobTracker
8358 NameNode
Slave Process:
~/hadoop-1.2.0$ jps
18130 DataNode
18380 Jps
18319 TaskTracker
Please help me to know what I am doing wrong.
The second DataNode is running but not connecting to the NameNode. Chances are you re-formatted the NameNode and now have different version numbers in the NameNode and DataNode.
A fix is to manually delete the directory where the DataNode keeps its data (dfs.datanode.data.dir) and then reformat the NameNode. A less extreme one is to manually edit the version but for study purposes you can just axe the whole directory.
Finally I got the solution,
After #charles input I checked the Datanode logs and got the below error.
org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.157.132:8020. Already tried 9 time(s);
I was able to do the ssh but there was issue with telnet from datanode to master for 8020 port.
>telnet 192.168.157.132 8020
Trying 192.168.157.132...
telnet: connect to address 192.168.157.132: No route to host
I just added in iptables to allow the port 8020 by using below command and restarted the hadoop services and everything worked fine.
iptables -I INPUT 5 -p tcp --dport 8020 -j ACCEPT
It was the issue with firewall only.
Thanks to all for valuable inputs.
I'm using Hadoop MapReduce paradigm, and i need to get the NameNode IP address from the DataNode, can any one give me an idea how to do this?
Thanks.
Easiest way would be to quickly open the core-site.xml file under HADOOP_HOME/conf directory. The value of fs.default.name property will tell you the host and port where NN is running.
Delete the line 127.0.0.1 localhost in your /etc/host and put your IP and the name of all your machines. Hadoop is resolving all the IPs and names of machines on the cluster as 127.0.0.1 localhost if you leave the file as default.
I have a network with some weird (as I understand) DNS server which causes Hadoop or HBase to malfunction.
It resolves my hostname to some address my machine doesn't know about (i.e. there is no such interface).
Hadoop does work if I have following entries in /etc/hosts:
127.0.0.1 localhost
127.0.1.1 myhostname
If entry "127.0.1.1 myhostname" is not present uploading file to HDFS fails and complains that it can replicate the file only to 0 datanodes instead of 1.
But in this case HBase does not work: creating a table from HBase shell causes NotAllMetaRegionsOnlineException (caused actually by HMaster trying to bind to wrong address returned by DNS server for myhostname).
In other network, I am using following /etc/hosts:
127.0.0.1 localhost
192.168.1.1 myhostname
And both Hadoop and HBase work.
The problem is that in second network the address is dynamic and I can't list it into /etc/hosts to override result returned by weird DNS.
Hadoop is run in pseudo-distributed mode. HBase also runs on single node.
Changing behavior of DNS server is not an option.
Changing "localhost" to 127.0.0.1 in hbase/conf/regionservers doesn't change anything.
Can somebody suggest a way how can I override its behavior while retaining internet connection (I actually work at client's machine through Teamviewer). Or some way to configure HBase (or Zookeeper it is managing) not to use hostname to determine address to bind?
Luckily, I've found the workaround to this DNS server problem.
DNS server returned invalid address when queried by local hostname.
HBase by default does reverse DNS lookup on local hostname to determine where to bind.
Because the address returned by DNS server was invalid, HMaster wasn't able to bind.
Workaround:
In hbase/conf/hbase-site.xml explicitly specify interfaces that will be used for master and regionserver:
<configuration>
<property>
<name>hbase.master.dns.interface</name>
<value>lo</value>
</property>
<property>
<name>hbase.regionserver.dns.interface</name>
<value>lo</value>
</property>
</configuration>
In this case, I specified loopback interface (lo) to be used for both master and regionserver.
a simple tool I wrote to check for DNS issues:
https://github.com/sujee/hadoop-dns-checker