HDFS showing heartbeat lost for nodes - hadoop

I am new to HDP sandbox. I installed HDP sandbox in the Virtual machine. After installation when I start http://localhost:1080 it connects to Ambari but here it shows alerts for errors. one such error is Connection failed: [Errno 111] Connection refused to sandbox-hdp.hortonworks.com:50010. none of the nodes are showing a heartbeat in Ambari.

The sandbox intentionally does not start any of the Hadoop services. Only the Ambari webserver and SSH server are started by default. You need to manually enable any other services, such as Namenode, Datanodes, YARN, HiveServer, etc.

Related

ResourceManager does not start

I installed Hadoop (HDP 2.5.3) on 4 VMs with Ambari (1 Ambari Server and 3 Ambari Clients; with the DNS entries server, node0, node1, node2) with HDFS, YARN, MapReduce and Zookeeper.
However, YARN doesn't want to start. When starting the Resource Manager on node1 I get the following error:
resource_management.core.exceptions.ExecutionFailed: Execution of 'curl -sS -L -w '%{http_code}' -X GET 'http://node0:50070/webhdfs/v1/ats/done/?op=GETFILESTATUS&user.name=hdfs' 1>/tmp/tmpgsiRLj 2>/tmp/tmpMENUFa' returned 7. curl: (7) Failed to connect to node0 port 50070: connection refused 000
App Timeline Server and History Server on node1 don't want to start either. Zookeeper, NameNode, DataNode and Nodemanager on Node0 is up. The nodes can reach each other (tried with ping) so that shouldn't be the problem.
Hopefully one can help me. I'm really new to this topic and not really familiar with the system.
You should check the host file (/etc/hosts), see the host name and FNDN, check if there any duplicates name, IP address.
Could you also confirm the firewall activity by steps:
sudo ufw status
And also check the port in iptables (or allow port in firewall: udp, tcp).

Datanode is not connected to the Namenode cloudera

I want to acces cloudera from a distant machine via Talend for big data. In order to do that i changed the ip of the host in cloudera by editing the file /etc/hosts and /etc/sysconfig/network.
I can acces cloudera from Talend. However the problem is that my datanode and Namenode seems to be not connected. When i check the log details of my Datanode i get the following errors :
Block pool BP-1183376682-127.0.0.1-1433878467265 (Datanode Uuid null) service to quickstart.cloudera/127.0.0.1:8022 beginning handshake with NN
Initialization failed for Block pool BP-1183376682-127.0.0.1-1433878467265 (Datanode Uuid null) service to quickstart.cloudera/127.0.0.1:8022 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=5802ab81-2c28-4beb-b155-cac31567d23a, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-83500942-6c65-4ecd-a0c2-a448be86bcb0;nsid=975591446;c=0)
the datanode still uses the wrong ip adress ( 127.0.0.1 ) even though i made the modifications in core-site.xml, hdfs-site.xml and mapred-site.xml by editing the previous ip adress by the new one.
I followed the steps given in this tutorial to do so :
https://www.youtube.com/watch?v=fSGpYHjGIRY
How can i fix this error ?
On Debian 8, the /etc/hosts will contain entry for 127.0.1.1 with your hostname you gave during Linux installation. Cloudera will use this IP-address for some of its services.
A regular HDFS will contain multiple servers with different hostnames/IP-addresses and will list those IPs as allowed. As your log says, the traffic is originating from 127.0.0.1, which is not the IP-address of your hostname.
For Cloudera single-server setup, the only way I found was to do the initial setup so, that /etc/hosts doesn't have the 127.0.1.1 entry in it.

connecting to a mesos development cluster on ubuntu 14 machine via opevpn

can someone who has successfully settup a mesos development cluster on google cluster help me out.. i have the clutser running however I am having a hard time creating a vpn to conect to the clutser even though I have openvpn installed on my machine and I have downloded the openvpn file provided by mesos. I am using ubuntu 14. basically i have followed instruction to create the cluster but in order to access mesos, marathon I need to configure a vpn connection by using the openvpn file provided by mesosphere but I do not how to do it on ubuntu 14..
Have you tried openvpn --config <file.ovpn>?

Change IP address of a Hadoop HDFS data node server and avoid Block pool errors

I'm using the cloudera distribution of Hadoop and recently had to change the IP addresses of a few nodes in the cluster. After the change, on one of the nodes (Old IP:10.88.76.223, New IP: 10.88.69.31) the following error comes up when I try to start the data node service.
Initialization failed for block pool Block pool BP-77624948-10.88.65.174-13492342342 (storage id DS-820323624-10.88.76.223-50010-142302323234) service to hadoop-name-node-01/10.88.65.174:6666
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(10.88.69.31, storageID=DS-820323624-10.88.76.223-50010-142302323234, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=cluster25;nsid=1486084428;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:656)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3593)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:899)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:91), I was unable to start the datanode service due to the following error:
Has anyone had success with changing the IP address of a hadoop data node and join it back to the cluster without data loss?
CHANGE HOST IP IN CLOUDERA MANAGER
Change Host IP on all node
sudo nano /etc/hosts
Edit the ip cloudera config.ini on all node if the master node ip changes
sudo nano /etc/cloudera-scm-agent/config.ini
Change IP in PostgreSQL Database
For the password Open PostgreSQL password
cat /etc/cloudera-scm-server/db.properties
Find the password lines
Example. com.cloudera.cmf.db.password=gUHHwvJdoE
Open PostgreSQL
psql -h localhost -p 7432 -U scm
Select table in PostgreSQL
select name,host_id,ip_address from hosts;
Update table IP
update hosts set ip_address = 'xxx.xxx.xxx.xxx' where host_id=x;
Exit the tool
\q
Restart the service on all node
service cloudera-scm-agent restart
Restart the service on master node
service cloudera-scm-server restart
Turns out its better to:
Decommission the server from the cluster to ensure that all blocks are replicated to other nodes in the cluster.
Remove the server from the cluster
Connect to the server and change the IP address then restart the cloudera agent
Notice that cloudera manager now shows two entries for this server. Delete the entry with the old IP and longest heartbeat time
Add the server to the required cluster and add required roles back to the server (e.g. HDFS datanode, HBASE RS, Yarn)
HDFS will read all data disks and recognize the block pool and cluster IDs, then register the datanode.
All data will be available and the process will be transparent to any client.
NOTE: If you run into name resolution errors from HDFS clients, the application has likely cached the old IP and will most likely need be restarted. Particularly Java clients that previously referenced this server e.g. HBASE clients must be restarted due to the JVM caching IPs indefinitely. Java based clients will likely throw errors relating to connectivity to the server with changed IP because they have the old IP cached until they are restarted.

TaskTracker on hadoop slave cluster won't start. Can't connect to master

I set up a 2 node hadoop cluster on aws with the namenode and the jobtracker running on the master, and the tasktracker and datanode being both the master and slave. When I start the dfs, It tells me that it starts the namenode, the datanode on both nodes, and the secondary namenode. When I start map reduce it also tells me that the jobtracker was started, as well as the tasktracker on both nodes. I started to run an example to make sure it was working, but it said that only one tasktracker was being used, on the namenode web interface. I checked the logs and bot the datanode and tasktracker node logs on the slave had something along the lines of
2013-08-08 21:31:04,196 INFO org.apache.hadoop.ipc.RPC: Server at ip-10-xxx-xxx-xxx/10.xxx.xxx.xxx:9000 not available yet, Zzzzz...
2013-08-08 21:31:06,202 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-xxx-xxx-xxx/10.xxx.xxx.xxx:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
The namenode is running on port 9000, This was in the datanode log. In the tasktracker log, it had the same thing except it was port 9001; where the jobtracker was running. I was able to find something on the apache wiki about this error http://wiki.apache.org/hadoop/ServerNotAvailable
but I couldn't find any of the possible problems they stated. Since I'm running both nodes on aws I also made sure that permissions were granted to both ports.
In summary.
The tasktracker and datanode on the slave node won't connect to the master
I know the ip addresses are right, i've checked multiple times
I can passphraseless ssh from both instances into each other and into themselves
The ports are granted permission on aws
based on the logs, both the namenode and the jobtracker are running fine
I put the the ips of the master and slave in the config files, rather than a hostname because when i did that and edited the /etc/hosts accordingly, it couldn't resolve it
Does anybody know of any other possible reasons?
Per the original poster:
OK, so apparently, it's because the namenode is listening at 127.0.0.1:9000, rather than ip-10.x.x.IpOfMaster:9000. See Hadoop Datanodes cannot find NameNode.
I just replaced localhost:9000 in the config files with ip-10.x.x.x:9000 and it worked.

Resources