HMASTER fails to run in psedo distributed mode - hadoop

I have installed hadoop 1.2.0 on Ubuntu.. All the services namenode, sec namenode, datanode, jobtracker, tasktracker running well.
I then installed hbase-0-94.8 and I hope, configurations are okay as well. But HMaster fails to start on port 9000.. It actually starts and then drops down.
I have self ssh on. It is working
my /etc/hosts entires,
127.0.0.1 localhost prakashl
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
kandabap#prakashl:/usr/lib/hbase/hbase-0.94.8/conf$ jps
2735 HQuorumPeer
3017 HRegionServer
2270 TaskTracker
3715 Jps
2100 JobTracker
1845 DataNode
2009 SecondaryNameNode
1688 NameNode
hbase-master.log
>
2014-05-14 09:28:37,015 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Master=localhost,60000,1400023716583
2014-05-14 09:28:38,108 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).
2014-05-14 09:28:39,109 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
2014-05-14 09:28:40,109 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
2014-05-14 09:28:41,110 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
2014-05-14 09:28:42,111 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
2014-05-14 09:28:43,111 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
2014-05-14 09:28:44,112 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
2014-05-14 09:28:45,112 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
2014-05-14 09:28:46,113 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
2014-05-14 09:28:47,113 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried
<<<<<
I copied jar files from hadoop folders to hbase/lib folder to address incompatibilities if any.
kandabap#prakashl:/usr/lib/hbase/hbase-0.94.8/logs$ netstat -ntla
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:48575 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:58304 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:47238 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:42987 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50060 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:60020 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8020 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8021 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:35255 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN

Change the value of hbase.rootdir from hdfs://localhost:9000/hbase to hdfs://localhost:8020/hbase in hbase-site.xml, and restart HBase.

Related

Cloudera: Installation failed. Failed to receive heartbeat on CDH 5.14 cluster installation on singleNode from agents

I am trying to install cloudera CDH 5.14.0. and facing below error during Cluster Installation.
Installation failed. Failed to receive heartbeat from agent.
Ensure that the host's hostname is configured properly.
Ensure that port 7182 is accessible on the Cloudera Manager Server (check firewall rules).
Ensure that ports 9000 and 9001 are not in use on the host being added.
Check agent logs in /var/log/cloudera-scm-agent/ on the host being added. (Some of the logs can be found in the installation details).
If Use TLS Encryption for Agents is enabled in Cloudera Manager (Administration -> Settings -> Security), ensure that /etc/cloudera-scm-agent/config.ini has use_tls=1 on the host being added. Restart the corresponding agent and click the Retry link here.
Detail Log:
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO Neither verify_cert_file nor verify_cert_dir are configured. Not performing validation of server certificates in HTTPS communication. These options can be configured in this agent's config.ini file to enable certificate validation.
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO Agent Logging Level: INFO
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO No command line vars
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO Found database jar: /usr/share/java/mysql-connector-java.jar
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO Missing database jar: /usr/share/java/oracle-connector-java.jar (normal, if you're not using this database type)
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO Found database jar: /usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar
>>[28/Jan/2018 10:30:44 +0000] 12771 MainThread agent INFO Agent starting as pid 12771 user root(0) group root(0).
END (0)
end of agent logs.
scm agent restarted
Local ubuntu host details:
chaithu#chaithu:~$ hostname
chaithu
chaithu#chaithu:~$ hostname -f
chaithu
chaithu#chaithu:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 chaithu
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
chaithu#chaithu:~$ sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:7180 0.0.0.0:* LISTEN 1353/java
tcp 0 0 0.0.0.0:7182 0.0.0.0:* LISTEN 1353/java
tcp 0 0 0.0.0.0:4433 0.0.0.0:* LISTEN 12930/python2.7
tcp 0 0 127.0.1.1:53 0.0.0.0:* LISTEN 1162/dnsmasq
tcp 0 0 127.0.0.1:7190 0.0.0.0:* LISTEN 12930/python2.7
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1031/sshd
tcp 0 0 0.0.0.0:7191 0.0.0.0:* LISTEN 12930/python2.7
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 5112/cupsd
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 1077/postgres
tcp 0 0 127.0.0.1:19001 0.0.0.0:* LISTEN 1632/python
tcp 0 0 127.0.1.1:9000 0.0.0.0:* LISTEN 12771/python2.7
tcp 0 0 0.0.0.0:7432 0.0.0.0:* LISTEN 1434/postgres
tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN 1045/mysqld
tcp6 0 0 :::80 :::* LISTEN 1441/apache2
tcp6 0 0 :::4434 :::* LISTEN 12930/python2.7
tcp6 0 0 :::22 :::* LISTEN 1031/sshd
tcp6 0 0 :::7191 :::* LISTEN 12930/python2.7
tcp6 0 0 ::1:631 :::* LISTEN 5112/cupsd
tcp6 0 0 :::7432 :::* LISTEN 1434/postgres
udp 0 0 0.0.0.0:57656 0.0.0.0:* 902/avahi-daemon: r
udp 0 0 0.0.0.0:631 0.0.0.0:* 5113/cups-browsed
udp 0 0 0.0.0.0:5353 0.0.0.0:* 2703/chrome
udp 0 0 0.0.0.0:5353 0.0.0.0:* 902/avahi-daemon: r
udp 0 0 0.0.0.0:39330 0.0.0.0:* 1162/dnsmasq
udp 0 0 0.0.0.0:7191 0.0.0.0:* 12930/python2.7
udp 0 0 127.0.1.1:53 0.0.0.0:* 1162/dnsmasq
udp 0 0 0.0.0.0:68 0.0.0.0:* 1117/dhclient
udp6 0 0 :::5353 :::* 2703/chrome
udp6 0 0 :::5353 :::* 902/avahi-daemon: r
udp6 0 0 :::46382 :::* 902/avahi-daemon: r
udp6 0 0 :::7191 :::* 12930/python2.7
I am trying to install CDH only on single node local machine. Do let me know if any more info required.

Datanode and Namenode runs but not reflected in UI

I have a small setback in configuring my Master and Slave in Hadoop and I have both my namenode and datanode in Master and Slave up and running.
However the LiveNodes count in the WebUI are not getting reflected but the nodes are running.
I have already tried disabling the firewall and formatted the nodes but I am unable to resolve the same.
Any help would be highly appreciated !!!
Here are the snippets :
Master:
jps command output :
5088 Jps
4446 NameNode
4681 SecondaryNameNode
Slave :
jps command output:
2478 Jps
2410 DataNode
ubuntu#hadoop-master:/usr/local/hadoop/etc/hadoop$ $HADOOP_HOME/bin/hdfs dfsadmin -refreshNodes
16/04/28 02:22:37 WARN ipc.Client: Address change detected. Old: hadoop-master/52.200.230.29:50077 New: hadoop-master/127.0.0.1:50077
refreshNodes: Call From hadoop-master/127.0.0.1 to hadoop-master:50077 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Log file of hadoop-slave-1:
2016-04-28 21:23:07,248 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/52.200.230.29:9000
2016-04-28 21:23:12,257 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/52.200.230.29:9000
2016-04-28 21:23:17,265 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/52.200.230.29:9000
2016-04-28 21:23:22,273 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/52.200.230.29:9000
2016-04-28 21:23:27,282 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/52.200.230.29:9000
2016-04-28 21:23:32,291 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-master/52.200.230.29:9000
Log File of Hadoop-master:
2016-04-28 21:21:04,002 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 127.0.0.1
2016-04-28 21:21:04,002 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
2016-04-28 21:21:04,002 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 407
2016-04-28 21:21:04,002 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 22
2016-04-28 21:21:04,003 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 23
2016-04-28 21:21:04,003 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /usr/local/hadoop/hadoop_data/hdfs/namenode/current/edits_inprogress_0000000000000000407 -> /usr/local/hadoop/hadoop_data/hdfs/namenode/current/edits_0000000000000000407-0000000000000000408
2016-04-28 21:21:04,004 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 409
netstat -pant command on my master:
ubuntu#hadoop-master:/usr/local/hadoop/etc/hadoop$ netstat -pant
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 21491/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:50077 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:50078 0.0.0.0:* LISTEN 21491/java
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 21726/java
tcp 0 0 172.31.63.189:50070 128.235.8.68:57225 ESTABLISHED 21491/java
tcp 0 0 127.0.0.1:41471 127.0.0.1:50078 TIME_WAIT -
tcp 0 124 172.31.63.189:22 128.235.8.68:56950 ESTABLISHED -
tcp 0 0 172.31.63.189:50070 128.235.8.68:57224 ESTABLISHED 21491/java
tcp 0 0 172.31.63.189:50070 128.235.8.68:57223 ESTABLISHED 21491/java
tcp 0 0 172.31.63.189:22 128.235.8.68:57084 ESTABLISHED -
tcp 0 0 172.31.63.189:22 58.218.204.215:39402 ESTABLISHED -
tcp 0 0 172.31.63.189:50070 128.235.8.68:57227 ESTABLISHED 21491/java
tcp 0 0 172.31.63.189:50070 128.235.8.68:57228 ESTABLISHED 21491/java
tcp 0 0 172.31.63.189:50070 128.235.8.68:57226 ESTABLISHED 21491/java
tcp6 0 0 :::22 :::* LISTEN -
tcp6 0 0 :::50077 :::* LISTEN -
tcp6 0 0 :::9000 :::* LISTEN -
Connection refused
I can see this error from your post. I guess you need to do 3 things
make sure 50077 port is listen by a process and it is your hadoop process
make sure it is access able using some tools like telnet
besides firewall. selinux can also affect access. So shut it down and restart your service and try again

hadoop slave cannot connect to master:8031

I'm a total hadoop newbee. I set up hadoop on two machines master and slave following this tutorial (I obtained the same error following this other tutorial).
Problem: After starting dfs and yarn, the only node appearing on localhost:50070 is the master, even if the right processes are running on the master (NameNode, DataNode, SecondaryNameNode, ResourceManager) and on the slave (DataNode).
The nodemanager log of the slave reports: INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.0.14:8031. Already tried 10 times.
Note that:
I have already edited yarn-site.xml following this thread.
I have disabled the firewall on the master
I ran netstat -anp | grep 8031 on the master and confirmed that there are a couple of processes listening on port 8031 using tcp.
I suffered from the same problem, and these steps may help.
Make sure your namenodes can negotiate with your datanodes.
hadoop#namenode-01:~$ jps
8678 NameNode
9530 WebAppProxyServer
9115 ResourceManager
8940 SecondaryNameNode
9581 Jps
hadoop#datanode-01:~$ jps
8592 NodeManager
8715 Jps
8415 DataNode
Check your nodemanager-datanode-01.log from $HADOOP_HOME/logs reports.
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode-01/192.168.4.2:8031. Already tried 8 time(s);
Check if your yarn.resourcemanager.resource-tracker.address listen on ipv6 instead of ipv4.
hadoop#namenode-01:~$ netstat -natp
tcp6 0 0 127.0.2.1:8088 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8030 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8031 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8032 :::* LISTEN 9115/java
tcp6 0 0 127.0.2.1:8033 :::* LISTEN 9115/java
If your yarn addresses listen on ipv6, maybe you should disable ipv6 first.
sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1
Finally, the default yarn addresses are like this:
yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8031
Check your /ect/hosts to avoid misconfigurations.
hadoop#namenode-01:~$ cat /etc/hosts
127.0.2.1 namenode-01 namenode-01 (for example, this line config hostname as 127.0.2.1, delete this line)
192.168.4.2 namenode-01
192.168.4.3 datanode-01
192.168.4.4 datanode-02
192.168.4.5 datanode-03

Why hdfs namenode and datanode both always listen on 0.0.0.0 at a random port?

I installed a new hadoop-2.2.0 today, I found after hdfs is started (using /sbin/start-dfs.sh), namenode and datanode both always listen on 0.0.0.0 at a random port? I cannot find relate configuration on http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml.
The port is not 50070, 50470, 50090, 50010, 50020, 50075, 50475...etc. which are listed on hdfs-default.xml, it is just a random port.
8369 Jps
8109 DataNode
7936 NameNode
Namenode listens on the followings:
tcp 0 0 0.0.0.0:46628 0.0.0.0:* LISTEN 7936/java <==
tcp 0 0 10.173.130.119:9000 0.0.0.0:* LISTEN 7936/java
tcp 0 0 10.173.130.119:50070 0.0.0.0:* LISTEN 7936/java
Datanode listens on the followings:
tcp 0 0 10.173.130.119:50020 0.0.0.0:* LISTEN 8109/java
tcp 0 0 0.0.0.0:35114 0.0.0.0:* LISTEN 8109/java <==
tcp 0 0 10.173.130.119:50010 0.0.0.0:* LISTEN 8109/java
tcp 0 0 10.173.130.119:50075 0.0.0.0:* LISTEN 8109/java
Thanks for any advise.
Yes It assigns random port everytime a namenode or datanode restarted. But if you observer all the namenode listeners running on same process id in this case (7936) and Datanode listeners run on same process id ie 8109. So internally process is same.

Hadoop dfs error : INFO ipc.Client: Retrying connect to server: localhost

I have successfully setup a Hadoop cluster with 6 nodes (master, salve<1-5>)
Formatted the namenode -> done
Starting up and shutting down cluster -> works fine
Executing "hadoop dfs -ls /" gives this error -> Error: INFO ipc.Client: Retrying connect to server: localhost
I tried to see the services running using:
sudo netstat -plten | grep java
hduser#ubuntu:~$ sudo netstat -plten | grep java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1000 93307 11384/java
tcp 0 0 0.0.0.0:44440 0.0.0.0:* LISTEN 1000 92491 11571/java
tcp 0 0 0.0.0.0:40633 0.0.0.0:* LISTEN 1000 92909 11758/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1000 93449 11571/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1000 93673 11571/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1000 93692 11571/java
tcp 0 0 127.0.0.1:40485 0.0.0.0:* LISTEN 1000 93666 12039/java
tcp 0 0 0.0.0.0:44582 0.0.0.0:* LISTEN 1000 93013 11852/java
tcp 0 0 10.42.43.1:54310 0.0.0.0:* LISTEN 1000 92471 11384/java
tcp 0 0 10.42.43.1:54311 0.0.0.0:* LISTEN 1000 93290 11852/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1000 93460 11758/java
tcp 0 0 0.0.0.0:34154 0.0.0.0:* LISTEN 1000 92179 11384/java
tcp 0 0 0.0.0.0:50060 0.0.0.0:* LISTEN 1000 94200 12039/java
tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1000 93550 11852/java
Its the master IP binded to port 54310 and 54311 and not the localhost(loopback).
The conf-site.xml has been properly configured:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
</property>
</configuration>
Why is it expecting localhost to be binded to 54310 rather than the master hich I have configured here. Help appreciated. How do I resolve this ??
Cheers
Apparently, someone added the older hadoop(1.0.3) bin directory into the path variable before I had added the new hadoop(1.0.4) bin directory. And thus whenever I ran "hadoop" from the CLI, it executed the binaries of the older hadoop rather than the new one.
Solution:
Remove the entire bin path of older hadoop
Shutdown cluster - Exit terminal
Login in new terminal session
Startup node
Tried hadoop dfs -ls / -> Works fine !!!! Good lesson learnt.
Looks like many people ran into this problem.
There might be no need to change /etc/hosts, and make sure you can access master and slave from each other, and your core-site.xml are the same pointing to the right master node and port number.
Then run $HADOOP/bin/stop-all.sh, $HADOOP/bin/start-all.sh on master node ONLY. (If run on slave might lead to problems). Use JPS to check whether all services are there as follows.
On master node:
4353 DataNode
4640 JobTracker
4498 SecondaryNameNode
4788 TaskTracker
4989 Jps
4216 NameNode
On slave node:
3143 Jps
2827 DataNode
2960 TaskTracker
In addition, check your firewall rules between namenode and datanode

Resources