regarding hbase running on hadoop in distributed mode - hadoop

Hadoop version=2.4.1
hbase version=0.98.6
i have hadoop up and running prefectly fine on below conf:
107.108.86.119-hadoop namenode,SecondaryNameNode
107.109.155.100-datanode1
107.109.155.102-datanode2
now i install hbase as below conf:-
107.108.86.114:-hmaster,HQuorumPeer
107.109.155.100-regionserver1
107.109.155.102-regionserver2
when i do jps following process are running:
107.109.155.102:-hregionserver,datanode
107.109.155.100:-hregionserver,datanode
107.108.86.119:-NameNode,secondaryNameNode
107.108.86.114:-hmaster
but on doing status on hbase shell is showing "0 servers, 0 dead, NaN average load"
on entering cmd on hbase shell showing ERROR: java.io.IOException: Table Namespace Manager not ready yet, try again later
logs on regionserver showing:
regionserver.HRegionServer: reportForDuty to master=localhost,60000,1415007213689 with port=60020, startcode=1415007215055
regionserver.HRegionServer: error telling master we are up
my hbase-site.xml-
<property>
<name>hbase.master</name>
<value>107.108.86.114:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://push-mcd2:54310/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>107.108.86.114</value>
</property>
while /etc/hosts of hmaster is:
127.0.0.1 localhost arpita-ubuntu
127.0.1.1 arpita-ubuntu
107.109.155.100 push-ws1
107.109.155.102 push-ws2
107.108.86.114 push-mcd1
107.108.86.119 push-mcd2
WHILE slaves file are also almost similiar to above one.
conf/hbase-env.sh
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.22 export HBASE_CLASSPATH=/home/hadoop/hadoop-0.20.2/conf export HBASE_MANAGES_ZK=true
so what change i make so hbase will run on above cluster

Why does your regionserver log mentions that it is looking for HBase Master on localhost?
Form information above you have setup Master on a node different for either regionservers, please check your config is correct on each node.
logs on regionserver showing: regionserver.HRegionServer:
reportForDuty to master=localhost,60000,1415007213689 with port=60020,
startcode=1415007215055 regionserver.HRegionServer: error telling
master we are up
Also in /etc/hosts on each node please update first two lines from
127.0.0.1 localhost arpita-ubuntu
127.0.1.1 arpita-ubuntu
to
127.0.0.1 localhost
<Actual_IP_Address_for_Host> arpita-ubuntu
This is necessary if you don't have automatic dns name resolution in place.
Also please use IP instead of localhost in all config settings.
If you still face issues, check if the respective ports are open or not.
Hope this helps you.

Related

Hbase installation in three node hadoop cluster

I have installed my hadoop three node cluster(master,slave1 and slave2).
I would like to install Hbase fully distrubuted mode. I am think to install HBase Master and Zookeepr in my hadoop cluster MASTER machine(i.e Namenode), And Region Servers in SLAVE1 and SLAVE2(i.e Datanodes) machines. Is this correct approach ?
Sorry, This may be simple question but I am new to NoSQL systems and want to do this installations.
I really appreciate If someone able to share any reference document for ths installation.
Thanks in advance.
In order to configure hbase and zookeeper on three nodes, i.e., 1 master and 2 slave nodes, you will need to edit hbase-site.xml, regionservers, hbase-env.sh (found in $HBASE_HOME/conf) and zoo.cfg (found in $ZOOKEEPER_HOME/conf).
Let us name your master node as master and slave nodes as slave1 and slave2. Let us consider your hadoop, hbase and zoopeeper folders are in /usr/local/cluster/ folder. Change the following files:
1. hbase-site.xml:
<configuration>
<property>
<name>hbase.master</name>
<value>master:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:8020/hbase</value>
</property>
<property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>slave1,slave2</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/usr/local/cluster/zk-tmp</value>
</property>
</configuration>
2. hbase-env.sh:
--add these lines--
export JAVA_HOME=/usr/lib/jvm/default-java
export HBASE_HOME=/usr/local/cluster/hbase
export HADOOP_HOME=/usr/local/cluster/hadoop
--modify these lines--
export HBASE_PID_DIR=/usr/local/cluster/zk-tmp
export HBASE_MANAGES_ZK=false
3. regionservers:
(delete the localhost and add these lines if you just want your regionservers in slave1 and slave2 only)
slave1
slave2
4. zoo.cfg:
--modify these lines--
dataDir=/usr/local/cluster/zk-tmp
--add these lines(since you start zookeeper server on master node)--
server.0=master:2888:3888
5. etc/hosts:
Edit the /etc/hosts file and comment the line with 127.0.1.1 (to avoid loopback address problems)
--add these lines--
your-master-node-ip master
your-slave1-node-ip slave1
your-slave2-node-ip slave2
Note: Do steps 1 to 5 in master, slave1 and slave2 nodes.
6. Start zookeeper server in master node:
$ZOOKEEPER_HOME/bin/zkServer.sh start
7. Start hbase processes in master node:
$HBASE_HOME/bin/start-hbase.sh
8. Check your hbase and zookeeper processes: Results for jps command in each node should contain-
--master--
QuorumPeerMain
HMaster
HRegionServer
--slave1--
HRegionServer
--slave2--
HRegionServer
9. Stopping zookeeeper and hbase:
$ZOOKEEPER_HOME/bin/zkServer.sh start
$HBASE_HOME/bin/stop-hbase.sh

get "ERROR: Can't get master address from ZooKeeper; znode data == null" when using Hbase shell

I installed Hadoop2.2.0 and Hbase0.98.0 and here is what I do :
$ ./bin/start-hbase.sh
$ ./bin/hbase shell
2.0.0-p353 :001 > list
then I got this:
ERROR: Can't get master address from ZooKeeper; znode data == null
Why am I getting this error ? Another question:
do I need to run ./sbin/start-dfs.sh and ./sbin/start-yarn.sh before I run base ?
Also, what are used ./sbin/start-dfs.sh and ./sbin/start-yarn.sh for ?
Here is some of my conf doc :
hbase-sites.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://127.0.0.1:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/Users/apple/Documents/tools/hbase-tmpdir/hbase-data</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/Users/apple/Documents/tools/hbase-zookeeper/zookeeper</value>
</property>
</configuration>
core-sites.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/micmiu/tmp/hadoop</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.native.lib.available</name>
<value>false</value>
</property>
</configuration>
yarn-sites.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
If you just want to run HBase without going into Zookeeper management for standalone HBase, then remove all the property blocks from hbase-site.xml except the property block named hbase.rootdir.
Now run /bin/start-hbase.sh. HBase comes with its own Zookeeper, which gets started when you run /bin/start-hbase.sh, which will suffice if you are trying to get around things for the first time. Later you can put distributed mode configurations for Zookeeper.
You only need to run /sbin/start-dfs.sh for running HBase since the value of hbase.rootdir is set to hdfs://127.0.0.1:9000/hbase in your hbase-site.xml. If you change it to some location on local the filesystem using file:///some_location_on_local_filesystem, then you don't even need to run /sbin/start-dfs.sh.
hdfs://127.0.0.1:9000/hbase says it's a place on HDFS and /sbin/start-dfs.sh starts namenode and datanode which provides underlying API to access the HDFS file system. For knowing about Yarn, please look at http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/YARN.html.
This could also happen if the vm or the host machine is put to sleep ,Zookeeper will not stay live.
Restarting the VM should solve the problem.
You need to start zookeeper and then run Hbase-shell
{HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
and you may want to check this property in hbase-env.sh
# Tell HBase whether it should manage its own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
Refer to Source - Zookeeper
One quick solution could be to Restart hbase:
1) Stop-hbase.sh
2) Start-hbase.sh
I had the exact same error. The Linux firewall was blocking connectivity. One can test ports via telnet. A quick fix is to turn off the firewall and see if it fixes it:
Completely disable the firewall on all of your nodes. Note: this command will not survive a reboot of your machines.
systemctl stop firewalld
Long term fix is that you must configure the firewall to allow the hbase ports.
Note, your version of hbase may use different ports:
https://issues.apache.org/jira/browse/HBASE-10123
The output from Hbase shell is quite high level that many misconfiguration would cause this message. To help yourself debug, it would be much better to look into the hbase log in
/var/log/hbase
to figure out the root cause of the issue.
I had the same problem too. For me, my root cause was due to hadoop-kms having a conflicting port number with my hbase-master. Both of them are using port 16000 so my HMaster didn't even get started when I invoke hbase shell. After I fixed that, my hbase worked.
Again, kms port conflict might not be your root-cause. Strongly suggest looking into /var/log/hbase to find the root cause.
In my case with same error in running hbase - I did not include the zookeeper properties in the hbase-site.xml and still get the above error messages (as based in Apache hbase guide, only the two properites: rootdir, and distributed are essential).
I can also trace back my output of jps command that find out that indeed my Hregion server and Hmaster were not properly up and running.
After stop and start (like a reset), I did have these two up and running and can run hbase properly.
if it's happening in VMWare or virtual box please restart Cloudera by command init1 please check you have root privilege and retry hope it will help :)
hbase shell

access hbase in IDE Eclipse , java.net.UnknownHostException

When I write the java code to access hbase in IDE Eclipse, the messages "java.net.UnknownHostException" are always been shown.But hbase shell works well.
I install the hadoop and hbase on a single linux node in pseudo distribution mode. And my hostname is yzd. Here are the /etc/hosts and hbase-site.xml:
/etc/hosts:
127.0.0.1 localhost yzd
hbase-site.xml:
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
Error message:
INFO [main] (HBaseRPC.java:117) - Using org.apache.hadoop.hbase.ipc.WritableRpcEngine for org.apache.hadoop.hbase.ipc.HMasterInterface
INFO [main] (HConnectionManager.java:596) - getMaster attempt 0 of 10 failed; retrying after sleep of 1000
java.net.UnknownHostException: unknown host: � 13846#yzdlocalhost
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.<init>(HBaseClient.java:224)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:954)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:816)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:141)
at com.sun.proxy.$Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:174)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:295)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:272)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:324)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:579)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:94)
at com.hbasebook.hush.schema.SchemaManager.process(SchemaManager.java:126)
at com.hbasebook.hush.HushMain.main(HushMain.java:57)
Check the version of your local hbase matches the one you are using as a dependency in your pom. This should solve your issue. I was facing the same issue, I was using hbase in standalone mode. I hope this helps you.
First of all yzd is not host name, its domain name (You should prefer FQDN). Now this line
java.net.UnknownHostException: unknown host: � 13846#yzdlocalhost
clearly says that 13846#yzdlocalhost host is not there. Now you can do followings:
Use IP address instead of hostname in both hbase-site.xml and core-site.xml and check
Then use FQDN in etc/hosts file and tab-separate the values, now you can replace the IP with FQDN

Hive/HBase Integration - Zookeeper Session Closes Immediately

We have an 8 node cluster using CDH3u2 configured using Cloudera Manager. We have a dedicated master node running our only instance of zookeeper. When I configure hive to run local hadoop, executed from the master node, I have no problem retreiving the data from HBase. When I run distributed map/reduce via hive, I am getting the following error when the slave nodes connect to zookeeper.
HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default).
We have tried setting max connections higher (we even tried removing the limit). This is a development cluster that has very few users, I know that the problem is not that there are too many connections (I am able to connect to zookeeper from the slave nodes using ./zkCli).
Server side logs indicate that the session was terminated by the client.
Client side hadoop log says:
'Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
Any idea why I am unable to maintian a connection to zookeeper via Hive Map/Reduce?
Configs for hbase and zookeeper are:
# Autogenerated by Cloudera SCM on Wed Dec 28 08:42:23 CST 2011
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper
clientPort=2181
maxClientCnxns=1000
minSessionTimeout=4000
maxSessionTimeout=40000
HBase Site-XML is:
<property>
<name>hbase.rootdir</name>
<value>hdfs://alnnimb01:8020/hbase</value>
<description>The directory shared by region servers. Should be fully-qualified to include the filesystem to use. E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR</description>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
<description>The port master should bind to.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)</description>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
<description>The port for the hbase master web UI Set to -1 if you do not want the info server to run.</description>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
<description>Root ZNode for HBase in ZooKeeper. All of HBase's ZooKeeper files that are configured with a relative path will go under this node. By default, all of HBase's ZooKeeper file path are configured with a relative path, so they will all go under this directory unless changed.</description>
</property>
<property>
<name>zookeeper.znode.rootserver</name>
<value>root-region-server</value>
<description>Path to ZNode holding root region location. This is written by the master and read by clients and region servers. If a relative path is given, the parent folder will be ${zookeeper.znode.parent}. By default, this means the root location is stored at /hbase/root-region-server.</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>The ZooKeeper client port to which HBase clients will connect</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>alnnimb01.aln.experian.com</value>
<description>Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".</description>
Turns out that the Map/Reduce submitted by Hive is trying to connect to zookeeper at 'localhost', regardless of how the zookeeper.quorom is setup in the config file. I changed /etc/hosts to have to the alias 'localhost' point to the IP of my master node and the connection to zookeeper is maintained. Still looking for a better resolution, but this will work for now.
I figured it out. It was a configuration issue (as I suspected all along). The solution was to:
-set ‘hbase.zookeeper.quorum’ within the ‘hive-site.xml’ and place it in the ‘hadoop-conf’ directory
What threw me off was that there is no 'hbase.zookeeper.quorum' in hive-default.xml. I had been playing with 'hive.zookeeper.quorum' which was not the correct configuration to change.
I'm sorry for posting a new answer. I wanted to comment on the previous answer but the commenting UI seems to have disappeared >.< ...
Anyway, I wanted to say that I am experiencing the same problem, and it is solved by doing the /etc/hosts hack, but that seems like a very dirty solution...
Did anyone figure out a way of fixing this cleanly...??
Thanks :) !
I meet exactly the same problem. What I did is to use the following conf to start hive cli and it works fine.
hive --hiveconf hbase.zookeeper.quorum={zk-host}
You should config HBase to use the external zookeeper and replace {zk-host} with the host of zookeeper.
I'm still looking for how to resolve this when using jdbc to access hive.

HBase binding to an incorrect address

I'm attempting to run running HBase in pseudo-distributed mode. I have followed all of the steps in the tutorial.
My hbase-site.xml looks like this:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
My regionservers looks like this (default):
localhost
In the logs, Zookeeper starts OK, MiniZK starts OK, then I get a BindException with this being the culprit:
Caused by: java.net.BindException: Problem binding to /192.168.0.1:0 : Cannot assign requested address
Where in the world did it get the address 192.168.0.1? And why is it trying to bind to port 0? That IP is my NAT gateway. The IP address of the machine it's on is 192.168.0.200.
I have looked in all of the config files but don't see anywhere that I would specify that address.
** UPDATE **
It looks like the problem was that HBase was trying to reverse-lookup my IP address by my hostname which-- because I'm using my router as a DNS-- resolved to ... my router.
When I add an "alias" in the /etc/hosts file to 127.0.0.1 it resolves just fine.
#arnon-rotem-gal-oz, I just installed whatever came in the HBase tarball. I'm assuming miniZK is a scaled-down version of Zookeeper? I'm not running a separate instance of it.
The code you posted did the trick to resolve the next problem that came up.
Check the zookeeper configuration file (zoo.cfg in the zookeeper/conf directory)
Also why do you have both zookeeper and miniZK?
Also (not directly related to your question) you need to tell hbase where to find the zookeeper e.g. adding the following to your hbase-site.xml
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>

Resources