Following these two tutorials: i.e tutorial 1 and tutorial 2, I was able to set up HBase cluster in fully-distributed mode. Initially the cluster seems to work okay.
The 'jps' output in HMaster/ Name node
The jps output in DataNodes/ RegionServers
Nevertheless, when every I try to execute hbase shell, it seems that the HBase processors are interrupted due to some Zookeeper error. The error is pasted below:
2021-03-13 11:52:26,047 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper exists failed a│1951 HRegionServer
fter 4 attempts │hduser#master-vm:~$
2021-03-13 11:52:26,048 WARN [main] zookeeper.ZKUtil: hconnection-0x4375b0130x0, quorum=137.4│
3.49.59:2181,137.43.49.58:2181,137.43.49.50:2181,137.43.49.49:2181, baseZNode=/hbase Unable to│
set watcher on znode (/hbase/hbaseid) │
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss│
for /hbase/hbaseid │
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) │
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) │
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) │
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.│
java:221) │
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:417) │
at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:6│
I tried several attempts to solve this issue (including trying out with different HBase/ Hadoop compatible versions). But still no progress.
Would like to have your input on this.
Shared below are other information required:
in /etc/hosts file:
(I already tried commenting the HBase related hosts in /etc/hosts/, still didn'w work)
in hbase-site.xml
After 5 days of hustle, I learned what went wrong. Posting my solution here. Hope it can help some of the other developers too. Also would like to thank #VV_FS for the comments.
In my scenario, I used virtual machines which I burrowed from an external party. Therefore, there were certain firewalls and other security measures. In case if you follow a similar experimental setup, these steps might help you.
To set up HBase cluster, follow the following tutorials.
Set up Hadoop in distributed mode.
Notes when setting up HBase in fully distributed-mode:
Make sure to open all the ports mentioned in the post. For example, use sudo ufw allow 9000 to open port 9000. Follow the command to open all the ports in relation to running Hadoop.
Set up Zookeeper in distributed mode.
Notes when setting up Zookeeper in fully distributed mode:
Make sure to open all the ports mentioned in the post. For example, use sudo ufw allow 3888 to open port 3888. Follow the command to open all the ports in relation to running Zookeeper.
DO NOT START ZOOKEEPER NODES AFTER INSTALLATION. ZOOKEEPER WILL BE MANAGED HBASE INTERNALLY. THEREFORE, DON'T START ZOOKEEPER AT THIS STAGE.
Set up HBase in distributed mode.
When setting up values for hbase-site.xml, use port number 60000 for hbase.master tag, not 60010. (thanks #VV_FS to point this out in the earlier discussion).
Make sure to open all the ports mentioned in the post. For example, use sudo ufw allow 60000 to open port 60000. Follow the command to open all the ports in relation to running Zookeeper.
[Important thoughts]: If encounters errors, always refer to HBase logs. In my case, hbase-mater-xxxxx.log and zookeeper-master--xxx.log helped me to track down exact errors.
Related
I have a 5 node hadoop cluster running HDP 2.3.0. I setup a H2O cluster on Yarn as described here.
On running following command
hadoop jar h2odriver_hdp2.2.jar water.hadoop.h2odriver -libjars ../h2o.jar -mapperXmx 512m -nodes 3 -output /user/hdfs/H2OTestClusterOutput
I get the following ouput
H2O cluster (3 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)
(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...
When I try to execute the command
h2o.init(ip="10.113.57.98", port=54321)
The process remains stuck at this stage.On trying to connect to the web UI using the ip:54321, the browser tries to endlessly load the H2O admin page but nothing ever displays.
On forcefully terminating the init process I get the following error
No instance found at ip and port: 10.113.57.98:54321. Trying to start local jar...
However if I try and use H2O with python without setting up a H2O cluster, everything runs fine.
I executed all commands as the root user. Root user has permissions to read and write from the /user/hdfs hdfs directory.
I'm not sure if this is a permissions error or that the port is not accessible.
Any help would be greatly appreciated.
It looks like you are using H2O2 (H2O Classic). I recommend upgrading your H2O to the latest (H2O 3). There is a build specifically for HDP2.3 here: http://www.h2o.ai/download/h2o/hadoop
Running H2O3 is a little cleaner too:
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 6g -output hdfsOutputDirName
Also, 512mb per node is tiny - what is your use case? I would give the nodes some more memory.
We are trying to install Phoenix 4.4.0 on HBase 1.0.0-cdh5.4.4 (CDH5.5.5 four nodes cluster) via this installation document: Phoenix installation
Based on that we copied our phoenix-server-4.4.0-HBase-1.0.jar to hbase libs on each region server and master server, so that, on each /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hbase/lib folder in the master and three region servers.
After that we reboot the HBase service via Cloudera Manager.
Everything seems to be ok, but when we are trying to access to phoenix shell via ./sqlline.py localhost command, we get a Zookeeper error in that way:
15/09/09 14:20:51 WARN client.ZooKeeperRegistry: Can't retrieve clusterId from Zookeeper
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
So we are not sure that the installation is properly done. Is necessary any further configuration?
We are not even sure wether we are using the sqlline command properly.
Any help will be appreciated.
After reinstalling the 4 nodes cluster on AWS, phoenix is now working properly.
It's a pitty that we don't know exactly what was really happening, but we think that after several changes in our config, we broke something that made phoenix impossible to work.
One thing to take into consideration is that sqllline command has to be executed with an ip that is in the zookeeper quorum, and this is something we were doing wrong, since we were trying to run it from the namenode, and it wasn't in the zookeeper quorum.Once we run sqlline.py from a datanode, everything is working fine.
Btw, the installation guide that we finally followed is Phoenix Installation
Launching using spark-ec2 script results in:
Setting up ganglia RSYNC'ing /etc/ganglia to slaves... <...>
Shutting down GANGLIA gmond: [FAILED]
Starting GANGLIA gmond: [ OK ]
Shutting down GANGLIA gmond: [FAILED]
Starting GANGLIA gmond: [ OK ]
Connection to <...> closed. <...> Stopping httpd:
[FAILED] Starting httpd: httpd: Syntax error on line 199 of
/etc/httpd/conf/httpd.conf: Cannot load modules/libphp-5.5.so into
server: /etc/httpd/modules/libphp-5.5.so: cannot open shared object
file: No such file or directory
[FAILED] [timing]
ganglia setup: 00h 00m 03s Connection to <...> closed.
Spark standalone cluster started at <...>:8080 Ganglia started at
<...>:5080/ganglia
Done!
However, when I netstat, there is no 5080 port listened on.
Is this related to the above error with httpd or it's something else?
EDIT:
So the issue is found (see the answer below), and the fix can be applied locally on the instance, after which Ganglia works fine. However the question is how to fix this issue in the root, so that spark-ec2 script can start Ganglia normally without intervention.
The fact that ganglia is not available is related to these errors - ganglia is php application and it won't run without php module for apache.
Which version of spark you are using to start cluster?
It is wierd error - these file should be present in AMI image.
Just traced the error: /etc/httpd/conf/httpd.conf is trying to load libphp-5.5 library while modules/ contains libphp-5.6 version...
Changing httpd.conf fixes the issue, however I'd be good to know a permanent fix within spark-ec2 script
This is because httpd fails to launch. As you have noted httpd.conf is trying to load modules and failing. You can reproduce the problem via apachectl start and examine exactly what modules are failing to load.
In my case there was one involving "auth" and "core". The last four (maybe five) listed will also fail to load. I did not encounter anything related to PHP so maybe our cases our different. Anyway the hacky solution is to comment out the problems. I did so and am running Ganglia without issue.
I have installed hadoop 2.2.0 & hbase-0.94.18 on ubuntu 12.04. When I try to run the command
create 't1','c1'
in hbase shell, I get the following error-
ERROR client.HConnectionManager$HConnectionImplementation:
Check the value configured in 'zookeeper.znode.parent'.
There could be a mismatch with the one configured in the master.
What's wrong?
A few things in no particular order:
To start with, let the error display continue. It will try 7 times and then exit. Before it exits, it will show the name of exception occurring. Try to look it up. It probably says MasterNotRunningException.
Verify that master is indeed running by doing $sudo jps. You should see an entry for HMaster. If not, start the hbase-master service.
Assuming you're going for pseudo-distributed mode, you may also want to check your /etc/hosts to make sure that entries point to 127.0.0.1 and not 127.0.1.1.
For cloudera's installs, here is a guide on how to setup HBase in pseudo-distributed mode. It also includes instructions to install hbase-master and zookeeper correctly.
Maybe you should check the file hbase-site.xml about zookeeper.znode.parent whether it's right. its default value is /hbase
Mine was set by default to /hbase-unsecure (hbase-site.xml)
I am trying to setup the multinode cluster of Hbase. When i do the jps on slave i get
5780 Jps
5558 HQuorumPeer
5684 HRegionServer
1963 DataNode
2093 TaskTracker
similarly on master i get
4254 SecondaryNameNode
15226 Jps
14982 HMaster
3907 NameNode
14921 HQuorumPeer
4340 JobTracker
EVerything is runnnig properly. But when i try to create table on hbase shell. It gives an error
ERROR: org.apache.hadoop.hbase.PleaseHoldException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
regionserver log of my slave(where region server is running):
2013-06-11 13:09:53,119 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at localhost,60000,137093$
2013-06-11 13:10:53,190 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:60000
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
at $Proxy8.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2037)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2083)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:744)
at java.lang.Thread.run(Thread.java:722)
2013-06-11 13:10:53,391 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at localhost,60000,137093$
FYI, i have also took care of /etc/hosts file on both master and slave.
127.0.0.1 localhost
127.0.0.1 naresh-PC
I again did changes in /etc/hosts file 127.0.1.1 to naresh-PC. But still getting this error
2013-06-11 14:51:17,781 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at naresh-pc,60000,137094$
2013-06-11 14:52:17,817 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
java.net.UnknownHostException: unknown host: naresh-pc
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.<init>(HBaseClient.java:276)
at org.apache.hadoop.hbase.ipc.HBaseClient.createConnection(HBaseClient.java:255)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1111)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
at $Proxy8.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2037)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2083)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:744)
at java.lang.Thread.run(Thread.java:722)
Try clearing all the states in Zookeeper.
Stop Zookeeper
Wipe the Zookeeper data directory
Start Zookeeper
I was getting the same issue and followed this approach and it worked fine.
You need to change the configuration on the slave node to point at the master. It is currently pointing to localhost and not connecting to the actual master:
"org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This
server is in the failed servers list: localhost/127.0.0.1:60000 at "
I'm hosting my own cluster inside Docker. Here's what worked in my case. I grepped the HBase log file for errors and found "Master passed us a different hostname to use"
`[root#docker-iop bin]# grep ERROR /var/log/hbase/hbase-hbase-regionserver-bi-mgmt01.local.log
2016-10-06 00:05:29,816 ERROR [regionserver/bi-mgmt01.local/111.11.2.3:16020] regionserver.HRegionServer: Master passed us a different hostname to use; was=my-host-name, but now=111.22.33.444'
I mapped my-host-name to 111.22.333.444 in my hosts file, restarted HBase and it worked.
I also had the same issue with a fully distributed hbase cluster with the configuration below.
Master Node (Node-A)
Backup Masters ($HBASE_HOME/conf/backup-masters) (Node-B & Node-C)
3 Replication servers (Node-A, Node-B & Node-C)
RCA:
The backup-masters nodes were attempted to be started when the cluster started.
Solution
I removed the backup masters by making $HBASE_HOME/conf/backup-masters empty in all hbase nodes.
So I had a cluster running without backup masters.
I wonder if the master node and master nodes must not also function as regionservers? The HBase documentation says otherwise though.
I came across the same issue and could not find anything, it turns out I was copy pasting from the Hbase documentation (https://hbase.apache.org/book.html#shell_exercises). I believe some character in there may be creating the error, so try to manually enter:
create 'test', 'cf'
We resolved this issue. Solution is to
stop Hbase
log to zookeeper-client as root
execute command rmr /hbase-unsecure/meta-region-server
start Hbase
We stop/start Hbase through Ambari UI, delete /hbase... through server bash shell.
[root#s1 ~]# zookeeper-client
Connecting to localhost:2181
.......
[zk: localhost:2181(CONNECTED) 0] rmr /hbase-unsecure/meta-region-server
I use docker/docker-compose to set up my distributed hbase, after I make changes, I can not create table in hbase shell.
I docker rm all the related images, and rebuild them. It works. Also, simply rebuilding the images doesn't work...