ZooKeeper exists failed after 3 retries - hadoop

I am running Hadoop-1.2.1 and HBase-0.94.11 in a pseudo-distributed mode.
Due to power failure Hadoop and HBase set up went down.Next time when I restarted my machine and the pseudo-distribution set up, HBase stopped working with the following errors on HBase shell:
13/11/27 13:53:27 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
13/11/27 13:53:27 WARN zookeeper.ZKUtil: hconnection Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:450)
at org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
at org.apache.hadoop.hbase.zookeeper.ClusterId.getId(ClusterId.java:50)
at org.apache.hadoop.hbase.zookeeper.ClusterId.hasId(ClusterId.java:44)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.ensureZookeeperTrackers(HConnectionManager.java:720)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:789)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:129)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
Following are the processes :
hduser#user-ubuntu:~$ jps
16914 NameNode
19955 Jps
29460 Main
17728 TaskTracker
19776 HMaster
17490 JobTracker
17392 SecondaryNameNode

Are you sure your Zookeeper process is running (your jps listing doesn't show an entry for QuorumPeerMain)? The jps stack may not show all java processes running - try using a ps axww | grep QuorumPeerMain.
If your zookeeper refuses to start, check its logs to see if there are some stack trace clues

It's straightforward the zookeeper quorum process is not running - if it has been, there'd have been another java process:
hduser#user-ubuntu:~$ jps
16914 NameNode
19955 Jps
29460 Main
17728 TaskTracker
19776 HMaster
17490 JobTracker
17392 SecondaryNameNode
xxxxx HQuorumPeer
Zookeeper is required for HBase cluster - as it manages it.
Possible solutions:
By default HBase manages zookeeper itself i.e. starting and stopping the zookeeper quorum (the cluster of zookeeper nodes) - to verify the settings look into the file conf/hbase-evn.sh (in your hbase directory) there must be a line:
export HBASE_MANAGES_ZK=true
Basically tells HBase whether it should manage its own instance of Zookeeper or not. In case it is set to false, edit to true.
Also verify the HBase conf at conf/hbase-site.xml,
The minimum conf that should work for pseudo-distributed mode is:
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/<yourusername>/zookeeper</value>
</property>
</configuration>
Now stop the HBase, if it's been running:
$ ./bin/stop-hbase.sh
make the neccessary changes and start it again:
$ ./bin/start-hbase.sh
Answers you may find helpful:1 2

Related

hbase master not starting

I am running HBase on Hadoop in standalone mode. I have successfully installed hadoop,zookeeper and hbase but in hbase master is not starting. Below is my hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/kumar/hdata/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>
I have started hadoop and zookeeper services:
start-all.sh
zkServer.sh start
start-hbase.sh
and the processes I am getting in Jps command
2133 DataNode
1974 NameNode
2679 NodeManager
2365 SecondaryNameNode
3917 QuorumPeerMain
2527 ResourceManager
3935 Jps
Hbase shell is starting succcesfully, but when i run any command in shell like 'list' I am getting below error:
ERROR: KeeperErrorCode = NoNode for /hbase/master
After that I try to run master with below command
hbase master start
and I am getting below error:
2018-10-15 18:51:51,380 ERROR [main] server.ZooKeeperServer: ZKShutdownHandler is not registered, so ZooKeeper server won't take any action on ERROR or SHUTDOWN server state changes
2018-10-15 18:51:51,437 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182] server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:34034
2018-10-15 18:51:51,479 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182] server.ServerCnxn: The list of known four letter word commands is : [{1936881266=srvr, 1937006964=stat, 2003003491=wchc, 1685417328=dump, 1668445044=crst, 1936880500=srst, 1701738089=envi, 1668247142=conf, 2003003507=wchs, 2003003504=wchp, 1668247155=cons, 1835955314=mntr, 1769173615=isro, 1920298859=ruok, 1735683435=gtmk, 1937010027=stmk}]
2018-10-15 18:51:51,479 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182] server.ServerCnxn: The list of enabled four letter word commands is : [[wchs, stat, stmk, conf, ruok, mntr, srvr, envi, srst, isro, dump, gtmk, crst, cons]]
2018-10-15 18:51:51,479 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182] server.NIOServerCnxn: Processing stat command from /127.0.0.1:34034
2018-10-15 18:51:51,485 INFO [Thread-2] server.NIOServerCnxn: Stat command output
2018-10-15 18:51:51,491 INFO [main] zookeeper.MiniZooKeeperCluster: Started MiniZooKeeperCluster and ran successful 'stat' on client port=2182
Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
2018-10-15 18:51:51,497 ERROR [main] master.HMasterCommandLine: Master exiting
java.io.IOException: Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:217)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2983)
2018-10-15 18:51:51,500 INFO [Thread-2] server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:34034 (no session established for client)
Also I am not getting any response for local hbase web URL
localhost:60010
HBase has a built-in instance of Zookeeper for just development environments and by default when you start HBase by the command start-hbase.sh it start the Zookeeper daemon, too. The error is because of you already start a standalone Zookeeper that uses the port 2181. When you start HBase it tries to start it's built-in zookeeper in port 2181, too and it got the error!
If you want to use standalone Zookeeper component, first edit the file hbase-env.sh and add the line: export HBASE_MANAGES_ZK=false (You also can search for HBASE_MANAGES_ZK variable in the file and set it to false). So now when you start the HBase, it just starts HBase Daemon and not Zookeeper anymore. Remember you should start the Zookeeper daemon before HBase.
I solved this by adding this in hbase-site.xml:
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
in my case I was working on hbase 2.2.0 in standalone

HBase fails to start in single node cluster mode on Mac OSX

I am trying to get a personal HBase development environment set up. I have hdfs and yarn running, but cannot get HBase to start.
I have started up hadoop 2.7.1, by running start-dfs.sh and start-yarn.sh. I have verified these are running by testing hdfs dfs -mkdir /test and running a sample MR job bundled in the examples, I have browsed HDFS at port 50070.
I have started zookeeper 3.4.6 on port 2181 and set its dataDir. My zoo.cfg has:
dataDir=/Users/.../tools/hd/zookeeper_data
clientPort=2181
I observe its zookeeper_server.PID file in the dataDir I chose, and when I run jps I see the below:
51074 NodeManager
50743 DataNode
50983 ResourceManager
50856 SecondaryNameNode
57848 QuorumPeerMain
58731 Jps
50653 NameNode
QuorumPeerMain above matches the PID in zookeeper_server.PID, as I would expect. Is this expectation correct? From what I have done so far, should it be expected that any more processes should be showing here?
I installed hbase-1.1.2. I configure hbase-site.xml. I set the hbase.rootDir to be hdfs://localhost:8200/hbase, my hdfs is running at localhost:8200. I set hbase.zookeeper.property.dataDir to my zookeeper's dataDir, with the expectation that it will use this property to find the PID of a running zookeeper. Is this expectation correct or have I misunderstood? The config in hbase-site.xml is:
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>Users/.../tools/hd/zookeeper_data</value>
</property>
When I run start-hbase.sh my server fails to start. I see this log message:
2015-09-26 19:32:43,617 ERROR [main] master.HMasterCommandLine: Master exiting
To investigate I ran hbase master start and get more detail:
2015-09-26 19:41:26,403 INFO [Thread-1] server.NIOServerCnxn: Stat command output
2015-09-26 19:41:26,405 INFO [Thread-1] server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:63334 (no session established for client)
2015-09-26 19:41:26,406 INFO [main] zookeeper.MiniZooKeeperCluster: Started MiniZooKeeperCluster and ran successful 'stat' on client port=2182
Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
2015-09-26 19:41:26,406 ERROR [main] master.HMasterCommandLine: Master exiting
java.io.IOException: Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:214)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2304)
So I have a few questions:
Should I be trying to set up a zookeeper before running HBase?
Why when I have started a zookeeper and told HBase where its dataDir is, does HBase try to start its own zookeeper?
Anything obviously stupid/misguided in the above?
The script you are using to start hbase start-hbase.sh will try to start the following components, in order:
zookeeper
hbase master
hbase regionserver
hbase master-backup
So, you could either stop the zookeeper which is started by you (or) you could start the daemons individually yourself:
# start hbase master
bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master
# start region server
bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} --hosts ${HBASE_CONF_DIR}/regionservers start regionserver
HBase stand alone starts it's own zookeeper (if you run start-hbase.sh), but it if fails to start or keep running, the other need hbase daemons won't work.
Make sure you explicitly set the properties for your interface lo0 in the hbase-site.xml file:
<property>
<name>hbase.zookeeper.dns.interface</name>
<value>lo0</value>
</property>
<property>
<name>hbase.regionserver.dns.interface</name>
<value>lo0</value>
</property>
<property>
<name>hbase.master.dns.interface</name>
<value>lo0</value>
</property>
I found that when my wifi was on, if these entries were missing, zookeeper filed to start.

hadoop Protocol message tag had invalid wire type

I Set up hadoop 2.6 cluster using two nodes of 8 cores each on Ubuntu 12.04. sbin/start-dfs.sh and sbin/start-yarn.sh both succeed. And I can see the following after jps on the master node.
22437 DataNode
22988 ResourceManager
24668 Jps
22748 SecondaryNameNode
23244 NodeManager
The jps outcome on the slave node is
19693 DataNode
19966 NodeManager
I then run the PI example.
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 30 100
Which gives me there error-log
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "Master-R5-Node/xxx.ww.y.zz"; destination host is: "Master-R5-Node":54310;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
The problem seems with the HDFS file system since trying out the command bin/hdfs dfs -mkdir /user fails with the similar exception.
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "Master-R5-Node/xxx.ww.y.zz"; destination host is: "Master-R5-Node":54310;
where xxx.ww.y.zz is the ip-address of Master-R5-Node
I have checked and followed all the recommendations of ConnectionRefused on Apache and on this site.
Despite the week long effort, I cannot get it fixed.
Thanks.
There are so many reasons to what may lead to the problem I faced. But I finally ended up fixing it using some of the following things.
Make sure that you have the needed permission to the /hadoop and hdfs temporary files. (you have to figure out where that is for your paticular case)
remove the port number from fs.defaultFS in $HADOOP_CONF_DIR/core-site.xml. It should look like this:
`<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://my.master.ip.address/</value>
<description>NameNode URI</description>
</property>
</configuration>`
Add the following two properties to `$HADOOP_CONF_DIR/hdfs-site.xml
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
Voila! You should now be up and running!

failed on connection exception: java.net.ConnectException: Connection refused in hive

When I am running the following query in hive:
hive> select count(*) from testsql;
I am getting the following error:
Error
FAILED: RuntimeException java.net.ConnectException: Call From impetus-1466/192.168.49.77 to impetus-1466:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
The jps looks like:
[impadmin#impetus-1466 hadoop-1.0.3.15]$ jps
26380 TaskTracker
26709 Jps
26230 JobTracker
25943 NameNode
I started the
$ start-all.sh
$ start-dfs.sh
$ start-mapred.sh
How could this be solved?
Thanks
If you can open the http://localhost:8088/cluster but can't open http://localhost:50070/. Maybe datanode didn't start-up or namenode didn't formated.
And check hadoop.tmp.dir in core-site.xml, if it is not set, the default directory of it is /tmp, so set hadoop.tmp.dir in core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/path/to/hadoop/tmp</value>
</property>
Then stop hadoop and reformat hdfs namenode -format, then restart the hadoop.
Similar question http://localhost:50070 does not work HADOOP
The reason for this is, either there are no datanodes in your cluster or the datanodes do not know their namenode. This might be the result of namenode format at least twice. The cluster id of namenode got changed but this change was not reflected to the datanodes.
The below links might be helpful:
Datanode not starts correctly
http://hortonworks.com/community/forums/topic/clusterid-mismatch-for-namenode-and-datanodes-in-fully-distributed-cluster/

How to setup titan over hbase in a single node hadoop

I have a single node hadoop and have installed hbase also on my ubuntu 12.04. Now i want to install titan over hbase. I have setup hadoop-1.0.3 and hbase-0.94.18 and titan/hbase-0.4.2
I have added a user mnit.My /usr/local/ folder contains hadoop2 , hbase2, titan2 .First i start my hadoop using command bin/start-all.sh and then i start hbase using command bin/start-hbase.sh . after it when i do jps i found the following :
mnit#aman:/usr/local$ jps
9921 DataNode
11386 HRegionServer
11041 HQuorumPeer
11537 Jps
11115 HMaster
10153 SecondaryNameNode
10252 JobTracker
9691 NameNode
10483 TaskTracker
now i start gremlin.sh in titan2 using command bin/gremlin.sh .
i applied the following commands
mnit#aman:/usr/local/titan2$ bin/gremlin.sh
gremlin> conf = new BaseConfiguration();
==>org.apache.commons.configuration.BaseConfiguration#19288c2
gremlin> conf.setProperty("storage.backend","hbase");
==>null
gremlin> conf.setProperty("storage.hostname","127.0.0.1");
==>null
gremlin> g = TitanFactory.open(conf);
WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
When i searched over this problem i found that there is a file named pom.xml but the titan that i have downloaded does not contain pom.xml. please tell me if this is a problem due to pom.xml. or i am doing something wrong or there is some other issue.
Thanks in advance
zk is managed by hbase in my system. i have added the following line in bin/hbase-env.sh
export HBASE_MANAGES_ZK=true
the content of my hbase-site.xml is as follows :
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:54310/user/hbase</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.zookeeper.property.datadir</name>
<value>/app/hadoop/tmp/zookeeper</value>
</property>
</configuration>
Your Titan and HBase configurations appear to be inconsistent. Your hbase-site.xml overrides the default ZK port (2181) to 2222, but it seems you haven't told Titan to use this non-default ZK port by setting storage.port in your Titan config file. Naturally, they can't talk to each other in that state. This doesn't have anything to do with pom.xml.
By the way, please don't simultaneously crosspost to SO and the aureliusgraphs Google Group. They're both good venues with slightly different purposes, but you seem to have just copy-pasted between this SO question and your subjectless thread on the aureliusgraphs list.

Resources