HBase master is immediately getting down - hadoop

My hadoop servers are got down because of disk space issue,then we increased disk space then HDFS,zookeeper,kafka started working but HBase is not working.
It is throwing below exception while restarting Hbase from Ambari.
org.apache.hadoop.hbase.util.FileSystemVersionException: HBase file layout needs to be upgraded. You have version null and I want version 8. Consult http://hbase.apache.org/book.html for further information about upgrading HBase. Is your hbase.rootdir valid? If so, you may need to run 'hbase hbck -fixVersionFile'.
Based on the suggestion I ran the command hbase hbck -fixVersionFile as a hbase user, then I am getting error like this:
2019-12-10 19:04:59,535 INFO [ReadOnlyZKClient-slave01.testiot.cloud:2181,slave02.testiot.cloud:2181,slave03.testiot.cloud:2181#0x619bfe29-SendThread(slave02.testiot.cloud:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.1.0.5:39250, server: slave02.testiot.cloud/10.1.0.7:2181
2019-12-10 19:04:59,560 INFO [ReadOnlyZKClient-slave01.testiot.cloud:2181,slave02.testiot.cloud:2181,slave03.testiot.cloud:2181#0x619bfe29-SendThread(slave02.testiot.cloud:2181)] zookeeper.ClientCnxn: Session establishment complete on server slave02.testiot.cloud/10.1.0.7:2181, sessionid = 0x26ef0e604f530e3, negotiated timeout = 60000
2019-12-10 19:05:03,908 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=6, retries=36, started=4163 ms ago, cancelled=false, msg=java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master, details=
2019-12-10 19:05:07,945 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=7, retries=36, started=8200 ms ago, cancelled=false, msg=java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master, details=
2019-12-10 19:05:17,964 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=8, retries=36, started=18219 ms ago, cancelled=false, msg=java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master, details=
2019-12-10 19:05:28,024 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=9, retries=36, started=28279 ms ago, cancelled=false, msg=java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master, details=
I am running a cluster of three nodes. When I checked hbase.root.dir hbase.version file was not there.
HbaseVersion -2.0.2
zookeeper - 3.4.6

I included the hbase version file from another server. Now its working fine.

Related

Zookeeper or Kafka connection Error, Showing Kazoo::VersionNotSupported Error

I am using Kafka and zookeeper, and creating connection between them but the connection is getting dropped again and again when I try to create new Kafka::Consumer
ZOOKEEPER = '127.0.0.1:2181'
CLIENT_ID = '************'
TOPICS = ['*****']
#consumer = Kafka::Consumer.new(CLIENT_ID, TOPICS, zookeeper: ZOOKEEPER, logger: nil)
I also checked the zookeeper and kafka log file and got that my kafka to zookeeper connection is dropped when I try to create new Kafka::Consumer
Kafka Log:
...
[2016-03-04 16:14:47,553] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2016-03-04 16:16:11,419] INFO Unable to read additional data from server sessionid 0x1533ff65f850003, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2016-03-04 16:16:11,520] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)
[2016-03-04 16:16:13,128] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2016-03-04 16:16:13,129] WARN Session 0x1533ff65f850003 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
...
Zookeeper Log:
...
2016-04-04 10:30:30,577 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#839] - Client attempting to establish new session at /127.0.0.1:51152
2016-04-04 10:30:30,579 - INFO [SyncThread:0:FileTxnLog#199] - Creating new log file: log.725
2016-04-04 10:30:30,668 - INFO [SyncThread:0:ZooKeeperServer#595] - Established session 0x153df9fc2a70000 with negotiated timeout 6000 for client /127.0.0.1:51152
2016-04-04 10:30:31,714 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor#627] - Got user-level KeeperException when processing sessionid:0x153df9fc2a70000 type:delete cxid:0x26 zxid:0x728 txntype:-1 reqpath:n/a Error Path:/admin/preferred_replica_election Error:KeeperErrorCode = NoNode for /admin/preferred_replica_election
2016-04-04 10:30:31,883 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor#627] - Got user-level KeeperException when processing sessionid:0x153df9fc2a70000 type:create cxid:0x2d zxid:0x729 txntype:-1 reqpath:n/a Error Path:/brokers Error:KeeperErrorCode = NodeExists for /brokers
...
Installed Gems
Using ione 1.2.3
Using json 1.8.3
Using thor 0.19.1
Using zookeeper 1.4.11
Using poseidon 0.0.5
Using bundler 1.11.2
Using cassandra-driver 2.1.5
Using kazoo-ruby 0.4.0
Using kafka-consumer 0.1.2
I am exactly not able to get where is the version problem
Getting error:
~/../kazoo-ruby-0.4.0/lib/kazoo/broker.rb:83:in `from_json': Kazoo::VersionNotSupported
~/../kazoo-ruby-0.4.0/lib/kazoo/cluster.rb:38:in `block (3 levels) in brokers'
Got the solution, I was using Kafka version 0.9.0.1 or 0.8.0 Beta which is creating some version issue. Now, I downloaded and installed Kafka 0.8.2.2 with scala version 2.11 Release which is working for me.

Information Issues while connecting Hbase remotely

This is the output file of running the java program remotely.
> Opening socket connection to server, Will not attempt to authenticate using SASL error, I'm facing this error, when i'm connecting remotely only.
15/10/01 17:00:21 INFO zookeeper.ClientCnxn: Opening socket connection to server quickstart.cloudera/192.168.0.106:2181. Will not attempt to authenticate using SASL (unknown error)
15/10/01 17:00:21 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.0.105:63654, server: quickstart.cloudera/192.168.0.106:2181
15/10/01 17:00:21 INFO zookeeper.ClientCnxn: Session establishment complete on server quickstart.cloudera/192.168.0.106:2181, sessionid = 0x150220e67060034, negotiated timeout = 60000
15/10/01 17:00:21 WARN util.DynamicClassLoader: Failed to identify the fs of dir hdfs://quickstart.cloudera:8020/hbase/lib, ignored
java.io.IOException: No FileSystem for scheme: hdfs
This is the output file of running the java program locally.
15/10/01 13:22:36 INFO zookeeper.ClientCnxn: Opening socket connection to server /127.0.0.1:2181
15/10/01 13:22:36 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
15/10/01 13:22:36 INFO zookeeper.ClientCnxn: Socket connection established to quickstart.cloudera/127.0.0.1:2181, initiating session
15/10/01 13:22:36 INFO zookeeper.ClientCnxn: Session establishment complete on server quickstart.cloudera/127.0.0.1:2181, sessionid = 0x150220e67060028, negotiated timeout = 60000
15/10/01 13:22:36 WARN util.DynamicClassLoader: Failed to identify the fs of dir hdfs://quickstart.cloudera:8020/hbase/lib, ignored
java.io.IOException: No FileSystem for scheme: hdfs
My questions are:
Am I not able to create a new table in hbase remotely?
Is there any issue from output file related to my question?
Yes, you are able to create new tables and perform any other operation with
HBase remotely.
Looks like your error means that you don't add Hadoop or HBase libraries
properly to your project. As a result your program couldn't find classes of HDFS file system.
Do you add jars to your project directly or use Maven/Gradle? What dependencies your project have in addition to HBase?

Squirrel Client Connecting to Phoenix - Timeout Exception

I am trying to connect to Phoenix via Squirrel client. I am receiving the following logs in the Squirrel logs. The logs suggests that the ClientConnection to zooperkeeper is established however it fails when a SQLClient Connection is being established with a Timeout exception.
I have copied the phoenix client jar into the lib directory of Squirrel and the driver is registered succesfully. Also when I run the SQLLine.py utility in the localhost it loads the SQL commandline to Phoenix succesfully and I can run the commands. Added the phoenix core jars to the $HBASE_HOME/lib folders as well.
2015-06-15 12:48:53,766 [pool-7-thread-1] INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x776a1002 connecting to ZooKeeper ensemble=10.58.126.245:2181
2015-06-15 12:48:53,766 [pool-7-thread-1] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=10.58.126.245:2181 sessionTimeout=90000 watcher=hconnection-0x776a10020x0, quorum=10.58.126.245:2181, baseZNode=/hbase
2015-06-15 12:48:58,287 [pool-7-thread-1-SendThread(10.58.126.245:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.58.126.245/10.58.126.245:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-15 12:48:58,301 [pool-7-thread-1-SendThread(10.58.126.245:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.58.126.245/10.58.126.245:2181, initiating session
2015-06-15 12:48:58,314 [pool-7-thread-1-SendThread(10.58.126.245:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server 10.58.126.245/10.58.126.245:2181, sessionid = 0x14df5b87b120040, negotiated timeout = 90000
2015-06-15 12:49:58,100 [pool-7-thread-1] INFO org.apache.hadoop.hbase.client.RpcRetryingCaller - Call exception, tries=10, retries=35, started=59774 ms ago, cancelled=false, msg=
2015-06-15 12:50:20,456 [pool-7-thread-1] INFO org.apache.hadoop.hbase.client.RpcRetryingCaller - Call exception, tries=11, retries=35, started=82130 ms ago, cancelled=false, msg=
2015-06-15 12:50:36,114 [AWT-EventQueue-1] ERROR net.sourceforge.squirrel_sql.client.gui.db.ConnectToAliasCallBack - Unexpected Error occurred attempting to open an SQL connection.
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:201)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.awaitConnection(OpenConnectionCommand.java:132)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.access$100(OpenConnectionCommand.java:45)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand$2.run(OpenConnectionCommand.java:115)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
I have the same problem, don't find the solution yet, but I managed to use the "thin" client instead.
Start queryserver https://phoenix.apache.org/server.html should listen on port 8765
Copy JAR phoenix-4.6.0-HBase-1.1-thin-client to Squirel lib folder
Create new driver, the class name is "org.apache.phoenix.queryserver.client.Driver"
Connect with this driver (my URI: jdbc:phoenix:thin:url=http://docker:8765)

Hbase managed zookeeper suddenly trying to connect to localhost instead of zookeeper quorum

I was running some tests with table mappers and reducers on large scale problems. After a certain point my reducers started failing when the job was 80% done. From what I can tell when looking at the syslogs the problem is that one of my zookeepers is attempting to connect to the localhost as opposed to the other zookeepers in the quorum
Oddly it seems to do just fine connecting to the other nodes when mapping is going on, its reducing that it has a problem with. Here are selected portions of the syslog which might be relevant to figuring out whats going on
2014-06-27 09:44:01,599 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hdev02:5181,hdev01:5181,hdev03:5181 sessionTimeout=10000 watcher=hconnection-0x4aee260b, quorum=hdev02:5181,hdev01:5181,hdev03:5181, baseZNode=/hbase
2014-06-27 09:44:01,612 INFO [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4aee260b connecting to ZooKeeper ensemble=hdev02:5181,hdev01:5181,hdev03:5181
2014-06-27 09:44:01,614 INFO [main-SendThread(hdev02:5181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server hdev02/172.17.43.36:5181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2014-06-27 09:44:01,615 INFO [main-SendThread(hdev02:5181)] org.apache.zookeeper.ClientCnxn: Socket connection established to hdev02/172.17.43.36:5181, initiating session
2014-06-27 09:44:01,617 INFO [main-SendThread(hdev02:5181)] org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2014-06-27 09:44:01,723 WARN [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=hdev02:5181,hdev01:5181,hdev03:5181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-06-27 09:44:01,723 INFO [main] org.apache.hadoop.hbase.util.RetryCounter: Sleeping
***
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 1 on-disk map-outputs
2014-06-27 09:55:12,012 INFO [main] org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
2014-06-27 09:55:12,013 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 33206049 bytes
2014-06-27 09:55:12,208 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merged 1 segments, 33206079 bytes to disk to satisfy reduce memory limit
2014-06-27 09:55:12,209 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 2 files, 265119413 bytes from disk
2014-06-27 09:55:12,209 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2014-06-27 09:55:12,210 INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
2014-06-27 09:55:12,212 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 265119345 bytes
2014-06-27 09:55:12,279 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x65afdbbb, quorum=localhost:2181, baseZNode=/hbase
2014-06-27 09:55:12,281 INFO [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x65afdbbb connecting to ZooKeeper ensemble=localhost:2181
2014-06-27 09:55:12,282 INFO [main-SendThread(localhost.localdomain:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2014-06-27 09:55:12,283 WARN [main-SendThread(localhost.localdomain:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2014-06-27 09:55:12,384 WARN [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-06-27 09:55:12,384 INFO [main] org.apache.hadoop.hbase.util.RetryCounter: Sleeping 1000ms before retry #0...
2014-06-27 09:55:13,385 INFO [main-SendThread(localhost.localdomain:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2014-06-27 09:55:13,385 WARN [main-SendThread(localhost.localdomain:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing
***
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-06-27 09:55:13,486 ERROR [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 1 attempts
2014-06-27 09:55:13,486 WARN [main] org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x65afdbbb, quorum=localhost:2181, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
I'm pretty sure its configured correctly, here is the relevant portion of my hbase-site.xml.
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>5181</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>10000</value>
<description></description>
</property>
<property>
<name>hbase.client.retries.number</name>
<value>10</value>
<description></description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hdev01,hdev02,hdev03</value>
<description></description>
</property>
So far as I can tell hdev03 is the only server that has any problem with this. Netstating all relevant ports doesn't show me anything strange.
I've had same problem when running HBase through Spark on Yarn. Everything was fine until suddenly it started to trying to connect to localhost instead of quorum. Setting port and quorum programmatically before HBase call fixed the issue
conf.set("hbase.zookeeper.quorum","my.server")
conf.set("hbase.zookeeper.property.clientPort","5181")
I'm using MapR, and it has "unusual" (5181) zookeeper port
Hard to say what is happening with the information, given. I have found the Hadoop Stack (HBase especially) to be quite hostile to even the slightest bit of misconfiguration in DNS or the hosts file.
As the quorum in your hbase-site.xml looks good, I'd start checking with network/hostname resolution related configurations:
Has the nodename slipped into the localhost entry in /etc/hosts on hdev03?
Is there an entry for the host itself in hdev03s /etc/hosts (there should)?
Has Reverse DNS been correctly configured in case you are using DNS for name resolution instead of the hosts file?
These are just a few pointers in the direction I'd look with this kind of issue. Hope it helps!
Add '--driver-class-path ~/hbase-1.1.2/conf' into spark-submit command, so that the task can find the configured zookeeper servers instead of 127.0.0.1.

FATAL master.HMaster: Unexpected state : .. Cannot transit it to OFFLINE

I've got a serious Hbase crash problem. I'm using HBase 0.94.7 with one master and two region servers. The HBase master crashed regularly, I can't even get it restarted. I've got the master logs as following:
DEBUG master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=master,60020,1374506461230, region=46c2333f401964bf877254be19c2cc8c
DEBUG handler.ClosedRegionHandler: Handling CLOSED event for 6423df864603aa6e8c45c726ab3ae62f
DEBUG master.AssignmentManager: Forcing OFFLINE; was=LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF8\xB3\x8F\x17\xCE\xE2g\x84,1374498065657.6423df864603aa6e8c45c726ab3ae62f. state=CLOSED, ts=1374508769672, server=slave,60020,1374506460892
DEBUG zookeeper.ZKAssign: master:60000-0x14006f52f3f000e Creating (or updating) unassigned node for 6423df864603aa6e8c45c726ab3ae62f with OFFLINE state
FATAL master.HMaster: Unexpected state : LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p&c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d. state=PENDING_OPEN, ts=1374508769697, server=master,60020,1374506461230 .. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p&c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d. state=PENDING_OPEN, ts=1374508769697, server=master,60020,1374506461230 .. Cannot transit it to OFFLINE.
at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
INFO master.HMaster: Aborting
DEBUG handler.ClosedRegionHandler: Handling CLOSED event for 0710b486dcb3d51465695b51db376255
....
DEBUG master.AssignmentManager: The znode of region LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p&c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d. has been deleted.
INFO master.AssignmentManager: The master has opened the region LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p&c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d. that was online on master,60020,1374506461230
DEBUG master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=master,60000,1374508461536, region=c9cfdd360c09b292412ba5ad88815e6f
DEBUG catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker#5c061cd2
INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x14006f52f3f000f
INFO zookeeper.ZooKeeper: Session: 0x14006f52f3f000f closed
INFO zookeeper.ClientCnxn: EventThread shut down
INFO master.AssignmentManager$TimerUpdater: master,60000,1374508461536.timerUpdater exiting
INFO master.SplitLogManager$TimeoutMonitor: master,60000,1374508461536.splitLogManagerTimeoutMonitor exiting
INFO master.AssignmentManager$TimeoutMonitor: master,60000,1374508461536.timeoutMonitor exiting
INFO zookeeper.ZooKeeper: Session: 0x14006f52f3f000e closed
INFO zookeeper.ClientCnxn: EventThread shut down
INFO master.HMaster: HMaster main thread exiting
ERROR master.HMasterCommandLine: Failed to start master
I also found something unusual in the ZK log:
INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /master:37856
INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /master:37856
INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140100dda0300e1 with negotiated timeout 180000 for client /master:37856
WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x140100dda0300e1, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:662)
INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /master:37856 which had sessionid 0x140100dda0300e1
Can anybody help to see what the problem is? Is it related to the unassigned region or something like this? I've tried the bin/hbase hbck -repair and bin/hbase hbck -fix, but it doesn't help.
Thanks
After checked the log of my region server very carefully, I got the answer.
Cause
It turns out that there is one library called 'SNAPPY' for the compression of the hbase table is not well installed on the region server. And all my tables are created using this compression algorithm. When the master tries to balance the region to the region server, it failed. Eventually the master aborted.
Solution
Install and configure the SNAPPY on EVERY NODE as following:
apt-get install libsnappy1
su hbase
mkdir /home/hbase/hbase-0.94.7/lib/native/Linux-amd64-64
ln -s /usr/lib/libsnappy.so.1.1.2 /home/hbase/hbase-0.94.7/lib/native/Linux-amd64-64/libsnappy.so
exit (-> root)
ln -s /usr/lib/libsnappy.so.1.1.2 /usr/lib64/libsnappy.so.1.1.2
ln -s /usr/lib/libsnappy.so.1.1.2 /usr/lib64/libsnappy.so.1
ln -s /usr/lib/libsnappy.so.1.1.2 /usr/lib64/libsnappy.so
ln -s /usr/lib/libsnappy.so.1 /usr/lib/libsnappy.so
Now everything is OK! The regions are well balanced over region servers.
Check the region server log, if it is caused by LZO compressor missing and you are using Cloudera Hadoop,you can install lzo easily according to the following instruction:
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/v1/v1-0-1/Installing-and-Using-Impala/ciiu_lzo.html

Resources