Hbase managed zookeeper suddenly trying to connect to localhost instead of zookeeper quorum - hadoop

I was running some tests with table mappers and reducers on large scale problems. After a certain point my reducers started failing when the job was 80% done. From what I can tell when looking at the syslogs the problem is that one of my zookeepers is attempting to connect to the localhost as opposed to the other zookeepers in the quorum
Oddly it seems to do just fine connecting to the other nodes when mapping is going on, its reducing that it has a problem with. Here are selected portions of the syslog which might be relevant to figuring out whats going on
2014-06-27 09:44:01,599 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hdev02:5181,hdev01:5181,hdev03:5181 sessionTimeout=10000 watcher=hconnection-0x4aee260b, quorum=hdev02:5181,hdev01:5181,hdev03:5181, baseZNode=/hbase
2014-06-27 09:44:01,612 INFO [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4aee260b connecting to ZooKeeper ensemble=hdev02:5181,hdev01:5181,hdev03:5181
2014-06-27 09:44:01,614 INFO [main-SendThread(hdev02:5181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server hdev02/172.17.43.36:5181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2014-06-27 09:44:01,615 INFO [main-SendThread(hdev02:5181)] org.apache.zookeeper.ClientCnxn: Socket connection established to hdev02/172.17.43.36:5181, initiating session
2014-06-27 09:44:01,617 INFO [main-SendThread(hdev02:5181)] org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2014-06-27 09:44:01,723 WARN [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=hdev02:5181,hdev01:5181,hdev03:5181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-06-27 09:44:01,723 INFO [main] org.apache.hadoop.hbase.util.RetryCounter: Sleeping
***
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 1 on-disk map-outputs
2014-06-27 09:55:12,012 INFO [main] org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
2014-06-27 09:55:12,013 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 33206049 bytes
2014-06-27 09:55:12,208 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merged 1 segments, 33206079 bytes to disk to satisfy reduce memory limit
2014-06-27 09:55:12,209 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 2 files, 265119413 bytes from disk
2014-06-27 09:55:12,209 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2014-06-27 09:55:12,210 INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
2014-06-27 09:55:12,212 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 265119345 bytes
2014-06-27 09:55:12,279 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x65afdbbb, quorum=localhost:2181, baseZNode=/hbase
2014-06-27 09:55:12,281 INFO [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x65afdbbb connecting to ZooKeeper ensemble=localhost:2181
2014-06-27 09:55:12,282 INFO [main-SendThread(localhost.localdomain:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2014-06-27 09:55:12,283 WARN [main-SendThread(localhost.localdomain:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2014-06-27 09:55:12,384 WARN [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-06-27 09:55:12,384 INFO [main] org.apache.hadoop.hbase.util.RetryCounter: Sleeping 1000ms before retry #0...
2014-06-27 09:55:13,385 INFO [main-SendThread(localhost.localdomain:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2014-06-27 09:55:13,385 WARN [main-SendThread(localhost.localdomain:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing
***
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-06-27 09:55:13,486 ERROR [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 1 attempts
2014-06-27 09:55:13,486 WARN [main] org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x65afdbbb, quorum=localhost:2181, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
I'm pretty sure its configured correctly, here is the relevant portion of my hbase-site.xml.
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>5181</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>10000</value>
<description></description>
</property>
<property>
<name>hbase.client.retries.number</name>
<value>10</value>
<description></description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hdev01,hdev02,hdev03</value>
<description></description>
</property>
So far as I can tell hdev03 is the only server that has any problem with this. Netstating all relevant ports doesn't show me anything strange.

I've had same problem when running HBase through Spark on Yarn. Everything was fine until suddenly it started to trying to connect to localhost instead of quorum. Setting port and quorum programmatically before HBase call fixed the issue
conf.set("hbase.zookeeper.quorum","my.server")
conf.set("hbase.zookeeper.property.clientPort","5181")
I'm using MapR, and it has "unusual" (5181) zookeeper port

Hard to say what is happening with the information, given. I have found the Hadoop Stack (HBase especially) to be quite hostile to even the slightest bit of misconfiguration in DNS or the hosts file.
As the quorum in your hbase-site.xml looks good, I'd start checking with network/hostname resolution related configurations:
Has the nodename slipped into the localhost entry in /etc/hosts on hdev03?
Is there an entry for the host itself in hdev03s /etc/hosts (there should)?
Has Reverse DNS been correctly configured in case you are using DNS for name resolution instead of the hosts file?
These are just a few pointers in the direction I'd look with this kind of issue. Hope it helps!

Add '--driver-class-path ~/hbase-1.1.2/conf' into spark-submit command, so that the task can find the configured zookeeper servers instead of 127.0.0.1.

Related

HBase master is immediately getting down

My hadoop servers are got down because of disk space issue,then we increased disk space then HDFS,zookeeper,kafka started working but HBase is not working.
It is throwing below exception while restarting Hbase from Ambari.
org.apache.hadoop.hbase.util.FileSystemVersionException: HBase file layout needs to be upgraded. You have version null and I want version 8. Consult http://hbase.apache.org/book.html for further information about upgrading HBase. Is your hbase.rootdir valid? If so, you may need to run 'hbase hbck -fixVersionFile'.
Based on the suggestion I ran the command hbase hbck -fixVersionFile as a hbase user, then I am getting error like this:
2019-12-10 19:04:59,535 INFO [ReadOnlyZKClient-slave01.testiot.cloud:2181,slave02.testiot.cloud:2181,slave03.testiot.cloud:2181#0x619bfe29-SendThread(slave02.testiot.cloud:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.1.0.5:39250, server: slave02.testiot.cloud/10.1.0.7:2181
2019-12-10 19:04:59,560 INFO [ReadOnlyZKClient-slave01.testiot.cloud:2181,slave02.testiot.cloud:2181,slave03.testiot.cloud:2181#0x619bfe29-SendThread(slave02.testiot.cloud:2181)] zookeeper.ClientCnxn: Session establishment complete on server slave02.testiot.cloud/10.1.0.7:2181, sessionid = 0x26ef0e604f530e3, negotiated timeout = 60000
2019-12-10 19:05:03,908 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=6, retries=36, started=4163 ms ago, cancelled=false, msg=java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master, details=
2019-12-10 19:05:07,945 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=7, retries=36, started=8200 ms ago, cancelled=false, msg=java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master, details=
2019-12-10 19:05:17,964 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=8, retries=36, started=18219 ms ago, cancelled=false, msg=java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master, details=
2019-12-10 19:05:28,024 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=9, retries=36, started=28279 ms ago, cancelled=false, msg=java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master, details=
I am running a cluster of three nodes. When I checked hbase.root.dir hbase.version file was not there.
HbaseVersion -2.0.2
zookeeper - 3.4.6
I included the hbase version file from another server. Now its working fine.

Unable to close file because the last block does not have enough number of replicas.​

I got an error like this. The health test result for HDFS_CANARY_HEALTH has become bad: Canary test failed to write file in directory /tmp/.cloudera_health_monitoring_canary_files.
2019-07-14 00:02:13,312 INFO hive.metastore: Trying to connect to metastore with URI xxxx:abc
2019-07-14 00:02:13,322 INFO hive.metastore: Opened a connection to metastore, current connections: 1
2019-07-14 00:02:13,322 INFO hive.metastore: Connected to metastore.
2019-07-14 00:02:14,255 INFO hive.metastore: Closed a connection to metastore, current connections: 0​
2019-07-14 00:02:21,444 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT21.280S, numStreamsChecked=607071, numStreamsRolle
dUp=199173
2019-07-14 00:03:04,172 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x2659d409 connecting to ZooKeeper ensemble=___
2019-07-14 00:03:04,182 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=____
2019-07-14 00:03:04,198 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x16b95f7c324bcf3
2019-07-14 00:03:08,346 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x2fc2fce5 connecting to ZooKeeper ensemble=___
2019-07-14 00:03:08,354 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=___
2019-07-14 00:03:08,365 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x26b8ea7e7c6cfde
2019-07-14 00:03:11,897 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2019_07_14-00_03_03 retrying...
2019-07-14 00:03:18,388 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2019_07_14-00_03_03 retrying...
2019-07-14 00:03:18,562 ERROR com.cloudera.cmon.firehose.polling.hdfs.HdfsCanary: com.cloudera.cmon.firehose.polling.hdfs.HdfsCanary#5b4e2d83 for hdfs://hdfs-cluster_name Failed to write to /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2019_07_14-00_03_03. Error: java.io.IOException: Unable to close file because the last block
BP-972893351-10.14.208.100-1438206436439:blk_4349006744_3275530539 does not have enough number of replicas.​
My question is which "file" it referring to in ERROR: Unable to close file because the last block does not have enough number of replicas.​

HBase distributed mode

I'm trying to run HBase(0.94.11) in distributed mode on 3-node Hadoop(1.0.4) cluster but I wish to utilize only two nodes for HBase.
Master/Namenode : cldx-1230-1116( IP : 172.25.38.245)
Regionserver/Slave : cldx-1229-1117(IP : 172.25.39.7)
HBase is getting started but there is no regionserver reflected. In the logs, following errors are shown :
Master/namenode log :
2013-09-03 14:52:23,683 DEBUG org.apache.hadoop.hbase.master.HMaster: Started service threads
2013-09-03 14:52:23,684 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2013-09-03 14:52:24,587 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=172.25.39.7:2222 sessionTimeout=180000 watcher=hconnection
2013-09-03 14:52:24,607 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 31222#cldx-1230-1116
2013-09-03 14:52:24,610 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server slave/172.25.39.7:2222. Will not attempt to authenticate using SASL (unknown error)
2013-09-03 14:52:24,615 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to slave/172.25.39.7:2222, initiating session
2013-09-03 14:52:24,631 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server slave/172.25.39.7:2222, sessionid = 0x140e363f8090002, negotiated timeout = 180000
2013-09-03 14:52:25,230 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 1546 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2013-09-03 14:52:26,753 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 3068 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2013-09-03 14:52:28,266 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 4582 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
regionserver/slave log :
2013-09-03 16:05:18,307 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=172.25.39.7:2222 sessionTimeout=180000 watcher=regionserver:60020
2013-09-03 16:05:18,333 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/172.25.39.7:2222. Will not attempt to authenticate using SASL (unknown error)
2013-09-03 16:05:18,336 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 14384#cldx-1229-1117
2013-09-03 16:05:18,348 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to localhost/172.25.39.7:2222, initiating session
2013-09-03 16:05:18,426 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server localhost/172.25.39.7:2222, sessionid = 0x140e363f8090000, negotiated timeout = 180000
2013-09-03 16:05:18,452 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker#3a9cfedf
2013-09-03 16:05:18,517 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/online-snapshot/acquired already exists and this is not a retry
2013-09-03 16:05:18,557 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: globalMemStoreLimit=393.4m, globalMemStoreLimitLowMark=344.2m, maxHeap=983.4m
2013-09-03 16:05:18,561 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Runs every 2hrs, 46mins, 40sec
2013-09-03 16:05:18,621 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at localhost,60000,1378199761324
2013-09-03 16:05:28,697 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:390)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:436)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1127)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
at com.sun.proxy.$Proxy8.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2030)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2076)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:744)
at java.lang.Thread.run(Thread.java:722)
slave's zookeeper log :
2013-09-03 16:05:18,345 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /172.25.39.7:48173
2013-09-03 16:05:18,392 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /172.25.39.7:48173
2013-09-03 16:05:18,395 INFO org.apache.zookeeper.server.persistence.FileTxnLog: Creating new log file: log.5a
2013-09-03 16:05:18,422 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140e363f8090000 with negotiated timeout 180000 for client /172.25.39.7:48173
2013-09-03 16:05:18,508 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x140e363f8090000 type:create cxid:0x8 zxid:0x5b txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired
2013-09-03 16:05:33,933 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /172.25.38.245:50879
2013-09-03 16:05:33,972 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /172.25.38.245:50879
2013-09-03 16:05:33,975 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140e363f8090001 with negotiated timeout 180000 for client /172.25.38.245:50879
2013-09-03 16:05:42,358 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x140e363f8090001 type:create cxid:0xb zxid:0x5d txntype:-1 reqpath:n/a Error Path:/hbase/master Error:KeeperErrorCode = NodeExists for /hbase/master
2013-09-03 16:05:47,934 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x140e363f8090001 type:create cxid:0x1f zxid:0x63 txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired
2013-09-03 16:05:49,037 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /172.25.38.245:50889
2013-09-03 16:05:49,042 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /172.25.38.245:50889
2013-09-03 16:05:49,050 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140e363f8090002 with negotiated timeout 180000 for client /172.25.38.245:50889
2013-09-03 16:08:15,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x140e35e60460000, timeout of 180000ms exceeded
2013-09-03 16:08:15,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x140d02920860000, timeout of 180000ms exceeded
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x140e35e60460001, timeout of 180000ms exceeded
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x140e35e60460000
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x140d02920860000
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x140e35e60460001
regionservers file has only one entry viz. 172.25.39.7
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://172.25.38.245:9000/hbase</value>
<description>The directory shared by RegionServers.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>172.25.39.7</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/bigdata/hadoop_ecosystem_dir/zookeeper</value>
</property>
</configuration>
The Hadoop masters file on the namenode(172.25.38.245) has 172.25.38.245
The Hadoop slaves file on the namenode(172.25.38.245) 172.25.38.245,172.25.39.7 and 172.25.36.73
The Hadoop masters file on the slave(172.25.39.7) has 172.25.38.245
The Hadoop slaves file on the slave(172.25.39.7) has 172.25.39.7
hosts file on master :
#127.0.0.1 localhost
#172.25.38.245 localhost
172.25.38.245 cldx-1230-1116
172.17.88.75 cloudx
172.25.38.245 master
172.25.39.7 slave
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
hosts file on slave :
#127.0.0.1 localhost
#172.25.39.7 localhost
172.25.39.7 cldx-1229-1117 cldx-1229-1117
172.25.38.245 cldx-1230-1116 cldx-1230-1116
172.17.88.75 cloudx
172.25.38.245 master
172.25.39.7 slave
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
I'm clueless as to why the regionserver/slave is trying to connect to the master on the localhost rather than 172.25.38.245 !
Add the IP and hostname of HMaster into the /etc/hosts file of RS and restart HBase daemons. One possible reason could be that your HMaster is assuming that the RS has the IP of 127.0.0.1(which implies localhost) and hence resolves to its own localhost.
And yes, JD is absolutely correct. hbase.master is an extinct property now.

"getMaster attempt 1 of 1 failed; no more retrying. com.google.protobuf.ServiceException: java.io.IOException: Broken pipe" when connecting

I'm trying to connect to HBase installed in the local system (using Hortonworks 1.1.1.16), by a small program in Java, which executes the next command:
HBaseAdmin.checkHBaseAvailable(conf);
It is worth saying that there is no problem at all when connecting to HBase from the command line using the hbase command.
The content of the host file is the next one (where example.com contains the actual host name):
127.0.0.1 localhost example.com
HBase is configured to work in standalone mode:
hbase.cluster.distributed=false
When executing the program, the next exception is thrown:
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:host.name=localhost
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_19
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.19.x86_64/jre
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:java.class.path=[...]
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-358.2.1.el6.x86_64
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:user.name=root
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root/git/project
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=example.com:2181 sessionTimeout=60000 watcher=hconnection-0x678e4593
13/05/13 15:18:29 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is hconnection-0x678e4593
13/05/13 15:18:29 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
13/05/13 15:18:29 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
13/05/13 15:18:29 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13e9d6851af0046, negotiated timeout = 40000
13/05/13 15:18:29 INFO client.HConnectionManager$HConnectionImplementation: ClusterId is cccadf06-f6bf-492e-8a39-e8beac521ce6
13/05/13 15:18:29 INFO client.HConnectionManager$HConnectionImplementation: getMaster attempt 1 of 1 failed; no more retrying.
com.google.protobuf.ServiceException: java.io.IOException: Broken pipe
at org.apache.hadoop.hbase.ipc.ProtobufRpcClientEngine$Invoker.invoke(ProtobufRpcClientEngine.java:149)
at com.sun.proxy.$Proxy5.isMasterRunning(Unknown Source)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.createMasterInterface(HConnectionManager.java:732)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.createMasterWithRetries(HConnectionManager.java:764)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterProtocol(HConnectionManager.java:1724)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterMonitor(HConnectionManager.java:1757)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isMasterRunning(HConnectionManager.java:837)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2010)
at TestHBase.main(TestHBase.java:37)
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.writeConnectionHeader(HBaseClient.java:896)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:847)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1414)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1299)
at org.apache.hadoop.hbase.ipc.ProtobufRpcClientEngine$Invoker.invoke(ProtobufRpcClientEngine.java:131)
... 8 more
13/05/13 15:18:29 INFO client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x13e9d6851af0046
13/05/13 15:18:29 INFO zookeeper.ZooKeeper: Session: 0x13e9d6851af0046 closed
13/05/13 15:18:29 INFO zookeeper.ClientCnxn: EventThread shut down
org.apache.hadoop.hbase.exceptions.MasterNotRunningException: com.google.protobuf.ServiceException: java.io.IOException: Broken pipe
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.createMasterWithRetries(HConnectionManager.java:793)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterProtocol(HConnectionManager.java:1724)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterMonitor(HConnectionManager.java:1757)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isMasterRunning(HConnectionManager.java:837)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2010)
at TestHBase.main(TestHBase.java:37)
Caused by: com.google.protobuf.ServiceException: java.io.IOException: Broken pipe
at org.apache.hadoop.hbase.ipc.ProtobufRpcClientEngine$Invoker.invoke(ProtobufRpcClientEngine.java:149)
at com.sun.proxy.$Proxy5.isMasterRunning(Unknown Source)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.createMasterInterface(HConnectionManager.java:732)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.createMasterWithRetries(HConnectionManager.java:764)
... 5 more
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.writeConnectionHeader(HBaseClient.java:896)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:847)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1414)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1299)
at org.apache.hadoop.hbase.ipc.ProtobufRpcClientEngine$Invoker.invoke(ProtobufRpcClientEngine.java:131)
... 8 more
This trace provides some evidence of what may be actually happening. It seems that the connection to ZooKeeper is established, but something fails when tries to access the master.
Though I've spent hours trying to find a solution in Google, I haven't seen such an exception. Particularly, this exception varies from two things from most found elsewhere:
Everybody seems to have the error getMaster attempt 0 of 1 failed rather than getMaster attempt 1 of 1 failed. I don't know whether this makes a point at all, but I find it somehow weird.
I can't find other people getting the Broken pipe error.
By the way, the master is actually running, as far as I can see in the Hortonworks Management Console.
When looking at the most recent logs, this is the output:
2013-05-13 15:30:07,192 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect header or version mismatch from 127.0.0.1:40788 got version 0 expected version 3
As it is a warning rather than an error, I don't know whether it has something to do with the actual problem. The port varies in each execution.
Finally we found the problem and solved it. It turned out to be a dependencies problem. We were using hbase-0.95.0 and hbase-client-0.95.0. Using hbase-0.94.7 or hbase-0.94.9 seemed to work.
Yet, some problems occurred under certain circumstances even with that versions of the HBase library. Particularly, some problems arised when running it within an application server (JBoss AS7). In the end, all problems seems to be solved by removing the dependency hbase-client-0.95.0, and replacing it by haboop-core-1.1.2 as some classes not contained in the hbase libraries were required.
Regards.
I'd recommend first check your HBase Master / RegionServer ports are really bound with netstat -n -a. I had situation when HBase Master IPC was bound to only external IP (this was Cloudera CDH) and it was not reachable through 127.0.0.1. It looks like most probable case for you - hbase shell should still work in this case.
Another possible reason could be previous cluster crash with some HDFS data corrupted. In this case HBase does not actually start waiting for HDFS to exit safe mode. But this looks like not your case. If it is, you can manually force HDFS to exit safe mode from console and then do fsck for Hadoop and similar procedure for HBase.

Running MapReduce on HBase gives Zookeeper error

I am doing a test project with Hadoop and HBase. Currently the cluster has 2 Ubuntu VMs hosted on a Windows machine.
I am able to perform PUT, QUERY and DELETE operation remotly (in my host machine) using following HBase Java API configuration
config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "192.168.56.90");
config.set("hbase.zookeeper.property.clientPort", "2222");
When I am trying to run a HBase MapReduce job on Windows with the same config as above, I am getting following error
13/03/24 06:11:03 ERROR security.UserGroupInformation: PriviledgedActionException as:Joel cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Joel\mapred\staging\Joel290889388\.staging to 0700
java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Joel\mapred\staging\Joel290889388\.staging to 0700
From what I have read on the web, there seems to be a problem with running MapReduce jobs on Windows. So I tried running the MapReduce job on Linux by using "java - jar MR.jar".
On Linux, I can't connect to Zookeeper. For unknown reason, Zookeeper host and port are getting reseted on the client side
13/03/24 05:59:33 INFO zookeeper.ZooKeeper: Client environment:os.version=3.5.0-23-generic
13/03/24 05:59:33 INFO zookeeper.ZooKeeper: Client environment:user.name=hduser
13/03/24 05:59:33 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hduser
13/03/24 05:59:33 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hduser/testes
13/03/24 05:59:33 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=192.168.56.90:2222 sessionTimeout=180000 watcher=hconnection
13/03/24 05:59:33 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 11552#node01
13/03/24 05:59:33 INFO zookeeper.ClientCnxn: Opening socket connection to server node01/192.168.56.90:2222. Will not attempt to authenticate using SASL (unknown error)
13/03/24 05:59:33 INFO zookeeper.ClientCnxn: Socket connection established to node01/192.168.56.90:2222, initiating session
13/03/24 05:59:33 INFO zookeeper.ClientCnxn: Session establishment complete on server node01/192.168.56.90:2222, sessionid = 0x13d9afaa1a30006, negotiated timeout = 180000
13/03/24 05:59:33 INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x13d9afaa1a30006
13/03/24 05:59:33 INFO zookeeper.ZooKeeper: Session: 0x13d9afaa1a30006 closed
13/03/24 05:59:33 INFO zookeeper.ClientCnxn: EventThread shut down
13/03/24 05:59:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/24 05:59:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/03/24 05:59:33 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
13/03/24 05:59:33 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 11552#node01
13/03/24 05:59:33 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
13/03/24 05:59:33 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
Judging from log above, it connects correctly to node01:2222 (node01 resolves to 192.168.56.90). But for some reason, it changes to localhost:2181 and it then gives a connection refused error.
How can I fix this issue to get a MR jobs running on Linux, on the same machine as Zookeeper is running?
Version: Hbase 0.94.5 / Hadoop 1.1.2
Thanks.
You may need to set the hbase.master also.
also check the /etc/hosts file and see if it is correct. Are you able to telnet to the zookeeper using that connection info?
config.set("hbase.zookeeper.quorum", "192.168.56.90");
config.set("hbase.zookeeper.property.clientPort", "2222");
config.set("hbase.master", "some.host.com:60000")

Resources