NoClassDefFoundError while running HBase, no error in zookeeper - hadoop

I've created a standalone hadoop cluster using this tutorial. Then I installed HBase over hadoop by following this tutorial.
I ran Hadoop by
cd /usr/local/hadoop/sbin/
./start-all.sh
And HBase by
cd /usr/local/hbase/bin
./start-hbase.sh
Then when I do jps, I get:
3761 Jps
835 NameNode
966 DataNode
3480 HMaster
3608 HRegionServer
1465 ResourceManager
1610 NodeManager
3418 HQuorumPeer
1150 SecondaryNameNode
But after some time it shows:
1779 SecondaryNameNode
1557 DataNode
2870 HQuorumPeer
2200 NodeManager
2061 ResourceManager
3246 Jps
1423 NameNode
So that's a pretty large indicator that something is wrong. Now, I checked the zookeeper logs in /usr/local/hbase/logs/hbase-hduser-zookeeper-stal.log and it showed:
2019-04-29 07:54:45,677 INFO [main] server.ZooKeeperServer: Server environment:java.io.tmpdir=/tmp
2019-04-29 07:54:45,677 INFO [main] server.ZooKeeperServer: Server environment:java.compiler=<NA>
2019-04-29 07:54:45,677 INFO [main] server.ZooKeeperServer: Server environment:os.name=Linux
2019-04-29 07:54:45,678 INFO [main] server.ZooKeeperServer: Server environment:os.arch=amd64
2019-04-29 07:54:45,678 INFO [main] server.ZooKeeperServer: Server environment:os.version=4.15.0-47-generic
2019-04-29 07:54:45,678 INFO [main] server.ZooKeeperServer: Server environment:user.name=hduser
2019-04-29 07:54:45,678 INFO [main] server.ZooKeeperServer: Server environment:user.home=/home/hduser
2019-04-29 07:54:45,678 INFO [main] server.ZooKeeperServer: Server environment:user.dir=/home/hduser
2019-04-29 07:54:45,782 INFO [main] server.ZooKeeperServer: tickTime set to 3000
2019-04-29 07:54:45,782 INFO [main] server.ZooKeeperServer: minSessionTimeout set to -1
2019-04-29 07:54:45,782 INFO [main] server.ZooKeeperServer: maxSessionTimeout set to 90000
2019-04-29 07:54:46,780 INFO [main] server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
which doesn't seem like any error whatsoever.
So, I checked HBase's errors in /usr/local/hbase/logs/hbase-hduser-master-stal.log and I got:
2019-04-29 07:55:11,513 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster.
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3100)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3111)
Caused by: java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:644)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:628)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2701)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2683)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:362)
at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:411)
at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:387)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:704)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:613)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:489)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3093)
... 5 more
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 25 more
There was a similar question, which was answered by:
HBase 2.1.0 release uses HTrace, that is an incubating Apache
Foundation project.
There is a folder for 3rd-party libraries in HBase lib folder,
client-facing-thirdparty. You need to copy
htrace-core-3.1.0-incubating.jar from there to the HBase lib
directory. (see reference)
There is also another solution at Cloudera Community that changes a
configuration instead of adding the library manually.
The first solution includes:
The HMaster refuse to start due to the error below:
Java.lang.RuntimeException: Failed construction of Master: class
org.apache.hadoop.hbase.master.HMaster Caused by:
java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder
This is because in hbase 2.0, we have 2 different version of
htrace-core.x.x.x.incubating.jar
cd /usr/local/hbase/lib/client-facing-thirdparty/:
htrace-core-3.1.0-incubating.jar
htrace-core-4.2.0-incubating.jar
Currently, only version 3.1.0 has the required class SamplerBuilder.
We need to remove version 4.2.0:
mv htrace-core-4.2.0-incubating.jar htrace-core-4.2.0-incubating.jar.bak
But, when I did cd to the /usr/local/hbase/lib/client-facing-thirdparty and do ls -a I get:
. audience-annotations-0.5.0.jar findbugs-annotations-1.3.9-1.jar log4j-1.2.17.jar slf4j-log4j12-1.7.25.jar
.. commons-logging-1.2.jar htrace-core4-4.2.0-incubating.jar slf4j-api-1.7.25.jar
As one can see, there is only one htrace file, not two. So, I downloaded htrace-3.1.0, from here, and copied it into /usr/local/hbase/lib/client-facing-thirdparty, and renamed htrace-core4-4.2.0-incubating.jar to htrace-core4-4.2.0-incubating.jar.bak. Then I restarted hadoop and HBase. Still no change. jps didn't show HMaster and HRegionServer now.
HBase configuration files:
<configuration>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/user/hduser/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.master</name>
<value>localhost:60010</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>hdfs://localhost:9000/user/hduser/zookeeper</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/hbase/tmp</value>
<description>Temporary directory on the local filesystem.</description>
</property>
</configuration>
And hbase-env.sh looks like:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HBASE_REGIONSERVERS=/usr/local/hbase/conf/regionservers
export HBASE_MANAGES_ZK=true
export HBASE_PID_DIR=/var/hbase/pids
export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC"
So, what should I do now? Any help is appreciated.

Related

jps does not show hmaster but <no information available>

I configured HBase today and I configured it correctly at first. However, when I ran HBase use the code 'start-all.sh' again, I could not see 'Hmaster' anywhere. It just shows like:
[root#master bin]# jps
25164 QuorumPeerMain
83447 HRegionServer
44542 NameNode
44789 DataNode
45098 SecondaryNameNode
45378 ResourceManager
45536 NodeManager
56678 <no information available>
56949 Jps
when I 'jps' again, '':
enter image description here
and the log shows:
[root#master bin]# cd /home/hadoop/hbase-2.2.3/logs
[root#master logs]# ls
hbase-root-master-master.log hbase-root-regionserver-master.out.1
hbase-root-master-master.out hbase-root-regionserver-master.out.2
hbase-root-master-master.out.1 hbase-root-regionserver-master.out.3
hbase-root-regionserver-master.log hbase-root-regionserver-master.out.4
hbase-root-regionserver-master.out SecurityAuth.audit
[root#master logs]# tail hbase-root-master-master.log
2022-04-28 17:29:56,674 INFO [master/master:16000] zookeeper.ZooKeeper: Session: 0x100000e4a0d0020 closed
2022-04-28 17:29:56,674 INFO [master/master:16000] regionserver.HRegionServer: Exiting; stopping=master,16000,1651138191876; zookeeper connection closed.
2022-04-28 17:29:56,674 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x100000e4a0d0020
2022-04-28 17:29:56,674 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: HMaster Aborted
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:244)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2940)
[root#master logs]#
I solve the problem by adding the following configuration to the configuration file "hbase-site.xml":
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
I do not why, but it works.

ERROR in datanode execution while running Hadoop first time in Windows 10

I am trying to run Hadoop 3.1.1 in my Windows 10 machine. I modified all the files:
hdfs-site.xml
mapred-site.xml
core-site.xml
yarn-site.xml
Then, I executed the following command:
C:\hadoop-3.1.1\bin> hdfs namenode -format
The format ran correctly so I directed to C:\hadoop-3.1.1\sbin to execute the following command:
C:\hadoop-3.1.1\sbin> start-dfs.cmd
The command prompt opens 2 new windows: one for datanode and another for namenode.
The namenode window keeps running:
2018-09-02 21:37:06,232 INFO ipc.Server: IPC Server Responder: starting
2018-09-02 21:37:06,232 INFO ipc.Server: IPC Server listener on 9000: starting
2018-09-02 21:37:06,247 INFO namenode.NameNode: NameNode RPC up at: localhost/127.0.0.1:9000
2018-09-02 21:37:06,247 INFO namenode.FSNamesystem: Starting services required for active state
2018-09-02 21:37:06,247 INFO namenode.FSDirectory: Initializing quota with 4 thread(s)
2018-09-02 21:37:06,247 INFO namenode.FSDirectory: Quota initialization completed in 3 milliseconds
name space=1
storage space=0
storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0
2018-09-02 21:37:06,279 INFO blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
While the datanode gives following error:
ERROR: datanode.DataNode: Exception in secureMain
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:220)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2762)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2677)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2719)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2863)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2887)
2018-09-02 21:37:04,250 INFO util.ExitUtil: Exiting with status 1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
2018-09-02 21:37:04,250 INFO datanode.DataNode: SHUTDOWN_MSG:
And then, the datanode shuts down! I tried several ways to overcome this error, but this is first time I am installing Hadoop on windows and can't understand what to do next!
I got things working, after I removed the file system reference for the datanode in hdfs-site.xml. I found that enabled the software to create and initialise its own datanode, which then popped up in sbin. After that I could use hdfs without a hitch. Here is what worked for me for Hadoop 3.1.3 on windows:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/Users/myusername/hadoop/hadoop-3.1.3/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>datanode</value>
</property>
</configuration>
Cheers,
MV
I had the same problem and what worked for me was editing hdfs-site.xml as follows:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/Hadoop/hadoop-3.1.2/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/C:/Hadoop/hadoop-3.1.2/data/datanode</value>
</property>

Hbase setup configuration: HMaster is not running

I am trying to setup HBase in a fully distributed mode: consisting of 1 master and 2 region servers. I have set HBASE_MANAGES_ZK = true in hbase-env.sh. The hadoop cluster is running on the cluster with following configurations:
Master: node-master
Regionserver1: node1
Regionserver2: node2
When I am starting HBase, I can see that RegionServers are getting started and HQuorumPeer on master also, but HMaster is not showing.
Please find the logs as below:
Master hbase-site.xml
<configuration>
<property>
<name>hbase.master</name>
<value>nodemaster.hbasecluster.com:60000</value>
<description>The host and port that the HBase master runs at.A value of ‘local’ runs the master and a regionserver in a single </description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://nodemaster.hbasecluster.com:9000/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) </description>
</property>
<property>
<name>hbase.zookeeper.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>Property from ZooKeeper’s config zoo.cfg. The port at which the clients will connect. </description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>nodemaster.hbasecluster.com</value>
<description>Comma separated list of servers in the ZooKeeper Quorum. </description>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/hbase/tmp</value>
<description>Temporary directory on the local filesystem.</description>
</property>
</configuration>
/etc/hosts on master
127.0.0.1 localhost
192.168.2.154 nodemaster.hbasecluster.com node-master
192.168.2.186 node1.hbasecluster.com node1
192.168.2.187 node2.hbasecluster.com node2
Logs on regionserver1
Fri Aug 17 12:32:15 IST 2018 Starting regionserver on node1.hbasecluster.com
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15701
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15701
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
2018-08-17 12:32:15,420 INFO [main] regionserver.HRegionServer: STARTING executorService HRegionServer
2018-08-17 12:32:15,422 INFO [main] util.VersionInfo: HBase 2.1.0
2018-08-17 12:32:15,422 INFO [main] util.VersionInfo: Source code repository git://zhangduo-Gen8/home/zhangduo/hbase/code revision=e1673bb0bbfea21d6e5dba73e013b09b8b49b89b
2018-08-17 12:32:15,422 INFO [main] util.VersionInfo: Compiled by zhangduo on Tue Jul 10 17:26:48 CST 2018
2018-08-17 12:32:15,422 INFO [main] util.VersionInfo: From source with checksum c8fb98abf2988c0490954e15806337d7
2018-08-17 12:32:15,703 INFO [main] util.ServerCommandLine: hbase.tmp.dir: /tmp/hbase-root
2018-08-17 12:32:15,703 INFO [main] util.ServerCommandLine: hbase.rootdir: hdfs://nodemaster.hbasecluster.com:9000/hbase
2018-08-17 12:32:15,703 INFO [main] util.ServerCommandLine: hbase.cluster.distributed: true
2018-08-17 12:32:15,703 INFO [main] util.ServerCommandLine: hbase.zookeeper.quorum: nodemaster.hbasecluster.com
2018-08-17 12:32:15,703 INFO [main] util.ServerCommandLine: env:HBASE_LOGFILE=hbase-root-regionserver-node1.hbasecluster.com.log
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:LANG=en_US.UTF-8
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:XDG_SESSION_ID=182
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:MAIL=/var/mail/root
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:LOGNAME=root
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:HBASE_REST_OPTS=
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:PWD=/root
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:HBASE_ROOT_LOGGER=INFO,RFA
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:SHELL=/bin/bash
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:HBASE_ENV_INIT=true
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:HBASE_IDENT_STRING=root
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:HBASE_ZNODE_FILE=/tmp/hbase-root-regionserver.znode
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:SSH_CLIENT=192.168.2.154 46760 22
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:HBASE_LOG_PREFIX=hbase-root-regionserver-node1.hbasecluster.com
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:HBASE_LOG_DIR=/root/install/hbase-2.1.0/bin/../logs
2018-08-17 12:32:15,704 INFO [main] util.ServerCommandLine: env:USER=root
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: root/install/hbase-2.1.0/bin/../lib/spymemcached-2.12.2.jar:/root/install/hbase-2.1.0/bin/../lib/validation-api-1.1.0.Final.jar:/root/install/hbase-2.1.0/bin/../lib/xmlenc-0.52.jar:/root/install/hbase-2.1.0/bin/../lib/xz-1.0.jar:/root/install/hbase-2.1.0/bin/../lib/zookeeper-3.4.10.jar:/root/install/hbase-2.1.0/bin/../lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/root/install/hbase-2.1.0/bin/../lib/client-facing-thirdparty/commons-logging-1.2.jar:/root/install/hbase-2.1.0/bin/../lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/root/install/hbase-2.1.0/bin/../lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/root/install/hbase-2.1.0/bin/../lib/client-facing-thirdparty/log4j-1.2.17.jar:/root/install/hbase-2.1.0/bin/../lib/client-facing-thirdparty/slf4j-api-1.7.25.jar:/root/install/hbase-2.1.0/bin/../lib/client-facing-thirdparty/htrace-core-3.1.0-incubating.jar:/root/install/hbase-2.1.0/bin/../lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: env:HBASE_MANAGES_ZK=true
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: env:SSH_CONNECTION=192.168.2.154 46760 192.168.2.186 22
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: env:HBASE_AUTOSTART_FILE=/tmp/hbase-root-regionserver.autostart
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: env:HBASE_NICENESS=0
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: env:HBASE_OPTS= -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/root/install/hbase-2.1.0/bin/../logs -Dhbase.log.file=hbase-root-regionserver-node1.hbasecluster.com.log -Dhbase.home.dir=/root/install/hbase-2.1.0/bin/.. -Dhbase.id.str=root -Dhbase.root.logger=INFO,RFA -Dhbase.security.logger=INFO,RFAS
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: env:HBASE_SECURITY_LOGGER=INFO,RFAS
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: env:XDG_RUNTIME_DIR=/run/user/0
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: env:HBASE_THRIFT_OPTS=
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: env:HBASE_HOME=/root/install/hbase-2.1.0/bin/..
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: env:SHLVL=3
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: env:HOME=/root
2018-08-17 12:32:15,705 INFO [main] util.ServerCommandLine: env:MALLOC_ARENA_MAX=4
2018-08-17 12:32:15,706 INFO [main] util.ServerCommandLine: vmName=OpenJDK 64-Bit Server VM, vmVendor=Oracle Corporation, vmVersion=25.171-b11
2018-08-17 12:32:15,707 INFO [main] util.ServerCommandLine: vmInputArguments=[-Dproc_regionserver, -XX:OnOutOfMemoryError=kill -9 %p, -XX:+UseConcMarkSweepGC, -Dhbase.log.dir=/root/install/hbase-2.1.0/bin/../logs, -Dhbase.log.file=hbase-root-regionserver-node1.hbasecluster.com.log, -Dhbase.home.dir=/root/install/hbase-2.1.0/bin/.., -Dhbase.id.str=root, -Dhbase.root.logger=INFO,RFA, -Dhbase.security.logger=INFO,RFAS]
2018-08-17 12:32:21,194 INFO [main] metrics.MetricRegistries: Loaded MetricRegistries class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
2018-08-17 12:32:21,245 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-08-17 12:32:21,489 INFO [main] regionserver.RSRpcServices: regionserver/node1:16020 server-side Connection retries=45
2018-08-17 12:32:21,503 INFO [main] ipc.RpcExecutor: Instantiated default.FPBQ.Fifo with queueClass=class java.util.concurrent.LinkedBlockingQueue; numCallQueues=3, maxQueueLength=300, handlerCount=30
2018-08-17 12:32:21,505 INFO [main] ipc.RpcExecutor: Instantiated priority.FPBQ.Fifo with queueClass=class java.util.concurrent.LinkedBlockingQueue; numCallQueues=2, maxQueueLength=300, handlerCount=20
2018-08-17 12:32:21,505 INFO [main] ipc.RpcExecutor: Instantiated replication.FPBQ.Fifo with queueClass=class java.util.concurrent.LinkedBlockingQueue; numCallQueues=1, maxQueueLength=300, handlerCount=3
2018-08-17 12:32:21,639 INFO [main] ipc.RpcServerFactory: Creating org.apache.hadoop.hbase.ipc.NettyRpcServer hosting hbase.pb.ClientService, hbase.pb.AdminService
2018-08-17 12:32:21,832 INFO [main] io.ByteBufferPool: Created with bufferSize=64 KB and maxPoolSize=1.88 KB
2018-08-17 12:32:21,937 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
java.lang.UnsupportedOperationException: Constructor threw an exception for org.apache.hadoop.hbase.ipc.NettyRpcServer
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiate(ReflectionUtils.java:66)
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:45)
at org.apache.hadoop.hbase.ipc.RpcServerFactory.createRpcServer(RpcServerFactory.java:66)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.createRpcServer(RSRpcServices.java:1271)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.<init>(RSRpcServices.java:1238)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.<init>(RSRpcServices.java:1191)
at org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:733)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:571)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2991)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:63)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3009)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiate(ReflectionUtils.java:58)
... 17 more
Caused by: org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address already in use
at org.apache.hbase.thirdparty.io.netty.channel.unix.Errors.newIOException(Errors.java:117)
at org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.bind(Socket.java:285)
at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel.doBind(AbstractEpollChannel.java:714)
at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollServerSocketChannel.doBind(EpollServerSocketChannel.java:70)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1283)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:989)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
at org.apache.hbase.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:364)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:309)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
2018-08-17 12:32:21,940 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer
at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2994)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:63)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3009)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2991)
... 5 more
Caused by: java.lang.UnsupportedOperationException: Constructor threw an exception for org.apache.hadoop.hbase.ipc.NettyRpcServer
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiate(ReflectionUtils.java:66)
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:45)
at org.apache.hadoop.hbase.ipc.RpcServerFactory.createRpcServer(RpcServerFactory.java:66)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.createRpcServer(RSRpcServices.java:1271)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.<init>(RSRpcServices.java:1238)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.<init>(RSRpcServices.java:1191)
I
at org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:733)
at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:571)
... 10 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiate(ReflectionUtils.java:58)
... 17 more
Caused by: org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address already in use
at org.apache.hbase.thirdparty.io.netty.channel.unix.Errors.newIOException(Errors.java:117)
at org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.bind(Socket.java:285)
at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel.doBind(AbstractEpollChannel.java:714)
at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollServerSocketChannel.doBind(EpollServerSocketChannel.java:70)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1283)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:989)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
at org.apache.hbase.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:364)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:309)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
regionserver hbase-site.xml
<configuration>
<property>
<name>hbase.master</name>
<value>nodemaster.hbasecluster.com:60000</value>
<description>The host and port that the HBase master runs at.A value of ‘local’ runs the master and a regionserver in a single </description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://nodemaster.hbasecluster.com:9000/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) </description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>Property from ZooKeeper’s config zoo.cfg. The port at which the clients will connect. </description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>nodemaster.hbasecluster.com</value>
<description>Property from ZooKeeper’s config zoo.cfg. The port at which the clients will connect. </description>
</property>
<property>
<name>hbase.zookeeper.distributed</name>
<value>true</value>
</property>
</configuration>
/etc/hosts file in regionserver1
127.0.0.1 localhost
192.168.2.154 nodemaster.hbasecluster.com node-master
192.168.2.186 node1.hbasecluster.com node1
192.168.2.187 node2.hbasecluster.com node2
Master node jps output:
19717 SecondaryNameNode
20441 HQuorumPeer
20781 Jps
19470 NameNode
19887 ResourceManager
regionserver jps output:
28404 NodeManager
28185 DataNode
28844 Jps
28687 HRegionServer
EDIT: I was trying to run ./bin/start-hbase.sh. When I used the commands ./bin/hbase-daemon.sh start master I get the following error in my master logs.
2018-08-20 11:50:42,742 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:635)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:358)
at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:407)
at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:383)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:691)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:600)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:484)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2965)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2983)
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 25 more
2018-08-20 11:50:42,744 ERROR [main] master.HMasterCommandLine: Master exiting
The zookeeper was able to create connections to the slaves and the Region servers are running on each slave.
I hope you are using bin/hbase-daemon.sh start master to start the master, if yes, there should be more logs telling you about the actual problem with ERROR/FATAL just before the master is shutting down, and also you should see a similar line "master.HMaster: STARTING service HMaster" line in the logs when master starting up.
Below log line in regionserver says that either the regionserver port(16020) is utilized by another regionserver or application. Probably you have seen this while starting the regionserver again.
Caused by: org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address already in use
at org.apache.hbase.thirdparty.io.netty.channel.unix.Errors.newIOException(Errors.java:117)
at org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.bind(Socket.java:285)
at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel.doBind(AbstractEpollChannel.java:714)

Mapreduce job ipc.Client retrying to connect

I am testing my hadoop cluster which consists of 4 docker containers:
Datanode
Secondary Namenode
Namenode
Resource Manager
When I submit a map reduce job I notice connection issues once both map and reduce are at 100%. This then reaches the maximum number of re-tries before erroring and providing a stack trace. The weird thing is that the job finishes and provides an answer. However the node manager web interface shows a failed job. None of the question/answers I have found so far fix my particular issue.
All my machines have exposed the port range 50100:50200 to comply with the 'yarn.app.mapreduce.am.job.client.port-range' property.
The job I submit is
sudo -u hdfs hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.1.jar pi 1 1
This is the output:
Number of Maps = 1
Samples per Map = 1
Wrote input for Map #0
Starting Job
16/06/18 19:14:07 INFO client.RMProxy: Connecting to ResourceManager at resource-manager/172.19.0.2:8032
16/06/18 19:14:08 INFO input.FileInputFormat: Total input paths to process : 1
16/06/18 19:14:08 INFO mapreduce.JobSubmitter: number of splits:1
16/06/18 19:14:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1466277178029_0001
16/06/18 19:14:08 INFO impl.YarnClientImpl: Submitted application application_1466277178029_0001
16/06/18 19:14:08 INFO mapreduce.Job: The url to track the job: http://resource-manager:8088/proxy/application_1466277178029_0001/
16/06/18 19:14:08 INFO mapreduce.Job: Running job: job_1466277178029_0001
16/06/18 19:14:15 INFO mapreduce.Job: Job job_1466277178029_0001 running in uber mode : false
16/06/18 19:14:15 INFO mapreduce.Job: map 0% reduce 0%
16/06/18 19:14:19 INFO mapreduce.Job: map 100% reduce 0%
16/06/18 19:14:26 INFO mapreduce.Job: map 100% reduce 100%
16/06/18 19:14:32 INFO ipc.Client: Retrying connect to server: 01d3c03f829a/172.19.0.4:50100. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/06/18 19:14:33 INFO ipc.Client: Retrying connect to server: 01d3c03f829a/172.19.0.4:50100. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/06/18 19:14:34 INFO ipc.Client: Retrying connect to server: 01d3c03f829a/172.19.0.4:50100. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/06/18 19:14:36 INFO mapreduce.Job: map 0% reduce 0%
16/06/18 19:14:36 INFO mapreduce.Job: Job job_1466277178029_0001 failed with state FAILED due to: Application application_1466277178029_0001 failed 2 times due to AM Container for appattempt_1466277178029_0001_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://resource-manager:8088/proxy/application_1466277178029_0001/AThen, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1466277178029_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
16/06/18 19:14:36 INFO mapreduce.Job: Counters: 0
Job Finished in 28.862 seconds
Estimated value of Pi is 4.00000000000000000000
the container log has the following:
2016-06-18 19:14:32,273 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1466277178029_0001_000002
2016-06-18 19:14:32,443 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-06-18 19:14:32,475 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2016-06-18 19:14:32,477 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier#3514a4c0)
2016-06-18 19:14:32,515 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred newApiCommitter.
2016-06-18 19:14:33,060 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Attempt num: 2 is last retry: true because a commit was started.
2016-06-18 19:14:33,061 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$NoopEventHandler
2016-06-18 19:14:33,067 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
2016-06-18 19:14:33,068 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
2016-06-18 19:14:33,118 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring
2016-06-18 19:14:33,141 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring
2016-06-18 19:14:33,162 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring
2016-06-18 19:14:33,183 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Emitting job history data to the timeline server is not enabled
2016-06-18 19:14:33,185 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Will not try to recover. recoveryEnabled: true recoverySupportedByCommitter: false numReduceTasks: 1 shuffleKeyValidForRecovery: true ApplicationAttemptID: 2
2016-06-18 19:14:33,210 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring
2016-06-18 19:14:33,212 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at hdfs://namenode:9000/user/hdfs/.staging/job_1466277178029_0001/job_1466277178029_0001_1.jhist
2016-06-18 19:14:33,621 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
2016-06-18 19:14:33,640 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-mrappmaster.properties,hadoop-metrics2.properties
2016-06-18 19:14:33,689 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-06-18 19:14:33,689 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started
2016-06-18 19:14:33,708 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true
2016-06-18 19:14:33,708 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3
2016-06-18 19:14:33,708 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33
2016-06-18 19:14:33,739 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at resource-manager/172.19.0.2:8030
2016-06-18 19:14:33,814 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: <memory:4096, vCores:4>
2016-06-18 19:14:33,814 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.hdfs
2016-06-18 19:14:33,837 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring
2016-06-18 19:14:33,840 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryCopyService: History file is at hdfs://namenode:9000/user/hdfs/.staging/job_1466277178029_0001/job_1466277178029_0001_1.jhist
2016-06-18 19:14:33,894 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1466277178029_0001, File: hdfs://namenode:9000/user/hdfs/.staging/job_1466277178029_0001/job_1466277178029_0001_2.jhist
2016-06-18 19:14:33,959 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: Was asked to shut down.
2016-06-18 19:14:33,959 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.io.IOException: Was asked to shut down.
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1546)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1540)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1473)
2016-06-18 19:14:33,962 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1
A few times it says 'Cannot locate configuration' or 'Default file system is set solely by core-default.xml'. Is this significant? In case this changes anything I am using the cloudera repo to install various hadoop services instead of unpacking a .tar.gz.
My config files are:
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property>
</configuration>
yar-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>resource-manager</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>resource-manager:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>resource-manager:8030</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///data/1/yarn/local,file:///data/2/yarn/local,file:///data/3/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///data/1/yarn/logs,file:///data/2/yarn/logs,file:///data/3/yarn/logs</value>
</property>
<property>
<name>yarn.log.aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>Where to aggregate logs</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://namenode:8020/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>resource-manager:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>resource-manager:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>resource-manager:8033</value>
</property>
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>600</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1000</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>namenode:8021</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>history-server:10020</value>
<description>Enter your JobHistoryServer hostname.</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>history-server:19888</value>
<description>Enter your JobHistoryServer hostname.</description>
</property>
<property>
<name>yarn.app.mapreduce.am.job.client.port-range</name>
<value>50100-50200</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.name.dir or dfs.namenode.name.dir</name>
<value>file:///data/1/dfs/nn,file:///nfsmount/dfs/nn</value>
</property>
<property>
<name>dfs.data.dir or dfs.datanode.data.dir</name>
<value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>namenode:50070</value>
<description>
The address and the base port on which the dfs NameNode Web UI will listen.
</description>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
Thanks for reading.
For anyone who has the same issue the solution is to add the following to the hdfs-site.xml:
<property>
<name>dfs.safemode.threshold.pct</name>
<value>0</value>
</property>

Hbase master keeps dying, claims a hbase:namespace already exists

In todays episode of hbase is bringing me to my wits end we have an issue where the hbase master starts and then very quickly dies. My master log is like so:
2014-06-20 12:52:40,469 FATAL [master:hdev01:60000] master.HMaster: Master serve
r abort: loaded coprocessors are: []
2014-06-20 12:52:40,470 FATAL [master:hdev01:60000] master.HMaster: Unhandled ex
ception. Starting shutdown.
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(Cre
ateTableHandler.java:120)
at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceT
able(TableNamespaceManager.java:232)
at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNames
paceManager.java:86)
at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:106
2)
at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.j
ava:926)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:615)
at java.lang.Thread.run(Thread.java:662)
2014-06-20 12:52:40,473 INFO [master:hdev01:60000] master.HMaster: Aborting
2014-06-20 12:52:40,473 DEBUG [master:hdev01:60000] master.HMaster: Stopping ser
vice threads
2014-06-20 12:52:40,473 INFO [master:hdev01:60000] ipc.RpcServer: Stopping serv
er on 60000
2014-06-20 12:52:40,473 INFO [CatalogJanitor-hdev01:60000] master.CatalogJanito
r: CatalogJanitor-hdev01:60000 exiting
2014-06-20 12:52:40,473 INFO [hdev01,60000,1403283149823-BalancerChore] balance
r.BalancerChore: hdev01,60000,1403283149823-BalancerChore exiting
2014-06-20 12:52:40,474 INFO [RpcServer.listener,port=60000] ipc.RpcServer: Rpc
Server.listener,port=60000: stopping
2014-06-20 12:52:40,474 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.res
ponder: stopped
2014-06-20 12:52:40,474 INFO [master:hdev01:60000] master.HMaster: Stopping inf
oServer
2014-06-20 12:52:40,474 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.res
ponder: stopping
2014-06-20 12:52:40,474 INFO [master:hdev01:60000.oldLogCleaner] cleaner.LogCle
aner: master:hdev01:60000.oldLogCleaner exiting
2014-06-20 12:52:40,475 INFO [hdev01,60000,1403283149823-ClusterStatusChore] ba
lancer.ClusterStatusChore: hdev01,60000,1403283149823-ClusterStatusChore exiting
2014-06-20 12:52:40,476 INFO [master:hdev01:60000.oldLogCleaner] master.Replica
tionLogCleaner: Stopping replicationLogCleaner-0x246ba2ab1e4001c, quorum=hdev02:
5181,hdev01:5181,hdev03:5181, baseZNode=/hbase
2014-06-20 12:52:40,479 INFO [master:hdev01:60000] mortbay.log: Stopped SelectC
hannelConnector#0.0.0.0:16010
2014-06-20 12:52:40,478 INFO [master:hdev01:60000.archivedHFileCleaner] cleaner
.HFileCleaner: master:hdev01:60000.archivedHFileCleaner exiting
2014-06-20 12:52:40,483 INFO [master:hdev01:60000.oldLogCleaner] zookeeper.ZooK
eeper: Session: 0x246ba2ab1e4001c closed
2014-06-20 12:52:40,484 INFO [master:hdev01:60000-EventThread] zookeeper.Client
Cnxn: EventThread shut down
2014-06-20 12:52:40,589 DEBUG [master:hdev01:60000] catalog.CatalogTracker: Stop
ping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker#f3f348b
2014-06-20 12:52:40,591 INFO [master:hdev01:60000] client.HConnectionManager$HC
onnectionImplementation: Closing zookeeper sessionid=0x246ba2ab1e4001b
2014-06-20 12:52:40,592 INFO [master:hdev01:60000] zookeeper.ZooKeeper: Session
: 0x246ba2ab1e4001b closed
2014-06-20 12:52:40,592 INFO [master:hdev01:60000-EventThread] zookeeper.Client
Cnxn: EventThread shut down
2014-06-20 12:52:40,695 INFO [hdev01,60000,1403283149823.splitLogManagerTimeout
Monitor] master.SplitLogManager$TimeoutMonitor: hdev01,60000,1403283149823.split
LogManagerTimeoutMonitor exiting
2014-06-20 12:52:40,696 INFO [master:hdev01:60000] zookeeper.ZooKeeper: Session
: 0x246ba2ab1e4001a closed
2014-06-20 12:52:40,696 INFO [main-EventThread] zookeeper.ClientCnxn: EventThre
ad shut down
2014-06-20 12:52:40,696 INFO [master:hdev01:60000] master.HMaster: HMaster main
thread exiting
2014-06-20 12:52:40,697 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: HMaster Aborted
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMaster
CommandLine.java:194)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandL
ine.java:135)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLi
ne.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2803)
I thought this might be some remnant of an old run so I deleted the files in hbases data directory, the zookeepers data directory and my hdfs. I still got the same error. Strangely my HMaster popper back up again temporarily when I ran stop-hbase.sh although there wasn't much I could do with it.
My Hbase version is 98.3 and my hadoop is 2.2.0. My hbase-site.comf is
<configuration>
<property>
<name>hbase.master</name>
<value>hdev01:60000</value>
<description>The host and port that the HBase master runs at.
A value of 'local' runs the master and a regionserver
in a single process.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hdev01:9000/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed
Zookeeper true: fully-distributed with unmanaged Zookeeper
Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>5181</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>10000</value>
<description></description>
</property>
<property>
<name>hbase.client.retries.number</name>
<value>10</value>
<description></description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hdev01,hdev02,hdev03</value>
<description>Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If
HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop
ZooKeeper on.
</description>
</property>
</configuration>
EDIT
Attempted hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair, my error now is HBase file layout needs to be upgraded. You have version null and I want version 8. Is your hbase.rootdir valid? If so, you may need to run 'hbase hbck -fixVersionFile'
Which is unhelpful since without a master hbck will not actually run.
Edited edit
I nuked and restarted my dfs and then tried repairing and starting things again, i am now back where i started.
hbase namespace is the internal namespace HBAse uses for its own management tables. Try to run the offline repair tool
from the $HBASE_HOME directory:
./bin/hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair
su - hdfs
hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair
(restart the hbase master.if still u are facing issue then do following)
zookeeper-client (enter)
rmr /hbase
quit
Then restart the hbase master service
#shash:
When HBase manages ZooKeeper( i.e. HBASE_manages_ZK=true), the command to access and clean hbase data is :
hbase zkcli. Afterwards you clean hbae using the command rmr /hbase, then you quit.

Resources