HBase Region servers will not start in Hadoop HA environment - hadoop

I've created an HBase cluster in a Hadoop HA cluster. My region servers are failing to start with the following exception in the logs:
2017-09-12 11:41:32,116 ERROR [regionserver/my.hostname.com/10.10.30.28:16020] regionserver.HRegionServer: Failed init
java.io.IOException: Failed on local exception: java.net.SocketException: Invalid argument; Host Details : local host is: "my.hostname.com/10.10.30.28"; destination host is: "0.0.0.1":8020;
I'm pretty sure the problem is caused by the hadoop HA configuration
I think Hbase doesn't understand the nameservice and thinks it's an IP address.
excerpt from core-site.xml:
<property>
<name>fs.defaultFS</name>
<value>hdfs://001</value>
<description>NameNode URI</description>
</property>
excerpt from hdfs-site.xml:
<property>
<name>dfs.nameservices</name>
<value>001</value>
</property>
my hbase-site.xml:
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://001/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>zk1:2181,zk2:2181,zk3:2181</value>
</property>
</configuration>
Help?

It was a silly mistake. Hbase was missing the path to the hadoop configuration files. Simply added HADOOP_CONF_DIR to hbase-env.sh

Related

Not able to create tables in hbase

I have installed hbase in my local ubuntu vm , which already have hadoop running in psuedo distribution mode.Hadoop version is 3.1.2 and hbase version is 2.1.2.
i'm able to get the hbase shell but when try to create a table get the following error:
hbase(main):001:0> create 'test', 'data'
ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2969)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1972)
at org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:630)
at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
on further investigation i found this error in the regionserver logs
2019-02-12 06:21:21,484 ERROR [RS_OPEN_META-regionserver/ubuntu:16020-0] handler.OpenRegionHandler: Failed open of region=hbase:meta,,1.1588230740
java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper
at org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:51)
at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:167)
at org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:166)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:113)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:612)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:124)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:756)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:486)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:251)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:73)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:48)
at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:152)
at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:60)
at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:284)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2104)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
are these two related , Can somebody provide a solution for this please
this is my hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/bithu/hbase-temp</value>
</property>
<property>
<name>hbase.master.port</name>
<value>7000</value>
</property>
<property>
<name>hbase.regionserver.port</name>
<value>7010</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/home/bithu/hbase-temp/tmp</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
</configuration>
I have faced the same issue running Hbase-2.1.5 on top of Hadoop 3.1.3.
Adding below property in hbase-site.xml would solve this issue.
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>>

unable to connect to hdfs on localhost

I am unable to connect to hdfs on port 9000, I keep getting this error:
localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused
hdfs-site.xml file is this:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/localhdfs/datanode</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
</property>
</configuration>
and core-site.xml file is this:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
I have restarted the cluster multiple times, I keep getting connection errors:
this is my /etc/hosts file look like:
127.0.0.1 localhost
what am I missing?
Why are you using port 9000 in your config? fs.defaultFS should contain something like: hdfs://nameofcluster
Is this a single node instance? Sandbox? Are you running the command hdfs dfs -ls /?
I would first check:
Remove port from fs.default.name
iptables or Firewalls
hadoop.proxyuser.hdfs.hosts
hadoop.proxyuser.hdfs.groups
Ranger
Logs are exceeding 80% of disk
/etc/hosts file contains the FQDN and your machine's public IP.
Get the IP with ip a command and set is like: 192.166.6.6 abc.xxx.com
Delete fs.default.name property from hdfs-site.xml
Configure a Single node cluster

Failed to start Hadoop namenode in cloudera CDH5 on debian OS

I am installing Cloudera on debian OS.
this is my core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-test-1:8020</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>0</value>
</property>
<property>
<name>fs.trash.checkpoint.interval</name>
<value>0</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property>
</configuration>
when i try to start the name node I am getting the following errors.
# sudo service hadoop-hdfs-namenode start
these are the errors that i have received when i execute the above cmd
FAIL - Failed to start Hadoop namenode. Return value: 1 ... failed!
in hadoop-hdfs-namenode-hadoop-test-1.log file :
Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.

Getting connection refused while reading file from hdfs using pyspark

I installed hadoop 2.7, set the paths and set the configurations in core-site.xml and hdfs-site.xml as follows:
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://<ip_addr>:9000/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/kavya/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/kavya/hdfs/name</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://<ip_addr>:9000/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/kavya/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/kavya/hdfs/name</value>
</property>
</configuration>
I also started the hdfs using start-dfs.sh. Inspite of mentioning the IP address in the configuration, I get connection refused error like:
Call From spark/<ip_addr> to localhost:8020 failed on connection exception: java.net.ConnectException:Connection refused
I stored file onto hdfs from my vm using:
hadoop fs -put /opt/TestLogs/traffic_log.log /usr/local/hadoop/TestLogs
This is a part of my code in pyspark to read file from hdfs and then extract the fields:
file = sc.textFile("hdfs://<ip_addr>/usr/local/hadoop/TestLogs/traffic_log.log")
result = file.filter(lambda x: len(x)>0)
result = result.map(lambda x: x.split("\n"))
print(result) # PythonRDD[2] at RDD at PythonRDD.scala
lines = result.map(func1).collect() #this is where I get the connection refused error.
print(lines)
func1 is function containing regex expressions to extract the fields from my logs. And then the result is returned to lines. This program works perfectly fine when reading text file directly from vm.
Spark version:spark-2.0.2-bin-hadoop2.7
VM: CentOS
How to resolve this error? Am I missing out something?
Two things need to be set:
1) In hdfs-site.xml make sure you have permissions disabled:
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<property>
2) In core-site.xml set your IP address to the IP address of the master:
<property>
<name>fs.defaultFS</name>
<value>hdfs://<MASTER IP ADDRESS>:8020</value>
<property>

Unable to start-hbase.sh (throws error)

When I start-hbase.sh, I get the following error
localhost: starting zookeeper, logging to /usr/lib/HBase/bin/../logs/hbase-hduser-zookeeper-nkhl.out
localhost: java.io.FileNotFoundException: /home/hduser/zookeeperpropertydataDir/myid (Permission denied)
localhost: at java.io.FileOutputStream.open0(Native Method)
localhost: at java.io.FileOutputStream.open(FileOutputStream.java:270)
localhost: at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
localhost: at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
localhost: at java.io.PrintWriter.<init>(PrintWriter.java:263)
localhost: at org.apache.hadoop.hbase.zookeeper.HQuorumPeer.writeMyID(HQuorumPeer.java:162)
localhost: at org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:70)
starting master, logging to /usr/lib/HBase/logs/hbase-hduser-master-nkhl.out
OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
regionserver running as process 25123. Stop it first.
After this, when I do hbase shell, it does open up, but when I list it throws this error:
ERROR: Can't get master address from ZooKeeper; znode data == null
Here is some help for this command:
List all tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:
hbase> list
hbase> list 'abc.*'
hbase> list 'ns:abc.*'
hbase> list 'ns:.*'
This is jps:
25123 HRegionServer
23975 SecondaryNameNode
23767 DataNode
24168 ResourceManager
26456 HMaster
26665 Jps
24297 NodeManager
23613 NameNode
Zookeeper starts fine:
ZooKeeper JMX enabled by default Using config:
/usr/lib/zookeeper/conf/zoo.cfg Starting zookeeper ... STARTED
My hbase-site.xml configuration:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:54433/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hduser/zookeeperpropertydataDir</value>
</property>
<property >
<name>hbase.master.port</name>
<value>60010</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description> The port at which the clients will connect.</description>
</property>
</configuration>
This is my hbase-env.sh configuration:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
export HBASE_MANAGES_ZK=true
Any help in this will be appreciated.
Hbase zookeeper deamon HQuorumPeer is not running. one of the reason can be below dir doesn't exist as shown in logs:-
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hduser/zookeeperpropertydataDir</value>
</property>
Make sure the file that ZooKeeper is using has been created and has the right privileges.
Use chmod to grant access to all users. This fixed my problem.
chmod -R 777 path/file_name
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hduser /zookeeperpropertydataDir</value>
</property>

Resources