Hbase is not starting in cloudera hadoop - hadoop

Service did not start successfully while starting the Hbase master. Not all the required roles started. Service has only 0 master roles running instead of minimum required 1 for Hbase in cloudera hadoop CDH5.3
A few months back, it worked fine. will the corrupted blocks in hdfs affect hbase while starting?

Related

Hadoop HDFS start up fails requires formatting

I have a multi-node standalone hadoop cluster for HDFS. I am able to load data to HDFS, however everytime I reboot my computer and start the cluster by start-dfs.sh, I don't see the dashboard until I perform hdfs namenode -format which erases all my data.
How do I start hadoop cluster without having to go through hdfs namenode -format?
You need to shutdown hdfs and the namenode cleanly (stop-dfs) before you shutdown your computer. Otherwise, you can corrupt the namenode, causing you to need to format to get back to a clean state

Apache HAWQ installation built on top of HDFS

I would like to install Apache HAWQ based on the Hadoop.
Before installing HAWQ, I should install Hadoop and configure the all my nodes.
I have four nodes as below and my question is as blow.
Should I install a hadoop distribution for hawq-master?
1. hadoop-master //namenode, Secondary Namenode, ResourceManager, HAWQ Standby,
2. hawq-master //HAWQ Master
3. datanode01 //Datanode, HAWQ Segment
4. datanode02 //Datanode, HAWQ Segment
I wrote the role of each node next to the nodes as above.
In my opinion, I should install hadoop for hadoop-master, datanode01 and datanode02 and I should set hadoop-master as namenode (master) and the others as datanode (slave). And then, I will install apache HAWQ on all the nodes. I will set hawq-master as a master node and hadoop-master as HAWQ Stand by and finally the other two nodes as HAWQ segment.
What I want is installing HAWQ based on the Hadoop. So, I think the hawq-master should be built on top of hadoop, but there are no connection with hadoop-master.
If I proceed above procedure, then I think that I don't have to install hadoop distribution on hawq-master. Is my thought right to successfully install the HAWQ installation based on the hadoop?
If hadoop should be installed on hawq-master then which one is correct?
1. `hawq-master` should be set as `namenode` .
2. `hawq-master` should be set as 'datanode`.
Any help will be appreciated.
Honestly, there is no strictly constraints on how the hadoop installed and hawq installed if they are configured correctly.
For your concern, "I think the hawq-master should be built on top of hadoop, but there are no connection with hadoop-master". IMO, it should be "hawq should be built on top of hadoop". And we configured the hawq-master conf files(hawq-site.xml) to make hawq have connections with hadoop.
Usually, for the hawq master and hadoop master, we could install each component on one node, but we could install some of them on one node to save nodes. But for HDFS datanode and HAWQ segment, we often install them together. Taking the workload of each machine, we could install them as below:
hadoop hawq
hadoop-master namenode hawq standby
hawq-master secondarynamenode hawq master
other node datanode segment
If you configure hawq with yarn integration, there would be resourcemanager and nodemanager in the cluster.
hadoop role hawq role
hadoop-master namenode hawq standby
hawq-master snamenode,resourcemanager hawq master
other node datanode, nodemanager segment
Install them together does not means they have connections, it's your config file that make them can reach each other.
You can install all the master component together, but there maybe too heavy for the machine. Read more information about Apache HAWQ at http://incubator.apache.org/projects/hawq.html and read some docs at http://hdb.docs.pivotal.io/211/hdb/index.html.
Besides, you could subscribe the dev and user mail list, send email to dev-subscribe#hawq.incubator.apache.org / user-subscribe#hawq.incubator.apache.org to subscribe and send emails to dev#hawq.incubator.apache.org / user#hawq.incubator.apache.org to ask questions.

How to install impala on an already running hadoop cluster

I have an already up and running Hadoop, 5-node cluster. I want to install Impala on the HDFS cluster without the Cloudera Manager. Can anyone supposedly guide me through the process or a link of the same.
Thanks.

Start multiple datanode in hadoop 2.x cluster

I was able to start up the the hadoop cluster with default configuration in Hadoop 2.2.0. How do I start additional DataNode pointing to the existing Cluster.

NoServerForRegionException while running Hadoop MapReduce job on HBase

I am executing a simple Hadoop MapReduce program with HBase as an input and output.
I am getting the error:
java.lang.RuntimeException: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for OutPut,,99999999999999 after 10 tries.
This exception appeared to us when there was difference in hbase version.
Our code was built with and running with 0.94.X version of hbase jars. Whereas the hbase server was running on 0.90.3.
When we changed our pom file with right version (0.90.3) of hbase jars it started working fine.
Query bin/hbase hbck and find in which machine Region server is running.
Make sure that all your Region server is up and running.
Use start regionserver for starting Region server
Even if Regionserver at the machine is started it may fail because of time sync.
Make sure you have NTP installed on all Regionserver nodes and HbaseMaster node.
As Hbase works on a key-value pair where it uses the Timestamp as the Index, So it allows a time skew less than 3 seconds.
Deleting (or move to /tmp) the WAL logs helped in our case:
hdfs dfs -mv /apps/hbase/data/MasterProcWALs/state-*.log /tmp

Resources