Start multiple datanode in hadoop 2.x cluster - hadoop

I was able to start up the the hadoop cluster with default configuration in Hadoop 2.2.0. How do I start additional DataNode pointing to the existing Cluster.

Related

Adding a node to hadoop cluster without restarting master

i have created a hadoop cluster and wanted to add a new node node in the cluster running as a slave without restarting the master node
how can this be acheived
Datanodes and nodemanagers can be added without restarting the namenode(s) or resource manager(s).
More specifically, these need to be ran on the machines of those running services
Namenode
hdfs dfsadmin -refreshNodes
ResourceManager
rmadmin -refreshNodes

Hbase is not starting in cloudera hadoop

Service did not start successfully while starting the Hbase master. Not all the required roles started. Service has only 0 master roles running instead of minimum required 1 for Hbase in cloudera hadoop CDH5.3
A few months back, it worked fine. will the corrupted blocks in hdfs affect hbase while starting?

Apache HAWQ installation built on top of HDFS

I would like to install Apache HAWQ based on the Hadoop.
Before installing HAWQ, I should install Hadoop and configure the all my nodes.
I have four nodes as below and my question is as blow.
Should I install a hadoop distribution for hawq-master?
1. hadoop-master //namenode, Secondary Namenode, ResourceManager, HAWQ Standby,
2. hawq-master //HAWQ Master
3. datanode01 //Datanode, HAWQ Segment
4. datanode02 //Datanode, HAWQ Segment
I wrote the role of each node next to the nodes as above.
In my opinion, I should install hadoop for hadoop-master, datanode01 and datanode02 and I should set hadoop-master as namenode (master) and the others as datanode (slave). And then, I will install apache HAWQ on all the nodes. I will set hawq-master as a master node and hadoop-master as HAWQ Stand by and finally the other two nodes as HAWQ segment.
What I want is installing HAWQ based on the Hadoop. So, I think the hawq-master should be built on top of hadoop, but there are no connection with hadoop-master.
If I proceed above procedure, then I think that I don't have to install hadoop distribution on hawq-master. Is my thought right to successfully install the HAWQ installation based on the hadoop?
If hadoop should be installed on hawq-master then which one is correct?
1. `hawq-master` should be set as `namenode` .
2. `hawq-master` should be set as 'datanode`.
Any help will be appreciated.
Honestly, there is no strictly constraints on how the hadoop installed and hawq installed if they are configured correctly.
For your concern, "I think the hawq-master should be built on top of hadoop, but there are no connection with hadoop-master". IMO, it should be "hawq should be built on top of hadoop". And we configured the hawq-master conf files(hawq-site.xml) to make hawq have connections with hadoop.
Usually, for the hawq master and hadoop master, we could install each component on one node, but we could install some of them on one node to save nodes. But for HDFS datanode and HAWQ segment, we often install them together. Taking the workload of each machine, we could install them as below:
hadoop hawq
hadoop-master namenode hawq standby
hawq-master secondarynamenode hawq master
other node datanode segment
If you configure hawq with yarn integration, there would be resourcemanager and nodemanager in the cluster.
hadoop role hawq role
hadoop-master namenode hawq standby
hawq-master snamenode,resourcemanager hawq master
other node datanode, nodemanager segment
Install them together does not means they have connections, it's your config file that make them can reach each other.
You can install all the master component together, but there maybe too heavy for the machine. Read more information about Apache HAWQ at http://incubator.apache.org/projects/hawq.html and read some docs at http://hdb.docs.pivotal.io/211/hdb/index.html.
Besides, you could subscribe the dev and user mail list, send email to dev-subscribe#hawq.incubator.apache.org / user-subscribe#hawq.incubator.apache.org to subscribe and send emails to dev#hawq.incubator.apache.org / user#hawq.incubator.apache.org to ask questions.

unable to see Task tracker and Jobtracker after Hadoop single node installation 2.5.1

Iam new to Hadoop 2.5.1. As i have already installed Hadoop 1.0.4 previously, i thought installation process would be same so followed following tutorial.
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Every thing was fine, even i have given these settings in core-site.xml
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
But i have seen in several sites this value as 9000.
And also changes in yarn.xml.
Still everything works fine when i run a mapreduce job. But my question is
when i run command jps it gives me this output..
hduser#secondmaster:~$ jps
5178 ResourceManager
5038 SecondaryNameNode
4863 DataNode
5301 NodeManager
4719 NameNode
6683 Jps
I dont see task tracker and job tracker in jps. Where are these demons running.
And without these deamons how am i able to run Mapreduce job.
Thanks,
Sreelatha K.
From hadoop version hadoop 2.0 onwards, default processing framework has been changed to YARN from Classic Mapreduce. You are using YARN, where you cannot see Jobtracker, Tasker in YARN. Jobtracker and Tasktracker is replaced by Resource manager and Nodemanager respectively in YARN.
But still you have an option to use Classic Mapreduce framework instead of YARN.
In Hadoop 2 there is an alternative method to run MapReduce jobs, called YARN. Since you have made changes in yarn.xml, MapReduce processing happens using YARN, not using the traditional MapReduce framework. That's probably be the reason why you don't see TaskTracker and JobTracker listed after executing the jps command. Note that ResourceManager and NodeManager are the daemons for YARN.
YARN is next generation of Resource Manager who can able to integrate with Apache spark, storm and many more tools you can use to write map-reduce jobs

Zookeer is part of hadoop or separate configuration?

As I read from various tuts, zookeeper helps to coordinate and sync various hadoop clusters.
Currently I installed hadoop 2.5.0. When I do jps it displays
4494 SecondaryNameNode
8683 Jps
4679 ResourceManager
3921 NameNode
4174 DataNode
4943 NodeManager
no process for zookeeper.
I had doubt whether zookeeper is part of hdfs or we need to install it manually?
If you use hadoop only, zookeeper is not required! for other tools in hadoop, i.e. hbase, it depends on zookeeper! but you don't need install it dedicatedly, hbase has included it, if you startup hbase, the zookeeper will startup at the same time.

Resources