Adding more data nodes in Cloudera - hadoop

I have just installed Cloudera 4.7 and was wondering how I can add more data nodes to a cluster in Cloudera (either through the GUI or through Ubuntu terminal). Also, I was wondering if Cloudera automatically does this for me.
Thanks in advance.

If you are using Cloudera Manager, you can add new hosts to cluster using the Cloudera Manager UI.
Go to Hosts tab on top--> Click on "Add New Hosts to Cluster"
If you are not using Cloudera Manager, Then follow below steps:
1) Add new node's DNS name to the conf/slaves file on the master node
2) Then log in to the new slave node and execute:
$ cd path/to/hadoop
$ bin/hadoop-daemon.sh start datanode
$ bin/hadoop-daemon.sh start tasktracker
You don't need to restart the cluster.Node can be added dynamically.

Related

Which node is running Cloudera Manager out of N hadoop nodes?

I have a large hadoop cluster (24 nodes). I have CLI access to these nodes. First few is not running Cloudera Manager (cloudera-scm-server).
How can I find out which node is running Cloudera Manager?
Any help is appreciated.
Cloudera Manager will have two services. one is Server another is agents.
As you said you have CLI access to all the node. So run below command on all the nodes to find which is server and open (server will be running on only 1 machine)
sudo service cloudera-scm-server status
Another simple method to find CDH Server address
ssh to any node and move to /etc/cloudera-scm-agent. There you will find config.ini file, in that you will find server_host address

How to run hadoop balancer from client node?

I want to ask how can I run the hadoop balancer? I've tried before on the namenode to run hadoop balancer command, but it has no effect at all (my new datanode still empty). I also read that hadoop balancer is not run on namenode but on client node. So what is the client node, how can I configure it, and how can client node access the hadoop file system?
Thanks all, I need your suggest
Client node is also know as edge node, Usually all the developers in a organization will not have access to all nodes on cluster. So for developers to accesss cluster usually we will have a Client node. You need to install hadoop-client packages on client node. If you are using cloudera RPM based installation, you can use below command.
sudo yum install hadoop-client
After client node installation update your configuration files like core-site.xml, hdfs-site.xml and other required files. Now when you execute hadoop CLI commands, they will be executed on cluster.
Balancer can be run from any node in the cluster. It can be a client machine/any node in cluster.
sudo -u hdfs hdfs balancer
Regarding newly added datanode, Just check in the namenode web UI if your node is added ? If you are able to see there, just check logs.

CDH4 : Add new node to existing cluster

I have successfully created hadoop cluster with CDH4 on ubuntu . I have created this with one master(master) and one slave(slave1) . Now I want to add one more cluster . For this I just cloned slave2 and updated hosts and ssh accordingly . Then I updated conf/slaves file with all datanode dns names in all nodes and restarted everything . But it's not detecting the new datanode instead it only shows the old one that is slave1 not slave2 . Can anyone please help me on this ?
I have used cdh4-repository_1.0_all.deb
#user2009755, you need to create a master and slave file only in the master. And in configuration files in $HADOOP_HOME/etc/hadoop, make necessary changes to the URI pointing to the master node.NOTE: Try to format the namenode and delete the tmp files (usually /tmp/*) but if you changed it in core-site.xml, format that directory in all nodes and start all the daemons, it worked for me.
There is so many reasons,
Have you change the dfs.replication value to 3 in conf/hdfs-site.xml??
check on master with cammands hduser#master:~$ ssh slave it should be show the slave terminal if not then execute this cammand -hduser#master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser#slave
for fully understand see this link
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

hadoop 2 node cluster setup

Can anyone please tell how to create a hadoop 2 node cluster ?i have 2 Virual machines both have hadoop single node setup installed properly,please tell how to form 2 node cluster from that ,and mention proper network setup for that.Thanks in Advance.i am using hadoop -1.0.3 version.
Hi can you do one thing for network configuration of vms.
Make sure of the network connect of virtual machine setting of hostonly.
And add two hostname and ips in /etc/hosts file.
add two hostnames in $HADOOP_HOME/conf/slaves file.
Make sure of secondary namenode hostname in $HADOOP_HOME/conf/masters file.
Then make sure same configuration on two machine.
hadoop namenode -format
start-all.sh
Best of luck.

Multi Node Cluster Hadoop Setup

Pseudo-Distributed single node cluster implementation
I am using window 7 with CYGWIN and installed hadoop-1.0.3 successfully. I still start services job tracker,task tracker and namenode on port (localhost:50030,localhost:50060 and localhost:50070).I have completed single node implementation.
Now I want to implement Pseudo-Distributed multiple node cluster . I don't understand how to divide in master and slave system through network ips?
For your ssh problem just follow the link of single node cluster :
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
and yes, you need to specify the ip's of master and slave in conf file
for that you can refer this url :
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
I hope this helps.
Try to create number of VM you want to add in your cluster.Make sure that those VM are having the same hadoop version .
Figure out the IPs of each VM.
you will find files named master and slaves in $HADOOP_HOME/conf mention the IP of VM to conf/master file which you want to treat as master and and do the same with conf/slaves
with slave nodes IP.
Make sure these nodes are having Passwordless-ssh connection.
Format your namenode and then run start-all.sh.
Thanks,

Resources