Adding external ssd to single node hadoop cluster - hadoop

I have to add external ssd to my single node hadoop cluster and make use of that disk as datanode where my blocks will be stored.
I have a running apache single node hadoop cluster . But now requirements are that can we use it as another datanode directory and how??
Thanks in advance
Shakir

Yes you may add it as a data-node. Install hadoop on the new node, setup passwordless ssh and copy the configuration to the new node.
Read this: Steps to add a node in Hadoop cluster

Related

How to run hadoop balancer from client node?

I want to ask how can I run the hadoop balancer? I've tried before on the namenode to run hadoop balancer command, but it has no effect at all (my new datanode still empty). I also read that hadoop balancer is not run on namenode but on client node. So what is the client node, how can I configure it, and how can client node access the hadoop file system?
Thanks all, I need your suggest
Client node is also know as edge node, Usually all the developers in a organization will not have access to all nodes on cluster. So for developers to accesss cluster usually we will have a Client node. You need to install hadoop-client packages on client node. If you are using cloudera RPM based installation, you can use below command.
sudo yum install hadoop-client
After client node installation update your configuration files like core-site.xml, hdfs-site.xml and other required files. Now when you execute hadoop CLI commands, they will be executed on cluster.
Balancer can be run from any node in the cluster. It can be a client machine/any node in cluster.
sudo -u hdfs hdfs balancer
Regarding newly added datanode, Just check in the namenode web UI if your node is added ? If you are able to see there, just check logs.

hadoop 2 node cluster setup

Can anyone please tell how to create a hadoop 2 node cluster ?i have 2 Virual machines both have hadoop single node setup installed properly,please tell how to form 2 node cluster from that ,and mention proper network setup for that.Thanks in Advance.i am using hadoop -1.0.3 version.
Hi can you do one thing for network configuration of vms.
Make sure of the network connect of virtual machine setting of hostonly.
And add two hostname and ips in /etc/hosts file.
add two hostnames in $HADOOP_HOME/conf/slaves file.
Make sure of secondary namenode hostname in $HADOOP_HOME/conf/masters file.
Then make sure same configuration on two machine.
hadoop namenode -format
start-all.sh
Best of luck.

Hadoop fuse on multinode

I need to use hadoop fuse to mount HDFS on a multi-node cluster. How can I achieve that?
I have successfully deployed fuse on a single-node cluster, but I doubt it would work on multi-node. Can anyone please throw light over this ?
It doesn't matter, whether your cluster is single node or multinode. If you want to mount HDFS on a remote machine, make sure that particular machine has access to cluster network. Setup a hadoop client(with the same hadoop version in cluster) in the node in which you are planning to mount HDFS using FUSE.
The difference while mounting is namenode url.
(dfs://NAMENODEHOST:NN-IPC-PORT/)
In case of single node namenode url would be localhost(0.0.0.0/127.0.0.1/0), but in multinode cluster you have to give namenode Hostname/Ip address instead of localhost. It's possible to mount hdfs in any linux machines which can access hadoop cluster.
Trying to use Fuse to mount HDFS. Can't compile libhdfs

hadoop: different datanodes configuration in shared directory

I try to run hadoop in clustered server machines.
But, problem is server machines uses shared directories, but file directories are not physically in one disk. So, I guess if I configure different datanode direcotry in each machine (slave), I can run hadoop without disk/storage bottleneck.
How do I configure datanode differently in each slave or
How do I configure setup for master node to find hadoop that are installed in different directory in slave node when starting namenode and datanodes using "start-dfs.sh" ?
Or, is there some fancy way for this environment?
Thanks!

Multi Node Cluster Hadoop Setup

Pseudo-Distributed single node cluster implementation
I am using window 7 with CYGWIN and installed hadoop-1.0.3 successfully. I still start services job tracker,task tracker and namenode on port (localhost:50030,localhost:50060 and localhost:50070).I have completed single node implementation.
Now I want to implement Pseudo-Distributed multiple node cluster . I don't understand how to divide in master and slave system through network ips?
For your ssh problem just follow the link of single node cluster :
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
and yes, you need to specify the ip's of master and slave in conf file
for that you can refer this url :
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
I hope this helps.
Try to create number of VM you want to add in your cluster.Make sure that those VM are having the same hadoop version .
Figure out the IPs of each VM.
you will find files named master and slaves in $HADOOP_HOME/conf mention the IP of VM to conf/master file which you want to treat as master and and do the same with conf/slaves
with slave nodes IP.
Make sure these nodes are having Passwordless-ssh connection.
Format your namenode and then run start-all.sh.
Thanks,

Resources