hadoop: different datanodes configuration in shared directory - hadoop

I try to run hadoop in clustered server machines.
But, problem is server machines uses shared directories, but file directories are not physically in one disk. So, I guess if I configure different datanode direcotry in each machine (slave), I can run hadoop without disk/storage bottleneck.
How do I configure datanode differently in each slave or
How do I configure setup for master node to find hadoop that are installed in different directory in slave node when starting namenode and datanodes using "start-dfs.sh" ?
Or, is there some fancy way for this environment?
Thanks!

Related

Hadoop datanode services is not starting in the slaves in hadoop

I am trying to configure hadoop-1.0.3 multinode cluster with one master and two slave in my laptop using vmware workstation.
when I ran the start-all.sh from master all daemon process running in master node (namenode,datanode,tasktracker,jobtracker,secondarynamenode) but Datanode and tasktracker is not starting on slave node. Password less ssh is enabled and I can do ssh for both master and slave from my masternode without pwd.
Please help me resolve this.
Stop the cluster.
If you have specifically defined tmp directory location in core-site.xml, then remove all files under those directory.
If you have specifically defined data node and namenode directory in hdfs-site.xml, then delete all the files under those directories.
If you have not defined anything in core-site.xml or hdfs-site.xml, then please remove all the files under /tmp/hadoop-*nameofyourhadoopuser.
Format the namenode.
It should work!

hadoop 2 node cluster setup

Can anyone please tell how to create a hadoop 2 node cluster ?i have 2 Virual machines both have hadoop single node setup installed properly,please tell how to form 2 node cluster from that ,and mention proper network setup for that.Thanks in Advance.i am using hadoop -1.0.3 version.
Hi can you do one thing for network configuration of vms.
Make sure of the network connect of virtual machine setting of hostonly.
And add two hostname and ips in /etc/hosts file.
add two hostnames in $HADOOP_HOME/conf/slaves file.
Make sure of secondary namenode hostname in $HADOOP_HOME/conf/masters file.
Then make sure same configuration on two machine.
hadoop namenode -format
start-all.sh
Best of luck.

Hadoop fuse on multinode

I need to use hadoop fuse to mount HDFS on a multi-node cluster. How can I achieve that?
I have successfully deployed fuse on a single-node cluster, but I doubt it would work on multi-node. Can anyone please throw light over this ?
It doesn't matter, whether your cluster is single node or multinode. If you want to mount HDFS on a remote machine, make sure that particular machine has access to cluster network. Setup a hadoop client(with the same hadoop version in cluster) in the node in which you are planning to mount HDFS using FUSE.
The difference while mounting is namenode url.
(dfs://NAMENODEHOST:NN-IPC-PORT/)
In case of single node namenode url would be localhost(0.0.0.0/127.0.0.1/0), but in multinode cluster you have to give namenode Hostname/Ip address instead of localhost. It's possible to mount hdfs in any linux machines which can access hadoop cluster.
Trying to use Fuse to mount HDFS. Can't compile libhdfs

[hdfs]how to configure different dfs.datanode.data.dir for each datanode?

I use ambari to setup a hadoop cluster.
but when I configure the hdfs's config. I found that if I modify the dfs.datanode.data.dir, the configure will take effect on all datanodes...
How could I configure different configs for each datanode?
for example, there are two disks in machine A, which is mounted to /data1, /data2
but there is only one disk in machine B, which is mounted to /data1
so I want to configure the dfs.datanode.data.dir to "/data1,/data2" for machine A.
but only "/data1" for machine B
HDFS directories that don't exist will be ignored. Put them all in, it won't matter.
Remember that each Hadoop node in the cluster has its own set of configuration files too (under the usual conf\ dir). So you can login to that data-node machine and change config files.
The local configuration on data-node will take effect for that data-node.

Multi Node Cluster Hadoop Setup

Pseudo-Distributed single node cluster implementation
I am using window 7 with CYGWIN and installed hadoop-1.0.3 successfully. I still start services job tracker,task tracker and namenode on port (localhost:50030,localhost:50060 and localhost:50070).I have completed single node implementation.
Now I want to implement Pseudo-Distributed multiple node cluster . I don't understand how to divide in master and slave system through network ips?
For your ssh problem just follow the link of single node cluster :
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
and yes, you need to specify the ip's of master and slave in conf file
for that you can refer this url :
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
I hope this helps.
Try to create number of VM you want to add in your cluster.Make sure that those VM are having the same hadoop version .
Figure out the IPs of each VM.
you will find files named master and slaves in $HADOOP_HOME/conf mention the IP of VM to conf/master file which you want to treat as master and and do the same with conf/slaves
with slave nodes IP.
Make sure these nodes are having Passwordless-ssh connection.
Format your namenode and then run start-all.sh.
Thanks,

Resources