Path on various slaves node - hadoop

I have installed hadoop on 3 nodes, 1 master and 2 -slave nodes.
on master node and one of slave node is having same hadoop path i.e. /home/hduser/hadoop,
but in one slave node it is different, i.e. /usr/hadoop
so while running ./start-all.sh from master namenode and jobtarcker started, and datanode started on one slave that is having same hadoop path as master node,but on other slave node it is giving error like--
ngs-dell: bash: line 0: cd: /home/hduser/hadoop/libexec/..: No such file or directory
means it is searching on same path as master, but it have different path.
Please tell me how to solve this issue.
And one more doubt, is it compulsary that all hadoop node (master & slave) should have same username, in my case it is hduser. If I change on one node of hadoop cluster then it gives me error.

I think you may not change the 'hadoop.tmp.dir' settings of core-site.xml in the slave node.
you can check the answer in this post

Related

Hadoop datanode services is not starting in the slaves in hadoop

I am trying to configure hadoop-1.0.3 multinode cluster with one master and two slave in my laptop using vmware workstation.
when I ran the start-all.sh from master all daemon process running in master node (namenode,datanode,tasktracker,jobtracker,secondarynamenode) but Datanode and tasktracker is not starting on slave node. Password less ssh is enabled and I can do ssh for both master and slave from my masternode without pwd.
Please help me resolve this.
Stop the cluster.
If you have specifically defined tmp directory location in core-site.xml, then remove all files under those directory.
If you have specifically defined data node and namenode directory in hdfs-site.xml, then delete all the files under those directories.
If you have not defined anything in core-site.xml or hdfs-site.xml, then please remove all the files under /tmp/hadoop-*nameofyourhadoopuser.
Format the namenode.
It should work!

Should we change the master and slaves config file for a Hadoop cluster in all nodes?

I know that we should put the IP address of the master node in conf/master file and put the IP addresses of all slave nodes in conf/slaves file one per each single line. My question is should we do this only on the master node or we should also change these two files on all slave nodes as well? Furthermore, If I want the master node to be as a DataNode and TaskTracker as well, should I put the IP address of master in slaves file as well?
conf/slaves, conf/master configuration files should be maintained only on master nodes not in slave nodes .conf/masters files is used for specifying the secondarynamenode host. start-all.sh consists of start-mapred.sh and start-dfs.sh. If you want to start Job tracker on a node then start-mapred.sh script should be executed on that node and based on it's conf/slaves files all Tasktracker services will be started on mentioned nodes.
Similarly start-dfs.sh script should be executed in Namenode machine, based on the values of conf/masters and conf/slaves, secondarynamenode and Datanodes will be started on the corresponding nodes respectively.

DataNode doesn't start in one of the slaves

I am trying to configure hadoop with 5 slaves. After I run start-dfs.sh in the master there is only one slave node which doesn't run DataNode. I tried looking for some difference in the configuration files in that node but I didn't find anything.
There WAS a difference in the configuration files! In the core-site.xml the hadoop.tmp.dir variable was set to a invalid directory so it couldn't be created when the DataNode was started. Lesson learned: look in the logs (Thanks Chris)

Hadoop master cannot start slave with different $HADOOP_HOME

In master, the $HADOOP_HOME is /home/a/hadoop, the slave's $HADOOP_HOME is /home/b/hadoop
In master, when I try to using start-all.sh, then the master name node start successfuly, but fails to start slave's data node with following message:
b#192.068.0.2: bash: line 0: cd: /home/b/hadoop/libexec/..: No such file or directory
b#192.068.0.2: bash: /home/b/hadoop/bin/hadoop-daemon.sh: No such file or directory
any idea on how to specify the $HADOOP_HOME for slave in master configuration?
I don't know of a way to configure different home directories for the various slaves from the master, but the Hadoop FAQ says that the Hadoop framework does not require ssh and that the DataNode and TaskTracker daemons can be started manually on each node.
I would suggest writing you own scripts to start things that take into account the specific environments of your nodes. However, make sure to include all the slaves in the master's slave file. It seems that this is necessary and that the heart beats are not enough for a master to add slaves.

How to remove a hadoop node from DFS but not from Mapred?

I am fairly new to hadoop. For running some benchmarks, I need variety of hadoop configuration for comparison.
I want to know a method to remove a hadoop slave from DFS (not running datanode daemon anymore) but not from Mapred (keep running tasktracker), or vice-versa.
AFAIK, there is a single slave file for such hadoop nodes and not separate slave files for DFS and Mapred.
Currently, I am trying to start both DFS and Mapred on the slave node , and then killing datanode on the slave. But it takes a while to put that node in to 'dead nodes' on HDFS GUI. Any parameter can be tuned to make this timeout quicker ?
Thankssss
Try using dfs.hosts and dfs.hosts.exclude in the hdfs-site.xml, mapred.hosts and mapred.hosts.exclude in mapred-site.xml. These are for allowing/excluding hosts to connect to the NameNode and the JobTracker.
Once the list of nodes in the files has been updated appropriately, the NameNode and the JobTracker have to be refreshed using the hadoop dfsadmin -refreshNodes and hadoop mradmin -refreshNodes command respectively.
Instead of using slaves file to start all processes on your cluster, you can start only required daemons on each machine if you have few nodes.

Resources