Hadoop datanode services is not starting in the slaves in hadoop - hadoop

I am trying to configure hadoop-1.0.3 multinode cluster with one master and two slave in my laptop using vmware workstation.
when I ran the start-all.sh from master all daemon process running in master node (namenode,datanode,tasktracker,jobtracker,secondarynamenode) but Datanode and tasktracker is not starting on slave node. Password less ssh is enabled and I can do ssh for both master and slave from my masternode without pwd.
Please help me resolve this.

Stop the cluster.
If you have specifically defined tmp directory location in core-site.xml, then remove all files under those directory.
If you have specifically defined data node and namenode directory in hdfs-site.xml, then delete all the files under those directories.
If you have not defined anything in core-site.xml or hdfs-site.xml, then please remove all the files under /tmp/hadoop-*nameofyourhadoopuser.
Format the namenode.
It should work!

Related

Problem with setup multi-node cluster hadoop on windows

I am trying to set up a multi-node Hadoop cluster between 2 windows devices. I am using Hadoop 3.3.1. how can I achieve that, please
(note: I was able to successfully create in Single-Node)
of course, I did these steps :
My network will be as follows:
master-IP : 192.168.81.144
slave-IP : 192.168.81.145
add below lines in Master-mode and SlaveNode C:\Windows\System32\drivers\etc\hosts
192.168.81.144 hadoopMaster
192.168.81.145 hadoopSlave
in master, ping salve-IP worked
in slave, ping master-IP worked
add below lines in the workers file
hadoopMaster
hadoopSlave
3)format the HDFS file system
hdfs namenode -format
run below script on master node
start-dfs
but the slave node not working

Should we change the master and slaves config file for a Hadoop cluster in all nodes?

I know that we should put the IP address of the master node in conf/master file and put the IP addresses of all slave nodes in conf/slaves file one per each single line. My question is should we do this only on the master node or we should also change these two files on all slave nodes as well? Furthermore, If I want the master node to be as a DataNode and TaskTracker as well, should I put the IP address of master in slaves file as well?
conf/slaves, conf/master configuration files should be maintained only on master nodes not in slave nodes .conf/masters files is used for specifying the secondarynamenode host. start-all.sh consists of start-mapred.sh and start-dfs.sh. If you want to start Job tracker on a node then start-mapred.sh script should be executed on that node and based on it's conf/slaves files all Tasktracker services will be started on mentioned nodes.
Similarly start-dfs.sh script should be executed in Namenode machine, based on the values of conf/masters and conf/slaves, secondarynamenode and Datanodes will be started on the corresponding nodes respectively.

hadoop: different datanodes configuration in shared directory

I try to run hadoop in clustered server machines.
But, problem is server machines uses shared directories, but file directories are not physically in one disk. So, I guess if I configure different datanode direcotry in each machine (slave), I can run hadoop without disk/storage bottleneck.
How do I configure datanode differently in each slave or
How do I configure setup for master node to find hadoop that are installed in different directory in slave node when starting namenode and datanodes using "start-dfs.sh" ?
Or, is there some fancy way for this environment?
Thanks!

DataNode doesn't start in one of the slaves

I am trying to configure hadoop with 5 slaves. After I run start-dfs.sh in the master there is only one slave node which doesn't run DataNode. I tried looking for some difference in the configuration files in that node but I didn't find anything.
There WAS a difference in the configuration files! In the core-site.xml the hadoop.tmp.dir variable was set to a invalid directory so it couldn't be created when the DataNode was started. Lesson learned: look in the logs (Thanks Chris)

Hadoop master cannot start slave with different $HADOOP_HOME

In master, the $HADOOP_HOME is /home/a/hadoop, the slave's $HADOOP_HOME is /home/b/hadoop
In master, when I try to using start-all.sh, then the master name node start successfuly, but fails to start slave's data node with following message:
b#192.068.0.2: bash: line 0: cd: /home/b/hadoop/libexec/..: No such file or directory
b#192.068.0.2: bash: /home/b/hadoop/bin/hadoop-daemon.sh: No such file or directory
any idea on how to specify the $HADOOP_HOME for slave in master configuration?
I don't know of a way to configure different home directories for the various slaves from the master, but the Hadoop FAQ says that the Hadoop framework does not require ssh and that the DataNode and TaskTracker daemons can be started manually on each node.
I would suggest writing you own scripts to start things that take into account the specific environments of your nodes. However, make sure to include all the slaves in the master's slave file. It seems that this is necessary and that the heart beats are not enough for a master to add slaves.

Resources