Hadoop Multi-Node Cluster Installation on Ubuntu Issue - Troubleshoot - hadoop

I have three Ubuntu 12.04 LTS computers that I want to install Hadoop on in a Master/Slave configuration as described here. It says to first install Hadoop as a single node and then proceed to multi-node. The single node installation works perfectly fine. I made the required changes to the /etc/hosts file and configured everything just as the guide says, but when I start the Hadoop cluster on the master, I get an error.
My computers, aptly named ironman, superman and batman, with batman (who else?) being the master node. When I do sudo bin/start-dfs.sh, the following shows up.
When I enter the password, I get this:
When I try sudo bin/start-all.sh, I get this:
I can ssh into the different terminals, but there's something that's not quite right. I checked the logs on superman/slave terminal and it says that it can't connect to batman:54310 and some zzz message. I figured my /etc/hosts is wrong but in fact, it is:
I tried to open port 54310 by changing iptables, but the output screens shown here are after I made the changes. I'm at my wit's end. Please tell me where I'm going wrong. Please do let me know if you need any more information and I will update the question accordingly. Thanks!
UPDATE: Here are my conf files.
core-site.xml Please note that I had put batman:54310 instead of the IP address. I only changed it because I thought I'd make the binding more explicit.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://130.65.153.195:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>130.65.153.195:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
My conf/masters file is simply batman and my conf/slaves file is just:
batman
superman
ironman
Hope this clarifies things.

First things first: Make sure you can ping the master from slave and slave from master. Login to each machine individually and ping the other 2 hosts. Make sure they are reachable via their hostnames. It is possible that you have not add /etc/hosts entries in the slaves.
Secondly, you need to setup passwordless SSH access. You can use ssh-keygen -t rsa and ssh-copy-id for this. This will help remove the password prompts. It is a good idea to create a separate user for this (and not use root).
If this doesn't help, please post your log output.

Related

How to access my Namenode GUI in hadoop outside the GCP instance in browser

I just set up single node HADOOP setup on a GCP instance. Doing JPS command is showing all the processes are running fine.
I want to access the GUI of my namenode. I am using http://localhost:50070/ on my laptop browser.
It shows This site can’t be reached
Coresite.xml
hduser#laptop:~$ vi /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description></description>
</property>
</configuration>
Mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>
</description>
</property>
</configuration>
Solution attempted:
I have tried replacing my values in <value> tag with the public DNS of GCP instance but then the namenode stopped working.
Anyone having any idea here what i am doing wrong??
I found the answer to this problem:
you need to use your public IP and port number
check your firewall setting it should allow all the traffic in inbound rules in
AWS and firewall setting in GCP

“WARN hdfs.DFSUtil: Namenode for null remains unresolved for ID null.”

I want to test if my hadoop worked well after configuration, but after the input, the command start-all.sh show below error in terminal
WARN hdfs.DFSUtil: Namenode for null remains unresolved for ID null.
Check your hdfs-site.xml file to ensure namenodes are configured
properly.
Starting namenodes on [master]
master: ssh: Could not resolve hostname master: Name or service not known
I checked my hdfs-site.xml file and resolved it as others given like this
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/lidekanfa/tools/hadoop-2.7.7/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/lidekanfa/tools/hadoop-2.7.7/hdfs/data</value>
</property>
</configuration>
It still doesn't work. Then I checked my hosts file and I have given the ip and name ,and more I can log in the slave without password. What is the problem?
Thanks a lot!
I have got the answer. There are 2 points.
First, my master's name called lidekanfa, not master. But in hdfs-site.xml file and other configuration files where should use master's name(lidekanfa), I used master instead. So it warns Namenode for null remains unresolved for ID null.
Second, There is an another hidden problem for me. In the installation tutorials for beginner, they use the same user name such as root etc among machines, but I didn't notice that. This lead the problem that after I fixed the problem mentioned above, it called me to input the password, but the user name and the ID didn't match, so the hadoop didn't work. To solve this problem, I reproduce the keys and start hadoop with root identity. Meanwhile, you may rewrite the sshd_config file to allow login as root. You can also solve this problem use the same user name among machines.
I too had the same problem.The problem was with my core-site.xml. after correcting the localhost, it worked fine. Namenode was able to connect the localhost.
In my case
error core-site.xml : <value>hdfs://localhosts:9000</value>
corrected core-site.xml : <value>hdfs://localhost:9000</value>

Raspberry Pi Hadoop Cluster Configuration

I've recently been trying to build and configure a (8-Pi) Raspberry Pi 3 Hadoop-cluster (as a personal project over the summer). Please bear with me (unfortunately I am a little new to Hadoop). I am using is Hadoop version 2.9.2. I think its important to note that right now I am trying to just get one Namenode and one Datanode completely functional with one-another, before moving ahead and replicating the same procedure on the remaining seven Pi's.
The issue: My Namenode (alias: master) is the only node that is being displayed as a 'Live Datanode' under both the dfs-health interface, and through the use of :
dfsadmin -report
Even though the Datanode is being displayed as an 'Active Node' (within the Nodes of the cluster Hadoop UI) and 'master' is not listed within the slaves file. The configuration I am aiming for is that the Namenode should not perform any of Datanode operations. Additionally I am trying to configure the cluster in such a way that the command above will display my Datanode (alias: slave-01) as a 'Live Datanode'.
I suspect that my issue is caused by the fact that both my Namenode and Datanode make use of the same host-name (raspberrypi), however am unsure of the configuration changes I am required to make in order to correct the issue. After having looked into the documentation, I unfortunately couldn't find a conclusive answer as to whether this is allowed or not.
If someone could please help me solve this issue it would be extremely appreciated! I have provided any relevant file-information below (which I thought may be useful for solving the issue). Thank you :)
PS: All files are identical within the Namenode and Datanode unless otherwise specified.
===========================================================================
Update 1
I have removed localhost from the slaves file on both the Namenode and Datanode, and changed their respective hostnames to 'master' and 'slave-01' as well.
After running JPS: I have noticed that all of the correct processes are running on the master node, however I am having an error on the Datanode for which the log shows:
ExitCodeException exitCode=1: chmod: changing permissions of '/opt/hadoop_tmp/hdfs/datanode': Operation not permitted.
If someone could please help me solve this issue it would be extremely appreciated! Unfortunately the issue persists despite changing permissions using 'chmod 777'. Thanks in advance :)
===========================================================================
Hosts File
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
127.0.1.1 raspberrypi
192.168.1.2 master
192.168.1.3 slave-01
Master File
master
Slaves File
localhost
slave-01
Core-Site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000/</value>
</property>
<property>
<name>fs.default.FS</name>
<value>hdfs://master:9000/</value>
</property>
</configuration>
HDFS-Site.xml
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop_tmp/hdfs/datanode</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop_tmp/hdfs/namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Mapred-Site.xml
<configuration>
<property>
<name>mapreduce.job.tracker</name>
<value>master:5431</value>
</property>
<property>
<name>mapred.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Yarn-Site.xml
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8035</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8050</value>
</property>
</configuration>
You could let your local router serve up the host names rather than manipulate /etc/hosts yourselves, but in order to change each Pi's name, edit /etc/hostname and reboot.
Before and after boots, check running hostname -f
Note: "master" is really meaningless once you have a "YARN master", "HDFS master", "Hive Master", etc. Best to literally say namenode, data{1,2,3}, yarn-rm, and so on
Regarding permissions issues, you could run everything as root, but that's insecure outside a homelab, so you'd want to run a few adduser commands for at least hduser (as documented elsewhere, but can be anything else), and yarn, then run commands as those users, after chown -R the data and log directories to be owned by these users and Unix groups they belong to

How to setup hadoop without changing `/etc/hosts`?

In order to test the network performance in our cluster, I have to deploy hadoop in the nodes. In all the setup guide that I can find, there is a step that changes /etc/hosts file. The problem is, the network I'm testing is not the frequently used one. So if I directly edit this file, this may cause the existing program fails.
I've tried to use ip address instead of its host name in those hadoop configuration files (core-site.xml, hdfs-site.xml, mapred-site.xml and yarn-site.xml). For example, in core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://10.1.0.50:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadooptmp</value>
</property>
</configuration>
But this cannot work without changing /etc/hosts.
Is there any way to specify a host file only for hadoop?

Failed to get system directory - hadoop

Using hadoop multinode setup (1 mater , 1 salve)
After starting up start-mapred.sh on master , i found below error in TT logs (Slave an)
org.apache.hadoop.mapred.TaskTracker: Failed to get system directory
can some one help me to know what can be done to avoid this error
I am using
Hadoop 1.2.0
jetty-6.1.26
java version "1.6.0_23"
mapred-site.xml file
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
<description>
define mapred.map tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
<description>
define mapred.reduce tasks to be number of slave hosts
</description>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/workspace</value>
</property>
</configuration>
It seems that you just added hadoop.tmp.dir and started the job. You need to restart the Hadoop daemons after adding any property to the configuration files. You have specified in your comment that you added this property at a later stage. This means that all the data and metadata along with other temporary files is still in the /tmp directory. Copy all those things from there into your /home/hduser/workspace directory, restart Hadoop and re run the job.
Do let me know the result. Thank you.
If, it is your windows PC and you are using cygwin to run Hadoop. Then task tracker will not work.

Resources