mapreduce tasks only run on namenode - hadoop

I have builded a Hadoop cluster on three machines; these are the characteristics:
OS:Ubuntu14.04LTS
Hadoop:2.6.0
NameNode and ResourceManager IP: namenode/192.168.0.100
DataNode also as the NodeManger IP: data1/192.168.0.101, data2/192.168.0.102
I have configed all xml files as official doc. When I execute the wordcount example program in eclipse, I wanna show the machine information, which is running the mapTask or reduceTask, so here is my code snippet.
//get localhost
InetAddress mLocalHost = InetAddress.getLocalHost();
System.out.println("Task on " + mLocalHost);
above the snippet was put into map and reduce function and runs it on hadoop. Nevertheless the console always show:
Task on namenode/192.168.0.100
From my perspective, these tasks should run on data1 or data2. Can you explain the puzzle? Whats wrong with my cluster?
Whats more?
the jobHistory(namenode:19888) records nothing.
and webAppProxy(namenode:8088) just show the active nodes:2, but nothing more infomation about job.
can you help me? really appreciated.
namenode's further info below,
jps command show:
12647 Jps
11426 SecondaryNameNode
11217 NameNode
11585 ResourceManager
12033 JobHistoryServe

Where did you put that code, Is it in your Driver class ? You need to have it in your mapper or reducer so that you can see which node is processing.
Instead of that you can have a look at resource manager web ui at rmipaddress:8088 which will give you more details on which node is executing mappers and other logs.

i have found whats wrong with my problem. "run on hadoop" in Eclipse just starts the job locally, so i should modify the MyHadoopXML.xml file which is under Eclipse plugins' sub-directory. Otherwise, i just develop and debug mapreduce job locally and export the project into a jar, then run the jar with command of "hadoop jar" in the cluster to verify whether the job is executed successfully.

Related

Job Tracker web interface

I followed the tutorialshttp://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html and installed hadoop 2.4.1 as pseudo distributed cluster. I created a ubuntu VM using OracleVM and installed hadoop as mentioned in the link. It was setup fine and able to run the examples. However the job tracker URL is not working. :50030 gives page not found. I also tried netstat on the server and there is no process waiting on 50030 port . Do i need to start any other service ? What are the possible reasons ?
You need to execute this:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
Or JobTracker won't start.
(In my case, $HADOOP_HOME is in /usr/local/hadoop)
Check the value of mapred.job.tracker.http.address in mapred-site.xml
If the port is different, use that.
Also check if jobtracker is running. Check the jobtracker logs.
You need to enter the following command
http://localhost:50030/
Job Tracker web UI.

Hadoop 2.2.0 Web UI not showing Job Progress

I have installed Single node hadoop 2.2.0 from this link. When i run a job from terminal, it works fine with output. Web UI's i used
- Resource Manager : http://localhost:8088
- Namenode Daemon : http://localhost:50070
But from Resource Manager's web UI(shown above) i can't see job progress like Submitted Jobs, Running Jobs, etc..
MY /etc/hosts file is as follows:
127.0.0.1 localhost
127.0.1.1 meitpict
My System has IP: 192.168.2.96(I tried by removing this ip but still it didn't worked)
The only host:port i mentioned is in core-site.xml and that is:
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:54310</value>
</property>
Even i get these problems while executing Map - Reduce job.
Best what i did that while executing Map-Reduce job you will get link on the box from which you executed MapR job for checking the job progress something like below:
http://node1:19888/jobhistory/job/job_1409462238335_0002/mapreduce/app
Here node1 is set to 192.168.56.101 in my hosts file entry and that is for NameNode box.
So at the time of your MapR job is running you can go to the UI link provided by MapR framework.
ANd when it gets opened then do not close and there you can find details about other jobs also, when they started and when they got finished etc.
So next time better to check your putty console output after submitting the MapR job, you will definitely see a link for the current job to check it status from the browser UI.
In Hadoop 2.x this problem could be related to memory issues, you can see it in MapReduce in Hadoop 2.2.0 not working

Jobtracker is not up

I have installed hadoop on centos. I have modified host name to slave2. I have also modified
core-site.xml, mapred files.but job tracker , Data node and task tracker are not starting. Please advice.
Check your daemon logs. They should point you to where the problem is.
When you start hadoop, you see messages like this:
starting tasktracker, logging to /home/username/hadoop/logs/hadoop-user-tasktracker-user-desktop.out
Open the .log file, it should give you a clue.

Need help adding multiple DataNodes in pseudo-distributed mode (one machine), using Hadoop-0.18.0

I am a student, interested in Hadoop and started to explore it recently.
I tried adding an additional DataNode in the pseudo-distributed mode but failed.
I am following the Yahoo developer tutorial and so the version of Hadoop I am using is hadoop-0.18.0
I tried to start up using 2 methods I found online:
Method 1 (link)
I have a problem with this line
bin/hadoop-daemon.sh --script bin/hdfs $1 datanode $DN_CONF_OPTS
--script bin/hdfs doesn't seem to be valid in the version I am using. I changed it to --config $HADOOP_HOME/conf2 with all the configuration files in that directory, but when the script is ran it gave the error:
Usage: Java DataNode [-rollback]
Any idea what does the error mean? The log files are created but DataNode did not start.
Method 2 (link)
Basically I duplicated conf folder to conf2 folder, making necessary changes documented on the website to hadoop-site.xml and hadoop-env.sh. then I ran the command
./hadoop-daemon.sh --config ..../conf2 start datanode
it gives the error:
datanode running as process 4190. stop it first.
So I guess this is the 1st DataNode that was started, and the command failed to start another DataNode.
Is there anything I can do to start additional DataNode in the Yahoo VM Hadoop environment? Any help/advice would be greatly appreciated.
Hadoop start/stop scripts use /tmp as a default directory for storing PIDs of already started daemons. In your situation, when you start second datanode, startup script finds /tmp/hadoop-someuser-datanode.pid file from the first datanode and assumes that the datanode daemon is already started.
The plain solution is to set HADOOP_PID_DIR env variable to something else (but not /tmp). Also do not forget to update all network port numbers in conf2.
The smart solution is start a second VM with hadoop environment and join them in a single cluster. It's the way hadoop is intended to use.

JobTracker doesn't show completed tasks

I am running hadoop-1.1.2 on my laptop in pseudo-distributed mode. I am able to run a simple WordCount program, reading from and writing back to HDFS. I am also able to see JobTracker running at http://localhost:50030/jobtracker.jsp. However when I run the WordCount job from Eclipse, there is no entry, either under running or completed jobs.
Am I missing any additional property setting in one of the configuration files?
Thanks.
This is happening because when you are running your job through Eclipse it starts running job inside itself rather than submitting it to the JobTracker as it does not know where to go to find the JobTracker. You need to tell it to Eclipse. Add the following lines in your code and it should work :
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:9000");
conf.set("mapred.job.tracker", "localhost:9001");
Copy your hdfs-site.xml and mapred-site.xml into your 'src' folder (make them available in classpath).
It will pick up all Configurations.

Resources