I am trying to add capacity scheduler in hadoop 0.23.0 and trying to run a sample pi, randomwriter program.
All the daemons are up and working fine, but the job is getting hanged and no more output is getting displayed.
I couldnt able to see the logs where they are accumulated. Can anyone please let me know the reason for this hanging of the job, and location where the logs are stored.
2012-06-08 18:41:06,118 INFO mapred.YARNRunner (YARNRunner.java:createApplicationSubmissionContext(355)) - Command to launch container for ApplicationMaster is : $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=<LOG_DIR> -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1536m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
2012-06-08 18:41:06,251 INFO mapred.ResourceMgrDelegate (ResourceMgrDelegate.java:submitApplication(304)) - Submitted application application_1339151256291_0003 to ResourceManager
2012-06-08 18:41:06,355 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1207)) - Running job: job_1339151256291_0003
2012-06-08 18:41:07,366 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1227)) - map 0% reduce 0%
I followed the instructions on http://www.thecloudavenue.com/search?q=0.23 and successfully run Hadoop-0.23.4 on a small 3 node cluster.
Check the log files in the
$HADOOP_HOME/logs
folder for any errors on the master and the slaves.
Check for the following console
//master:50070/dfshealth.jsp
Check if the number of DataNodes is correct in
//master:50070/dfsnodelist.jsp?whatNodes=LIVE
Check if the number of NodeManagers is correctly reported in
//master:8088/cluster
Also the number of NodeManagers should be correctly specified in
//master:8088/cluster/nodes
Verify that the output folder with the proper contents has been created through the NameNode Web console
//master:50070/nn_browsedfscontent.jsp
Related
I have followed a tutorial to setup Apache Hadoop for Windows, which can be found here. I am now having an issue with the Datanode, Resource Manager, and Yarn cmd windows showing that all 3 shutdown seconds after opening, with only the Namenode continuing to run. Here is the process I have tried so far:
run CMD as admin
use command start-all.cmd (this opens the Namenode, Datanode, Yarn, and Resourcemanager cmd windows)
Datanode, Yarn, and Resource manager all give shutdown messages almost immediately after they start
SHUTDOWN_MSG: Shutting down ResourceManager at thood-alienware/...
SHUTDOWN_MSG: Shutting down NodeManager at thood-alienware/...
SHUTDOWN_MSG: Shutting down DataNode at thood-alienware/...
Interestingly enough, only the Datanode window gives an error as a reason for shutting down:
2019-03-26 00:07:03,382 INFO util.ExitUtil: Exiting with status 1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
I know that I can edit the amount of tolerated failures, but I'd like to actually fix whatever is causing this disk failure. When I open the datanode directory, it's an empty folder, however my namenode directory has files present within that were created from the `start-all.cmd'. Has anyone worked with Hadoop on Windows before? I'm totally at a loss for where to go from here because most online help is for Linux systems.
Did you get the following bin files in HADOOP directory
https://github.com/s911415/apache-hadoop-3.1.0-winutils
When my mapreduce job finished, i can go to the job history url and see individual reducer/mapper log in there. However, I have a lot of mappers and reducers and I need to down load them all to my local drive to analyze it. I don't know the location of those log files in hdfs. Do you know where it is?
I presume what you need is a unix command:
yarn logs -applicationId <applicationId>
Application id is revealed during the MR application startup, e.g.
...
15/07/13 10:52:23 INFO input.FileInputFormat: Total input paths to process : 4
15/07/13 10:52:23 INFO mapreduce.JobSubmitter: number of splits:4
15/07/13 10:52:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1424784903733_0762
15/07/13 10:52:24 INFO impl.YarnClientImpl: Submitted application application_1424784903733_0762
...
:
or you can check it up in the web history web page.
What the command does is it dumps all the logs from MR processing to stout.
Actually the userlogs are stored in the local machine only where the nodemanager service runs and where the property yarn.nodemanager.log-dirs set to.
These logs will not be save in the HDFS location. If you want to save these logs in the HDFS then you have to enable Log Aggregation in YARN
Check the below links for more information
Simplifying user logs
YARN Log Aggregation
Similar questions
Where does Hadoop store the logs of YARN applications?
The logs can be found at localhost:50070, under the utilities options
I have builded a Hadoop cluster on three machines; these are the characteristics:
OS:Ubuntu14.04LTS
Hadoop:2.6.0
NameNode and ResourceManager IP: namenode/192.168.0.100
DataNode also as the NodeManger IP: data1/192.168.0.101, data2/192.168.0.102
I have configed all xml files as official doc. When I execute the wordcount example program in eclipse, I wanna show the machine information, which is running the mapTask or reduceTask, so here is my code snippet.
//get localhost
InetAddress mLocalHost = InetAddress.getLocalHost();
System.out.println("Task on " + mLocalHost);
above the snippet was put into map and reduce function and runs it on hadoop. Nevertheless the console always show:
Task on namenode/192.168.0.100
From my perspective, these tasks should run on data1 or data2. Can you explain the puzzle? Whats wrong with my cluster?
Whats more?
the jobHistory(namenode:19888) records nothing.
and webAppProxy(namenode:8088) just show the active nodes:2, but nothing more infomation about job.
can you help me? really appreciated.
namenode's further info below,
jps command show:
12647 Jps
11426 SecondaryNameNode
11217 NameNode
11585 ResourceManager
12033 JobHistoryServe
Where did you put that code, Is it in your Driver class ? You need to have it in your mapper or reducer so that you can see which node is processing.
Instead of that you can have a look at resource manager web ui at rmipaddress:8088 which will give you more details on which node is executing mappers and other logs.
i have found whats wrong with my problem. "run on hadoop" in Eclipse just starts the job locally, so i should modify the MyHadoopXML.xml file which is under Eclipse plugins' sub-directory. Otherwise, i just develop and debug mapreduce job locally and export the project into a jar, then run the jar with command of "hadoop jar" in the cluster to verify whether the job is executed successfully.
I'm making my first steps mastering hadoop. I've setup a CDH4.5 in distributed mode (on two virtual machines). I'm having problems running MapReduce jobs with YARN. I could launch successfully a DistributedShell application (from CDH examples), but once I run a MapReduce job, it just hangs there forever.
This is what I'm trying to launch:
sudo -uhdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 1 1
These are the last resource manager's log lines:
13/12/10 23:30:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1386714123362_0001
13/12/10 23:30:02 INFO client.YarnClientImpl: Submitted application application_1386714123362_0001 to ResourceManager at master/192.168.122.175:8032
13/12/10 23:30:02 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1386714123362_0001/
13/12/10 23:30:02 INFO mapreduce.Job: Running job: job_1386714123362_0001
The node manager's log doesn't get any new messages once I run the job.
This is what I see on resource manager's web page regarding the job:
State - ACCEPTED
FinalStatus - UNDEFINED
Progress - (progress bar in 0%)
Tracking UI - UNASSIGNED
Apps Submitted - 1
Apps Pending - 1
Apps Running - 0
I found this at http://hadoop.apache.org/docs/r2.0.6-alpha/hadoop-project-dist/hadoop-common/releasenotes.html:
YARN-300. Major bug reported by shenhong and fixed by Sandy Ryza (resourcemanager , scheduler)
After YARN-271, fair scheduler can infinite loop and not schedule any application.
After yarn-271, when yarn.scheduler.fair.max.assign<=0, when a node was been reserved, fairScheduler will infinite loop and not schedule any application.
try with new version i.e. 2.0 above
Probably caused by system resource issue, I fixed it by restarting my system.
I have installed CDH3U5 on a 2 node cluster. Everything seems to run fine such as all the services, web UI, MR jobs, HDFS shell commands. However, interestingly, when I started the datanode service, it gave me an OK message that datanode is running as process say X. But when I run JPS, I do not see the label "Datanode" for the process. So the output looks like -
17153 TaskTracker
18908 Jps
16267
The process ID - 16267 is the Datanode process. All other checkpoints have passed. So this seems weird. The same thing happens on the other node in the cluster. Any insight into this behavior and if this is something that needs fixing would be helpful.
can you check the following and reply?
- web interface for namenode and what does it show there for livenode
- logfiles for datanode to see if any exception
- if datanode is pingable/ssh from namenode and viceversa
If all the above look ok I'm not sure what the problem is but to fix you can
- stop all hadoop deamons
- delete temp directory pointed in conf/core-site.xml for both NN and DN
- format namenode
- start deamon