Where is the mapper and reducer runtime log in hdfs? - hadoop

When my mapreduce job finished, i can go to the job history url and see individual reducer/mapper log in there. However, I have a lot of mappers and reducers and I need to down load them all to my local drive to analyze it. I don't know the location of those log files in hdfs. Do you know where it is?

I presume what you need is a unix command:
yarn logs -applicationId <applicationId>
Application id is revealed during the MR application startup, e.g.
...
15/07/13 10:52:23 INFO input.FileInputFormat: Total input paths to process : 4
15/07/13 10:52:23 INFO mapreduce.JobSubmitter: number of splits:4
15/07/13 10:52:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1424784903733_0762
15/07/13 10:52:24 INFO impl.YarnClientImpl: Submitted application application_1424784903733_0762
...
:
or you can check it up in the web history web page.
What the command does is it dumps all the logs from MR processing to stout.

Actually the userlogs are stored in the local machine only where the nodemanager service runs and where the property yarn.nodemanager.log-dirs set to.
These logs will not be save in the HDFS location. If you want to save these logs in the HDFS then you have to enable Log Aggregation in YARN
Check the below links for more information
Simplifying user logs
YARN Log Aggregation
Similar questions
Where does Hadoop store the logs of YARN applications?

The logs can be found at localhost:50070, under the utilities options

Related

Application (job) list empty on Hadoop 2.x

I have a Hadoop 2.8.1 installation on a macOS Sierra (Darwin Kernel version 16.7.0) and it's working fine, except the application/tasks tracking.
1) At first, I thought it was a problem with the Resource Manager web interface. So:
I've copied the yarn-site.xml template to the etc/yarn-site.xml file, but it didn't help.
I've tried to change the default 'dr. who' user to my Hadoop user on Resource manager (http://localhost:18088/cluster/apps/RUNNING?user.name=myUser), but it didn't help also.
2) Nor even on command line I can track my applications (jobs): yarn application -list returns always empty.
3) Another information: on application INFO outputs, it shows these following lines, but I can't access it.
INFO mapreduce.Job: The url to track the job: http://localhost:8080/
INFO mapreduce.Job: Running job: job_local2009332672_0001
Is it a yarn problem? Should I change another setting file? Thanks!
Look at mapreduce.framework.name in mapred-site.xml. In your HADOOP_CONF_DIR
Set its value to yarn.
If you don't have a mapred-site, then copy and rename the mapred-default XML file.
Thanks for the answer, I was looking for this feature without success. I did changes on the etc/hosts for nothing
The answer is to set mapreduce.framework.name in mapred-site.xmlto yarn as stated by cricket_007.
This is setting yarn as the default framework for MapReduce operations

How logs printed directly onto the console in yarn-cluster mode using spark

I am new in spark and i want to print logs on console using apache spark in yarn cluster mode.
You need to check the value in log4j.properties file. In my case i have this file in /etc/spark/conf.dist directory
log4j.rootCategory=INFO,console
INFO - prints the all the logs on the console. You can change the value to ERROR, WARN to limit the information you would like to see on the console as sparks logs can be overwhelming

mapreduce tasks only run on namenode

I have builded a Hadoop cluster on three machines; these are the characteristics:
OS:Ubuntu14.04LTS
Hadoop:2.6.0
NameNode and ResourceManager IP: namenode/192.168.0.100
DataNode also as the NodeManger IP: data1/192.168.0.101, data2/192.168.0.102
I have configed all xml files as official doc. When I execute the wordcount example program in eclipse, I wanna show the machine information, which is running the mapTask or reduceTask, so here is my code snippet.
//get localhost
InetAddress mLocalHost = InetAddress.getLocalHost();
System.out.println("Task on " + mLocalHost);
above the snippet was put into map and reduce function and runs it on hadoop. Nevertheless the console always show:
Task on namenode/192.168.0.100
From my perspective, these tasks should run on data1 or data2. Can you explain the puzzle? Whats wrong with my cluster?
Whats more?
the jobHistory(namenode:19888) records nothing.
and webAppProxy(namenode:8088) just show the active nodes:2, but nothing more infomation about job.
can you help me? really appreciated.
namenode's further info below,
jps command show:
12647 Jps
11426 SecondaryNameNode
11217 NameNode
11585 ResourceManager
12033 JobHistoryServe
Where did you put that code, Is it in your Driver class ? You need to have it in your mapper or reducer so that you can see which node is processing.
Instead of that you can have a look at resource manager web ui at rmipaddress:8088 which will give you more details on which node is executing mappers and other logs.
i have found whats wrong with my problem. "run on hadoop" in Eclipse just starts the job locally, so i should modify the MyHadoopXML.xml file which is under Eclipse plugins' sub-directory. Otherwise, i just develop and debug mapreduce job locally and export the project into a jar, then run the jar with command of "hadoop jar" in the cluster to verify whether the job is executed successfully.

Can't run a MapReduce job with YARN

I'm making my first steps mastering hadoop. I've setup a CDH4.5 in distributed mode (on two virtual machines). I'm having problems running MapReduce jobs with YARN. I could launch successfully a DistributedShell application (from CDH examples), but once I run a MapReduce job, it just hangs there forever.
This is what I'm trying to launch:
sudo -uhdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 1 1
These are the last resource manager's log lines:
13/12/10 23:30:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1386714123362_0001
13/12/10 23:30:02 INFO client.YarnClientImpl: Submitted application application_1386714123362_0001 to ResourceManager at master/192.168.122.175:8032
13/12/10 23:30:02 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1386714123362_0001/
13/12/10 23:30:02 INFO mapreduce.Job: Running job: job_1386714123362_0001
The node manager's log doesn't get any new messages once I run the job.
This is what I see on resource manager's web page regarding the job:
State - ACCEPTED
FinalStatus - UNDEFINED
Progress - (progress bar in 0%)
Tracking UI - UNASSIGNED
Apps Submitted - 1
Apps Pending - 1
Apps Running - 0
I found this at http://hadoop.apache.org/docs/r2.0.6-alpha/hadoop-project-dist/hadoop-common/releasenotes.html:
YARN-300. Major bug reported by shenhong and fixed by Sandy Ryza (resourcemanager , scheduler)
After YARN-271, fair scheduler can infinite loop and not schedule any application.
After yarn-271, when yarn.scheduler.fair.max.assign<=0, when a node was been reserved, fairScheduler will infinite loop and not schedule any application.
try with new version i.e. 2.0 above
Probably caused by system resource issue, I fixed it by restarting my system.

Job hanging when example run on hadoop 0.23.0

I am trying to add capacity scheduler in hadoop 0.23.0 and trying to run a sample pi, randomwriter program.
All the daemons are up and working fine, but the job is getting hanged and no more output is getting displayed.
I couldnt able to see the logs where they are accumulated. Can anyone please let me know the reason for this hanging of the job, and location where the logs are stored.
2012-06-08 18:41:06,118 INFO mapred.YARNRunner (YARNRunner.java:createApplicationSubmissionContext(355)) - Command to launch container for ApplicationMaster is : $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=<LOG_DIR> -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1536m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
2012-06-08 18:41:06,251 INFO mapred.ResourceMgrDelegate (ResourceMgrDelegate.java:submitApplication(304)) - Submitted application application_1339151256291_0003 to ResourceManager
2012-06-08 18:41:06,355 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1207)) - Running job: job_1339151256291_0003
2012-06-08 18:41:07,366 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1227)) - map 0% reduce 0%
I followed the instructions on http://www.thecloudavenue.com/search?q=0.23 and successfully run Hadoop-0.23.4 on a small 3 node cluster.
Check the log files in the
$HADOOP_HOME/logs
folder for any errors on the master and the slaves.
Check for the following console
//master:50070/dfshealth.jsp
Check if the number of DataNodes is correct in
//master:50070/dfsnodelist.jsp?whatNodes=LIVE
Check if the number of NodeManagers is correctly reported in
//master:8088/cluster
Also the number of NodeManagers should be correctly specified in
//master:8088/cluster/nodes
Verify that the output folder with the proper contents has been created through the NameNode Web console
//master:50070/nn_browsedfscontent.jsp

Resources