I'm making my first steps mastering hadoop. I've setup a CDH4.5 in distributed mode (on two virtual machines). I'm having problems running MapReduce jobs with YARN. I could launch successfully a DistributedShell application (from CDH examples), but once I run a MapReduce job, it just hangs there forever.
This is what I'm trying to launch:
sudo -uhdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 1 1
These are the last resource manager's log lines:
13/12/10 23:30:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1386714123362_0001
13/12/10 23:30:02 INFO client.YarnClientImpl: Submitted application application_1386714123362_0001 to ResourceManager at master/192.168.122.175:8032
13/12/10 23:30:02 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1386714123362_0001/
13/12/10 23:30:02 INFO mapreduce.Job: Running job: job_1386714123362_0001
The node manager's log doesn't get any new messages once I run the job.
This is what I see on resource manager's web page regarding the job:
State - ACCEPTED
FinalStatus - UNDEFINED
Progress - (progress bar in 0%)
Tracking UI - UNASSIGNED
Apps Submitted - 1
Apps Pending - 1
Apps Running - 0
I found this at http://hadoop.apache.org/docs/r2.0.6-alpha/hadoop-project-dist/hadoop-common/releasenotes.html:
YARN-300. Major bug reported by shenhong and fixed by Sandy Ryza (resourcemanager , scheduler)
After YARN-271, fair scheduler can infinite loop and not schedule any application.
After yarn-271, when yarn.scheduler.fair.max.assign<=0, when a node was been reserved, fairScheduler will infinite loop and not schedule any application.
try with new version i.e. 2.0 above
Probably caused by system resource issue, I fixed it by restarting my system.
Related
I have a hadoop cluster 2.7.4 version. Due to some reason, I have to restart my cluster. I need job IDs of those jobs that were executed on cluster before cluster reboot. Command mapred -list provide currently running of waiting jobs details only
You can see a list of all jobs on the Yarn Resource Manager Web UI.
In your browser go to http://ResourceManagerIPAdress:8088/
This is how the history looks on the Yarn cluster I am currently testing on (and I restarted the services several times):
See more info here
The job that I've submitted to spark cluster is not finishing. I see it is pending forever, however logs say that even spark jetty connector is shut down:
17/05/23 11:53:39 INFO org.spark_project.jetty.server.ServerConnector: Stopped ServerConnector#4f67e3df{HTTP/1.1}{0.0.0.0:4041}
I run latest cloud dataproc v1.1 (spark 2.0.2) on yarn. I submit spark job via gcloud api:
gcloud dataproc jobs submit spark --project stage --cluster datasys-stg \
--async --jar hdfs:///apps/jdbc-job/jdbc-job.jar --labels name=jdbc-job -- --dbType=test
The same spark pi stuff is finished correctly:
gcloud dataproc jobs submit spark --project stage --cluster datasys-stg --async \
--class org.apache.spark.examples.SparkPi --jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 100
While visiting hadoop application manager interface I see it is finished with Successful result:
Google cloud console and job list is showing it is still running until killed (see job run for 20 hours before killed, while hadoop says it ran for 19 seconds):
Is there something I can monitor to see what is preventing gcloud to finish the job?
I couldn't find anything that I can monitor my application is not finishing, but I've found the actual problem and fixed it. Turns out I had abandoned threads in my application - I had connection to RabbitMQ and that seemed to create some threads that prevented application from being finally stoped by gcloud.
When my mapreduce job finished, i can go to the job history url and see individual reducer/mapper log in there. However, I have a lot of mappers and reducers and I need to down load them all to my local drive to analyze it. I don't know the location of those log files in hdfs. Do you know where it is?
I presume what you need is a unix command:
yarn logs -applicationId <applicationId>
Application id is revealed during the MR application startup, e.g.
...
15/07/13 10:52:23 INFO input.FileInputFormat: Total input paths to process : 4
15/07/13 10:52:23 INFO mapreduce.JobSubmitter: number of splits:4
15/07/13 10:52:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1424784903733_0762
15/07/13 10:52:24 INFO impl.YarnClientImpl: Submitted application application_1424784903733_0762
...
:
or you can check it up in the web history web page.
What the command does is it dumps all the logs from MR processing to stout.
Actually the userlogs are stored in the local machine only where the nodemanager service runs and where the property yarn.nodemanager.log-dirs set to.
These logs will not be save in the HDFS location. If you want to save these logs in the HDFS then you have to enable Log Aggregation in YARN
Check the below links for more information
Simplifying user logs
YARN Log Aggregation
Similar questions
Where does Hadoop store the logs of YARN applications?
The logs can be found at localhost:50070, under the utilities options
I have written a hadoop 1.0.4 application that runs fine locally in semi-distributed mode. I have also installed Cloudera Hadoop 4 on my cluster. I thought that CDH4 runs hadoop 1.0.4 since it is listed as stable on the hadoop site, but that seems not to be the case. When i run the application on my cluster I get the following errors:
12/11/27 16:14:38 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/11/27 16:14:38 INFO input.FileInputFormat: Total input paths to process : 16
12/11/27 16:14:39 INFO mapred.JobClient: Running job: job_201211271520_0004
12/11/27 16:14:40 INFO mapred.JobClient: map 0% reduce 0%
12/11/27 16:14:50 INFO mapred.JobClient: Task Id : attempt_201211271520_0004_m_000013_0, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
12/11/27 16:14:50 INFO mapred.JobClient: Task Id : attempt_201211271520_0004_m_000000_0, Status : FAILED
... and so on...
Am I right in my assmption that this is because CHD4 is not compatible with hadoop 1.0.4? And if so does anyone know what version is compatible with hadoop 1.0.4? I rather switch cloudera software than rewrite my application.
You are correct; CDH3 uses version 0.20.2, CDH4 uses version 2.0.0. The nomenclature for Hadoop versions is a mess, and I don't pretend to understand it. But it looks like you may be able to use CDH3 based on the following stated in this blog post by Cloudera:
"The CDH3 distribution incorporated the 0.20.2 Apache Hadoop release plus the features of the 0.20.append and 0.20.security branches that collectively are now known as “1.0.” The Apache Hadoop in CDH3 has been the equivalent of the recently announced Apache Hadoop 1.0 for approximately a year now."
If this is the case, I would give CDH3 a try. If it doesn't work, you may just have to look for something besides Cloudera's installation.
I am trying to add capacity scheduler in hadoop 0.23.0 and trying to run a sample pi, randomwriter program.
All the daemons are up and working fine, but the job is getting hanged and no more output is getting displayed.
I couldnt able to see the logs where they are accumulated. Can anyone please let me know the reason for this hanging of the job, and location where the logs are stored.
2012-06-08 18:41:06,118 INFO mapred.YARNRunner (YARNRunner.java:createApplicationSubmissionContext(355)) - Command to launch container for ApplicationMaster is : $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=<LOG_DIR> -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1536m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
2012-06-08 18:41:06,251 INFO mapred.ResourceMgrDelegate (ResourceMgrDelegate.java:submitApplication(304)) - Submitted application application_1339151256291_0003 to ResourceManager
2012-06-08 18:41:06,355 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1207)) - Running job: job_1339151256291_0003
2012-06-08 18:41:07,366 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1227)) - map 0% reduce 0%
I followed the instructions on http://www.thecloudavenue.com/search?q=0.23 and successfully run Hadoop-0.23.4 on a small 3 node cluster.
Check the log files in the
$HADOOP_HOME/logs
folder for any errors on the master and the slaves.
Check for the following console
//master:50070/dfshealth.jsp
Check if the number of DataNodes is correct in
//master:50070/dfsnodelist.jsp?whatNodes=LIVE
Check if the number of NodeManagers is correctly reported in
//master:8088/cluster
Also the number of NodeManagers should be correctly specified in
//master:8088/cluster/nodes
Verify that the output folder with the proper contents has been created through the NameNode Web console
//master:50070/nn_browsedfscontent.jsp