Map reduce job getting stuck at map 0% reduce 0% - hadoop

I am running the famous wordcount example. I have a local and prod hadoop setup. The same example is working in prod, but its not working locally. Can someone tell me what should I look for.
The job is getting stuck. The task logs are:
~/tmp$ hadoop jar wordcount.jar WordCount /testhistory /outputtest/test
Warning: $HADOOP_HOME is deprecated.
13/08/29 16:12:34 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/08/29 16:12:35 INFO input.FileInputFormat: Total input paths to process : 3
13/08/29 16:12:35 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/08/29 16:12:35 WARN snappy.LoadSnappy: Snappy native library not loaded
13/08/29 16:12:35 INFO mapred.JobClient: Running job: job_201308291153_0015
13/08/29 16:12:36 INFO mapred.JobClient: map 0% reduce 0%
Locally hadoop in running as pseudo distributed mode. All the 3 processes, namenode, datanode, jobtracker is running. Let me know if some extra information is required.

The tasktracker seems to be missing.
Try:
hadoop tasktracker &

In Hadoop 2.x this problem could be related to memory issues, you can see it in MapReduce in Hadoop 2.2.0 not working

I had the same problem and this page helped me:
http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide/
Basically I solved my problem using the following 3 steps. The fact is that I had to configure much more memory I really have.
1) yarn-site.xml
yarn.resourcemanager.hostname = hostname_of_the_master
yarn.nodemanager.resource.memory-mb = 4000
yarn.nodemanager.resource.cpu-vcores = 2
yarn.scheduler.minimum-allocation-mb = 4000
2) mapred-site.xml
yarn.app.mapreduce.am.resource.mb = 4000
yarn.app.mapreduce.am.command-opts = -Xmx3768m
mapreduce.map.cpu.vcores = 2
mapreduce.reduce.cpu.vcores = 2
3) Send these files across all nodes

Except for hadoop tasktracker & and any other issues. Please check you code and make sure that there is no infinite loop or any other bugs. Maybe there are some bugs in your code!

If this problem is coming when using Hive queries then do check if you are joining two very big tables without leveraging partitions. Not using partitions may lead to long running full table scans and hence stuck at map 0% reduce 0%.

Related

hadoop stuck at “running job”

I want to running the hadoop word count program from the doc. However the program stuck at running job
16/09/02 10:51:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/09/02 10:51:13 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/09/02 10:51:13 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/09/02 10:51:14 INFO input.FileInputFormat: Total input paths to process : 1
16/09/02 10:51:14 INFO mapreduce.JobSubmitter: number of splits:2
16/09/02 10:51:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1472783047951_0003
16/09/02 10:51:14 INFO impl.YarnClientImpl: Submitted application application_1472783047951_0003
16/09/02 10:51:14 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1472783047951_0003/
16/09/02 10:51:14 INFO mapreduce.Job: Running job: job_1472783047951_0003
And show http://hadoop-master:8088/proxy/application_1472783047951_0003/
And it runs a AppMaster on http://hadoop-slave2:8042, show it
However, since it stucks on WordCount, it also stuck on Hive
hive (default)> select a, b, count(1) as cnt from newtb group by a, b;
Query ID = hadoop_20160902110124_d2b2680b-c493-4986-aa84-f65794bfd8e4
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1472783047951_0004, Tracking URL = http://hadoop-master:8088/proxy/application_1472783047951_0004/
Kill Command = /opt/hadoop-2.6.4/bin/hadoop job -kill job_1472783047951_0004
The is nothing wrong with select *.
hive (default)> select * from newtb;
OK
1 2 3
1 3 4
2 3 4
5 6 7
8 9 0
1 8 3
Time taken: 0.101 seconds, Fetched: 6 row(s)
So, I think there is something wrong with MapReduce. There is enough disk an memory. So, How to solve it?
You are having issues because application master is unable to start containers and run the job. First try restarting your system and if it doesn't change you have to change memory allocations in yarn-site.xml and mapred-site.xml. go with basic memory settings.
use Following links
http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide/#yarn-configuration_1
here
here
As I was running hadoop on the ubuntu as a quest on the wmware, I just do increase the amount of RAM that I had been allocated to the ubuntu, I allocated 4GB Ram instead of 2GB Ram to the ubuntu, finally it could continue out and finished the job.

MapReduce job is stuck on a multi node Hadoop-2.7.1 cluster

I have successfully run Hadoop 2.7.1 on a multi node cluster (1 namenode and 4 datanodes). But, when I run MapReduce job (WordCount example from Hadoop website), it always stuck at this point.
[~#~ hadoop-2.7.1]$ bin/hadoop jar WordCount.jar WordCount /user/inputdata/ /user/outputdata
15/09/30 17:54:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/30 17:54:57 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/09/30 17:54:58 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/09/30 17:54:59 INFO input.FileInputFormat: Total input paths to process : 1
15/09/30 17:55:00 INFO mapreduce.JobSubmitter: number of splits:1
15/09/30 17:55:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1443606819488_0002
15/09/30 17:55:00 INFO impl.YarnClientImpl: Submitted application application_1443606819488_0002
15/09/30 17:55:00 INFO mapreduce.Job: The url to track the job: http://~~~~:8088/proxy/application_1443606819488_0002/
15/09/30 17:55:00 INFO mapreduce.Job: Running job: job_1443606819488_0002
Do I have to specify a memory for yarn?
NOTE: DataNode hardwares are really old (Each has 1GB RAM).
Appreciate your help.
Thank you.
The data nodes memory (1gb) is really very scarce to prepare atleast 1 container to run mapper/reducer/am in it.
You could try lowering the below container memory allocation values in yarn-site.xml with very lower values to get the container created on them.
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
Also try to reduce the below properties values in your job configration,
mapreduce.map.memory.mb
mapreduce.reduce.memory.mb
mapreduce.map.java.opts
mapreduce.reduce.java.opts

Hadoop YARN reducer/shuffle stuck

I was migrating from Hadoop 1 to Hadoop 2 YARN. Source code were recompiled using MRV2 jars and didn't have any compatibility issue. When I was trying to run the job under YARN, map worked fine and went to 100%, but reduce was stuck at ~6,7%. There's no performance issue. Actually, I checked CPU usage, it turned out when reduce was stuck, there seems like no computation going on because CPU is mostly 100% idle. The job can run successfully on Hadoop 1.2.1.
I checked the log messages from resourcemanager and found out that since map finished, no more container was allocated so there's no reduce is running on any container. What caused this situation?
I'm wondering if it is related to the yarn.nodemanager.aux-services property setting. By following the official tutorial(http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html), this property has to be set to mapreduce_shuffle which indicates that MR will still use default shuffle method instead of other shuffle plugins(http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html). I tried not to set this property but Hadoop wouldn't let me.
Here's the log of userlogs/applicationforlder/containerfolder/syslog when it's about to reach 7% of reduce. After that log didn't update anymore and reduce stopped as well.
2014-11-26 09:01:04,104 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1416988910568_0001_m_002988_0 decomp: 129587 len: 129591 to MEMORY
2014-11-26 09:01:04,104 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 129587 bytes from map-output for attempt_1416988910568_0001_m_002988_0
2014-11-26 09:01:04,104 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 129587, inMemoryMapOutputs.size() -> 2993, commitMemory -> 342319024, usedMemory ->342448611
2014-11-26 09:01:04,105 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1416988910568_0001_m_002989_0 decomp: 128525 len: 128529 to MEMORY
2014-11-26 09:01:04,105 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 128525 bytes from map-output for attempt_1416988910568_0001_m_002989_0
2014-11-26 09:01:04,105 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 128525, inMemoryMapOutputs.size() -> 2994, commitMemory -> 342448611, usedMemory ->342577136
2014-11-26 09:01:04,105 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: datanode03:13562 freed by fetcher#1 in 13ms
Was this a common issue when migrating from Hadoop 1 to 2? Was the strategy of running map-shuffle-sort-reduce changed in Hadoop 2? What caused this problem? Thanks so much. Any comments will help!
Major environment setup:
Hadoop version: 2.5.2
6-node cluster with 8-core CPU, 15 GB memory on each node
Related properties settings:
yarn.scheduler.maximum-allocation-mb: 14336
yarn.scheduler.minimum-allocation-mb: 2500
yarn.nodemanager.resource.memory-mb: 14336
yarn.nodemanager.aux-services: mapreduce_shuffle
mapreduce.task.io.sort.factor: 100
mapreduce.task.io.sort.mb: 1024
Finally solved the problem after googling around and found out I posted this question three month ago already.
It's because of the data skew.

Wordcount program is stuck in hadoop-2.3.0

I installed hadoop-2.3.0 and tried to run wordcount example
But it starts the job and sits idle
hadoop#ubuntu:~$ $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar wordcount /myprg outputfile1
14/04/30 13:20:40 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/04/30 13:20:51 INFO input.FileInputFormat: Total input paths to process : 1
14/04/30 13:20:53 INFO mapreduce.JobSubmitter: number of splits:1
14/04/30 13:21:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1398885280814_0004
14/04/30 13:21:07 INFO impl.YarnClientImpl: Submitted application application_1398885280814_0004
14/04/30 13:21:09 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1398885280814_0004/
14/04/30 13:21:09 INFO mapreduce.Job: Running job: job_1398885280814_0004
The url to track the job: application_1398885280814_0004/
For previous versions I did nt get such an issue. I was able to run hadoop wordcount in previous version.
I followed these steps for installing hadoop-2.3.0
Please suggest.
I had the exact same situation a while back while switching to YARN. Basically there was the concept of task slots in MRv1 and containers in MRv2. Both of these differ very much in how the tasks are scheduled and run on the nodes.
The reason that your job is stuck is that it is unable to find/start a container. If you go into the full logs of Resource Manager/Application Master etc daemons, you may find that it is doing nothing after it starts to allocate a new container.
To solve the problem, you have to tweak your memory settings in yarn-site.xml and mapred-site.xml. While doing the same myself, I found this and this tutorials especially helpful. I would suggest you to try with the very basic memory settings and optimize them later on. First check with a word count example then go on to other complex ones.
I was facing the same issue.I added the following property to my yarn-site.xml and it solved the issue.
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Hostname-of-your-RM</value>
<description>The hostname of the RM.</description>
</property>
Without the resource manager host name things go awry in the multi-node set up as each node would then default to trying to find a local resource manager and would never announce its resources to the master node. So your Map Reduce execution request probably didn't find any mappers in which to run because the request was being sent to the master and the master didn't know about the slave slots.
Reference : http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide/

Hadoop. Restart Map

After 36 hours of working Hadoop 1.0.3 said:
INFO mapred.JobClient: map 42% reduce 0%
mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1.
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
and stopped.
Is it possible to restart Hadoop jobs not from the very beginning (map 0% reduce 0%) ?
There doesn't seem to be a good way to restart a failed job. A couple of things to keep in mind:
looks like your in mapred config [mapreduce.map.maxattempts=1] and the default is typically 4
mapred.JobClient: Job Failed: # of failed Map Tasks
exceeded allowed limit. FailedCount: 1.
You would typically want to understand why it failed. (not sure from your post if you identified the issue)
It may have failed for a bogus reason and you can implement this exception into your mapreduce program by providing failure traps. You can implement that same concept using the Hadoop API.
Check out this answer here: https://stackoverflow.com/a/9742235/1515370

Resources