Wordcount program is stuck in hadoop-2.3.0 - hadoop

I installed hadoop-2.3.0 and tried to run wordcount example
But it starts the job and sits idle
hadoop#ubuntu:~$ $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar wordcount /myprg outputfile1
14/04/30 13:20:40 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/04/30 13:20:51 INFO input.FileInputFormat: Total input paths to process : 1
14/04/30 13:20:53 INFO mapreduce.JobSubmitter: number of splits:1
14/04/30 13:21:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1398885280814_0004
14/04/30 13:21:07 INFO impl.YarnClientImpl: Submitted application application_1398885280814_0004
14/04/30 13:21:09 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1398885280814_0004/
14/04/30 13:21:09 INFO mapreduce.Job: Running job: job_1398885280814_0004
The url to track the job: application_1398885280814_0004/
For previous versions I did nt get such an issue. I was able to run hadoop wordcount in previous version.
I followed these steps for installing hadoop-2.3.0
Please suggest.

I had the exact same situation a while back while switching to YARN. Basically there was the concept of task slots in MRv1 and containers in MRv2. Both of these differ very much in how the tasks are scheduled and run on the nodes.
The reason that your job is stuck is that it is unable to find/start a container. If you go into the full logs of Resource Manager/Application Master etc daemons, you may find that it is doing nothing after it starts to allocate a new container.
To solve the problem, you have to tweak your memory settings in yarn-site.xml and mapred-site.xml. While doing the same myself, I found this and this tutorials especially helpful. I would suggest you to try with the very basic memory settings and optimize them later on. First check with a word count example then go on to other complex ones.

I was facing the same issue.I added the following property to my yarn-site.xml and it solved the issue.
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Hostname-of-your-RM</value>
<description>The hostname of the RM.</description>
</property>
Without the resource manager host name things go awry in the multi-node set up as each node would then default to trying to find a local resource manager and would never announce its resources to the master node. So your Map Reduce execution request probably didn't find any mappers in which to run because the request was being sent to the master and the master didn't know about the slave slots.
Reference : http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide/

Related

wordcount not running in Cloudera

I have installed Cloudera 5.8 in a Linux RHEL 7.2 instance of Amazon EC2. I have logged in with SSH and I am trying to run the wordcount example for testing mapreduce operation with the following command:
hadoop jar /opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount archivo.txt output
The problem is that the wordcount program is blocked and it not produces the output. Only the following is prompted:
16/08/11 13:10:02 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-22-226.ec2.internal/172.31.22.226:8032
16/08/11 13:10:03 INFO input.FileInputFormat: Total input paths to process : 1
16/08/11 13:10:03 INFO mapreduce.JobSubmitter: number of splits:1
16/08/11 13:10:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470929244097_0007
16/08/11 13:10:04 INFO impl.YarnClientImpl: Submitted application application_1470929244097_0007
16/08/11 13:10:04 INFO mapreduce.Job: The url to track the job: http://ip-172-31-22-226.ec2.internal:8088/proxy/application_1470929244097_0007/
16/08/11 13:10:04 INFO mapreduce.Job: Running job: job_1470929244097_0007
And then get blocked since "Running job". After this I have to press Ctrl+C for unblock and it not produces the output.
Anyone that knows why?. I think it is probably a configuration issue and I am new to DataNodes and so on.
Thanks a lot.
Looks like there are no resources (map or reducer slots), job is waiting for resources. You can check the job status on.
http://ip-172-31-22-226.ec2.internal:8088

MapReduce job is stuck on a multi node Hadoop-2.7.1 cluster

I have successfully run Hadoop 2.7.1 on a multi node cluster (1 namenode and 4 datanodes). But, when I run MapReduce job (WordCount example from Hadoop website), it always stuck at this point.
[~#~ hadoop-2.7.1]$ bin/hadoop jar WordCount.jar WordCount /user/inputdata/ /user/outputdata
15/09/30 17:54:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/30 17:54:57 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/09/30 17:54:58 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/09/30 17:54:59 INFO input.FileInputFormat: Total input paths to process : 1
15/09/30 17:55:00 INFO mapreduce.JobSubmitter: number of splits:1
15/09/30 17:55:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1443606819488_0002
15/09/30 17:55:00 INFO impl.YarnClientImpl: Submitted application application_1443606819488_0002
15/09/30 17:55:00 INFO mapreduce.Job: The url to track the job: http://~~~~:8088/proxy/application_1443606819488_0002/
15/09/30 17:55:00 INFO mapreduce.Job: Running job: job_1443606819488_0002
Do I have to specify a memory for yarn?
NOTE: DataNode hardwares are really old (Each has 1GB RAM).
Appreciate your help.
Thank you.
The data nodes memory (1gb) is really very scarce to prepare atleast 1 container to run mapper/reducer/am in it.
You could try lowering the below container memory allocation values in yarn-site.xml with very lower values to get the container created on them.
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
Also try to reduce the below properties values in your job configration,
mapreduce.map.memory.mb
mapreduce.reduce.memory.mb
mapreduce.map.java.opts
mapreduce.reduce.java.opts

Stuck at command prompt while Running hadoop mapreduce jobs on windows 8

I have gone through the detailed installation videos of installing hadoop on windows 8 without cygwin or any other like hortonworks , sandbox etc.
My error is that while all things done succeesfully I am getting below dilemma
my command prompt stuck like this --
Note that i have not yet installed eclipse kepler and followed dis video--
[http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform]
'C:\hworks>hadoop jar c:\hworks\Recipe.jar Recipe /in /out
15/07/23 11:23:13 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0
:8032
15/07/23 11:23:16 INFO input.FileInputFormat: Total input paths to process : 1
15/07/23 11:23:18 INFO mapreduce.JobSubmitter: number of splits:1
15/07/23 11:23:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14
37627735863_0001
15/07/23 11:23:19 INFO impl.YarnClientImpl: Submitted application application_14
37627735863_0001
15/07/23 11:23:19 INFO mapreduce.Job: The url to track the job: http://SkyneT:80
88/proxy/application_1437627735863_0001/
15/07/23 11:23:19 INFO mapreduce.Job: Running job: job_1437627735863_0001'
I experienced a similar issue.
You can try these steps that worked for me:
Open the command prompt as Administrator.
Delete your c:\tmp directory (a new one will be created automatically)
Run \etc\hadoop\hadoop-env.cmd to initialize environment variables.
Run \bin\hdfs namenode -format
Run \sbin\start-all.cmd
Then try running your process again, and post here if you see any new errors

Why Mapreduce with YARN stuck on CDH 5.3?

Mapreduce with YARN fail to move ahead of 0% map and 0% reduce. I am using Cloudera CDH on google compute high memory instance(13 GM RAM). 8 GB free ram is available on the machine. Can you please help me to fix it?
sunny#hadoop-m:~$ hadoop jar /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/jars/hadoop-mapreduce-examples-2.5.0-cdh5.3.0.jar grep input output 'dfs[a-z.]+'
14/12/24 00:13:53 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m.c.sunny-hadoop-trial.internal/10.240.253.233:8032
14/12/24 00:13:53 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
14/12/24 00:13:54 INFO input.FileInputFormat: Total input paths to process : 5
14/12/24 00:13:54 INFO mapreduce.JobSubmitter: number of splits:5
14/12/24 00:13:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1419360146634_0001
14/12/24 00:13:54 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/12/24 00:13:54 INFO impl.YarnClientImpl: Submitted application application_1419360146634_0001
14/12/24 00:13:55 INFO mapreduce.Job: The url to track the job: http://hadoop-m.c.sunny-hadoop-trial.internal:8088/proxy/application_1419360146634_0001/
14/12/24 00:13:55 INFO mapreduce.Job: Running job: job_1419360146634_0001
Resource Manager Output
Some more info about job
yarn-site.xml: http://pastebin.mozilla.org/8113782
mapred-site.xml: http://pastebin.mozilla.org/8113813
Server 's IP got changed because of DHCP service. Client configuration for HDFS and YARN became stale. I needed to update client configuration, I did it with Cloudera manager and now cluster is running fine.

Map reduce job getting stuck at map 0% reduce 0%

I am running the famous wordcount example. I have a local and prod hadoop setup. The same example is working in prod, but its not working locally. Can someone tell me what should I look for.
The job is getting stuck. The task logs are:
~/tmp$ hadoop jar wordcount.jar WordCount /testhistory /outputtest/test
Warning: $HADOOP_HOME is deprecated.
13/08/29 16:12:34 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/08/29 16:12:35 INFO input.FileInputFormat: Total input paths to process : 3
13/08/29 16:12:35 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/08/29 16:12:35 WARN snappy.LoadSnappy: Snappy native library not loaded
13/08/29 16:12:35 INFO mapred.JobClient: Running job: job_201308291153_0015
13/08/29 16:12:36 INFO mapred.JobClient: map 0% reduce 0%
Locally hadoop in running as pseudo distributed mode. All the 3 processes, namenode, datanode, jobtracker is running. Let me know if some extra information is required.
The tasktracker seems to be missing.
Try:
hadoop tasktracker &
In Hadoop 2.x this problem could be related to memory issues, you can see it in MapReduce in Hadoop 2.2.0 not working
I had the same problem and this page helped me:
http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide/
Basically I solved my problem using the following 3 steps. The fact is that I had to configure much more memory I really have.
1) yarn-site.xml
yarn.resourcemanager.hostname = hostname_of_the_master
yarn.nodemanager.resource.memory-mb = 4000
yarn.nodemanager.resource.cpu-vcores = 2
yarn.scheduler.minimum-allocation-mb = 4000
2) mapred-site.xml
yarn.app.mapreduce.am.resource.mb = 4000
yarn.app.mapreduce.am.command-opts = -Xmx3768m
mapreduce.map.cpu.vcores = 2
mapreduce.reduce.cpu.vcores = 2
3) Send these files across all nodes
Except for hadoop tasktracker & and any other issues. Please check you code and make sure that there is no infinite loop or any other bugs. Maybe there are some bugs in your code!
If this problem is coming when using Hive queries then do check if you are joining two very big tables without leveraging partitions. Not using partitions may lead to long running full table scans and hence stuck at map 0% reduce 0%.

Resources