Running Hadoop MapReduce word count for the first time fails? - hadoop

When running the Hadoop word count example the first time it fails. Here's what I'm doing:
Format namenode: $HADOOP_HOME/bin/hdfs namenode -format
Start HDFS/YARN:
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
$HADOOP_HOME/sbin/yarn-daemon.sh start nodemanager
Run wordcount: hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount input output
(let's say input folder is already in HDFS I'm not gonna put every single command here)
Output:
16/07/17 01:04:34 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.20.0.2:8032
16/07/17 01:04:35 INFO input.FileInputFormat: Total input paths to process : 2
16/07/17 01:04:35 INFO mapreduce.JobSubmitter: number of splits:2
16/07/17 01:04:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468688654488_0001
16/07/17 01:04:36 INFO impl.YarnClientImpl: Submitted application application_1468688654488_0001
16/07/17 01:04:36 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1468688654488_0001/
16/07/17 01:04:36 INFO mapreduce.Job: Running job: job_1468688654488_0001
16/07/17 01:04:46 INFO mapreduce.Job: Job job_1468688654488_0001 running in uber mode : false
16/07/17 01:04:46 INFO mapreduce.Job: map 0% reduce 0%
Terminated
And then HDFS crashes so I can't access http://localhost:50070/
Then I restart eveyrthing (repeat step 2), rerun the example and everything's fine.
How can I fix it for the first run? My HDFS obviously has no data the first time around, maybe that's the problem?
UPDATE:
Running an even simpler example fails as well:
hadoop#8f98bf86ceba:~$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples*.jar pi 3 3
Number of Maps = 3
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Starting Job
16/07/17 03:21:28 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.20.0.3:8032
16/07/17 03:21:29 INFO input.FileInputFormat: Total input paths to process : 3
16/07/17 03:21:29 INFO mapreduce.JobSubmitter: number of splits:3
16/07/17 03:21:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468696855031_0001
16/07/17 03:21:31 INFO impl.YarnClientImpl: Submitted application application_1468696855031_0001
16/07/17 03:21:31 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1468696855031_0001/
16/07/17 03:21:31 INFO mapreduce.Job: Running job: job_1468696855031_0001
16/07/17 03:21:43 INFO mapreduce.Job: Job job_1468696855031_0001 running in uber mode : false
16/07/17 03:21:43 INFO mapreduce.Job: map 0% reduce 0%
Same problem, HDFS terminates

Your post looks incomplete to deduce what is wrong here. My guess is that hadoop-mapreduce-examples-2.7.2-sources.jar is not what you want. More likely you need hadoop-mapreduce-examples-2.7.2.jar containing .class files and not the sources.

HDFS has to be restarted the first time before MapReduce jobs can be successfully ran. This is because HDFS creates some data on the first run but stopping it can clean up its state so MapReduce jobs can be ran through YARN afterwards.
So my solution was:
Start Hadoop: $HADOOP_HOME/sbin/start-dfs.sh
Stop Hadoop: $HADOOP_HOME/sbin/stop-dfs.sh
Start Hadoop again: $HADOOP_HOME/sbin/start-dfs.sh

Related

Hadoop Installation in Windows 7

I am working on hadoop installation in Windows 7.
Tried to untar the tarfiles from apache site but it was unsuccessful.
I have searched in internet and found below link.
http://toodey.com/2015/08/10/hadoop-installation-on-windows-without-cygwin-in-10-mints/
I was able to install. But when i was trying to execute the examples i was encountered with below errors.
Command executed :
C:\Users\hadoop\hadoop-2.7.1\bin\hadoop.cmd jar C:\Users\hadoop\hadoop-2.7.1\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.1.jar wordcount /hadoop/input /hadoop/output
Error :
C:\Users\hadoop\hadoop-2.7.1>C:\Users\hadoop\hadoop-2.7.1\bin\hadoop.cmd jar C:\Users\hadoop\hadoop-2.7.1\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.1.jar wordcount /hadoop/input /hadoop/output
16/11/14 17:05:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:0000
16/11/14 17:05:30 INFO input.FileInputFormat: Total input paths to process : 3
16/11/14 17:05:30 INFO mapreduce.JobSubmitter: number of splits:3
16/11/14 17:05:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479122512555_0003
16/11/14 17:05:31 INFO impl.YarnClientImpl: Submitted application application_1479122512555_0003
16/11/14 17:05:32 INFO mapreduce.Job: The url to track the job: http://MachineName:8088/proxy/application_1479122512555_0003/
16/11/14 17:05:32 INFO mapreduce.Job: Running job: job_1479122512555_0003
16/11/14 17:05:36 INFO mapreduce.Job: Job job_1479122512555_0003 running in uber mode : false
16/11/14 17:05:36 INFO mapreduce.Job: map 0% reduce 0%
16/11/14 17:05:36 INFO mapreduce.Job: Job job_1479122512555_0003 failed with state FAILED due to: Application application_1479122512555_0003 failed 2 times due to AM Container for appattempt_1479122512555_0003_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://MachineName:8088/cluster/app/application_1479122512555_0003Then, click on links to logs of each attempt.
Diagnostics: null
Failing this attempt. Failing the application.
16/11/14 17:05:36 INFO mapreduce.Job: Counters: 0
Thanks in advance...
Actually this is due to the permission issues on some of the yarn local directories. So here is the solution
Identify the yarn directories specified using the parameter yarn.nodemanager.local-dirs from /etc/gphd/hadoop/conf/yarn-site.xml on yarn node manager.
Delete the files / folders under usercache from all the directories listed in yarn-site.xml and all the node managers.
e.g
rmdir path/to/yarn/nm/usercache/*

wordcount not running in Cloudera

I have installed Cloudera 5.8 in a Linux RHEL 7.2 instance of Amazon EC2. I have logged in with SSH and I am trying to run the wordcount example for testing mapreduce operation with the following command:
hadoop jar /opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount archivo.txt output
The problem is that the wordcount program is blocked and it not produces the output. Only the following is prompted:
16/08/11 13:10:02 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-22-226.ec2.internal/172.31.22.226:8032
16/08/11 13:10:03 INFO input.FileInputFormat: Total input paths to process : 1
16/08/11 13:10:03 INFO mapreduce.JobSubmitter: number of splits:1
16/08/11 13:10:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470929244097_0007
16/08/11 13:10:04 INFO impl.YarnClientImpl: Submitted application application_1470929244097_0007
16/08/11 13:10:04 INFO mapreduce.Job: The url to track the job: http://ip-172-31-22-226.ec2.internal:8088/proxy/application_1470929244097_0007/
16/08/11 13:10:04 INFO mapreduce.Job: Running job: job_1470929244097_0007
And then get blocked since "Running job". After this I have to press Ctrl+C for unblock and it not produces the output.
Anyone that knows why?. I think it is probably a configuration issue and I am new to DataNodes and so on.
Thanks a lot.
Looks like there are no resources (map or reducer slots), job is waiting for resources. You can check the job status on.
http://ip-172-31-22-226.ec2.internal:8088

Hadoop program stuck at "Running job:"

I was running hadoop program (wordcount) in Horton sandbox. And the situation occurred as below. Especially, this is the program I had ran successfully for many times on exactly the same virtual machine I used, however this time it "failed" without any notification, so it just stuck there. I tried other mapreduce program, the results are similar. Normally, the command lines will notify me with ubermode : false, follows by the Running job..., but this time, it doesn't, and out of no reason.
[root#sandbox ~]# hadoop jar testWC.jar testWC.WCdriver /data/input/pg103.txt /data/output/WC
WARNING: Use "yarn jar" to launch YARN applications.
16/03/11 19:20:01 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
16/03/11 19:20:01 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
16/03/11 19:20:01 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/11 19:20:02 INFO input.FileInputFormat: Total input paths to process : 1
16/03/11 19:20:02 INFO mapreduce.JobSubmitter: number of splits:1
16/03/11 19:20:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1457723341319_0002
16/03/11 19:20:03 INFO impl.YarnClientImpl: Submitted application application_1457723341319_0002
16/03/11 19:20:03 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1457723341319_0002/
16/03/11 19:20:03 INFO mapreduce.Job: Running job: job_1457723341319_0002
The program just could not move on anymore.

Job submitting but map reduce not working

I tried to run the example program present in Hadoop. However, I'm not successful in getting the output.
I have included my logs below. Please help in solving the issue.
hdfs#localhost:~$ hadoop jar '/opt/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar' wordcount /README.txt /ooo
15/08/21 09:48:26 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
15/08/21 09:48:28 INFO input.FileInputFormat: Total input paths to process : 1
15/08/21 09:48:28 INFO mapreduce.JobSubmitter: number of splits:1
15/08/21 09:48:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1440130528838_0001
15/08/21 09:48:29 INFO impl.YarnClientImpl: Submitted application application_1440130528838_0001
15/08/21 09:48:29 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1440130528838_0001/
15/08/21 09:48:29 INFO mapreduce.Job: Running job: job_1440130528838_0001
The mapreduce seems working, there is no error logs which appears.
1/ Can you please detail furthermore your logs?!
2/ Your output folder /ooo is created?? If yes what its contents?!
3/ Verify please if your input file is not empty.

Why Mapreduce with YARN stuck on CDH 5.3?

Mapreduce with YARN fail to move ahead of 0% map and 0% reduce. I am using Cloudera CDH on google compute high memory instance(13 GM RAM). 8 GB free ram is available on the machine. Can you please help me to fix it?
sunny#hadoop-m:~$ hadoop jar /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/jars/hadoop-mapreduce-examples-2.5.0-cdh5.3.0.jar grep input output 'dfs[a-z.]+'
14/12/24 00:13:53 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m.c.sunny-hadoop-trial.internal/10.240.253.233:8032
14/12/24 00:13:53 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
14/12/24 00:13:54 INFO input.FileInputFormat: Total input paths to process : 5
14/12/24 00:13:54 INFO mapreduce.JobSubmitter: number of splits:5
14/12/24 00:13:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1419360146634_0001
14/12/24 00:13:54 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/12/24 00:13:54 INFO impl.YarnClientImpl: Submitted application application_1419360146634_0001
14/12/24 00:13:55 INFO mapreduce.Job: The url to track the job: http://hadoop-m.c.sunny-hadoop-trial.internal:8088/proxy/application_1419360146634_0001/
14/12/24 00:13:55 INFO mapreduce.Job: Running job: job_1419360146634_0001
Resource Manager Output
Some more info about job
yarn-site.xml: http://pastebin.mozilla.org/8113782
mapred-site.xml: http://pastebin.mozilla.org/8113813
Server 's IP got changed because of DHCP service. Client configuration for HDFS and YARN became stale. I needed to update client configuration, I did it with Cloudera manager and now cluster is running fine.

Resources