Hadoop - Mapreduce job fails to run in Windows (Cygwin) - hadoop

I have installed cygwin in windows, and configured the hadoop0.20.0 set up on it, i could able to run the word count project in eclipse successfully, but when i run the wordcount in hadoop-..*-example.jar,it throws the following error
3/06/28 07:32:51 INFO input.FileInputFormat: Total input paths to process : 1
13/06/28 07:32:52 INFO mapred.JobClient: Running job: job_201306280622_0002
13/06/28 07:32:53 INFO mapred.JobClient: map 0% reduce 0%
13/06/28 07:32:57 INFO mapred.JobClient: Task Id : attempt_201306280622_0002_m_000002_0, Status : FAILED
Error initializing attempt_201306280622_0002_m_000002_0:
org.apache.hadoop.util.Shell$ExitCodeException: //job.jar: invalid mode: `jar'
Try `//job.jar --help' for more information.
at org.apache.hadoop.util.Shell.runCommand(Shell.java:195)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java: 286)
what would be the problem, plz assit

Your commands looks fine to me. try giving input path with file name. I hope that will solve your problem.
bin/hadoop jar hadoop-0.20.0-examples.jar wordcount /user/input/input_file.txt /user/output

Related

Hadoop Installation in Windows 7

I am working on hadoop installation in Windows 7.
Tried to untar the tarfiles from apache site but it was unsuccessful.
I have searched in internet and found below link.
http://toodey.com/2015/08/10/hadoop-installation-on-windows-without-cygwin-in-10-mints/
I was able to install. But when i was trying to execute the examples i was encountered with below errors.
Command executed :
C:\Users\hadoop\hadoop-2.7.1\bin\hadoop.cmd jar C:\Users\hadoop\hadoop-2.7.1\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.1.jar wordcount /hadoop/input /hadoop/output
Error :
C:\Users\hadoop\hadoop-2.7.1>C:\Users\hadoop\hadoop-2.7.1\bin\hadoop.cmd jar C:\Users\hadoop\hadoop-2.7.1\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.1.jar wordcount /hadoop/input /hadoop/output
16/11/14 17:05:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:0000
16/11/14 17:05:30 INFO input.FileInputFormat: Total input paths to process : 3
16/11/14 17:05:30 INFO mapreduce.JobSubmitter: number of splits:3
16/11/14 17:05:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479122512555_0003
16/11/14 17:05:31 INFO impl.YarnClientImpl: Submitted application application_1479122512555_0003
16/11/14 17:05:32 INFO mapreduce.Job: The url to track the job: http://MachineName:8088/proxy/application_1479122512555_0003/
16/11/14 17:05:32 INFO mapreduce.Job: Running job: job_1479122512555_0003
16/11/14 17:05:36 INFO mapreduce.Job: Job job_1479122512555_0003 running in uber mode : false
16/11/14 17:05:36 INFO mapreduce.Job: map 0% reduce 0%
16/11/14 17:05:36 INFO mapreduce.Job: Job job_1479122512555_0003 failed with state FAILED due to: Application application_1479122512555_0003 failed 2 times due to AM Container for appattempt_1479122512555_0003_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://MachineName:8088/cluster/app/application_1479122512555_0003Then, click on links to logs of each attempt.
Diagnostics: null
Failing this attempt. Failing the application.
16/11/14 17:05:36 INFO mapreduce.Job: Counters: 0
Thanks in advance...
Actually this is due to the permission issues on some of the yarn local directories. So here is the solution
Identify the yarn directories specified using the parameter yarn.nodemanager.local-dirs from /etc/gphd/hadoop/conf/yarn-site.xml on yarn node manager.
Delete the files / folders under usercache from all the directories listed in yarn-site.xml and all the node managers.
e.g
rmdir path/to/yarn/nm/usercache/*

class not found exception in mapreduce

When I tried the below query
[cloudera#localhost ~]$ hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount examples/output3;
15/11/05 10:13:04 INFO mapred.JobClient: map 0% reduce 0% 15/11/05 10:13:18 INFO mapred.JobClient: Task Id : attempt_201511050944_0005_m_000000_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: Class word.Paras$Map not found.
Here word is package name Paras is class name I have checked the output file is created as logs.
How can I fix this issue.
Thanks,
Anbu k

Stuck at command prompt while Running hadoop mapreduce jobs on windows 8

I have gone through the detailed installation videos of installing hadoop on windows 8 without cygwin or any other like hortonworks , sandbox etc.
My error is that while all things done succeesfully I am getting below dilemma
my command prompt stuck like this --
Note that i have not yet installed eclipse kepler and followed dis video--
[http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform]
'C:\hworks>hadoop jar c:\hworks\Recipe.jar Recipe /in /out
15/07/23 11:23:13 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0
:8032
15/07/23 11:23:16 INFO input.FileInputFormat: Total input paths to process : 1
15/07/23 11:23:18 INFO mapreduce.JobSubmitter: number of splits:1
15/07/23 11:23:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14
37627735863_0001
15/07/23 11:23:19 INFO impl.YarnClientImpl: Submitted application application_14
37627735863_0001
15/07/23 11:23:19 INFO mapreduce.Job: The url to track the job: http://SkyneT:80
88/proxy/application_1437627735863_0001/
15/07/23 11:23:19 INFO mapreduce.Job: Running job: job_1437627735863_0001'
I experienced a similar issue.
You can try these steps that worked for me:
Open the command prompt as Administrator.
Delete your c:\tmp directory (a new one will be created automatically)
Run \etc\hadoop\hadoop-env.cmd to initialize environment variables.
Run \bin\hdfs namenode -format
Run \sbin\start-all.cmd
Then try running your process again, and post here if you see any new errors

In Hadoop, How can I find which slave node is executing an attempt N?

I'm using Hadoop 1.2.1, and my hadoop application fails in doing Reduce. From Hadoop run I see messages like following :
15/05/22 18:14:15 INFO mapred.JobClient: map 0% reduce 0% 15/05/22
18:14:25 INFO mapred.JobClient: map 100% reduce 0% 15/05/22 18:24:25
INFO mapred.JobClient: map 0% reduce 0% 15/05/22 18:24:26 INFO
mapred.JobClient: Task Id : attempt_201505221804_0013_m_000000_0,
Status : FAILED Task attempt_201505221804_0013_m_000000_0 failed to
report status for 600 seconds. Killing! 15/05/22 18:24:35 INFO
mapred.JobClient: map 100% reduce 0%
I'd like to see the log of attempt_201505221804_0013_m_000000_0, but it is too time-consuming to find which slave had executed attempt_201505221804_0013_m_000000_0.
Someone told me to use Hadoop web pages to find it, but there is some firewall on this cluster and I can't change the option because the cluster is fundamentally not owned by our group.
Is there any way to find in where this attempt was executed?
You should be able to find this information in the jobtracker logs which are by default under HADOOP_HOME/logs. This will contain entries looking similar to this:
INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201503262103_0001_m_000000_0' to tip task_201503262103_0001_m_000000, for tracker 'host'
You can search the file for the specific attempt id.

hadoop MapReduce Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out

I am facing below error while trying to run MapReduce job with more than one input file. Although I am able to run MapReduce job with only one input file.
I go through some posts and almost every one is saying there is firewall Issue or not setup properly hostnames in /etc/hosts file.
Even IF this is the case my MapReduce job will fail whether the input is single file or directory(multiple files)
Below is the output from console.
INFO input.FileInputFormat: Total input paths to process : 2
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
WARN snappy.LoadSnappy: Snappy native library not loaded
INFO mapred.JobClient: Running job: job_201505201700_0005
INFO mapred.JobClient: map 0% reduce 0%
INFO mapred.JobClient: map 50% reduce 0%
INFO mapred.JobClient: map 100% reduce 0%
INFO mapred.JobClient: map 100% reduce 16%
INFO mapred.JobClient: map 100% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201505201700_0005_r_000000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
WARN mapred.JobClient: Error reading task outputAMR-DEV02.local
WARN mapred.JobClient: Error reading task outputAMR-DEV02.local
INFO mapred.JobClient: map 100% reduce 16%
INFO mapred.JobClient: map 100% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201505201700_0005_r_000000_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
WARN mapred.JobClient: Error reading task outputEmbeddedQASrv.local
WARN mapred.JobClient: Error reading task outputEmbeddedQASrv.local
INFO mapred.JobClient: map 100% reduce 16%
Note. EmbeddedQASrv.local(ip address. 192.168.115.80) and AMR-DEV02.local(ip address. 192.168.115.79) are my slave node host names.
My Hadoop cluster is consisting of 1 Master and 2 Slaves.
This is the command I am running from console.(emp_dept_data is a directory contains empData and deptData files)
hadoop jar testdata/joindevice.jar JoinDevice emp_dept_data output15
However, If i run this command MapReduce job gets successed(single file as input)
hadoop jar testdata/joindevice.jar JoinDevice emp_dept_data/empData output16
Here is my /etc/hosts file entry set up Master node. However same entry's were copied to my slave nodes also.
127.0.0.1 amr-dev01.local amr-dev01 localhost
::1 localhost6.localdomain6 localhost6
#Hadoop Configurations
192.168.115.78 master
192.168.115.79 slave01
192.168.115.80 slave02
I am clueless for what is wrong and where to check for exact root cause.
The actual problem was with /etc/hosts file. I commented my local host configuration.
amr-dev01.local amr-dev01 localhost
and Instead of specifying different names like master, slave01, slave02...I used same hostnames
192.168.115.78 amr-dev01
192.168.115.79 amr-dev02

Resources