hadoop Input path does not exist - hadoop

I am trying to get hadoop set up on my laptop. I have followed a few tutorials on setting up hadoop.
I ran this command:
bin/hdfs dfs -mkdir /user/<username>
If I run it again it says already exists.
I try to run the test jar file with this command:
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs[a-z.]+'
and receive this exception
16/01/22 15:11:06 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/<username>/.staging/job_1453492366595_0006
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/<username>/grep-temp-891167560
I did not realize that I receive this before this error:
16/01/22 15:51:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/22 15:51:51 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/01/22 15:51:51 INFO input.FileInputFormat: Total input paths to process : 33
16/01/22 15:51:52 INFO mapreduce.JobSubmitter: number of splits:33
16/01/22 15:51:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1453492366595_0009
16/01/22 15:51:52 INFO impl.YarnClientImpl: Submitted application application_1453492366595_0009
16/01/22 15:51:52 INFO mapreduce.Job: The url to track the job: http://Marys-MacBook-Pro.local:8088/proxy/application_1453492366595_0009/
16/01/22 15:51:52 INFO mapreduce.Job: Running job: job_1453492366595_0009
16/01/22 15:51:56 INFO mapreduce.Job: Job job_1453492366595_0009 running in uber mode : false
16/01/22 15:51:56 INFO mapreduce.Job: map 0% reduce 0%
16/01/22 15:51:56 INFO mapreduce.Job: Job job_1453492366595_0009 failed with state FAILED due to: Application application_1453492366595_0009 failed 2 times due to AM Container for appattempt_1453492366595_0009_000002 exited with exitCode: 127
For more detailed output, check application tracking page:http://Marys-MacBook-Pro.local:8088/cluster/app/application_1453492366595_0009Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1453492366595_0009_02_000001
Exit code: 127
Stack trace: ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 127
Failing this attempt. Failing the application.
There is a stack trace the follows this.
I am on a Mac PC.

I use Hadoop 2.7.2, and While following the Official Docs, I also encountered this problem at first.
The reason was that I forgot to follow "Prepare to Start the Hadoop Cluster" chapter.
I solved it by setting JAVA_HOME in etc/hadoop/hadoop-env.sh.

For me, it's because using wrong version JDK with hadoop. I used hadoop 2.6.5. At first, I started hadoop using oracle JDK 1.8.0_131, ran example jar and error occurred. After I used JDK 1.7.0_80, the example works like a charm.
There is a page about HadoopJavaVersions.

Related

Hadoop Installation in Windows 7

I am working on hadoop installation in Windows 7.
Tried to untar the tarfiles from apache site but it was unsuccessful.
I have searched in internet and found below link.
http://toodey.com/2015/08/10/hadoop-installation-on-windows-without-cygwin-in-10-mints/
I was able to install. But when i was trying to execute the examples i was encountered with below errors.
Command executed :
C:\Users\hadoop\hadoop-2.7.1\bin\hadoop.cmd jar C:\Users\hadoop\hadoop-2.7.1\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.1.jar wordcount /hadoop/input /hadoop/output
Error :
C:\Users\hadoop\hadoop-2.7.1>C:\Users\hadoop\hadoop-2.7.1\bin\hadoop.cmd jar C:\Users\hadoop\hadoop-2.7.1\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.1.jar wordcount /hadoop/input /hadoop/output
16/11/14 17:05:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:0000
16/11/14 17:05:30 INFO input.FileInputFormat: Total input paths to process : 3
16/11/14 17:05:30 INFO mapreduce.JobSubmitter: number of splits:3
16/11/14 17:05:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479122512555_0003
16/11/14 17:05:31 INFO impl.YarnClientImpl: Submitted application application_1479122512555_0003
16/11/14 17:05:32 INFO mapreduce.Job: The url to track the job: http://MachineName:8088/proxy/application_1479122512555_0003/
16/11/14 17:05:32 INFO mapreduce.Job: Running job: job_1479122512555_0003
16/11/14 17:05:36 INFO mapreduce.Job: Job job_1479122512555_0003 running in uber mode : false
16/11/14 17:05:36 INFO mapreduce.Job: map 0% reduce 0%
16/11/14 17:05:36 INFO mapreduce.Job: Job job_1479122512555_0003 failed with state FAILED due to: Application application_1479122512555_0003 failed 2 times due to AM Container for appattempt_1479122512555_0003_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://MachineName:8088/cluster/app/application_1479122512555_0003Then, click on links to logs of each attempt.
Diagnostics: null
Failing this attempt. Failing the application.
16/11/14 17:05:36 INFO mapreduce.Job: Counters: 0
Thanks in advance...
Actually this is due to the permission issues on some of the yarn local directories. So here is the solution
Identify the yarn directories specified using the parameter yarn.nodemanager.local-dirs from /etc/gphd/hadoop/conf/yarn-site.xml on yarn node manager.
Delete the files / folders under usercache from all the directories listed in yarn-site.xml and all the node managers.
e.g
rmdir path/to/yarn/nm/usercache/*

Hadoop program stuck at "Running job:"

I was running hadoop program (wordcount) in Horton sandbox. And the situation occurred as below. Especially, this is the program I had ran successfully for many times on exactly the same virtual machine I used, however this time it "failed" without any notification, so it just stuck there. I tried other mapreduce program, the results are similar. Normally, the command lines will notify me with ubermode : false, follows by the Running job..., but this time, it doesn't, and out of no reason.
[root#sandbox ~]# hadoop jar testWC.jar testWC.WCdriver /data/input/pg103.txt /data/output/WC
WARNING: Use "yarn jar" to launch YARN applications.
16/03/11 19:20:01 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
16/03/11 19:20:01 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
16/03/11 19:20:01 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/11 19:20:02 INFO input.FileInputFormat: Total input paths to process : 1
16/03/11 19:20:02 INFO mapreduce.JobSubmitter: number of splits:1
16/03/11 19:20:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1457723341319_0002
16/03/11 19:20:03 INFO impl.YarnClientImpl: Submitted application application_1457723341319_0002
16/03/11 19:20:03 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1457723341319_0002/
16/03/11 19:20:03 INFO mapreduce.Job: Running job: job_1457723341319_0002
The program just could not move on anymore.

Hadoop 2.6.0 wordcount example not running

I was following the instructions found here and here.
All web urls are opened properly and then I tried to run wordcount example.
I went into ACCEPTED state .. didn't run.
[root#localhost hadoop-2.6.0]# yarn jar /usr/local/deployment/WordCount.jar input output
14/12/05 19:15:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/05 19:15:22 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/12/05 19:15:22 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/12/05 19:15:23 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
14/12/05 19:15:24 INFO mapred.FileInputFormat: Total input paths to process : 30
14/12/05 19:15:25 INFO mapreduce.JobSubmitter: number of splits:30
14/12/05 19:15:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1417787106330_0001
14/12/05 19:15:26 INFO impl.YarnClientImpl: Submitted application application_1417787106330_0001
14/12/05 19:15:26 INFO mapreduce.Job: The url to track the job: http://local:8088/proxy/application_1417787106330_0001/
14/12/05 19:15:26 INFO mapreduce.Job: Running job: job_1417787106330_0001
Following output on web interface :
User: root
Name: wordcount
Application Type: MAPREDUCE
Application Tags:
State: ACCEPTED
FinalStatus: UNDEFINED
Can someone tell me possible reason for the this ??

Hadoop mapreduce container exited with a non-zero exit code 1

I'm trying to run some hadoop program to extracting keywords of some abstracts in Ubuntu. When I run my program using Hadoop, I get the following error.
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO input.FileInputFormat: Total input paths to process : 1
INFO mapreduce.JobSubmitter: number of splits:1
INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1404812840999_0001
INFO impl.YarnClientImpl: Submitted application application_1404812840999_0001
INFO mapreduce.Job: The url to track the job: http://shiva-VirtualBox:8088/proxy/application_1404812840999_0001/
INFO mapreduce.Job: Running job: job_1404812840999_0001
INFO mapreduce.Job: Job job_1404812840999_0001 running in uber mode : false
INFO mapreduce.Job: map 0% reduce 0%
INFO mapreduce.Job: Job job_1404812840999_0001 failed with state FAILED due to: Application application_1404812840999_0001 failed 2 times due to AM Container for appattempt_1404812840999_0001_000002 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
14/07/08 14:21:44 INFO mapreduce.Job: Counters: 0
What's the cause of this error?
Note that I converted my mapreduce project to maven project for using lucene library in my code.
Is your resource manager really on the /0.0.0.0:8032? It also seams you are not using Toolrunner, so try to rewrite your mapreduce Hadoop: Implementing the Tool interface for MapReduce driver.
Hope it helps
Number of thread increased, JVM memory and CPU is fully utilised. Please increase the JVM size and increase memory limit of Mapper and reducer task.
conf.set("mapreduce.map.memory.mb", "4096");
conf.set("mapreduce.map.java.opts", "-Xmx3500m");

Hadoop error in shuffle in fetcher: Exceeded MAX_FAILED_UNIQUE_FETCHES

I am new to hadoop. I have a kerberos security enabled hadoop cluster (master and 1 slave) set up on a virtual box. I am trying to run a job from the hadoop examples 'pi'. The job terminates with the error Exceeded MAX_FAILED_UNIQUE_FETCHES. I tried searching for this error but the solutions given on the internet do not seem to be working for me. Perhaps I am missing something obvious. I even tried removing the slave from the etc/hadoop/slaves file to see if the job can run only on the master but that fails as well with the same error. Below is the log. I am running this on 64-bit Ubuntu 14.04 virtual box. Any help appreciated.
montauk#montauk-vmaster:/usr/local/hadoop$ sudo -u yarn bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar pi 2 10
Number of Maps = 2
Samples per Map = 10
OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/06/05 12:04:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
14/06/05 12:04:49 INFO client.RMProxy: Connecting to ResourceManager at /192.168.0.29:8040
14/06/05 12:04:50 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 17 for yarn on 192.168.0.29:54310
14/06/05 12:04:50 INFO security.TokenCache: Got dt for hdfs://192.168.0.29:54310; Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.0.29:54310, Ident: (HDFS_DELEGATION_TOKEN token 17 for yarn)
14/06/05 12:04:50 INFO input.FileInputFormat: Total input paths to process : 2
14/06/05 12:04:51 INFO mapreduce.JobSubmitter: number of splits:2
14/06/05 12:04:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401975262053_0007
14/06/05 12:04:51 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.0.29:54310, Ident: (HDFS_DELEGATION_TOKEN token 17 for yarn)
14/06/05 12:04:53 INFO impl.YarnClientImpl: Submitted application application_1401975262053_0007
14/06/05 12:04:53 INFO mapreduce.Job: The url to track the job: http://montauk-vmaster:8088/proxy/application_1401975262053_0007/
14/06/05 12:04:53 INFO mapreduce.Job: Running job: job_1401975262053_0007
14/06/05 12:05:29 INFO mapreduce.Job: Job job_1401975262053_0007 running in uber mode : false
14/06/05 12:05:29 INFO mapreduce.Job: map 0% reduce 0%
14/06/05 12:06:04 INFO mapreduce.Job: map 50% reduce 0%
14/06/05 12:06:06 INFO mapreduce.Job: map 100% reduce 0%
14/06/05 12:06:34 INFO mapreduce.Job: map 100% reduce 100%
14/06/05 12:06:34 INFO mapreduce.Job: Task Id : attempt_1401975262053_0007_r_000000_0, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#4
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)
I came across the same problem as yours when I install cdh5.1.0 with kerberos security using tarball,solutions found by google are insufficient memory,but I don't think it's my situation since my input is very small (52K).
After digging several days,I found root cause in this link.
To sum up solutions in that link can be:
add following property in yarn-site.xml even it's default in yarn-default.xml
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
remove property yarn.nodemanager.local-dirs and use default value /tmp.Then exec following commands:
mkdir -p /tmp/hadoop-yarn/nm-local-dir
chown yarn:yarn /tmp/hadoop-yarn/nm-local-dir
The problem can be concluded:
After setting yarn.nodemanager.local-dirs property, the property yarn.nodemanager.aux-services.mapreduce_shuffle.class in yarn-default.xml doesn't work.
The root cause I haven't found also.
I had the same issue.I had mapreduce job without reducer.Then I solved it using job.setNumReduceTasks(0);
change below property in yarn-site.xml and create the directory.
yarn.nodemanager.local-dirs
/tmp
mkdir -p /tmp/hadoop-yarn/nm-local-dir
chown yarn:yarn /tmp/hadoop-yarn/nm-local-dir
tune the resources properety in mapred-site.xml
mapreduce.reduce.shuffle.input.buffer.percent=0.50
mapreduce.reduce.shuffle.memory.limit.percent=0.2
mapreduce.reduce.shuffle.parallelcopies=4
Restart resourcemanager and nodemanager on their respective nodes.

Resources