Stuck at command prompt while Running hadoop mapreduce jobs on windows 8 - hadoop

I have gone through the detailed installation videos of installing hadoop on windows 8 without cygwin or any other like hortonworks , sandbox etc.
My error is that while all things done succeesfully I am getting below dilemma
my command prompt stuck like this --
Note that i have not yet installed eclipse kepler and followed dis video--
[http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform]
'C:\hworks>hadoop jar c:\hworks\Recipe.jar Recipe /in /out
15/07/23 11:23:13 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0
:8032
15/07/23 11:23:16 INFO input.FileInputFormat: Total input paths to process : 1
15/07/23 11:23:18 INFO mapreduce.JobSubmitter: number of splits:1
15/07/23 11:23:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14
37627735863_0001
15/07/23 11:23:19 INFO impl.YarnClientImpl: Submitted application application_14
37627735863_0001
15/07/23 11:23:19 INFO mapreduce.Job: The url to track the job: http://SkyneT:80
88/proxy/application_1437627735863_0001/
15/07/23 11:23:19 INFO mapreduce.Job: Running job: job_1437627735863_0001'

I experienced a similar issue.
You can try these steps that worked for me:
Open the command prompt as Administrator.
Delete your c:\tmp directory (a new one will be created automatically)
Run \etc\hadoop\hadoop-env.cmd to initialize environment variables.
Run \bin\hdfs namenode -format
Run \sbin\start-all.cmd
Then try running your process again, and post here if you see any new errors

Related

Hadoop Installation in Windows 7

I am working on hadoop installation in Windows 7.
Tried to untar the tarfiles from apache site but it was unsuccessful.
I have searched in internet and found below link.
http://toodey.com/2015/08/10/hadoop-installation-on-windows-without-cygwin-in-10-mints/
I was able to install. But when i was trying to execute the examples i was encountered with below errors.
Command executed :
C:\Users\hadoop\hadoop-2.7.1\bin\hadoop.cmd jar C:\Users\hadoop\hadoop-2.7.1\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.1.jar wordcount /hadoop/input /hadoop/output
Error :
C:\Users\hadoop\hadoop-2.7.1>C:\Users\hadoop\hadoop-2.7.1\bin\hadoop.cmd jar C:\Users\hadoop\hadoop-2.7.1\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.1.jar wordcount /hadoop/input /hadoop/output
16/11/14 17:05:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:0000
16/11/14 17:05:30 INFO input.FileInputFormat: Total input paths to process : 3
16/11/14 17:05:30 INFO mapreduce.JobSubmitter: number of splits:3
16/11/14 17:05:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479122512555_0003
16/11/14 17:05:31 INFO impl.YarnClientImpl: Submitted application application_1479122512555_0003
16/11/14 17:05:32 INFO mapreduce.Job: The url to track the job: http://MachineName:8088/proxy/application_1479122512555_0003/
16/11/14 17:05:32 INFO mapreduce.Job: Running job: job_1479122512555_0003
16/11/14 17:05:36 INFO mapreduce.Job: Job job_1479122512555_0003 running in uber mode : false
16/11/14 17:05:36 INFO mapreduce.Job: map 0% reduce 0%
16/11/14 17:05:36 INFO mapreduce.Job: Job job_1479122512555_0003 failed with state FAILED due to: Application application_1479122512555_0003 failed 2 times due to AM Container for appattempt_1479122512555_0003_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://MachineName:8088/cluster/app/application_1479122512555_0003Then, click on links to logs of each attempt.
Diagnostics: null
Failing this attempt. Failing the application.
16/11/14 17:05:36 INFO mapreduce.Job: Counters: 0
Thanks in advance...
Actually this is due to the permission issues on some of the yarn local directories. So here is the solution
Identify the yarn directories specified using the parameter yarn.nodemanager.local-dirs from /etc/gphd/hadoop/conf/yarn-site.xml on yarn node manager.
Delete the files / folders under usercache from all the directories listed in yarn-site.xml and all the node managers.
e.g
rmdir path/to/yarn/nm/usercache/*

wordcount not running in Cloudera

I have installed Cloudera 5.8 in a Linux RHEL 7.2 instance of Amazon EC2. I have logged in with SSH and I am trying to run the wordcount example for testing mapreduce operation with the following command:
hadoop jar /opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount archivo.txt output
The problem is that the wordcount program is blocked and it not produces the output. Only the following is prompted:
16/08/11 13:10:02 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-22-226.ec2.internal/172.31.22.226:8032
16/08/11 13:10:03 INFO input.FileInputFormat: Total input paths to process : 1
16/08/11 13:10:03 INFO mapreduce.JobSubmitter: number of splits:1
16/08/11 13:10:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470929244097_0007
16/08/11 13:10:04 INFO impl.YarnClientImpl: Submitted application application_1470929244097_0007
16/08/11 13:10:04 INFO mapreduce.Job: The url to track the job: http://ip-172-31-22-226.ec2.internal:8088/proxy/application_1470929244097_0007/
16/08/11 13:10:04 INFO mapreduce.Job: Running job: job_1470929244097_0007
And then get blocked since "Running job". After this I have to press Ctrl+C for unblock and it not produces the output.
Anyone that knows why?. I think it is probably a configuration issue and I am new to DataNodes and so on.
Thanks a lot.
Looks like there are no resources (map or reducer slots), job is waiting for resources. You can check the job status on.
http://ip-172-31-22-226.ec2.internal:8088

Running Hadoop MapReduce word count for the first time fails?

When running the Hadoop word count example the first time it fails. Here's what I'm doing:
Format namenode: $HADOOP_HOME/bin/hdfs namenode -format
Start HDFS/YARN:
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
$HADOOP_HOME/sbin/yarn-daemon.sh start nodemanager
Run wordcount: hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount input output
(let's say input folder is already in HDFS I'm not gonna put every single command here)
Output:
16/07/17 01:04:34 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.20.0.2:8032
16/07/17 01:04:35 INFO input.FileInputFormat: Total input paths to process : 2
16/07/17 01:04:35 INFO mapreduce.JobSubmitter: number of splits:2
16/07/17 01:04:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468688654488_0001
16/07/17 01:04:36 INFO impl.YarnClientImpl: Submitted application application_1468688654488_0001
16/07/17 01:04:36 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1468688654488_0001/
16/07/17 01:04:36 INFO mapreduce.Job: Running job: job_1468688654488_0001
16/07/17 01:04:46 INFO mapreduce.Job: Job job_1468688654488_0001 running in uber mode : false
16/07/17 01:04:46 INFO mapreduce.Job: map 0% reduce 0%
Terminated
And then HDFS crashes so I can't access http://localhost:50070/
Then I restart eveyrthing (repeat step 2), rerun the example and everything's fine.
How can I fix it for the first run? My HDFS obviously has no data the first time around, maybe that's the problem?
UPDATE:
Running an even simpler example fails as well:
hadoop#8f98bf86ceba:~$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples*.jar pi 3 3
Number of Maps = 3
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Starting Job
16/07/17 03:21:28 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.20.0.3:8032
16/07/17 03:21:29 INFO input.FileInputFormat: Total input paths to process : 3
16/07/17 03:21:29 INFO mapreduce.JobSubmitter: number of splits:3
16/07/17 03:21:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468696855031_0001
16/07/17 03:21:31 INFO impl.YarnClientImpl: Submitted application application_1468696855031_0001
16/07/17 03:21:31 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1468696855031_0001/
16/07/17 03:21:31 INFO mapreduce.Job: Running job: job_1468696855031_0001
16/07/17 03:21:43 INFO mapreduce.Job: Job job_1468696855031_0001 running in uber mode : false
16/07/17 03:21:43 INFO mapreduce.Job: map 0% reduce 0%
Same problem, HDFS terminates
Your post looks incomplete to deduce what is wrong here. My guess is that hadoop-mapreduce-examples-2.7.2-sources.jar is not what you want. More likely you need hadoop-mapreduce-examples-2.7.2.jar containing .class files and not the sources.
HDFS has to be restarted the first time before MapReduce jobs can be successfully ran. This is because HDFS creates some data on the first run but stopping it can clean up its state so MapReduce jobs can be ran through YARN afterwards.
So my solution was:
Start Hadoop: $HADOOP_HOME/sbin/start-dfs.sh
Stop Hadoop: $HADOOP_HOME/sbin/stop-dfs.sh
Start Hadoop again: $HADOOP_HOME/sbin/start-dfs.sh

Hadoop program stuck at "Running job:"

I was running hadoop program (wordcount) in Horton sandbox. And the situation occurred as below. Especially, this is the program I had ran successfully for many times on exactly the same virtual machine I used, however this time it "failed" without any notification, so it just stuck there. I tried other mapreduce program, the results are similar. Normally, the command lines will notify me with ubermode : false, follows by the Running job..., but this time, it doesn't, and out of no reason.
[root#sandbox ~]# hadoop jar testWC.jar testWC.WCdriver /data/input/pg103.txt /data/output/WC
WARNING: Use "yarn jar" to launch YARN applications.
16/03/11 19:20:01 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
16/03/11 19:20:01 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
16/03/11 19:20:01 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/11 19:20:02 INFO input.FileInputFormat: Total input paths to process : 1
16/03/11 19:20:02 INFO mapreduce.JobSubmitter: number of splits:1
16/03/11 19:20:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1457723341319_0002
16/03/11 19:20:03 INFO impl.YarnClientImpl: Submitted application application_1457723341319_0002
16/03/11 19:20:03 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1457723341319_0002/
16/03/11 19:20:03 INFO mapreduce.Job: Running job: job_1457723341319_0002
The program just could not move on anymore.

Job submitting but map reduce not working

I tried to run the example program present in Hadoop. However, I'm not successful in getting the output.
I have included my logs below. Please help in solving the issue.
hdfs#localhost:~$ hadoop jar '/opt/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar' wordcount /README.txt /ooo
15/08/21 09:48:26 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
15/08/21 09:48:28 INFO input.FileInputFormat: Total input paths to process : 1
15/08/21 09:48:28 INFO mapreduce.JobSubmitter: number of splits:1
15/08/21 09:48:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1440130528838_0001
15/08/21 09:48:29 INFO impl.YarnClientImpl: Submitted application application_1440130528838_0001
15/08/21 09:48:29 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1440130528838_0001/
15/08/21 09:48:29 INFO mapreduce.Job: Running job: job_1440130528838_0001
The mapreduce seems working, there is no error logs which appears.
1/ Can you please detail furthermore your logs?!
2/ Your output folder /ooo is created?? If yes what its contents?!
3/ Verify please if your input file is not empty.

Resources