Hadoop job not working - hadoop

I'm following this instructions to run hadoop:
http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_(Single-Node_Cluster)
however, I couldn't get this command to work:
hadoop-*/bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
all what I get is:
Exception in thread "main" java.io.IOException: Error opening job jar: /Users/hadoop/hadoop-1.0.1/hadoop-examples-1.0.1.jargrep
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:127)
at java.util.jar.JarFile.<init>(JarFile.java:135)
at java.util.jar.JarFile.<init>(JarFile.java:72)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
I added this to my hadoop-env.sh :
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
but still the same error.
Any clue guys?

When you run the following command:
hadoop-/bin/hadoop jar hadoop--examples.jar grep input output 'dfs[a-z.]+'
grep is the hadoop program which is part of example
input is the folder where your source data is and hope you have created it at HDFS
output is the folder which will be created as result.
'dfs[a=-z.]+' is the regular options used with grep program
because the output is "Grep......." it seems to me that the actual sample application class is not available or missing some info when Hadoop command is running.. you would need to check that first and also look for regular expression if that applies with your input data.

I know this is old, but in case anyone else has the same problem and sees this SO question, I want to put up what I did to solve this, as it's very simple.
It looks like it's a typo in the example's instructions. If you look in the Hadoop distribution directory you will notice that the example file being referred to is called hadoop-examples-1.0.4.jar, or whatever version you are using.
So instead of:
hadoop-*/bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
try:
bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

Related

Installing and setting up hadoop 2.7.2 in stand-alone mode

I'm installing hadoop now using the following link :
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation
I have Question on installing and setting up hadoop platform as stand-alone mode.
First making input file in Standalone operation, this site write command as follows :
$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
$ cat output/*
what is this processing?? running example??
and I issue those commands, I got the error as displayed in the image below :
what is problem??
what is this processing?? running example??
Those commands didn't process anything seriously rather than that, just executing a predefined example available with hadoop jar file to make sure you have installed & configured the setup properly.
As assumed that you were in the directory "/" while executing the following commands :
1) $ mkdir input : creating a directory called input under root directory /
2) $ cp etc/hadoop/*.xml input : Copying the hadoop conf files (*.xml) from /etc/hadoop to /input
3) $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+' :
Executing an inbuilt example class shipped with hadoop libraries. This example do extract the parameter starts with dfs from all the hadoop xml conf files located under the directory /input and write the result into the directory /output (implicitly created by hadoop as part of execution).
4) $ cat output/* : This command print all the file contents under the directory /output in terminal.
what is problem??
The problem you are facing here is the "input path". The path is vague and it was not resolved by hadoop. Make sure you are running hadoop as standalone mode. And finally execute the example by giving absolute path (for both input and output) as follows :
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep /input /output 'dfs[a-z.]+'

Could you please tell me where I would find the output for the mapreduce program Wordmedian in hadoop? Is it stored in a directory in HDFS?

Is this the right command?
hadoop jar /usr/jars/hadoop-examples.jar wordmedian [input.txt] out
The format for submitting job is
$ bin/hadoop jar wc.jar WordCount input output
So when you are submitting any job like above, you need to mention your output directory along with input directory.
Yes, the final output of the job result will be saved in HDFS (mentioned directory)

What exactly does this Hadoop command performed in the shell?

I am absolutly new in Apache Hadoop and I am following a video course.
So I have correctly installed Hadoop 1.2.1 on a linux Ubuntu virtual machine.
After the installation the instructor perform this command in the shell:
bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
to see that Hadoop is working and that it is correctly installed.
But what exactly does this command?
This command runs a grep job defined inside hadoop examples jar file (containing map, reduce and driver code) with input folder to search for specified in input folder in hdfs while output is the folder where output after searching for patter would be in that file while dfs[a-z.]+ is a regular expression which you are saying to grep for in input.

How to execute -fs in hadoop pig

I want to get the output files from the hdfs to my local storage so i ran this code in my pig script
Fs -get user/miner/adhoc/results/mine1.txt /home/miner/jeweler/results
Unfortunately the executing the code returns error 2997: encountered ioexception
I also saw default bootup file /var/lib/hadoop-yarn/.pigbootup not found
Do i need to import something or do i need to set certain properties in my pig script?
It seems your path is incorrect which gives IOException. Root slash is missing in your path. Correct path: /user/miner/adhoc/results/mine1.txt
You can try this also:
fs -copyToLocal /user/miner/adhoc/results/mine1.txt /home/miner/jeweler/results

Error in generating Behemoth corpus

I am new to hadoop and behemoth and I followed the tutorial on https://github.com/DigitalPebble/behemoth/wiki/tutorial to generate a behemoth corpus for a text document, using the following command:
sudo bin/hadoop jar /home/madhumita/behemoth/core/target/behemoth-core-*-job.jar com.digitalpebble.behemoth.util.CorpusGenerator -i /home/madhumita/Documents/testFile -o /home/madhumita/behemoth/testGateOpCorpus
I am getting the error:
ERROR util.CorpusGenerator: Input does not exist : /home/madhumita/Documents/testFile
every time I run the command, though I have checked with gedit that the path is correct. I searched online for any similar issues, but I could not find any.
Any ideas as to why it may be happening? If .txt file format is not acceptable, what is the required file format?
Okay, I managed to solve the problem. The input path required was the path to the file on the hadoop distributed file system, not on the local machine.
So first I copied the local file to /data/test.txt on HDFS and gave this path as the input parameter. The commands are as follows:
sudo bin/hadoop fs -copyFromLocal /home/madhumita/Documents/testFile/test.txt /docs/test.txt
sudo bin/hadoop jar /home/madhumita/behemoth/core/target/behemoth-core-*-job.jar com.digitalpebble.behemoth.util.CorpusGenerator -i /docs/test.txt -o /docs/behemoth/test
This solves the issue. Thanks to everyone who tried to solve the problem.
To generate Behemoth corpus directly from local filesystem, refer it using file protocol. (file:///)
hadoop jar core/target/behemoth-core-*-job.jar com.digitalpebble.behemoth.util.CorpusGenerator -i "file:///home/madhumita/Documents/testFile/test.txt" -o "/docs/behemoth/test"

Resources