Example Jar in Hadoop release - hadoop

I am learning Hadoop with book 'Hadoop in Action' by Chuck Lam. In first chapter the books says that Hadoop installation will have example jar and by running 'hadoop jar hadoop-*-examples.jar' will show all the examples. But when I run the command then it throw error 'Could not find or load main class org.apache.hadoop.util.RunJar'. My guess is that installed Hadoop doesn't have example jar. I have installed 'hadoop-2.1.0-beta.tar.gz' on cygwin on Win 7 laptop. Please suggest how to get example jar.

run following command
hadoop jar PathToYourJarFile wordcount inputPath OutputPath
you can get examples jar file at your hadoop installation directory

What I can suggest here is you should manually go to the Hadoop installation directory and look for a jar name similar to hadoop-examples.jar yourself. Different distribution can have different names for the jar.
If you are in Cygwin, while in the Hadoop Installation directory you can also do a ls *examples*.jar to find the same, narrowing down the file listing to any jar file containing examples as a string.
You can then directly use the jar file name like --
hadoop jar <exampleJarYourFound.jar>
Hope this takes you to a solution.

Related

Is maven JAR runnable on hadoop?

To produce jar from hadoop mapreduce program(mapreduce wordcount example) i used maven.
Here i successfully done 'clean' and 'install'.
Also 'build' successfully by running as a Java Application by including arguments(input and output).
And it provided expected result successfully.
Now the problem is not running on hadoop.
Giving the following error:
Exception in thread "main" java.lang.ClassNotFoundException: WordCount
Is maven JAR runnable on hadoop?
Maven is a build tool which creates a Java Artifact. Any JAR containing the hadoop dependencies and the class having main() method in the Manifest file should be working with the hadoop.
Try running your JAR using the below command
hadoop jar your-jar.jar wordcount input output
where "wordcount" is the name of the class with main method,"input" and "output" are the arguments.
Two things
1) I think you are missing the package details before the classname. Copy your package name and put it before the classname and it should work.
hadoop jar /home/user/examples.jar com.test.examples.WordCount /home/user/inputfolder /home/user/outputfolder
PS: If you are using a jar for which the source code is not available with you, you can do
jar -tvf /home/user/examples.jar
and it will print all the classses with their folder names. Replace the "/" with "." (dot) and you get the package name. But this needs JDK (not JRE) in the PATH.
2) You are trying to run a MapReduce program from a Windows prompt. Are you sure you have Hadoop installed on your Windows?

Running a hadoop job

It is the first time I'm running a job on hadoop and started from WordCount example. To run my job, I', using this command
hduser#ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop*examples*.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
and I think we should copy the jar file in /usr/local/hadoop . My first question is that what is the meaning of hadoop*examples*? and if we want to locate our jar file in another location for example /home/user/WordCountJar, what I should do? Thanks for your help in advance.
I think we should copy the jar file in /usr/local/hadoop
It is not mandatory. But if you have your jar at some other location, you need to specify the complete path while running your job.
My first question is that what is the meaning of hadoop*examples*?
hadoop*examples* is the name of your jar package that contains your MR job along with other dependencies. Here, * signifies that it can be any version. Not specifically 0.19.2 or something else. But, I feel it should be hadoop-examples-*.jar and not hadoop*examples*.jar
and if we want to locate our jar file in another location for example
/home/user/WordCountJar, what I should do?
If your jar is present in a directory other than the directory from where you are executing the command, you need to specify the complete path to your jar. Say,
bin/hadoop jar /home/user/WordCountJar/hadoop-*-examples.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
The examples is just wildcard expansion to account for different version numbers in the file name. For example: hadoop-0.19.2-examples.jar
You can use the full path to your jar like so:
bin/hadoop jar /home/user/hadoop-0.19.2-examples.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
Edit: the asterisks surrounding the word examples got removed from my post at time of submission.

Unable to execute Map/Reduce job

I've been trying to figure out how execute my Map/Reduce job for almost 2 days now. I keep getting a ClassNotFound exception.
I've installed a Hadoop cluster in Ubuntu using Cloudera CDH4.3.0. The .java file (DemoJob.java which is not inside any package) is inside a folder called inputs and all required jar files are inside inputs/lib.
I followed http://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_topic_5_2.html for reference.
I compile the .java file using:
javac -cp "inputs/lib/hadoop-common.jar:inputs/lib/hadoop-map-reduce-core.jar" -d Demo inputs/DemoJob.java
(In the link, it says -cp should be "/usr/lib/hadoop/:/usr/lib/hadoop/client-0.20/". But I don't have those folders in my system at all)
Create jar file using:
jar cvf Demo.jar Demo
Move 2 input files to HDFS
(Now this is where I'm confused. Do I need to move the jar file to HDFS as well? It doesn't say so in the link. But if it is not in HDFS, then how does the hadoop jar .. command work? I mean how does it combine the jar file which is in Linux system and the input files which are in HDFS?)
I run my code using:
hadoop jar Demo.jar DemoJob /Inputs/Text1.txt /Inputs/Text2.txt /Outputs
I keep getting ClassNotFoundException : DemoJob.
Somebody please help.
The class not found exception only means that some class wasn't found when class DemoJob was loaded. The missing class could have been a class referenced (imported, for example) by DemoJob. I think the problem is that you don't have the /usr/lib/hadoop/:/usr/lib/hadoop/client-0.20/ folders (classes) in your class path. It's the classes that should be there but aren't that probably are triggering the class not found exception.
Finally figured out what the problem was. Instead of creating a jar file from a folder, I directly created the jar file from the .class files using jar -cvf Demo.jar *.class
This resolved the ClassNotFound error. But I don't understand why it was not working earlier. Even when I created the jar file from a folder, I did mention the folder name when executing the class file as:hadoop jar Demo.jar Demo.DemoJob /Inputs/Text1.txt /Inputs/Text2.txt /Outputs

How to run GIS codes through hadoop's prompt?

I am running a GIS code through hadoop's prompt in following manner:
Wrote the GIS code in Eclipse including all the GIS jars (relevant).
Went into the dir. where my eclipse workspace is.
Compiled the code by adding all the relevant jars in the classpath. *(The compilation was successful).
Built the jar.
Now running the same jar using hadoop: bin/hadoop jar my_jar_file_name.jar my_pkg_structure.Main_class_file
Now, inspite of the code being error free, when i try to execute through hadoop's propmpt, it gives me multiple issues.
Is there a workable alternative way to do the same without any hassles?
Also note, the gid code runs beautifully in eclipse. Since, I have to do Geo processing over hadoop, I need to run it through hadoop's prompt.

java.io.IOException: error=2, No such file or directory eroor in Hadoop streaming

Please help with the "-file" option issue of hadoop streaming (mentioned in the link below). just to update, I know that the jar is already there, I am trying this after I tried hadoop-streaming for a different class file which failed, so to identify if there is something wrong with the class file itself or with the way I am using it. if you need the stderr file please let me know.
Problem with Hadoop Streaming -file option for Java class files.
you can't really use -file to send over jars as hadoop doesn't support multiple jars (that were not already in the CLASSPATH), check the streaming docs:
At least as late as version 0.14, Hadoop does not support multiple jar files. So, when specifying your own custom classes you will have to pack them along with the streaming jar and use the custom jar instead of the default hadoop streaming jar.
To add more than one jar file to the CLASSPATH.. you could use the -libjars options as specified in the hadoop tutorial (search for the word "libjar" on the page).

Resources