I'm new to Hadoop MapReduce. When I'm trying to run my MapReduce code using the following command:
vishal#XXXX bin/hadoop jar /user/vishal/WordCount com.WordCount.java /user/vishal/file01 /user/vishal/output.
It displays the following output:
Exception in thread "main" java.io.IOException: Error opening job jar: /user/vishal/WordCount.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:130)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:131)
at java.util.jar.JarFile.<init>(JarFile.java:150)
at java.util.jar.JarFile.<init>(JarFile.java:87)
at org.apache.hadoop.util.RunJar.main(RunJar.java:128)
How can I fix this error?
Your command is asking Hadoop to run a JAR but is specifying a directory instead.
You have also added '.java' to the class name, which is not required. (This is assuming you have written the package name, com.WordCount, correctly).
First build the jar in /user/vishal/WordCount.jar (ensure this is a local directory, not HDFS) then run the command without the '.java' at the end of the class name. Also, you put a dot at the end of the command in your question, I hope that isn't there in the real command.
bin/hadoop jar /user/vishal/WordCount.jar com.WordCount /user/vishal/file01 /user/vishal/output
See the Hadoop tutorial's 'Usage' section for more.
Related
When I tried to execute the following command to copy data from hbase to another cluster in a hbase client environment. The command I ran is:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=[destination zk]:/hbase [source table name]
I got this error:
Exception in thread "main" java.io.FileNotFoundException: File does
not exist:
hdfs://servername:8020/opt/hbase-1.2.10/lib/metrics-core-2.2.0.jar at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
The /opt/hbase-1.2.10/lib/metrics-core-2.2.0.jar is on my local path but it does not exist in the hdfs. It seems the CopyTable util is submitting a mapreduce job without the dependency jars. I read a few articles and it seems the only solution is to upload the jar lib to hdfs with the same path. This is really an ugly solution.
Please kindly advise. Thanks!
I configured a Map Reduce job to save output as a Sequence file compressed with Snappy. The MR job executes successfully however in HDFS the output file looks as the following:
I've expected that the file will have a .snappy extension and that it should be part-r-00000.snappy. And now I think that this may be the reason for the file to be not readable when I'm trying to read it from a local file system using this pattern hadoop fs -libjars /path/to/jar/myjar.jar -text /path/in/HDFS/to/my/file
So I'm getting the –libjars: Unknown command when executing the command:
hadoop fs –libjars /root/hd/metrics.jar -text /user/maria_dev/hd/output/part-r-00000
And when I'm using this command hadoop fs -text /user/maria_dev/hd/output/part-r-00000, I'm getting the error:
18/02/15 22:01:57 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
-text: Fatal internal error
java.lang.RuntimeException: java.io.IOException: WritableName can't load class: com.hd.metrics.IpMetricsWritable
Caused by: java.lang.ClassNotFoundException: Class com.hd.ipmetrics.IpMetricsWritable not found
Could it be that the absence of the .snappy extension causes the problem? What other command should I try to read the compressed file?
The jar is in my local file system /root/hd/ Where should I place it not to cause ClassNotFoundException? Or how should I modify the command?
Instead of hadoop fs –libjars (which actually has a wrong hyphen and should be -libjars. Copy that exactly, and you won't see Unknown command)
You should be using HADOOP_CLASSPATH environment variable
export HADOOP_CLASSPATH=/root/hd/metrics.jar:${HADOOP_CLASSPATH}
hadoop fs -text /user/maria_dev/hd/output/part-r-*
The error clearly says ClassNotFoundException: Class com.hd.ipmetrics.IpMetricsWritable not found.
It means that a required library is missing in classpath.
To clarify your doubts:
Map-Reduce by default output the file as part-* and there is no
meaning of extension. Remember extension "thing" is just a metadata
usually required by windows operating system to determine suitable
program for the file. It has no meaning in linux/unix and the
system's behavior is not going to change, even though you rename the
file as .snappy (you may actually try this).
The command looks absolutely fine to inspect the snappy file, but it seems that some required jar file are not there, which is causing ClassNotFoundException.
EDIT 1:
By default hadoop picks the jar files from the path emit by below command:
$ hadoop classpath
By default it list all the hadoop core jars.
You can add your jar by executing below command on the prompt
export HADOOP_CLASSPATH=/path/to/my/custom.jar
After executing this, try checking the class path again by hadoop classpath command and you should be able to see your jar listed along with hadoop core jars.
I have created the input text file test.txt and put it to HDFS as /user/yogesh/Input/test.txt
Created output path on HDFS as /user/yogesh/Output
Created the jar file on local /home/yogesh/WordCount.jar and submitted MR job from local, like that: hadoop jar /home/yogesh/WordCount.jar WordCount /user/yogesh/Input/test.txt /user/yogesh/Output/output1
I have got following error:
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException:Input path does not exist: hdfs:host/user/yogesh/WordCount.
hdfs:host/user/yogesh/ - is my HDFS directory. I am not able to understand why this MR job looking for code in HDFS and how to solve this error.
Try giving the name package of the class WordCount as its prefix, or just skip the class and use just jar, input, output, like that:
hadoop jar /home/yogesh/WordCount.jar /user/yogesh/Input /user/yogesh/Output/output1
Also, make sure that /user/yogesh/Output/output1 does not exist prior to the execution of this command. Also, notice that you should give an input directory and not an input file. Hadoop will take as input all the files in the specified directory.
For an example, see how the WordCount example is run, in this site.
I am a very begineer on Hadoop. I developed a jar and tried to execute it with command below. But I got error: Mkdirs failed to create D:...\META-INF\license
I checked all permissions and gave full access but did not work.
command: hadoop jar wiki-stats.jar example/data/stats.txt example/results/
Thanks in advance
I have school project to work with hadoop and that will be hosted in amazon EMR.
At first, I'm trying to understand with simple wordcount program and it is running fine at eclipse IDE.
But if I tried to run from command line I'm getting below error.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at counter.WordCount.main(WordCount.java:56)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method).
Do you have any suggestion for this error and any resource to understand hadoop and EMR?
Thanks,
myat
Don't run your Job from the IDE or with the java command. Instead use the hadoop script in the bin/ directory of the hadoop installation.
Example: if your Job's starting point is in the mrjob.MyJob class and you have a jar (job.jar) containing your Job class, you should run it like this:
path/to/bin/hadoop jar job.jar mrjob.MyJob inputFolder outputFolder