Hadoop Custom Java Program - hadoop

I have a simple java program called putmerge that I am trying to execute. I have been at it for like 6hrs, researched many places on the web but could not find solution. Basically I try to build the jar with all class libraries with the following command:
javac -classpath *:lib/* -d playground/classes playground/src/PutMerge.java
And then I build the jar with the following command.
jar -cvf playground/putmerge.jar -C playground/classes/ .
And then I try to execute it with the following command:
bin/hadoop jar playground/putmerge.jar org.scd.putmerge "..inputPath.." "..outPath"
..
Exception in thread "main" java.lang.ClassNotFoundException: com.scd.putmerge
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I tried every permutation/combination to run this simple jar, however I always get some kind of exception as shown above.
My source code:
package org.scd.putmerge;
import java.io.IOException;
import java.util.Scanner;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
/**
*
* #author Anup V. Saumithri
*
*/
public class PutMerge
{
public static void main(String[] args) throws IOException
{
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
FileSystem local = FileSystem.getLocal(conf);
Path inputDir = new Path(args[0]);
Path hdfsFile = new Path(args[1]);
try
{
FileStatus[] inputFiles = local.listStatus(inputDir);
FSDataOutputStream out = hdfs.create(hdfsFile);
for(int i=0; i<inputFiles.length; i++)
{
System.out.println(inputFiles[i].getPath().getName());
FSDataInputStream in = local.open(inputFiles[i].getPath());
byte buffer[] = new byte[256];
int bytesRead = 0;
while((bytesRead = in.read(buffer)) > 0)
{
out.write(buffer, 0, bytesRead);
}
in.close();
}
out.close();
}
catch(IOException ex)
{
ex.printStackTrace();
}
}
}

The way you are putting your PutMerge class inside the jar may be a little incorrect.
If you do a jar tf putmerge.jar, you must see the PutMerge class inside the path mentioned in your package (org.scd.putmerge) in your code (i.e. org/scd/putmerge).
If not try doing the following to achieve that. Make sure you have copied PutMerge.class inside org/scd/putmerge/ directory.
jar -cvf playground/putmerge.jar org/scd/putmerge/PutMerge.class
Next, verify again with jar tf putmerge.jar to check if now see org/scd/putmerge/PutMerge.class in the output.
If everything's fine, you can try to run the hadoop jar again. But looking at the errors, I see that you haven't actually included the PutMerge class with the package. You should use org.scd.putmerge.PutMerge. So, the correct way should be something like --
bin/hadoop jar playground/putmerge.jar org.scd.putmerge.PutMerge "..inputPath.." "..outPath"

Related

spark job fails in yarn cluster

I have a spark job which runs without any issue in spark-shell. I am currently trying to submit this job to yarn using spark's api.
I am using the below class to run a spark job
import java.util.ResourceBundle;
import org.apache.hadoop.conf.Configuration;
import org.apache.spark.SparkConf;
import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
public class SubmitSparkJobToYARNFromJavaCode {
public static void main(String[] arguments) throws Exception {
ResourceBundle bundle = ResourceBundle.getBundle("device_compare");
String accessKey = bundle.getString("accessKey");
String secretKey = bundle.getString("secretKey");
String[] args = new String[] {
// path to my application's JAR file
// required in yarn-cluster mode
"--jar",
"my_s3_path_to_jar",
// name of my application's main class (required)
"--class", "com.abc.SampleIdCount",
// comma separated list of local jars that want
// SparkContext.addJar to work with
// "--addJars", arguments[1]
};
// create a Hadoop Configuration object
Configuration config = new Configuration();
// identify that I will be using Spark as YARN mode
System.setProperty("SPARK_YARN_MODE", "true");
System.setProperty("spark.local.dir", "/tmp");
// create an instance of SparkConf object
SparkConf sparkConf = new SparkConf();
sparkConf.set("fs.s3n.awsAccessKeyId", accessKey);
sparkConf.set("fs.s3n.awsSecretAccessKey", secretKey);
sparkConf.set("spark.local.dir", "/tmp");
// create ClientArguments, which will be passed to Client
ClientArguments cArgs = new ClientArguments(args);
// create an instance of yarn Client client
Client client = new Client(cArgs, config, sparkConf);
// submit Spark job to YARN
client.run();
}
}
This is the spark job I am trying to run
package com.abc;
import java.util.ResourceBundle;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
public class SampleIdCount {
private static String accessKey;
private static String secretKey;
public SampleIdCount() {
ResourceBundle bundle = ResourceBundle.getBundle("device_compare");
accessKey = bundle.getString("accessKey");
secretKey = bundle.getString("secretKey");
}
public static void main(String[] args) {
System.out.println("Started execution");
SampleIdCount sample = new SampleIdCount();
System.setProperty("SPARK_YARN_MODE", "true");
System.setProperty("spark.local.dir", "/tmp");
SparkConf conf = new SparkConf();
{
conf = new SparkConf().setAppName("SampleIdCount").setMaster("yarn-cluster");
}
JavaSparkContext sc = new JavaSparkContext(conf);
sc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", accessKey);
sc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", secretKey);
JavaRDD<String> installedDeviceIdsRDD = sc.emptyRDD();
installedDeviceIdsRDD = sc.textFile("my_s3_input_path");
installedDeviceIdsRDD.saveAsTextFile("my_s3_output_path");
sc.close();
}
}
When I run my java code , the spark job is being submitted to yarn but the issue is I face the below error
Diagnostics: File file:/mnt/tmp/spark-1b86d806-5c8f-4ae6-a486-7b68d46c759a/__spark_libs__8257948728364304288.zip does not exist
java.io.FileNotFoundException: File file:/mnt/tmp/spark-1b86d806-5c8f-4ae6-a486-7b68d46c759a/__spark_libs__8257948728364304288.zip does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:616)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:829)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:606)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:431)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The problem I thought was that the folder /mnt was not available in slave nodes so I tried to change spark local directory to /tmp by doing the following things
I set System environment variable to "/tmp" in my Java code
I set a system env variable by doing export LOCAL_DIRS=/tmp
I set a system env variable by doing export SPARK_LOCAL_DIRS=/tmp
None of these had any effect and I still face the very same error. None of the suggestions in other links helped me either. I am really stuck in this. Any help would be much appreciated. Thanks in advance. Cheers!

how add hadoop jars to classpath?

Hadoop 2.7.3 on my mac is installed at:
/usr/local/Cellar/hadoop/2.7.3
I write a demo to read file from HDFS using java:
import java.io.*;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
public class HDFSTest{
public static void main(String[] args) throws IOException, URISyntaxException{
String file= "hdfs://localhost:9000/hw1/customer.tbl";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(file), conf);
Path path = new Path(file);
FSDataInputStream in_stream = fs.open(path);
BufferedReader in = new BufferedReader(new
InputStreamReader(in_stream));
String s;
while ((s=in.readLine())!=null) {
System.out.println(s);
}
in.close();
fs.close();
}
}
When I compile the java file ,error as shown blow:
hero:Documents yaopan$ javac HDFSTest.java
HDFSTest.java:8: error: package org.apache.hadoop.conf does not exist
import org.apache.hadoop.conf.Configuration;
^
HDFSTest.java:10: error: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.FSDataInputStream;
^
HDFSTest.java:12: error: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.FSDataOutputStream;
^
HDFSTest.java:14: error: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.FileSystem;
I know the reason is can not find hadoop jars,how to configure that?
^
Locate a jar file named "hadoop-common-2.7.3.jar" under your installation (i.e /usr/local/Cellar/hadoop/2.7.3) and set it in classpath or give it directly in the command line along with javac.
javac -cp "/PATH/hadoop-common-2.7.3.jar" HDFSTest.java
(replace PATH with appropriate path)
Just add hadoop jars to classpath:
I install hbase using homebrew on /usr/local/Cellar/hbase/1.2.2,
add all jars under /usr/local/Cellar/hbase/1.2.2/libexec/lib to classpath:
1.edit .bash_profile
sudo vim ~/.bash_profile
2.add classpath
#set hbase lib path
export CLASSPATH=$CLASSPATH://usr/local/Cellar/hbase/1.2.2/libexec/lib/*
save and exit
wq

Getting trouble on running hadoop word count program

I am trying to run the word count program given in puma benchmark
The WordCount.java file is as follows:
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
List<String> other_args = new ArrayList<String>();
for(int i=0; i < args.length; ++i) {
try {
if ("-r".equals(args[i])) {
job.setNumReduceTasks(Integer.parseInt(args[++i]));
} else {
other_args.add(args[i]);
}
} catch (NumberFormatException except) {
System.out.println("ERROR: Integer expected instead of " + args[i]);
System.err.println("Usage: wordcount <numReduces> <in> <out>");
System.exit(2);
} catch (ArrayIndexOutOfBoundsException except) {
System.out.println("ERROR: Required parameter missing from " +
args[i-1]);
System.err.println("Usage: wordcount <numReduces> <in> <out>");
System.exit(2);
}
}
// Make sure there are exactly 2 parameters left.
if (other_args.size() != 2) {
System.out.println("ERROR: Wrong number of parameters: " +
other_args.size() + " instead of 2.");
System.err.println("Usage: wordcount <numReduces> <in> <out>");
System.exit(2);
}
FileInputFormat.addInputPath(job, new Path(other_args.get(0)));
FileOutputFormat.setOutputPath(job, new Path(other_args.get(1)));
Date startIteration = new Date();
Boolean waitforCompletion = job.waitForCompletion(true) ;
Date endIteration = new Date();
System.out.println("The iteration took "
+ (endIteration.getTime() - startIteration.getTime()) / 1000
+ " seconds.");
System.exit(waitforCompletion ? 0 : 1);
}
}
I used the following commands and got the following result:
#javac -cp /opt/local/share/java/hadoop-1.2.1/hadoop-core-1.2.1.jar -d wordcount_classes WordCount.java
#jar -cvf wordcount.jar -C wordcount_classes/ .
and output that i got is:
added manifest
adding: org/(in = 0) (out= 0)(stored 0%)
adding: org/apache/(in = 0) (out= 0)(stored 0%)
adding: org/apache/hadoop/(in = 0) (out= 0)(stored 0%)
adding: org/apache/hadoop/examples/(in = 0) (out= 0)(stored 0%)
adding: org/apache/hadoop/examples/WordCount$IntSumReducer.class(in = 1793) (out= 750)(deflated 58%)
adding: org/apache/hadoop/examples/WordCount$TokenizerMapper.class(in = 1790) (out= 764)(deflated 57%)
adding: org/apache/hadoop/examples/WordCount.class(in = 3131) (out= 1682)(deflated 46%)
adding: org/myorg/(in = 0) (out= 0)(stored 0%)
adding: org/myorg/WordCount$IntSumReducer.class(in = 1759) (out= 745)(deflated 57%)
adding: org/myorg/WordCount$TokenizerMapper.class(in = 1756) (out= 759)(deflated 56%)
adding: org/myorg/WordCount.class(in = 3080) (out= 1676)(deflated 45%)
#hadoop jar wordcount.jar WordCount ../input/file01.txt ../output/
I got the following output:
Exception in thread "main" java.lang.NoClassDefFoundError: WordCount (wrong name: org/apache/hadoop/examples/WordCount)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
I applied all the procedure described before in this site but nothing is working for me.
I would be very thankful if anyone tells me how to solve this problem.
Change the package statement to
package org.myorg;
And run the program with the full class name.
Looking at your output, you seem to include the WordCount class twice in different paths (= packages), but when you run the program, you don't specify any package.
hadoop jar wordcount.jar org/apache/hadoop/examples/WordCount ../input/file01.txt ../output/
I think the problem is there, because you are not using the full class name.
your wordcount.jar has two Wordount classes specify the class with qualifier which one you want to run.
e.g
hadoop jar wordcount.jar org.apache.hadoop.examples.WordCount ../input/file01.txt ../output/
or
hadoop jar wordcount.jar org.myorg.WordCount ../input/file01.txt ../output/
Your WordCount class has got two nested classes inside itself, i.e.: TokenizerMapper and IntSumReducer.
You need to make sure that these classes are included in the jar file you are generating. try this:
jar cvf WordCount.jar WordCount.class WordCount\$TokenizerMapper.class WordCount\$IntSumReducer.class

Override hadoop

I'm running an EMR Activity inside a Data Pipeline analyzing log files and I get the following error when my Pipeline fails:
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://10.211.146.177:9000/home/hadoop/temp-output-s3copy-2013-05-24-00 already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:905)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:905)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:879)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1316)
at com.valtira.datapipeline.stream.CloudFrontStreamLogProcessors.main(CloudFrontStreamLogProcessors.java:216)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
I've tried delete that folder by adding:
FileSystem fs = FileSystem.get(getConf());
fs.delete(new Path("path/to/file"), true); // delete file, true for recursive
but it does not work. Is there a way to override the FileOutputFormat method from Hadoop in java? Is there a way to ignore this error in java?
The path to the file to be deleted changes as the output directory uses date for naming.
There are 2 ways to delete it:
Over shell, try this:
hadoop dfs -rmr hdfs://127.0.0.1:9000/home/hadoop/temp-output-s3copy-*
To do it via Java code:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;
import org.mortbay.log.Log;
public class FSDeletion {
public static void main(String[] args) {
try {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
String fsName = conf.get("fs.default.name", "localhost:9000");
String baseDir = "/home/hadoop/";
String outputDirPattern = fsName + baseDir + "temp-output-s3copy-";
Path[] paths = new Path[1];
paths[0] = new Path(baseDir);
FileStatus[] status = fs.listStatus(paths);
Path[] listedPaths = FileUtil.stat2Paths(status);
for (Path p : listedPaths) {
if (p.toString().startsWith(outputDirPattern)) {
Log.info("Attempting to delete : " + p);
boolean result = fs.delete(p, true);
Log.info("Deleted ? : " + result);
}
}
fs.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

How to use Hadoop FS API inside Storm Bolt in java

I want to store the data in hdfs which is emitted by Storm Spout. I have added hadoop FS API code in Bolt Class, but It is throwing compilation error with storm.
Following is the Storm bolt Class :
package bolts;
import java.io.*;
import java.util.*;
import java.net.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class DataNormalizer extends BaseBasicBolt {
public void execute(Tuple input, BasicOutputCollector collector) {
String sentence = input.getString(0);
String[] process = sentence.split(" ");
int n = 1;
String rec = "";
try {
String filepath = "/root/data/top_output.csv";
String dest = "hdfs://localhost:9000/user/root/nishu/top_output/top_output_1.csv";
Configuration conf = new Configuration();
FileSystem fileSystem = FileSystem.get(conf);
System.out.println(fileSystem);
Path srcPath = new Path(source);
Path dstPath = new Path(dest);
String filename = source.substring(source.lastIndexOf('/') + 1,
source.length());
try {
if (!(fileSystem.exists(dstPath))) {
FSDataOutputStream out = fileSystem.create(dstPath, true);
InputStream in = new BufferedInputStream(
new FileInputStream(new File(source)));
byte[] b = new byte[1024];
int numBytes = 0;
while ((numBytes = in.read(b)) > 0) {
out.write(b, 0, numBytes);
}
in.close();
out.close();
} else {
fileSystem.copyFromLocalFile(srcPath, dstPath);
}
} catch (Exception e) {
System.err.println("Exception caught! :" + e);
System.exit(1);
} finally {
fileSystem.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
I have added hadoop jars in CLASSPATH also..
Following is the value of classpath :
$STORM_HOME/storm-0.8.1.jar:$JAVA_HOME/lib/:$HADOOP_HOME/hadoop-core-1.0.4.jar:$HADOOP_HOME/lib/:$STORM_HOME/lib/
Also copied hadoop libraries : hadoop-cor-1.0.4.jar, commons-collection-3.2.1.jar and commons-cli-1.2.jar in Storm/lib directory.
When I am building this project, It is throwing following error :
3006 [Thread-16] ERROR backtype.storm.daemon.executor -
java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:37)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:34)
at org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:216)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:184)
at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:236)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:466)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1494)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1395)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at bolts.DataNormalizer.execute(DataNormalizer.java:67)
at backtype.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:32)
......................
The error message tells you that Apache commons configuration is missing. You have to add it to the classpath.
More generally, you should add all Hadoop dependencies to your classpath. You can find them using a dependency manager (Maven, Ivy, Gradle etc.) or look into /usr/lib/hadoop/lib on a machine on which Hadoop is installed.

Resources