not able to run wordcount example in mapreduce 2.x - hadoop

I am trying to execute mapreduce word count exaple in mapreduce 2.x in java.... I had create the jar but while executing it is showing the error like WordMapper class not found in my package but I had declared that in my package.....please help me to solve the issue......
this is my WordCount driver code :
package com.mapreduce2.x;
public class WordCount {
public static void main(String args[]) throws IOException, ClassNotFoundException, InterruptedException
{
Configuration conf=new Configuration();
org.apache.hadoop.mapreduce.Job job= new org.apache.hadoop.mapreduce.Job(conf, "Word_Count");
job.setMapperClass(WordMapper.class);
job.setReducerClass(WordReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(job, new Path(args[0]));
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}}
And this is my WordMapper Class :-
public class WordMapper extends Mapper<LongWritable, Text, Text,IntWritable>{
private final static IntWritable one=new IntWritable(1);
private Text word=new Text();
public void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException
{
String line=value.toString();
StringTokenizer tokenizer=new StringTokenizer(line);
while(tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
context.write(word, one);
}
}}
WordReducer code -
public class WordReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
public void reduce(Text key, Iterator<IntWritable> values,Context context) throws IOException, InterruptedException
{
int sum =0;
while(values.hasNext())
{
sum= sum+values.next().get();
}
context.write(key, new IntWritable(sum));
}}
it is showing the folowing error while executing-
15/05/29 10:12:26 INFO mapreduce.Job: map 0% reduce 0%
15/05/29 10:12:33 INFO mapreduce.Job: Task Id : attempt_1432876892622_0005_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.mapreduce2.x.WordMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2076)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:742)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.mapreduce2.x.WordMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1982)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
... 8 more

Include the class name while running the JAR file or you can specify the main class name while creating the JAR file.
If you are running without class name then specify the class name while running the JAR.
Use the command
hadoop jar word.jar com.mapreduce2.x.WordMapper /input /output
Here word.jar is the JAR file name.
OR
you can also include the main class name while creating a jar file.
Steps:
File --> export --> JAR --> location --> then click next --> it asks for selecting main class --> select the class and click OK
After that you can run the jar file with the command
hadoop jar word.jar /input /output
Hope this solves your issue.

Try adding the commented line below
Job job = new Job(conf, "wordcount");
//job.setJarByClass(WordCount.class);
It worked for me

You can try this one:(IN Linux/Unix)
Remove the package name in the java code.
Inside the directory which contains the java program, create a new directory called classes. ex: Hadoop-Wordcount -> classes , WordCount.java
compile: javac -classpath $HADOOP_HOME/hadoop-common-2.7.1.jar:$HADOOP_HOME/hadoop-mapreduce-client-core-2.7.1.jar:$HADOOP_HOME/hadoop-annotations-2.7.1.jar:$HADOOP_HOME/commons-cli-1.2.jar -d ./classes WordCount.java
create a jar jar -cvf wordcount.jar -C ./classes/ .
5.run bin/hadoop jar $HADOOP_HOME/Hadoop-WordCount/wordcount.jar WordCount input output

Related

Hadoop mapreduce - mapping NullPointerException

I need to write a simple map-reduce program that , given as input a directed graph represented as a list of edges, produces the same graph where each edge (x,y) with x>y is replaced by (y,x) and there are no repetitions of edges in the output graph.
INPUT
1;3
2;1
0;1
3;1
2;0
1;1
2;1
OUTPUT
1;3
1;2
0;1
0;2
1;1
This is the code :
public class ExamGraph {
// mapper class
public static class MyMapper extends Mapper<LongWritable, Text, Text, NullWritable> {
#Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
value = new Text( value.toString());
String[] campi = value.toString().split(";");
if (Integer.getInteger(campi[0]) > Integer.getInteger(campi[1]))
context.write(new Text(campi[1]+";"+campi[0]), NullWritable.get());
else context.write(new Text(campi[0]+";"+campi[1]), NullWritable.get());
}
}
// reducer class
public static class MyReducer extends Reducer<Text, NullWritable, Text, NullWritable> {
#Override
protected void reduce(Text key, Iterable <NullWritable> values , Context context)
throws IOException, InterruptedException {
context.write(key, NullWritable.get());
}
}
public static void main(String[] args) throws Exception {
// create new job
Job job = Job.getInstance(new Configuration());
// job is based on jar containing this class
job.setJarByClass(ExamGraph.class);
// for logging purposes
job.setJobName("ExamGraph");
// set input path in HDFS
FileInputFormat.addInputPath(job, new Path(args[0]));
// set output path in HDFS (destination must not exist)
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// set mapper and reducer classes
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
// An InputFormat for plain text files.
// Files are broken into lines. Either linefeed or carriage-return are used
// to signal end of line. Keys are the position in the file, and values
// are the line of text.
job.setInputFormatClass(TextInputFormat.class);
// set type of output keys and values for both mappers and reducers
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
// start job
job.waitForCompletion(true);
}
}
When I run the jar file using :
hadoop jar path/jar JOBNAME /inputlocation /outputlocation
I got this error :
18/05/22 02:13:11 INFO mapreduce.Job: Task Id : attempt_1526979627085_0001_m_000000_1, Status : FAILED
Error: java.lang.NullPointerException
at ExamGraph$MyMapper.map(ExamGraph.java:38)
at ExamGraph$MyMapper.map(ExamGraph.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
But I did not find the error in the code.
Found the problem , I confused the method getInteger() with the parseInt() in the mapper.

Hadoop WordCount Tutorial java.lang.ClassNotFoundException

I'm relatively new to hadoop and I'm struggling a little bit to understand the ClassNotFoundException I get when trying to run the job. I'm using the standard tutorial found here and here is my WordCount class (running on ubuntu 16.04 hadoop 2.7.3 distributed cluster mode):
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
To try and remain organized, I added a couple paths to my ~/.bashrc file:
hduser#mynode:~$ cd $HADOOP_CODE
hduser#mynode:/usr/local/hadoop/code$
This is one directory down from the $HADOOP_HOME directory. To compile the WordCount.JAVA file, I ran:
hduser#mynode:/usr/local/hadoop$ hadoop com.sun.tools.javac.Main $HADOOP_CODE/WordCount.java
hduser#mynode:/usr/local/hadoop$ jar cf wc.jar $HADOOP_CODE/WordCount*.class
I then tried:
hduser#mynode:/usr/local/hadoop$ hadoop jar $HADOOP_CODE/wc.jar $HADOOP_CODE/WordCount /home/hduser/input /home/hduser/output/wordcount
which bombed with the following error:
Exception in thread "main" java.lang.ClassNotFoundException: /usr/local/hadoop/code/WordCount
EDIT
This gave me the same error:
hduser#mynode:/usr/local/hadoop/code$ hadoop jar $HADOOP_CODE/wc.jar WordCount /home/hduser/input /home/hduser/output/wordcount
To get it to run without error, I moved the WordCount.Java file up one directory to the default hadoop ($HADOOP_HOME) folder. I also know from here and here that the solution is to add a package to the file.
What I'm trying to understand is why that is the solution. With no package name, where is hadoop looking for the specified package, and why can't I pass it a full path to get it to run correctly? This may be a basic java question (apologies - I'm from the python world), but what is the package name doing during the compile process that makes it so I could run without a path name, but leaving off the package name means it has to be in that default directory? I'd prefer not to have to add a package name to every job I run. An explanation would be greatly appreciated!

Map-reduce job giving ClassNotFound exception even though mapper is present when running with yarn?

I am running a hadoop job which is working fine when I am running it without yarn in pseudo-distributed mode, but it is giving me class not found exception when running with yarn
16/03/24 01:43:40 INFO mapreduce.Job: Task Id : attempt_1458775953882_0002_m_000003_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.hadoop.keyword.count.ItemMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.keyword.count.ItemMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 8 more
Here is the source-code for the job
Configuration conf = new Configuration();
conf.set("keywords", args[2]);
Job job = Job.getInstance(conf, "item count");
job.setJarByClass(ItemImpl.class);
job.setMapperClass(ItemMapper.class);
job.setReducerClass(ItemReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
Here is the command I am running
hadoop jar ~/itemcount.jar /user/rohit/tweets /home/rohit/outputs/23mar-yarn13 vodka,wine,whisky
Edit Code, after suggestion
package com.hadoop.keyword.count;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import org.json.simple.parser.ParseException;
public class ItemImpl {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set("keywords", args[2]);
Job job = Job.getInstance(conf, "item count");
job.setJarByClass(ItemImpl.class);
job.setMapperClass(ItemMapper.class);
job.setReducerClass(ItemReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
public static class ItemMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
JSONParser parser = new JSONParser();
#Override
public void map(Object key, Text value, Context output) throws IOException,
InterruptedException {
JSONObject tweetObject = null;
String[] keywords = this.getKeyWords(output);
try {
tweetObject = (JSONObject) parser.parse(value.toString());
} catch (ParseException e) {
e.printStackTrace();
}
if (tweetObject != null) {
String tweetText = (String) tweetObject.get("text");
if(tweetText == null){
return;
}
tweetText = tweetText.toLowerCase();
/* StringTokenizer st = new StringTokenizer(tweetText);
ArrayList<String> tokens = new ArrayList<String>();
while (st.hasMoreTokens()) {
tokens.add(st.nextToken());
}*/
for (String keyword : keywords) {
keyword = keyword.toLowerCase();
if (tweetText.contains(keyword)) {
output.write(new Text(keyword), one);
}
}
output.write(new Text("count"), one);
}
}
String[] getKeyWords(Mapper<Object, Text, Text, IntWritable>.Context context) {
Configuration conf = (Configuration) context.getConfiguration();
String param = conf.get("keywords");
return param.split(",");
}
}
public static class ItemReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
#Override
protected void reduce(Text key, Iterable<IntWritable> values, Context output)
throws IOException, InterruptedException {
int wordCount = 0;
for (IntWritable value : values) {
wordCount += value.get();
}
output.write(key, new IntWritable(wordCount));
}
}
}
Running in full distributed mode your TaskTracker/NodeManager (the thing running your mapper) is running in a separate JVM and it sounds like your class is not making it onto that JVM's classpath.
Try using the -libjars <csv,list,of,jars> command line arg on job invocation. This will have Hadoop distribute the jar to the TaskTracker JVM and load your classes from that jar. (Note, this copies the jar out to each node in your cluster and makes it available only for that specific job. If you have common libraries that would need to be invoked for a lot of jobs, you'd want to look into using the Hadoop distributed cache.)
You may also want to try yarn -jar ... when launching your job versus hadoop -jar ... since that's the new/preferred way to launch yarn jobs.
Can you check the content of your itemcount.jar ?( jar -tvf itemcount.jar). I faced this issue once only to find that the .class was missing from the jar.
I had the same error a few days ago.
Changing map and reduce classes to static fixed my problem.
Make your map and reduce classes inner classes.
Control constructors of map and reduce classes (i/o values and override statement)
Check your jar command
old one
hadoop jar ~/itemcount.jar /user/rohit/tweets /home/rohit/outputs/23mar-yarn13 vodka,wine,whisky
new
hadoop jar ~/itemcount.jar com.hadoop.keyword.count.ItemImpl /user/rohit/tweets /home/rohit/outputs/23mar-yarn13 vodka,wine,whisky
add packageName.mainclass after you specified .jar file
Try-catch
try {
tweetObject = (JSONObject) parser.parse(value.toString());
} catch (Exception e) { **// Change ParseException to Exception if you don't only expect Parse error**
e.printStackTrace();
return; **// return from function in case of any error**
}
}
extends Configured and implement Tool
public class ItemImpl extends Configured implements Tool{
public static void main (String[] args) throws Exception{
int res =ToolRunner.run(new ItemImpl(), args);
System.exit(res);
}
#Override
public int run(String[] args) throws Exception {
Job job=Job.getInstance(getConf(),"ItemImpl ");
job.setJarByClass(this.getClass());
job.setJarByClass(ItemImpl.class);
job.setMapperClass(ItemMapper.class);
job.setReducerClass(ItemReducer.class);
job.setMapOutputKeyClass(Text.class);//probably not essential but make it certain and clear
job.setMapOutputValueClass(IntWritable.class); //probably not essential but make it certain and clear
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
add public static map
add public static reduce
I'm not an expert about this topic but This implementation is from one of my working projects. Try this if doesn't work for you I would suggest you check the libraries you added to your project.
Probably first step will solve it but
If these steps doesn't work , share the code with us.

ClassNotFoundException when running HBase map reduce job on cluster

I have been testing a map reduce job on a single node and it seems to work but now that I am trying to run it on a remote cluster I am getting a ClassNotFoundExcepton. My code is structured as follows:
public class Pivot {
public static class Mapper extends TableMapper<ImmutableBytesWritable, ImmutableBytesWritable> {
#Override
public void map(ImmutableBytesWritable rowkey, Result values, Context context) throws IOException {
(map code)
}
}
public static class Reducer extends TableReducer<ImmutableBytesWritable, ImmutableBytesWritable, ImmutableBytesWritable> {
public void reduce(ImmutableBytesWritable key, Iterable<ImmutableBytesWritable> values, Context context) throws IOException, InterruptedException {
(reduce code)
}
}
public static void main(String[] args) {
Configuration conf = HBaseConfiguration.create();
conf.set("fs.default.name", "hdfs://hadoop-master:9000");
conf.set("mapred.job.tracker", "hdfs://hadoop-master:9001");
conf.set("hbase.master", "hadoop-master:60000");
conf.set("hbase.zookeeper.quorum", "hadoop-master");
conf.set("hbase.zookeeper.property.clientPort", "2222");
Job job = new Job(conf);
job.setJobName("Pivot");
job.setJarByClass(Pivot.class);
Scan scan = new Scan();
TableMapReduceUtil.initTableMapperJob("InputTable", scan, Mapper.class, ImmutableBytesWritable.class, ImmutableBytesWritable.class, job);
TableMapReduceUtil.initTableReducerJob("OutputTable", Reducer.class, job);
job.waitForCompletion(true);
}
}
The error I am receiving when I try to run this job is the following:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Pivot$Mapper
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
...
Is there something I'm missing? Why is the job having difficulty finding the mapper?
When running a job from Eclipse it's important to note that Hadoop requires you to launch your job from a jar. Hadoop requires this so it can send your code up to HDFS / JobTracker.
In your case i imagine you haven't bundled up your job classes into a jar, and then run the program 'from the jar' - resulting in a CNFE.
Try building a jar and running from the command line using hadoop jar myjar.jar ..., once this works then you can test running from within Eclipse

error using -libjars while running map reduce job

I am trying to run a map reduce job using hadoop jar command.
I am trying to include external libraries using the -libjars option.
The command that I am running currently is
hadoop jar mapR.jar com.ms.hadoop.poc.CsvParser -libjars google-gson.jar Test1.txt output
But I am recieveing this as the output
usage: [input] [output]
Can anyone please help me out.
I have included the the exteranal libraries in my classpath as well.
Can you list the contents of your main(String args[]) method? Are you using the ToolRunner interface to launch your job? The parsing of the -libjars argument is a function of the GenericOptionsParser, which is invoked for you via the ToolRunner utility class:
public class Driver extends Configured implements Tool {
public static void main(String args[]) {
System.exit(ToolRunner.run(new Driver(), args)));
}
public int run(String args[]) {
Job job = new Job(getConf());
Configuration conf = job.getConfiguration();
// other job configuration
return job.waitForCompletion(true) ? 0 : 1;
}
}

Resources