Unable to Configure Number of Reducers In WordCount Job in hadoop - hadoop

I m using Single Node Cluster - Hadoop-2.7.0 in my Linum Machine.
My code for WordCount Job is running fine with 1 reducer.
But Not working fine if i increase the reducers.
It is showing the following error:
15/05/25 21:15:10 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/05/25 21:15:10 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/05/25 21:15:10 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/05/25 21:15:10 WARN snappy.LoadSnappy: Snappy native library is available
15/05/25 21:15:10 INFO snappy.LoadSnappy: Snappy native library loaded
15/05/25 21:15:10 INFO mapred.FileInputFormat: Total input paths to process : 1
15/05/25 21:15:10 INFO mapred.JobClient: Running job: job_local_0001
15/05/25 21:15:11 INFO util.ProcessTree: setsid exited with exit code 0
15/05/25 21:15:11 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#5f1fd699
15/05/25 21:15:11 INFO mapred.MapTask: numReduceTasks: 1
15/05/25 21:15:11 INFO mapred.MapTask: io.sort.mb = 100
15/05/25 21:15:11 INFO mapred.MapTask: data buffer = 79691776/99614720
15/05/25 21:15:11 INFO mapred.MapTask: record buffer = 262144/327680
15/05/25 21:15:11 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: Illegal partition for am (1)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
at WordMapper.map(WordMapper.java:24)
at WordMapper.map(WordMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
My getPartition Method Looks like this:
public int getPartition(Text key, IntWritable value, int numRedTasks) {
String s = key.toString();
if(s.length() == 1)
{
return 0;
}
else if(s.length() == 2)
{
return 1;
}
else if(s.length() == 3)
{
return 2;
}
else
return 3;
}
Run Method in WordCount.class File:
if(input.length < 2)
{
System.out.println("Please provide valid input");
return -1;
}
else
{
JobConf config = new JobConf();
FileInputFormat.setInputPaths(config, new Path(input[0]));
FileOutputFormat.setOutputPath(config, new Path(input[1]));
config.setMapperClass(WordMapper.class);
config.setReducerClass(WordReducer.class);
config.setNumReduceTasks(4);
config.setPartitionerClass(MyPartitioner.class);
config.setMapOutputKeyClass(Text.class);
config.setMapOutputValueClass(IntWritable.class);
config.setOutputKeyClass(Text.class);
config.setOutputValueClass(IntWritable.class);
JobClient.runJob(config);
}
return 0;
}
My Mapper and Reducer Code is fine because Wordcount Job with 1 reducer is running fine.
Any One able to figure it out?

This may be due to pig fails in the operation due to high default_parallel could be set in it.
Thanks,
Shailesh.

You need to use tooRunner in your driver class and invoke the toolrunner in your main class. You can do this by using combiner as part of workflow. Below is the driver class code: As you can see from the code below, along with the mapper and reducer calls, there is a combiner call as well. And the exit code in the main runner is " int exitCode = ToolRunner.run(new Configuration(), new WordCountWithCombiner(), args);" which invokes tool runner at run time and you can specify the number of reducers or mappers you would like to use by using the "-D" option when running the wordcount program. A sample command line would look like "-D mapred.reduce.tasks =2 input output"
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
public class WordCountWithCombiner extends Configured
implements Tool{
#Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
Job job = new Job(conf, "MyJob");
job.setJarByClass(WordCount.class);
job.setJobName("Word Count With Combiners");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new Configuration(), new WordCountWithCombiner(), args);
System.exit(exitCode);
}
}

Related

Why is map reduce job poinitng to localhost:8080?

I am working with Map Reduce job and executing it using ToolRunner's run method.
Here is my code:
public class MaxTemperature extends Configured implements Tool {
public static void main(String[] args) throws Exception {
System.setProperty("hadoop.home.dir", "/");
int exitCode = ToolRunner.run(new MaxTemperature(), args);
System.exit(exitCode);
}
#Override
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path> <output path>");
System.exit(-1);
}
System.out.println("Starting job");
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
int returnValue = job.waitForCompletion(true) ? 0:1;
if(job.isSuccessful()) {
System.out.println("Job was successful");
} else if(!job.isSuccessful()) {
System.out.println("Job was not successful");
}
return returnValue;
}
}
The job executed well as expected. But when i looked into the logs which displays the information abou the job tracking, I found that the Map reduce is pointing to localhost:8080 for the tracking of the job.
Here is the snapshot of logs:
20521 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
20670 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local1454583076_0001
20713 [main] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-KV/mapred/staging/KV1454583076/.staging/job_local1454583076_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
20716 [main] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-KV/mapred/staging/KV1454583076/.staging/job_local1454583076_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
20818 [main] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-KV/mapred/local/localRunner/KV/job_local1454583076_0001/job_local1454583076_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
20820 [main] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-KV/mapred/local/localRunner/KV/job_local1454583076_0001/job_local1454583076_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
**20826 [main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/**
20827 [main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_local1454583076_0001
20829 [Thread-10] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null
So my question is why is map reduce pointing to localhost:8080
The url to track the job: http://localhost:8080/
There is no configuration file or properties file where i manually set this. Also, is it possible that i can change it to some other port? If yes, how can i achieve this?
So the ports are configured in yarn-site.xml : yarn-site.xml
Check : yarn.resourcemanager.webapp.address
We need to change the default configuration and create a Configuration object and set the properties to this configuration object and then create a Job object using this Configuration as follows:
Configuration configuration = getConf();
//configuration.set("fs.defaultFS", "hdfs://192.**.***.2**");
//configuration.set("mapred.job.tracker", "jobtracker:jtPort");
configuration.set("mapreduce.jobtracker.address", "localhost:54311");
configuration.set("mapreduce.framework.name", "yarn");
configuration.set("yarn.resourcemanager.address", "127.0.0.1:8032");
//configuration.set("yarn.resourcemanager.webapp.address", "127.0.0.1:8032");
//Initialize the Hadoop job and set the jar as well as the name of the Job
Job job = new Job(configuration);

NullPointerException in Map Reduce job

I am trying to do bulk upload into Hbase using java api .
When calling Mapper class i am getting following exception .
This i found while debugging my driver code.This error comes when debugger is trying to hit mapper code .
My Hfile is created but its not able to load into Hbase
16/08/10 04:09:56 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#7363c839
16/08/10 04:09:56 INFO mapred.MapTask: Processing split: file:/home/cloudera/su.txt:0+50
16/08/10 04:09:56 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/08/10 04:09:56 INFO mapred.MapTask: io.sort.mb = 100
16/08/10 04:09:57 INFO mapred.MapTask: data buffer = 79691776/99614720
16/08/10 04:09:57 INFO mapred.MapTask: record buffer = 262144/327680
16/08/10 04:09:57 INFO mapred.LocalJobRunner: Map task executor complete.
16/08/10 04:09:57 WARN mapred.LocalJobRunner: job_local930363008_0001
java.lang.Exception: java.lang.NullPointerException
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:843)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:85)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:584)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/08/10 04:09:57 INFO mapred.JobClient: map 0% reduce 0%
16/08/10 04:09:57 INFO mapred.JobClient: Job complete: job_local930363008_0001
16/08/10 04:09:57 INFO mapred.JobClient: Counters: 0
This is my code to do that operation
package com.sample.bulkload.hbase;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class HBaseBulkLoad {
public static class BulkLoadMap extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] values = value.toString().split(",");
String rowKey = values[0];
// ImmutableBytesWritable HKey = new
// ImmutableBytesWritable(put.getRow());
// context.write(HKey, put);
System.out.println("Entered into Mapper Method");
Put HPut = new Put(Bytes.toBytes(rowKey));
HPut.add(Bytes.toBytes("personalDetails"), Bytes.toBytes("first_name"), Bytes.toBytes(values[1]));
HPut.add(Bytes.toBytes("personalDetails"), Bytes.toBytes("last_name"), Bytes.toBytes(values[2]));
HPut.add(Bytes.toBytes("contactDetails"), Bytes.toBytes("email"), Bytes.toBytes(values[3]));
HPut.add(Bytes.toBytes("contactDetails"), Bytes.toBytes("city"), Bytes.toBytes(values[4]));
context.write(new ImmutableBytesWritable(Bytes.toBytes(rowKey)), HPut);
System.out.println("Written into Context");
}
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "localhost");
conf.set("hbase.zookeeper.property.clientport", "2181");
Job job = new Job(conf, "HBase_Bulk_loader");
HTable hTable = new HTable(conf, args[2]);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Put.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(HFileOutputFormat.class);
job.setJarByClass(HBaseBulkLoad.class);
job.setMapperClass(HBaseBulkLoad.BulkLoadMap.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
HFileOutputFormat.configureIncrementalLoad(job, hTable);
job.waitForCompletion(true);
}
}
Mapper output key and value class needs to extend from Writable interface

Hadoop - Mappers not emitting anything

I'm running the code below and no output is generated (well, the output folder and the reducer output file are created, but there is nothing wihtin the part-r-00000 file). From the logs, I suspect the mappers are not emitting anything.
The code:
package com.telefonica.iot.tidoop.mrlib;
import com.telefonica.iot.tidoop.mrlib.utils.Constants;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.log4j.Logger;
public class Count extends Configured implements Tool {
private static final Logger LOGGER = Logger.getLogger(Count.class);
public static class UnitEmitter extends Mapper<Object, Text, Text, LongWritable> {
private final Text commonKey = new Text("common-key");
#Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
context.write(commonKey, new LongWritable(1));
} // map
} // UnitEmitter
public static class Adder extends Reducer<Text, LongWritable, Text, LongWritable> {
#Override
public void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
long sum = 0;
for (LongWritable value : values) {
sum += value.get();
} // for
context.write(key, new LongWritable(sum));
} // reduce
} // Adder
public static class AdderWithTag extends Reducer<Text, LongWritable, Text, LongWritable> {
private String tag;
#Override
public void setup(Context context) throws IOException, InterruptedException {
tag = context.getConfiguration().get(Constants.PARAM_TAG, "");
} // setup
#Override
public void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
long sum = 0;
for (LongWritable value : values) {
sum += value.get();
} // for
context.write(new Text(tag), new LongWritable(sum));
} // reduce
} // AdderWithTag
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new Filter(), args);
System.exit(res);
} // main
#Override
public int run(String[] args) throws Exception {
// check the number of arguments, show the usage if it is wrong
if (args.length != 3) {
showUsage();
return -1;
} // if
// get the arguments
String input = args[0];
String output = args[1];
String tag = args[2];
// create and configure a MapReduce job
Configuration conf = this.getConf();
conf.set(Constants.PARAM_TAG, tag);
Job job = Job.getInstance(conf, "tidoop-mr-lib-count");
job.setNumReduceTasks(1);
job.setJarByClass(Count.class);
job.setMapperClass(UnitEmitter.class);
job.setCombinerClass(Adder.class);
job.setReducerClass(AdderWithTag.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(input));
FileOutputFormat.setOutputPath(job, new Path(output));
// run the MapReduce job
return job.waitForCompletion(true) ? 0 : 1;
} // main
private void showUsage() {
System.out.println("...");
} // showUsage
} // Count
The command executed, and the output logs:
$ hadoop jar target/tidoop-mr-lib-0.0.0-SNAPSHOT-jar-with-dependencies.jar com.telefonica.iot.tidoop.mrlib.Count -libjars target/tidoop-mr-lib-0.0.0-SNAPSHOT-jar-with-dependencies.jar tidoop/numbers tidoop/numbers_count onetag
15/11/05 17:24:52 INFO input.FileInputFormat: Total input paths to process : 1
15/11/05 17:24:52 WARN snappy.LoadSnappy: Snappy native library is available
15/11/05 17:24:53 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/11/05 17:24:53 INFO snappy.LoadSnappy: Snappy native library loaded
15/11/05 17:24:53 INFO mapred.JobClient: Running job: job_201507101501_23002
15/11/05 17:24:54 INFO mapred.JobClient: map 0% reduce 0%
15/11/05 17:25:00 INFO mapred.JobClient: map 100% reduce 0%
15/11/05 17:25:07 INFO mapred.JobClient: map 100% reduce 33%
15/11/05 17:25:08 INFO mapred.JobClient: map 100% reduce 100%
15/11/05 17:25:09 INFO mapred.JobClient: Job complete: job_201507101501_23002
15/11/05 17:25:09 INFO mapred.JobClient: Counters: 25
15/11/05 17:25:09 INFO mapred.JobClient: Job Counters
15/11/05 17:25:09 INFO mapred.JobClient: Launched reduce tasks=1
15/11/05 17:25:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5350
15/11/05 17:25:09 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/11/05 17:25:09 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/11/05 17:25:09 INFO mapred.JobClient: Rack-local map tasks=1
15/11/05 17:25:09 INFO mapred.JobClient: Launched map tasks=1
15/11/05 17:25:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8702
15/11/05 17:25:09 INFO mapred.JobClient: FileSystemCounters
15/11/05 17:25:09 INFO mapred.JobClient: FILE_BYTES_READ=6
15/11/05 17:25:09 INFO mapred.JobClient: HDFS_BYTES_READ=1968928
15/11/05 17:25:09 INFO mapred.JobClient: FILE_BYTES_WRITTEN=108226
15/11/05 17:25:09 INFO mapred.JobClient: Map-Reduce Framework
15/11/05 17:25:09 INFO mapred.JobClient: Map input records=598001
15/11/05 17:25:09 INFO mapred.JobClient: Reduce shuffle bytes=6
15/11/05 17:25:09 INFO mapred.JobClient: Spilled Records=0
15/11/05 17:25:09 INFO mapred.JobClient: Map output bytes=0
15/11/05 17:25:09 INFO mapred.JobClient: CPU time spent (ms)=2920
15/11/05 17:25:09 INFO mapred.JobClient: Total committed heap usage (bytes)=355663872
15/11/05 17:25:09 INFO mapred.JobClient: Combine input records=0
15/11/05 17:25:09 INFO mapred.JobClient: SPLIT_RAW_BYTES=124
15/11/05 17:25:09 INFO mapred.JobClient: Reduce input records=0
15/11/05 17:25:09 INFO mapred.JobClient: Reduce input groups=0
15/11/05 17:25:09 INFO mapred.JobClient: Combine output records=0
15/11/05 17:25:09 INFO mapred.JobClient: Physical memory (bytes) snapshot=328683520
15/11/05 17:25:09 INFO mapred.JobClient: Reduce output records=0
15/11/05 17:25:09 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1466642432
15/11/05 17:25:09 INFO mapred.JobClient: Map output records=0
The content of the output file:
$ hadoop fs -cat /user/frb/tidoop/numbers_count/part-r-00000
[frb#cosmosmaster-gi tidoop-mr-lib]$ hadoop fs -ls /user/frb/tidoop/numbers_count/
Found 3 items
-rw-r--r-- 3 frb frb 0 2015-11-05 17:25 /user/frb/tidoop/numbers_count/_SUCCESS
drwxr----- - frb frb 0 2015-11-05 17:24 /user/frb/tidoop/numbers_count/_logs
-rw-r--r-- 3 frb frb 0 2015-11-05 17:25 /user/frb/tidoop/numbers_count/part-r-00000
Any hints about what is happening?
Weird. I'd try using Mapper (identity mapper) with your job.
If the Mapper does not output anything there must be something weird with your hadoop installation, or job configuration.

WARN mapred.JobClient: No job jar file set. User classes may not be found

My code is
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class word_count_new {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
// job.setJarByClass(word_count_new.class);
// conf.setJar(word_count_new.jar);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(word_count_new.class);
job.waitForCompletion(true);
}
}
Below are class files and jars:
-rw-r----- 1 ps993w hyhdev 2236 Apr 3 13:56 word_count_new.java
-rw-r----- 1 ps993w hyhdev 1870 Apr 3 13:58 word_count_new$Map.class
-rw-r----- 1 ps993w hyhdev 1638 Apr 3 13:58 word_count_new$Reduce.class
-rw-r----- 1 ps993w hyhdev 1510 Apr 3 13:58 word_count_new.class
-rw-r----- 1 ps993w hyhdev 2990 Apr 3 13:58 word_count_new.jar
And the error is
[ps993w#hltd413 ~]$ hadoop jar word_count_new.jar word_count_new /user/ps993w/indata/input_line.dat /user/ps993w/wordcount/
14/04/03 15:53:13 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/04/03 15:53:13 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/04/03 15:53:13 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 105404 for ps993w on 130.4.240.48:8020
14/04/03 15:53:13 INFO security.TokenCache: Got dt for hdfs://hltd410.hydc.sbc.com:8020/user/ps993w/.staging/job_201402241341_9518;uri=130.4.240.48:8020;t.service=130.4.240.48:8020
14/04/03 15:53:13 INFO input.FileInputFormat: Total input paths to process : 1
14/04/03 15:53:13 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
14/04/03 15:53:13 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3]
14/04/03 15:53:13 WARN snappy.LoadSnappy: Snappy native library is available
14/04/03 15:53:13 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/04/03 15:53:13 INFO snappy.LoadSnappy: Snappy native library loaded
14/04/03 15:53:13 INFO mapred.JobClient: Running job: job_201402241341_9518
14/04/03 15:53:14 INFO mapred.JobClient: map 0% reduce 0%
14/04/03 15:53:24 INFO mapred.JobClient: Task Id : attempt_201402241341_9518_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: word_count_new$Map
Please suggest
I had the same problem. The issue was that hadoop didn't have access to read the jar file on the local system.
I found this solution from the following:
http://lucene.472066.n3.nabble.com/Trouble-with-Word-Count-example-td4023269.html

Avro Map Reduce - AvroInputFormat not found error

This is what i have understood so far reading from varied sources on the internet.
Avro mapred and Avro are not part of CDH4 (Cloudera Distribution) and i have to set it up manually using HADOOP_CLASSPATH=avro.jar:avro-mapred.jar
I have done that and when i run my job on my pseudo cluster it throws the following exception:
13/12/27 00:47:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/12/27 00:47:40 INFO mapred.FileInputFormat: Total input paths to process : 1
13/12/27 00:47:41 INFO mapred.JobClient: Running job: job_201312221245_0017
13/12/27 00:47:42 INFO mapred.JobClient: map 0% reduce 0%
13/12/27 00:47:57 INFO mapred.JobClient: Task Id : attempt_201312221245_0017_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found
I'm running the job as follows:
hadoop jar build/libs/hadoop-boilerplate-1.0.jar CustomerMapReduce transactions/input transactions/output1 -libjars /path/to/libs/avro-1.7.4.jar,/path/to/libs/avro-mapred-1.7.4.jar
You should implement Tool and use getConf() for job configuration.
public class SomeClass extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
...
}
}

Resources