I am facing the NullPointerException with the below code. It would be great if some one can review and help me with the program.
The mapper is running fine but, I get an NPE, when I am try to split the value at the iterator. Please help me figure out my mistake. I have attached the mapper out below.
Toppermain.java
package TopperPackage;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class TopperMain {
//hadoop jar worcount.jar ars[0] args[1]
public static void main(String[] args) throws Exception {
Job myhadoopJob = new Job();
myhadoopJob.setJarByClass(TopperMain.class);
myhadoopJob.setJobName("Finding topper based on subject");
FileInputFormat.addInputPath(myhadoopJob, new Path(args[0]));
FileOutputFormat.setOutputPath(myhadoopJob, new Path(args[1]));
myhadoopJob.setInputFormatClass(TextInputFormat.class);
myhadoopJob.setOutputFormatClass(TextOutputFormat.class);
myhadoopJob.setMapperClass(TopperMapper.class);
myhadoopJob.setReducerClass(TopperReduce.class);
myhadoopJob.setMapOutputKeyClass(Text.class);
myhadoopJob.setMapOutputValueClass(Text.class);
myhadoopJob.setOutputKeyClass(Text.class);
myhadoopJob.setOutputValueClass(Text.class);
System.exit(myhadoopJob.waitForCompletion(true) ? 0 : 1);
}
}
TopperMapper.java
package TopperPackage;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
/*Surender,87,60,50,50,80
Raj,80,70,80,85,60
Anten,81,60,50,70,100
Dinesh,60,90,80,80,70
Priya,80,85,91,60,75
*/
public class TopperMapper extends Mapper<LongWritable, Text, Text, Text>
{
String temp,temp2;
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String record = value.toString();
String[] parts = record.split(",");
temp=parts[0];
temp2=temp+ "\t" + parts[1];
context.write(new Text("Tamil"),new Text(temp2));
temp2=temp+ "\t" + parts[2];
context.write(new Text("English"),new Text(temp2));
temp2=temp+ "\t" + parts[3];
context.write(new Text("Maths"),new Text(temp2));
temp2=temp+ "\t" + parts[4];
context.write(new Text("Science"),new Text(temp2));
temp2=temp+ "\t" + parts[5];
context.write(new Text("SocialScrience"),new Text(temp2));
}
}
TopperReduce.java
package TopperPackage;
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class TopperReduce extends Reducer<Text, Text, Text, Text> {
int temp;
private String[] names;
private int[] marks;
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String top = "";
int count =0,topmark;
marks = null;
String befsplit;
String[] parts=null;
names = null;
for (Text t : values)
{
befsplit= t.toString();
parts = befsplit.split("\t");
names[count]=parts[0];
marks[count]=Integer.parseInt(parts[1]);
count = count+1;
}
topmark=calcTopper(marks);
top=names[topmark]+ "\t"+marks[topmark] ;
context.write(new Text(key), new Text(top));
}
public int calcTopper(int[] marks)
{
int count=marks.length;
temp=((marks[1]));
int i=0;
for (i=1;i<=(count-2);i++)
{
if(temp < marks[i+1])
{
temp = marks[i+1];
}
}
return i;
}
}
the error is
cloudera#cloudera-vm:~/Jarfiles$ hadoop jar TopperMain.jar /user/cloudera/inputfiles/topper/topperinput.txt /user/cloudera/outputfiles/topper/
14/08/24 23:17:07 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/24 23:17:08 INFO input.FileInputFormat: Total input paths to process : 1
14/08/24 23:17:09 INFO mapred.JobClient: Running job: job_201408241907_0012
14/08/24 23:17:10 INFO mapred.JobClient: map 0% reduce 0%
14/08/24 23:17:49 INFO mapred.JobClient: map 100% reduce 0%
14/08/24 23:18:03 INFO mapred.JobClient: Task Id : attempt_201408241907_0012_r_000000_0, Status : FAILED
java.lang.NullPointerException
at TopperPackage.TopperReduce.reduce(TopperReduce.java:25)
at TopperPackage.TopperReduce.reduce(TopperReduce.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:571)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
attempt_201408241907_0012_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201408241907_0012_r_000000_0: log4j:WARN Please initialize the log4j system properly.
14/08/24 23:18:22 INFO mapred.JobClient: Task Id : attempt_201408241907_0012_r_000000_1, Status : FAILED
java.lang.NullPointerException
at TopperPackage.TopperReduce.reduce(TopperReduce.java:25)
at TopperPackage.TopperReduce.reduce(TopperReduce.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:571)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
attempt_201408241907_0012_r_000000_1: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201408241907_0012_r_000000_1: log4j:WARN Please initialize the log4j system properly.
I am getting the expected output from mapper but reducer is throwing error when splitting the output and storing in a variable.
The mapper output is
Tamil Surender 87
English Surender 60
Maths Surender 50
Science Surender 50
SocialScrience Surender 80
Tamil Raj 80
English Raj 70
Maths Raj 80
Science Raj 85
SocialScrience Raj 60
Tamil Anten 81
English Anten 60
Maths Anten 50
Science Anten 70
SocialScrience Anten 100
Tamil Dinesh 60
English Dinesh 90
Maths Dinesh 80
Science Dinesh 80
SocialScrience Dinesh 70
Tamil Priya 80
English Priya 85
Maths Priya 91
Science Priya 60
SocialScrience Priya 75
Any advice to point out my mistake is appreciated.
The error is due to you are initializing marks and names arrays to null and not initialize them properly. Please use the below reducer class.
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class TopperReduce extends Reducer<Text, Text, Text, Text> {
int temp;
private String[] names = new String[10];
private int[] marks = new int[10];
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String top = "";
int count = 0, topmark;
String befsplit;
String[] parts = null;
for (Text t : values) {
befsplit = t.toString();
parts = befsplit.split("\t");
names[count] = parts[0];
marks[count] = Integer.parseInt(parts[1]);
count++;
}
topmark = calcTopper(marks);
top = names[topmark] + "\t" + marks[topmark];
context.write(new Text(key), new Text(top));
}
public int calcTopper(int[] marks) {
int count = marks.length;
int i = 0;
int highestMArk = 0;
int mark = 0;
int highestMarkIndex = 0;
for (; i < count; i++) {
mark = marks[i];
if (mark > highestMArk) {
highestMarkIndex = i;
}
}
return highestMarkIndex;
}
}
You are referring to a null array variable parts so you are getting this error,
change your code as i mentioned below it could work
public class TopperReduce extends Reducer<Text, Text, Text, Text> {
int temp;
private String[] names=new String[20];
private int[] marks= new int[20];
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String top = "";
int count =0,topmark;
for (Text t : values)
{
String befsplit= t.toString();
String[] parts = befsplit.split("\t");
names[count]=parts[0];
marks[count]=Integer.parseInt(parts[1]);
count = count+1;
}
topmark=calcTopper(marks);
top=names[topmark]+ "\t"+marks[topmark] ;
context.write(new Text(key), new Text(top));
}
Related
Kindly point me in a direction to get my desired output
Current outPut given:
Albania 3607 ++ Country minPopulation
Albania 418495 ++ Country maxPopulation
Desired Output
country city minPopulation
country city maxPopulation
Reducer Class:
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class Handson3Reducer extends Reducer<Text, IntWritable, Text, IntWritable> {
#Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
int minValue = Integer.MAX_VALUE;
String line = key.toString();
String field[] = line.split(",");
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
minValue = Math.min(minValue, value.get());
}
context.write(key, new IntWritable(minValue));
context.write(key, new IntWritable(maxValue));
}
}
Mapper class:
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class handson3Mapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
int populationVal;
String line = value.toString();
String field[] = line.split(",");
String country = field[4].substring(1, field[4].length()-1);
String newString = country.concat(field[0].substring(1, field[0].length()-1));
String population = field[9].substring(1, field[9].length()-1);
String city = field[0].substring(1, field[0].length()-1);
if (!population.matches(".*\\d.*") || population.equals("")||
population.matches("([0-9].*)\\.([0-9].*)") ){
return;
}else{
populationVal = Integer.parseInt(population);
context.write(new Text(country),new IntWritable(populationVal));
}
}
}
Runner Class:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class handsonJobRunner {
public int run(String[] args) throws Exception {
if(args.length !=2) {
System.err.println("Usage: Handson3 <input path> <outputpath>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(handsonJobRunner.class);
job.setJobName("Handson 3");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
job.setMapperClass(handson3Mapper.class);
job.setReducerClass(Handson3Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0:1);
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
handsonJobRunner driver = new handsonJobRunner();
driver.run(args);
}
}
Thank you in advance, any pointers would be much appreciated.
You should send both city and population as value to reducer and at reducer select the city with max and min population for each country.
Your mapper would be like this:
public class Handson3Mapper extends Mapper<LongWritable, Text, Text, Text> {
private static final int MISSING = 9999;
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
int populationVal;
String line = value.toString();
String field[] = line.split(",");
String country = field[4].substring(1, field[4].length() - 1);
String newString = country.concat(field[0].substring(1, field[0].length() - 1));
String population = field[9].substring(1, field[9].length() - 1);
String city = field[0].substring(1, field[0].length() - 1);
if (!population.matches(".*\\d.*") || population.equals("") ||
population.matches("([0-9].*)\\.([0-9].*)")) {
return;
} else {
populationVal = Integer.parseInt(population);
context.write(new Text(country), new Text(city + "-" + populationVal));
}
}
}
And Your reducer should change to this one:
public class Handson3Reducer extends Reducer<Text, Text, Text, IntWritable> {
#Override
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
String maxPopulationCityName = "";
String minPopulationCityName = "";
int maxValue = Integer.MIN_VALUE;
int minValue = Integer.MAX_VALUE;
String line = key.toString();
String field[] = line.split(",");
for (IntWritable value : values) {
String[] array = value.toString().split("-");
int population = Integer.valueOf(array[1]);
if (population > maxValue) {
maxPopulationCityName = array[0];
maxValue = population;
}
if (population < minValue) {
minPopulationCityName = array[0];
minValue = population;
}
}
context.write(new Text(key + " " + minPopulationCityName), new IntWritable(minValue));
context.write(new Text(key + " " + maxPopulationCityName), new IntWritable(maxValue));
}
}
I am trying to create a map reduce program to perform the k-means algorithm. I know using map reduce isn't the best way to do iterative algorithms.
I have created the mapper and reducer classes.
In the mapper code I read an input file. When a map reduce has completed I want the results to be stored in the same input file. How do i make the output file overwrite the inputted file from the mapper?
Also so I make the map reduce iterate until the values from the old input file and new input file converge i.e. the difference between the values is less than 0.1
My code is:
import java.io.IOException;
import java.util.StringTokenizer;
import java.util.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.FileReader;
import java.io.BufferedReader;
import java.util.ArrayList;
public class kmeansMapper extends Mapper<Object, Text, DoubleWritable,
DoubleWritable> {
private final static String centroidFile = "centroid.txt";
private List<Double> centers = new ArrayList<Double>();
public void setup(Context context) throws IOException{
BufferedReader br = new BufferedReader(new
FileReader(centroidFile));
String contentLine;
while((contentLine = br.readLine())!=null){
centers.add(Double.parseDouble(contentLine));
}
}
public void map(Object key, Text input, Context context) throws IOException,
InterruptedException {
String[] fields = input.toString().split(" ");
Double rating = Double.parseDouble(fields[2]);
Double distance = centers.get(0) - rating;
int position = 0;
for(int i=1; i<centers.size(); i++){
Double cDistance = Math.abs(centers.get(i) - rating);
if(cDistance< distance){
position = i;
distance = cDistance;
}
}
Double closestCenter = centers.get(position);
context.write(new DoubleWritable(closestCenter),new
DoubleWritable(rating)); //outputs closestcenter and rating value
}
}
import java.io.IOException;
import java.lang.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Reducer;
import java.util.*;
public class kmeansReducer extends Reducer<DoubleWritable, DoubleWritable,
DoubleWritable, Text> {
public void reduce(DoubleWritable key, Iterable<DoubleWritable> values,
Context context)// get count // get total //get values in a string
throws IOException, InterruptedException {
Iterator<DoubleWritable> v = values.iterator();
double total = 0;
double count = 0;
String value = ""; //value is the rating
while (v.hasNext()){
double i = v.next().get();
value = value + " " + Double.toString(i);
total = total + i;
++count;
}
double nCenter = total/count;
context.write(new DoubleWritable(nCenter), new Text(value));
}
}
import java.util.Arrays;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class run
{
public static void runJob(String[] input, String output) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf);
Path toCache = new Path("input/centroid.txt");
job.addCacheFile(toCache.toUri());
job.setJarByClass(run.class);
job.setMapperClass(kmeansMapper.class);
job.setReducerClass(kmeansReducer.class);
job.setMapOutputKeyClass(DoubleWritable.class);
job.setMapOutputValueClass(DoubleWritable.class);
job.setNumReduceTasks(1);
Path outputPath = new Path(output);
FileInputFormat.setInputPaths(job, StringUtils.join(input, ","));
FileOutputFormat.setOutputPath(job, outputPath);
outputPath.getFileSystem(conf).delete(outputPath,true);
job.waitForCompletion(true);
}
public static void main(String[] args) throws Exception {
runJob(Arrays.copyOfRange(args, 0, args.length-1), args[args.length-1]);
}
}
Thanks
I know you put the disclaimer.. but please switch to Spark or some other framework that can solve problems in-memory. Your life will be so much better.
If you really want to do this, just iteratively run the code in runJob and use a temporary file name for input. You can see this question on moving files in hadoop to achieve this. You'll need a FileSystem instance and a temp file for input:
FileSystem fs = FileSystem.get(new Configuration());
Path tempInputPath = Paths.get('/user/th/kmeans/tmp_input';
Broadly speaking, after each iteration is finished, do
fs.delete(tempInputPath)
fs.rename(outputPath, tempInputPath)
Of course for the very first iteration you must set the input path to be the input paths provided when running the job. Subsequent iterations can use the tempInputPath, which will be the output of the previous iteration.
I'm trying to get pearson's correlation for all pairs of column.
This is My MapReduce code :
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Pearson
{
public static class MyMapper extends Mapper<LongWritable,Text,IndexPair,ValuePair>{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] tokens = line.split(",");
double[] arr = toDouble(tokens);
for(int i=0; i < arr.length; i++) {
for(int j=i+1; j < arr.length; j++) {
IndexPair k2 = new IndexPair(i, j);
ValuePair v2 = new ValuePair(arr[i], arr[j]);
context.write(k2, v2);
}
}
}
public double[] toDouble(String[] tokens) {
double[] arr = new double[tokens.length];
for(int i=0; i < tokens.length; i++) {
arr[i] = Double.parseDouble(tokens[i]);
}
return arr;
}
}
public static class MyReduce extends Reducer<IndexPair,ValuePair,IndexPair,DoubleWritable>
{
public void reduce(IndexPair key, Iterable<ValuePair> values, Context context) throws IOException, InterruptedException {
double x = 0.0d;
double y = 0.0d;
double xx = 0.0d;
double yy = 0.0d;
double xy = 0.0d;
double n = 0.0d;
for(ValuePair pairs : values) {
x += pairs.v1;
y += pairs.v2;
xx += Math.pow(pairs.v1, 2.0d);
yy += Math.pow(pairs.v2, 2.0d);
xy += (pairs.v1 * pairs.v2);
n += 1.0d;
}
double numerator = xy - ((x * y) / n);
double denominator1 = xx - (Math.pow(x, 2.0d) / n);
double denominator2 = yy - (Math.pow(y, 2.0d) / n);
double denominator = Math.sqrt(denominator1 * denominator2);
double corr = numerator / denominator;
context.write(key, new DoubleWritable(corr));
}
}
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Pearson's Correlation");
job.setJarByClass(Pearson.class);
job.setMapperClass(MyMapper.class);
job.setCombinerClass(MyReduce.class);
job.setReducerClass(MyReduce.class);
job.setMapOutputKeyClass(IndexPair.class);
job.setMapOutputValueClass(ValuePair.class);
job.setOutputKeyClass(IndexPair.class);
job.setOutputValueClass(DoubleWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
And Code for IndexPair is this one :
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;
public class IndexPair implements WritableComparable<IndexPair>{
public static String[] labels
={"Year","Month","MEI","CO2","CH4","N2O","CFC-11","CFC-12","TSI","Aerosols","Temp"};
public long i,j;
public IndexPair()
{
}
public IndexPair(long i,long j) {
this.i=i;
this.j=j;
}
#Override
public void readFields(DataInput in) throws IOException {
i = in.readLong();
j = in.readLong();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeLong(i);
out.writeLong(j);
}
#Override
public int compareTo(IndexPair o) {
Long i1 = i;
Long j1 = j;
Long i2 = o.i;
Long j2 = o.j;
int result = i1.compareTo(i2);
if (0 == result) {
return j1.compareTo(j2);
}
return result;
}
#Override
public String toString()
{
return "Corelation between column "+labels[(int) i]+"-->"+ labels[(int)j];
}
}
And Code For value Pair is thisone :
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;
public class ValuePair implements WritableComparable<ValuePair>{
public double v1,v2;
public ValuePair()
{
}
public ValuePair(double i,double j)
{
v1=i;
v2=j;
}
#Override
public void readFields(DataInput in) throws IOException {
v1=in.readDouble();
v2=in.readDouble();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeDouble(v1);
out.writeDouble(v2);
}
#Override
public int compareTo(ValuePair o) {
// comparator for value pair is not required....
return 0;
}
}
But Whn I'm trying to execute this, I'm getting the following error
17/07/20 13:59:49 INFO mapreduce.Job: map 0% reduce 0%
17/07/20 13:59:53 INFO mapreduce.Job: Task Id : attempt_1500536519279_0007_m_000000_0, Status : FAILED
Error: java.io.IOException: wrong value class: class org.apache.hadoop.io.DoubleWritable is not class ValuePair
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:194)
at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1411)
at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1728)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
at Pearson$MyReduce.reduce(Pearson.java:66)
at Pearson$MyReduce.reduce(Pearson.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1749)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1639)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1491)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
The problem is that you use the reducer as a combiner:
job.setCombinerClass(MyReduce.class);
The output key and value types of the combiner should be the same as the ones of the mapper, while, when you use the reducer as a combiner, it tries to emit pairs of different types, hence the error.
I am writing MapReduce code with 2 mapper class and a reducer , but I don't know why I have an reduce output records=0.
Please tell me how to solve this problem
package reducesidejoin;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
import java.util.Iterator;
public class ReduceSideJoinReducer extends Reducer<IntWritable,
Text, IntWritable, Text> {
#Override
public void reduce(IntWritable key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
String output = null;
Text achat;
Text vins;
Text valeur2;
Text valeur1;
Iterator<Text> itr = values.iterator();
valeur1 = itr.next();
if (valeur1.charAt(0) == 1) {
vins = valeur1;
while (itr.hasNext()) {
valeur2 = itr.next();
if (valeur2.charAt(0) == 2) {
achat = valeur2;
output = vins.toString() + achat.toString();
context.write(key, new Text(output));
}
context.write(key, new Text(output));
}
} else if (valeur1.charAt(0) == 2) {
achat = valeur1;
while (itr.hasNext()) {
valeur2 = itr.next();
if (valeur2.charAt(0) == 1) {
vins = valeur2;
output = vins.toString() + achat.toString();
System.out.println(key + "," + output);
}
context.write(key, new Text(output));
}
}
}
}
The only way your reducer can output anything is if your char comparisons are working. This is assuming you actually have records entering your reducer.
I would have a look at these lines: valeur1.charAt(0) == 1
You're comparing an integer to a char and i suspect your looking for the printable value of 1 (49 if you did an integer comparison) so you probably want:
valeur1.charAt(0) == '1'
You're also doing this a lot - vins = valeur1; which is going to cause problems because hadoop is going to be reusing the Text objects it gives you via the Iterable.
You should change these to vins.set(valeur1);
I have been trying to get a MapReduce sample code that comes with Cassandra running but I get run time error.
Source code:
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.*;
import java.util.Map.Entry;
import org.apache.cassandra.hadoop.cql3.CqlConfigHelper;
import org.apache.cassandra.hadoop.cql3.CqlOutputFormat;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat;
import org.apache.cassandra.hadoop.ConfigHelper;
import org.apache.cassandra.utils.ByteBufferUtil;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import java.nio.charset.CharacterCodingException;
/**
* This counts the occurrences of words in ColumnFamily
* cql3_worldcount ( user_id text,
* category_id text,
* sub_category_id text,
* title text,
* body text,
* PRIMARY KEY (user_id, category_id, sub_category_id))
*
* For each word, we output the total number of occurrences across all body texts.
*
* When outputting to Cassandra, we write the word counts to column family
* output_words ( row_id1 text,
* row_id2 text,
* word text,
* count_num text,
* PRIMARY KEY ((row_id1, row_id2), word))
* as a {word, count} to columns: word, count_num with a row key of "word sum"
*/
public class WordCount extends Configured implements Tool
{
private static final Logger logger = LoggerFactory.getLogger(WordCount.class);
static final String KEYSPACE = "cql3_worldcount";
static final String COLUMN_FAMILY = "inputs";
static final String OUTPUT_REDUCER_VAR = "output_reducer";
static final String OUTPUT_COLUMN_FAMILY = "output_words";
private static final String OUTPUT_PATH_PREFIX = "/tmp/word_count";
private static final String PRIMARY_KEY = "row_key";
public static void main(String[] args) throws Exception
{
// Let ToolRunner handle generic command-line options
ToolRunner.run(new Configuration(), new WordCount(), args);
System.exit(0);
}
public static class TokenizerMapper extends Mapper<Map<String, ByteBuffer>, Map<String, ByteBuffer>, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
private ByteBuffer sourceColumn;
protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException, InterruptedException
{
}
public void map(Map<String, ByteBuffer> keys, Map<String, ByteBuffer> columns, Context context) throws IOException, InterruptedException
{
for (Entry<String, ByteBuffer> column : columns.entrySet())
{
if (!"body".equalsIgnoreCase(column.getKey()))
continue;
String value = ByteBufferUtil.string(column.getValue());
logger.debug("read {}:{}={} from {}",
new Object[] {toString(keys), column.getKey(), value, context.getInputSplit()});
StringTokenizer itr = new StringTokenizer(value);
while (itr.hasMoreTokens())
{
word.set(itr.nextToken());
context.write(word, one);
}
}
}
private String toString(Map<String, ByteBuffer> keys)
{
String result = "";
try
{
for (ByteBuffer key : keys.values())
result = result + ByteBufferUtil.string(key) + ":";
}
catch (CharacterCodingException e)
{
logger.error("Failed to print keys", e);
}
return result;
}
}
public static class ReducerToFilesystem extends Reducer<Text, IntWritable, Text, IntWritable>
{
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
sum += val.get();
context.write(key, new IntWritable(sum));
}
}
public static class ReducerToCassandra extends Reducer<Text, IntWritable, Map<String, ByteBuffer>, List<ByteBuffer>>
{
private Map<String, ByteBuffer> keys;
private ByteBuffer key;
protected void setup(org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException, InterruptedException
{
keys = new LinkedHashMap<String, ByteBuffer>();
String[] partitionKeys = context.getConfiguration().get(PRIMARY_KEY).split(",");
keys.put("row_id1", ByteBufferUtil.bytes(partitionKeys[0]));
keys.put("row_id2", ByteBufferUtil.bytes(partitionKeys[1]));
}
public void reduce(Text word, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
sum += val.get();
context.write(keys, getBindVariables(word, sum));
}
private List<ByteBuffer> getBindVariables(Text word, int sum)
{
List<ByteBuffer> variables = new ArrayList<ByteBuffer>();
keys.put("word", ByteBufferUtil.bytes(word.toString()));
variables.add(ByteBufferUtil.bytes(String.valueOf(sum)));
return variables;
}
}
public int run(String[] args) throws Exception
{
String outputReducerType = "filesystem";
if (args != null && args[0].startsWith(OUTPUT_REDUCER_VAR))
{
String[] s = args[0].split("=");
if (s != null && s.length == 2)
outputReducerType = s[1];
}
logger.info("output reducer type: " + outputReducerType);
Job job = new Job(getConf(), "wordcount");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
if (outputReducerType.equalsIgnoreCase("filesystem"))
{
job.setCombinerClass(ReducerToFilesystem.class);
job.setReducerClass(ReducerToFilesystem.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH_PREFIX));
}
else
{
job.setReducerClass(ReducerToCassandra.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Map.class);
job.setOutputValueClass(List.class);
job.setOutputFormatClass(CqlOutputFormat.class);
ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, OUTPUT_COLUMN_FAMILY);
job.getConfiguration().set(PRIMARY_KEY, "word,sum");
String query = "UPDATE " + KEYSPACE + "." + OUTPUT_COLUMN_FAMILY +
" SET count_num = ? ";
CqlConfigHelper.setOutputCql(job.getConfiguration(), query);
ConfigHelper.setOutputInitialAddress(job.getConfiguration(), "localhost");
ConfigHelper.setOutputPartitioner(job.getConfiguration(), "Murmur3Partitioner");
}
job.setInputFormatClass(CqlPagingInputFormat.class);
ConfigHelper.setInputRpcPort(job.getConfiguration(), "9160");
ConfigHelper.setInputInitialAddress(job.getConfiguration(), "localhost");
ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY);
ConfigHelper.setInputPartitioner(job.getConfiguration(), "Murmur3Partitioner");
CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), "3");
//this is the user defined filter clauses, you can comment it out if you want count all titles
CqlConfigHelper.setInputWhereClauses(job.getConfiguration(), "title='A'");
job.waitForCompletion(true);
return 0;
}
}
It compiles fine but I get this error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/cassandra/hadoop/cql3/CqlPagingInputFormat
at WordCount.run(WordCount.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at WordCount.main(WordCount.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.ClassNotFoundException: org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 8 more
I am using hadoop 1.2.1 and cassandra 2.0.4.
Help with this error or sample code or instruction for getting hadoop mapreduce to work with cassandra would be appreciated.
To solve the problem copy cassandra jar files to hadoop lib directory.
Please use following path
export HADOOP_CLASSPATH=/< path to cassandra >/lib/*:$HADOOP_CLASSPATH in /< hadoop path >/conf/hadoop-env.sh file.