I am new to hadoop and I am trying to get max salary of an employee. My data looks like
1231,"","","",4000<br/>
1232,"","","",5000<br/>
..................<br/>
..................<br/>
This is my mapper class, here, I am to trying to emit the full tuple
public class maxmap extends MapReduceBase implements Mapper<LongWritable,Text,Text,employee> {
#Override
public void map(LongWritable key, Text value,
OutputCollector<Text,employee> outputCollector, Reporter reporter)
throws IOException {
// TODO Auto-generated method stub
//TupleWritable tupleWritable= new TupleWritable(new Writable[]{new Text(value.toString().split(",")[1]),
//new Text(value.toString().split(",")[0])
//});Str
String employeeId = value.toString().split(",")[0];
int count =1231;
employee employee=new employee(Integer.parseInt(employeeId), "employeeName", "StemployeeDept", "employeeJoinDt",1231);
//tupleWritable.write();
outputCollector.collect(new Text("max salry"),employee);
}
}
This is my Reducer class
public class maxsalreduce extends MapReduceBase implements Reducer<Text,employee,Text,IntWritable> {
#Override
public void reduce(Text key, Iterator<employee> values,
OutputCollector<Text, IntWritable> collector, Reporter reporter)
throws IOException {
// TODO Auto-generated method stub
System.out.println("in reducer");
while(values.hasNext()){
employee employee=values.next();
System.out.println("employee id"+employee.employeeId);
}
collector.collect(new Text(""), new IntWritable(1));
}
}
This is my employee class
public class employee implements Writable{
public int employeeId;
private String employeeName;
private String employeeDept;
private String employeeJoinDt;
public employee(int employeeId,String employeeName,String employeeDept,String employeeJoinDt,int employeeSalary){
this.employeeId=employeeId;
System.out.println(this.employeeId);
this.employeeName=employeeName;
this.employeeDept=employeeDept;
this.employeeJoinDt=employeeJoinDt;
this.employeeSalary=employeeSalary;
}
public employee() {
// TODO Auto-generated constructor stub
}
public int getEmployeeId() {
return employeeId;
}
public void setEmployeeId(int employeeId) {
this.employeeId = employeeId;
}
public String getEmployeeName() {
return employeeName;
}
public void setEmployeeName(String employeeName) {
this.employeeName = employeeName;
}
public String getEmployeeDept() {
return employeeDept;
}
public void setEmployeeDept(String employeeDept) {
this.employeeDept = employeeDept;
}
public String getEmployeeJoinDt() {
return employeeJoinDt;
}
public void setEmployeeJoinDt(String employeeJoinDt) {
this.employeeJoinDt = employeeJoinDt;
}
public int getEmployeeSalary() {
return employeeSalary;
}
public void setEmployeeSalary(int employeeSalary) {
this.employeeSalary = employeeSalary;
}
private int employeeSalary;
#Override
public void readFields(DataInput input) throws IOException {
// TODO Auto-generated method stubt
System.out.println("employee id is"+input.readInt());
//this.employeeId=input.readInt();
//this.employeeName=input.readUTF();
//this.employeeDept=input.readUTF();
//this.employeeJoinDt=input.readUTF();mployee id
//this.employeeSalary=input.readInt();
new employee(input.readInt(),input.readUTF(),input.readUTF(),input.readUTF(),input.readInt());
}
#Override
public void write(DataOutput output) throws IOException {
// TODO Auto-generated method stub
output.writeInt(this.employeeId);
output.writeUTF(this.employeeName);
output.writeUTF(this.employeeDept);
output.writeUTF(this.employeeJoinDt);
output.writeInt(this.employeeSalary);
}
}
This is my job runner
public class jobrunner {
public static void main(String[] args) throws IOException
{
JobConf jobConf = new JobConf(jobrunner.class);
jobConf.setJobName("Count no of employees");
jobConf.setMapperClass(maxmap.class);
jobConf.setReducerClass(maxsalreduce.class);
FileInputFormat.setInputPaths(jobConf, new Path("hdfs://localhost:9000/employee_data.txt"));
FileOutputFormat.setOutputPath(jobConf,new Path("hdfs://localhost:9000/dummy20.txt"));
jobConf.setOutputKeyClass(Text.class);
jobConf.setOutputValueClass(employee.class);
JobClient.runJob(jobConf);
}
}
This is the exception I am getting
java.lang.RuntimeException: problem advancing post rec#0
at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1214)
at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:249)
at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:245)
at tuplewritable.maxsalreduce.reduce(maxsalreduce.java:24)
at tuplewritable.maxsalreduce.reduce(maxsalreduce.java:1)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readUTF(DataInputStream.java:592)
at java.io.DataInputStream.readUTF(DataInputStream.java:547)
at tuplewritable.employee.readFields(employee.java:76)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1271)
at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1211)
... 7 more
13/08/17 20:44:14 INFO mapred.JobClient: map 100% reduce 0%
13/08/17 20:44:14 INFO mapred.JobClient: Job complete: job_local_0001
13/08/17 20:44:14 INFO mapred.JobClient: Counters: 21
13/08/17 20:44:14 INFO mapred.JobClient: File Input Format Counters
13/08/17 20:44:14 INFO mapred.JobClient: Bytes Read=123
13/08/17 20:44:14 INFO mapred.JobClient: FileSystemCounters
13/08/17 20:44:14 INFO mapred.JobClient: FILE_BYTES_READ=146
13/08/17 20:44:14 INFO mapred.JobClient: HDFS_BYTES_READ=123
13/08/17 20:44:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=39985
13/08/17 20:44:14 INFO mapred.JobClient: Map-Reduce Framework
13/08/17 20:44:14 INFO mapred.JobClient: Map output materialized bytes=270
13/08/17 20:44:14 INFO mapred.JobClient: Map input records=4
13/08/17 20:44:14 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/17 20:44:14 INFO mapred.JobClient: Spilled Records=4
13/08/17 20:44:14 INFO mapred.JobClient: Map output bytes=256
13/08/17 20:44:14 INFO mapred.JobClient: Total committed heap usage (bytes)=160763904
13/08/17 20:44:14 INFO mapred.JobClient: CPU time spent (ms)=0
13/08/17 20:44:14 INFO mapred.JobClient: Map input bytes=123
13/08/17 20:44:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=92
13/08/17 20:44:14 INFO mapred.JobClient: Combine input records=0
13/08/17 20:44:14 INFO mapred.JobClient: Reduce input records=0
13/08/17 20:44:14 INFO mapred.JobClient: Reduce input groups=0
13/08/17 20:44:14 INFO mapred.JobClient: Combine output records=0
13/08/17 20:44:14 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
13/08/17 20:44:14 INFO mapred.JobClient: Reduce output records=0
13/08/17 20:44:14 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
13/08/17 20:44:14 INFO mapred.JobClient: Map output records=4
13/08/17 20:44:14 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at tuplewritable.jobrunner.main(jobrunner.java:30)
13/08/17 20:44:14 ERROR hdfs.DFSClient: Exception closing file /dummy20.txt/_temporary/_attempt_local_0001_r_000000_0/part-00000 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /dummy20.txt/_temporary/_attempt_local_0001_r_000000_0/part-00000 File does not exist. Holder DFSClient_1595916561 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1629)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663)
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /dummy20.txt/_temporary/_attempt_local_0001_r_000000_0/part-00000 File does not exist. Holder DFSClient_1595916561 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1629)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663)
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3894)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3809)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1342)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:275)
at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:328)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1446)
at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:277)
at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:260)
You have these going wrong in your classes.
In the Employee class remove
System.out.println("employee id is"+input.readInt());
And,
new employee(input.readInt(),input.readUTF(),input.readUTF(),
input.readUTF(),input.readInt());
from,
#Override
public void readFields(DataInput input) throws IOException {
// TODO Auto-generated method stubt
System.out.println("employee id is"+input.readInt());
//this.employeeId=input.readInt();
//this.employeeName=input.readUTF();
//this.employeeDept=input.readUTF();
//this.employeeJoinDt=input.readUTF();mployee id
//this.employeeSalary=input.readInt();
new employee(input.readInt(),input.readUTF(),input.readUTF(),input.readUTF(),input.readInt());
}
Reason: The System.out.println("employee id is"+input.readInt()); already deserializes your first input and that's why using input.readInt() again is causing the issue. And the other line new Employee(....), you are possibly well aware of not using it like this. Atleast I don't do.
Next in the JobRunner class,
remove this line:
jobConf.setOutputValueClass(employee.class);
add these lines,
jobConf.setMapOutputKeyClass(Text.class);
jobConf.setMapOutputValueClass(employee.class);
jobConf.setOutputKeyClass(Text.class);
jobConf.setOutputValueClass(IntWritable.class);
Addendum: Please use capital letters to start a class name. It breaks Java naming convention, if you are not.
Related
import java.io.*;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.lib.db.DBWritable;
public class DBInputWritable implements Writable, DBWritable
{
String symbol;
String date;
double open;
double high;
double low;
double close;
int volume;
double adjClose;
//private final static SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
public void readFields(DataInput in) throws IOException
{
symbol=in.readLine();
date=in.readLine();
open=in.readDouble();
high=in.readDouble();
low=in.readDouble();
close=in.readDouble();
volume=in.readInt();
adjClose=in.readDouble();
}
public void readFields(ResultSet rs) throws SQLException
{
symbol = rs.getString(2);
date = rs.getString(3);
open = rs.getDouble(4);
high = rs.getDouble(5);
low = rs.getDouble(6);
close = rs.getDouble(7);
volume = rs.getInt(8);
adjClose = rs.getDouble(9);
}
public void write(DataOutput out) throws IOException
{
}
public void write( PreparedStatement ps) throws SQLException
{
}
public String getSymbol()
{
return symbol;
}
public String getDate()
{
return date;
}
public double getOpen()
{
return open;
}
public double getHigh()
{
return high;
}
public double getLow()
{
return low;
}
public double getClose()
{
return close;
}
public int getVolume()
{
return volume;
}
public double getAdjClose()
{
return adjClose;
}
}
public class DBOutputWritable implements Writable, DBWritable
{
String symbol;
String date;
double open;
double high;
double low;
double close;
int volume;
double adjClose;
;
public DBOutputWritable(String symbol,String date,String open,String high,String low,String close,String volume,String adjClose)
{
this.symbol=symbol;
this.date=date;
this.open=Double.parseDouble(open);
this.high=Double.parseDouble(high);
this.low=Double.parseDouble(low);
this.close=Double.parseDouble(close);
this.volume=Integer.parseInt(volume);
this.adjClose=Double.parseDouble(adjClose);
}
public void readFields(DataInput in) throws IOException
{
}
public void readFields(ResultSet rs) throws SQLException
{
}
public void write(DataOutput out) throws IOException
{
out.writeChars(symbol);
out.writeChars(date);
out.writeDouble(open);
out.writeDouble(high);
out.writeDouble(low);
out.writeDouble(close);
out.writeInt(volume);
out.writeDouble(adjClose);
}
public void write(PreparedStatement ps) throws SQLException
{
ps.setString(1,symbol);
ps.setString(2,date);
ps.setDouble(3,open);
ps.setDouble(4,high);
ps.setDouble(5,low);
ps.setDouble(6,close);
ps.setInt(7,volume);
ps.setDouble(8,adjClose);
}
}
public class Map extends Mapper<LongWritable,DBInputWritable,Text,Text>
{
public void map(LongWritable key, DBInputWritable value, Context ctx)
{
try
{
Text set;
set= new Text(value.getDate());
String line = value.getSymbol()+","+value.getDate()+","+value.getOpen()+","+value.getHigh()+","+value.getLow()+","+value.getClose()+","+value.getVolume()+","+value.getAdjClose();
ctx.write(set,new Text(line));
}
catch(IOException e)
{
e.printStackTrace();
}
catch(InterruptedException e)
{
e.printStackTrace();
}
}
}
public class Reduce extends Reducer<Text, Text, DBOutputWritable, NullWritable>
{
public void reduce(Text key, Text value, Context ctx)
{
try
{
String []line= value.toString().split(",");
String sym=line[0];
String dt=line[1];
String opn=line[2];
String hgh=line[3];
String lw=line[4];
String cls=line[5];
String vlm=line[6];
String adcls=line[7];
ctx.write(new DBOutputWritable(sym,dt,opn,hgh,lw,cls,vlm,adcls),NullWritable.get());
}
catch(IOException e)
{
e.printStackTrace();
}
catch(InterruptedException e)
{
e.printStackTrace();
}
}
}
public class Main
{
public static void main(String [] args) throws Exception
{
Configuration conf = new Configuration();
DBConfiguration.configureDB(conf,
"com.mysql.jdbc.Driver", //Driver Class
"jdbc:mysql://192.168.198.128:3306/testDb", //DB URL
"sqoopuser", //USERNAME
"passphrase"); //PASSWORD
Job job = new Job(conf);
job.setJarByClass(Main.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(DBOutputWritable.class);
job.setOutputValueClass(NullWritable.class);
job.setInputFormatClass(DBInputFormat.class);
job.setOutputFormatClass(DBOutputFormat.class);
DBInputFormat.setInput(
job,
DBInputWritable.class,
"aapldata", //input table name
null,
null,
new String[] {"stock","symbol", "date" ,"open", "high", "low", "close", "volume", "adjClose"}
//Table Columns
);
DBOutputFormat.setOutput(
job,
"aapldatanew", //Output Table Name
new String[] {"symbol", "date" ,"open", "high", "low", "close", "volume", "adjClose"}
//Table Columns
);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
I think Code is picture perfect. But still I encounter below error:
14/11/26 22:09:47 INFO mapred.JobClient: map 100% reduce 0%
14/11/26 22:09:55 INFO mapred.JobClient: map 100% reduce 33%
14/11/26 22:09:58 INFO mapred.JobClient: Task Id : attempt_201411262208_0001_r_000000_2, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.mapreduce.lidb.DBWritable
at org.apache.hadoop.mapreduce.lib.db.DBOutputFormat$DBRecordWriter.write(DBOutputFormat.java:66
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:156)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Need your valuable Insights.
In your map class get the input as text instead of DBInputWritable:
public class Map extends Mapper {
public void map(LongWritable key,Text value, Context ctx)
I can find 2 problems:
mapper output key,value classes not matching with the job configuration. Please chek it.
mapper while jobconfigured. Please correct as your need. I approach the 2nd problem considering mapper key,value pair is your right choice.
you didn't override reduce method !!
As per your job configurationThe signature should be:
public void reduce(Text key, Iterable<Text> values, Context context){
//...your code
}
Explanation for your exception
Since you didn't override Reducer's reduce, the reducer will reduce with the default implementation(which is called identity reducer).Source code:
protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
) throws IOException, InterruptedException {
for(VALUEIN value: values) {
context.write((KEYOUT) key, (VALUEOUT) value);
}
}
As specified in the source code it simply iterate over values and write it using Output key value calsses.But in your case intermmediate pairs(i.e IntWritable,Text) do not match with pairs of DBoutputFormat.
HTH !
I am facing the NullPointerException with the below code. It would be great if some one can review and help me with the program.
The mapper is running fine but, I get an NPE, when I am try to split the value at the iterator. Please help me figure out my mistake. I have attached the mapper out below.
Toppermain.java
package TopperPackage;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class TopperMain {
//hadoop jar worcount.jar ars[0] args[1]
public static void main(String[] args) throws Exception {
Job myhadoopJob = new Job();
myhadoopJob.setJarByClass(TopperMain.class);
myhadoopJob.setJobName("Finding topper based on subject");
FileInputFormat.addInputPath(myhadoopJob, new Path(args[0]));
FileOutputFormat.setOutputPath(myhadoopJob, new Path(args[1]));
myhadoopJob.setInputFormatClass(TextInputFormat.class);
myhadoopJob.setOutputFormatClass(TextOutputFormat.class);
myhadoopJob.setMapperClass(TopperMapper.class);
myhadoopJob.setReducerClass(TopperReduce.class);
myhadoopJob.setMapOutputKeyClass(Text.class);
myhadoopJob.setMapOutputValueClass(Text.class);
myhadoopJob.setOutputKeyClass(Text.class);
myhadoopJob.setOutputValueClass(Text.class);
System.exit(myhadoopJob.waitForCompletion(true) ? 0 : 1);
}
}
TopperMapper.java
package TopperPackage;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
/*Surender,87,60,50,50,80
Raj,80,70,80,85,60
Anten,81,60,50,70,100
Dinesh,60,90,80,80,70
Priya,80,85,91,60,75
*/
public class TopperMapper extends Mapper<LongWritable, Text, Text, Text>
{
String temp,temp2;
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String record = value.toString();
String[] parts = record.split(",");
temp=parts[0];
temp2=temp+ "\t" + parts[1];
context.write(new Text("Tamil"),new Text(temp2));
temp2=temp+ "\t" + parts[2];
context.write(new Text("English"),new Text(temp2));
temp2=temp+ "\t" + parts[3];
context.write(new Text("Maths"),new Text(temp2));
temp2=temp+ "\t" + parts[4];
context.write(new Text("Science"),new Text(temp2));
temp2=temp+ "\t" + parts[5];
context.write(new Text("SocialScrience"),new Text(temp2));
}
}
TopperReduce.java
package TopperPackage;
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class TopperReduce extends Reducer<Text, Text, Text, Text> {
int temp;
private String[] names;
private int[] marks;
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String top = "";
int count =0,topmark;
marks = null;
String befsplit;
String[] parts=null;
names = null;
for (Text t : values)
{
befsplit= t.toString();
parts = befsplit.split("\t");
names[count]=parts[0];
marks[count]=Integer.parseInt(parts[1]);
count = count+1;
}
topmark=calcTopper(marks);
top=names[topmark]+ "\t"+marks[topmark] ;
context.write(new Text(key), new Text(top));
}
public int calcTopper(int[] marks)
{
int count=marks.length;
temp=((marks[1]));
int i=0;
for (i=1;i<=(count-2);i++)
{
if(temp < marks[i+1])
{
temp = marks[i+1];
}
}
return i;
}
}
the error is
cloudera#cloudera-vm:~/Jarfiles$ hadoop jar TopperMain.jar /user/cloudera/inputfiles/topper/topperinput.txt /user/cloudera/outputfiles/topper/
14/08/24 23:17:07 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/24 23:17:08 INFO input.FileInputFormat: Total input paths to process : 1
14/08/24 23:17:09 INFO mapred.JobClient: Running job: job_201408241907_0012
14/08/24 23:17:10 INFO mapred.JobClient: map 0% reduce 0%
14/08/24 23:17:49 INFO mapred.JobClient: map 100% reduce 0%
14/08/24 23:18:03 INFO mapred.JobClient: Task Id : attempt_201408241907_0012_r_000000_0, Status : FAILED
java.lang.NullPointerException
at TopperPackage.TopperReduce.reduce(TopperReduce.java:25)
at TopperPackage.TopperReduce.reduce(TopperReduce.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:571)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
attempt_201408241907_0012_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201408241907_0012_r_000000_0: log4j:WARN Please initialize the log4j system properly.
14/08/24 23:18:22 INFO mapred.JobClient: Task Id : attempt_201408241907_0012_r_000000_1, Status : FAILED
java.lang.NullPointerException
at TopperPackage.TopperReduce.reduce(TopperReduce.java:25)
at TopperPackage.TopperReduce.reduce(TopperReduce.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:571)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
attempt_201408241907_0012_r_000000_1: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201408241907_0012_r_000000_1: log4j:WARN Please initialize the log4j system properly.
I am getting the expected output from mapper but reducer is throwing error when splitting the output and storing in a variable.
The mapper output is
Tamil Surender 87
English Surender 60
Maths Surender 50
Science Surender 50
SocialScrience Surender 80
Tamil Raj 80
English Raj 70
Maths Raj 80
Science Raj 85
SocialScrience Raj 60
Tamil Anten 81
English Anten 60
Maths Anten 50
Science Anten 70
SocialScrience Anten 100
Tamil Dinesh 60
English Dinesh 90
Maths Dinesh 80
Science Dinesh 80
SocialScrience Dinesh 70
Tamil Priya 80
English Priya 85
Maths Priya 91
Science Priya 60
SocialScrience Priya 75
Any advice to point out my mistake is appreciated.
The error is due to you are initializing marks and names arrays to null and not initialize them properly. Please use the below reducer class.
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class TopperReduce extends Reducer<Text, Text, Text, Text> {
int temp;
private String[] names = new String[10];
private int[] marks = new int[10];
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String top = "";
int count = 0, topmark;
String befsplit;
String[] parts = null;
for (Text t : values) {
befsplit = t.toString();
parts = befsplit.split("\t");
names[count] = parts[0];
marks[count] = Integer.parseInt(parts[1]);
count++;
}
topmark = calcTopper(marks);
top = names[topmark] + "\t" + marks[topmark];
context.write(new Text(key), new Text(top));
}
public int calcTopper(int[] marks) {
int count = marks.length;
int i = 0;
int highestMArk = 0;
int mark = 0;
int highestMarkIndex = 0;
for (; i < count; i++) {
mark = marks[i];
if (mark > highestMArk) {
highestMarkIndex = i;
}
}
return highestMarkIndex;
}
}
You are referring to a null array variable parts so you are getting this error,
change your code as i mentioned below it could work
public class TopperReduce extends Reducer<Text, Text, Text, Text> {
int temp;
private String[] names=new String[20];
private int[] marks= new int[20];
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String top = "";
int count =0,topmark;
for (Text t : values)
{
String befsplit= t.toString();
String[] parts = befsplit.split("\t");
names[count]=parts[0];
marks[count]=Integer.parseInt(parts[1]);
count = count+1;
}
topmark=calcTopper(marks);
top=names[topmark]+ "\t"+marks[topmark] ;
context.write(new Text(key), new Text(top));
}
im just a beginner in hadoop.im getting null pointer exception while performing seconday sort
This is my mapper class
public void map(LongWritable key, Text value,
OutputCollector<Text, Employee> outputCollector, Reporter reporter)
throws IOException {
// TODO Auto-generated method stub
String employeeId = value.toString().split(",")[0];
String employeeName= value.toString().split(",")[1];
String employeeDept= value.toString().split(",")[2];
String employeejoinDate= value.toString().split(",")[3];
String employeSalary= value.toString().split(",")[4];
//System.out.println(employeSalary);
Employee employee=new Employee(Integer.parseInt(employeeId),employeeName,employeeDept,employeejoinDate,Integer.parseInt(employeSalary));
outputCollector.collect(new Text(employeeName),employee);
}
This is my reducer
public void reduce(Text arg0, Iterator<Employee> arg1,
OutputCollector<NullWritable,IntWritable> arg2, Reporter arg3)
throws IOException {
// TODO Auto-generated method stub
System.out.println("inside reducer");
while(arg1.hasNext()){
arg2.collect(NullWritable.get(),new IntWritable(arg1.next().getEmployeeSalary()));
}
this is my employee class
public class Employee implements WritableComparable<Employee>{
private int employeeId;
private String employeeName;
private String employeeDept;
private String employeeJoinDt;
private int employeeSalary;
public Employee(int employeeId,String employeeName,String employeeDept,String employeeJoinDt,int employeeSalary){
this.employeeId=employeeId;
this.employeeName=employeeName;
this.employeeDept=employeeDept;
this.employeeJoinDt=employeeJoinDt;
this.employeeSalary=employeeSalary;
}
public int getEmployeeId() {
return employeeId;
}
public void setEmployeeId(int employeeId) {
this.employeeId = employeeId;
}
public String getEmployeeName() {
return employeeName;
}
public void setEmployeeName(String employeeName) {
this.employeeName = employeeName;
}
public String getEmployeeDept() {
return employeeDept;
}
public void setEmployeeDept(String employeeDept) {
this.employeeDept = employeeDept;
}
public String getEmployeeJoinDt() {
return employeeJoinDt;
}
public void setEmployeeJoinDt(String employeeJoinDt) {
this.employeeJoinDt = employeeJoinDt;
}
public int getEmployeeSalary() {
return employeeSalary;
}
public void setEmployeeSalary(int employeeSalary) {
this.employeeSalary = employeeSalary;
}
#Override
public void readFields(DataInput input) throws IOException {
// TODO Auto-generated method stubt
this.employeeId=input.readInt();
this.employeeName=input.readUTF();
this.employeeDept=input.readUTF();
this.employeeJoinDt=input.readUTF();
this.employeeSalary=input.readInt();
}
#Override
public void write(DataOutput output) throws IOException {
// TODO Auto-generated method stub
output.writeInt(this.employeeId);
output.writeUTF(this.employeeName);
output.writeUTF(this.employeeDept);
output.writeUTF(this.employeeJoinDt);
output.writeInt(this.employeeSalary);
}
public int compareTo(Employee employee) {
// TODO Auto-generated method stub
if(this.employeeSalary>employee.getEmployeeSalary())
return 1;
else if(this.employeeSalary<employee.getEmployeeSalary())
return -1;
else
return 0;
}
}
this is my sort comparator class
public class SecondarySortComparator extends WritableComparator {
public SecondarySortComparator(){
super(Employee.class);
System.out.println("sort");
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
// TODO Auto-generated method stub
Employee employee1 = (Employee)a;
Employee employee2 = (Employee)b;
int i = employee1.getEmployeeSalary()>employee2.getEmployeeSalary()?1:-1;
return i;
}
this is my groupo comparator class
public class SecondarySortGroupingComparator extends WritableComparator{
public SecondarySortGroupingComparator(){
super(Employee.class,true);
System.out.println("group");
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
// TODO Auto-generated method stub
Employee employee1 = (Employee)a;
Employee employee2 = (Employee)b;
return employee1.getEmployeeName().compareTo(employee2.getEmployeeName());
}
}
this is the error iam getting
13/09/01 19:13:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/09/01 19:13:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/09/01 19:13:47 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/09/01 19:13:47 INFO mapred.FileInputFormat: Total input paths to process : 1
13/09/01 19:13:47 INFO mapred.JobClient: Running job: job_local_0001
13/09/01 19:13:47 INFO util.ProcessTree: setsid exited with exit code 0
13/09/01 19:13:47 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1b3f8f6
13/09/01 19:13:47 INFO mapred.MapTask: numReduceTasks: 1
13/09/01 19:13:47 INFO mapred.MapTask: io.sort.mb = 100
13/09/01 19:13:48 INFO mapred.JobClient: map 0% reduce 0%
13/09/01 19:13:48 INFO mapred.MapTask: data buffer = 79691776/99614720
sort13/09/01 19:13:48 INFO mapred.MapTask: record buffer = 262144/327680
1
1
1
1
13/09/01 19:13:49 INFO mapred.MapTask: Starting flush of map output
13/09/01 19:13:49 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:96)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1111)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:70)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1399)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
13/09/01 19:13:49 INFO mapred.JobClient: Job complete: job_local_0001
13/09/01 19:13:49 INFO mapred.JobClient: Counters: 0
13/09/01 19:13:49 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at secondarysort.JobRunner.main(JobRunner.java:31)
any suggestions on how to solve this problem?
thanks in advance
This line seems to cause the problem.
context.write(new Text(employeeName), employee);
You are emitting employee object (of type Employee) as a value but not as a key and both SecondarySortComparator and SecondarySortGroupingComparator work upon your keys not values.
Hence, the main problem is you are passing a Text as a key and that is causing the issue. You might consider passing the employee object as a Key instead of Text for the two Comparators to actually work.
You might also want to put a default constructor in your Employee class -
public Employee() { }
Code
I tried to run DataJoin example of Hadoop in Action Book.
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
// import org.apache.commons.logging.Log;
// import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.contrib.utils.join.*;
public class MultiDataSetJoinMR extends Configured implements Tool
{
public static class MapClass extends DataJoinMapperBase
{
protected Text generateInputTag(String inputFile)
{
String datasource = inputFile.split("-")[0];
return new Text(datasource);
}
protected Text generateGroupKey(TaggedMapOutput aRecord)
{
String line = ((Text) aRecord.getData()).toString();
String[] tokens = line.split(",");
String groupKey = tokens[0];
return new Text(groupKey);
}
protected TaggedMapOutput generateTaggedMapOutput(Object value)
{
TaggedWritable retv = new TaggedWritable((Text) value);
retv.setTag(this.inputTag);
return retv;
}
}
public static class Reduce extends DataJoinReducerBase
{
protected TaggedMapOutput combine(Object[] tags, Object[] values)
{
if (tags.length < 2) return null;
String joinedStr = "";
for (int i=0; i<values.length; i++)
{
if (i > 0) joinedStr += ",";
TaggedWritable tw = (TaggedWritable) values[i];
String line = ((Text) tw.getData()).toString();
String[] tokens = line.split(",", 2);
joinedStr += tokens[1];
}
TaggedWritable retv = new TaggedWritable(new Text(joinedStr));
retv.setTag((Text) tags[0]);
return retv;
}
}
public static class TaggedWritable extends TaggedMapOutput
{
private Writable data;
public TaggedWritable() {
this.tag = new Text();
}
public TaggedWritable(Writable data)
{
this.tag = new Text("");
this.data = data;
}
public Writable getData()
{
return data;
}
public void write(DataOutput out) throws IOException
{
this.tag.write(out);
this.data.write(out);
}
public void readFields(DataInput in) throws IOException
{
this.tag.readFields(in);
this.data.readFields(in);
}
}
public int run(String[] args) throws Exception
{
Configuration conf = getConf();
JobConf job = new JobConf(conf, MultiDataSetJoinMR.class);
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2)
{
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("DataJoin");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(TextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(TaggedWritable.class);
job.set("mapred.textoutputformat.separator", ",");
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception
{
int res = ToolRunner.run(new Configuration(),
new MultiDataSetJoinMR(),
args);
System.exit(res);
}
}
Running Command
./hadoop jar MultiDataSetJoin.jar /home/project/dataset /home/project/out
Error
But I am facing following issues.
15 Mar, 2013 4:29:45 PM org.apache.hadoop.metrics.jvm.JvmMetrics init
INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
15 Mar, 2013 4:29:45 PM org.apache.hadoop.mapred.JobClient configureCommandLineOptions
WARNING: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15 Mar, 2013 4:29:45 PM org.apache.hadoop.mapred.FileInputFormat listStatus
INFO: Total input paths to process : 2
15 Mar, 2013 4:29:45 PM org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: Running job: job_local_0001
15 Mar, 2013 4:29:45 PM org.apache.hadoop.mapred.FileInputFormat listStatus
INFO: Total input paths to process : 2
15 Mar, 2013 4:29:45 PM org.apache.hadoop.mapred.MapTask runOldMapper
INFO: numReduceTasks: 1
15 Mar, 2013 4:29:45 PM org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
INFO: io.sort.mb = 100
15 Mar, 2013 4:29:45 PM org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
INFO: data buffer = 79691776/99614720
15 Mar, 2013 4:29:45 PM org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
INFO: record buffer = 262144/327680
15 Mar, 2013 4:29:45 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local_0001
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 5 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 10 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 13 more
Caused by: java.lang.NullPointerException
at MultiDataSetJoinMR$MapClass.generateInputTag(MultiDataSetJoinMR.java:31)
at org.apache.hadoop.contrib.utils.join.DataJoinMapperBase.configure(DataJoinMapperBase.java:60)
... 18 more
null15 Mar, 2013 4:29:46 PM org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: map 0% reduce 0%
15 Mar, 2013 4:29:46 PM org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: Job complete: job_local_0001
15 Mar, 2013 4:29:46 PM org.apache.hadoop.mapred.Counters log
INFO: Counters: 0
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at MultiDataSetJoinMR.run(MultiDataSetJoinMR.java:123)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at MultiDataSetJoinMR.main(MultiDataSetJoinMR.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
From log trace i can identify that inputFile variable gets null value in below method,
protected Text generateInputTag(String inputFile)
{
String datasource = inputFile.split("-")[0];
return new Text(datasource);
}
I dont know from where it gets called and how fix it. Can anyone help me please
I am trying to write simple map reduce program to find largest prime number using new API (0.20.2). This is how my Map and reduce class look likeā¦
public class PrimeNumberMap extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
public void map (LongWritable key, Text Kvalue,Context context) throws IOException,InterruptedException
{
Integer value = new Integer(Kvalue.toString());
if(isNumberPrime(value))
{
context.write(new IntWritable(value), new IntWritable(new Integer(key.toString())));
}
}
boolean isNumberPrime(Integer number)
{
if (number == 1) return false;
if (number == 2) return true;
for (int counter =2; counter<(number/2);counter++)
{
if(number%counter ==0 )
return false;
}
return true;
}
}
public class PrimeNumberReduce extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
public void reduce ( IntWritable primeNo, Iterable<IntWritable> Values,Context context) throws IOException ,InterruptedException
{
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : Values)
{
maxValue= Math.max(maxValue, value.get());
}
//output.collect(primeNo, new IntWritable(maxValue));
context.write(primeNo, new IntWritable(maxValue)); }
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException{
if (args.length ==0)
{
System.err.println(" Usage:\n\tPrimenumber <input Directory> <output Directory>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(Main.class);
job.setJobName("Prime");
// Creating job configuration object
FileInputFormat.addInputPath(job, new Path (args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
String star ="*********************************************";
System.out.println(star+"\n Prime number computer \n"+star);
System.out.println(" Application started ... keeping fingers crossed :/ ");
System.exit(job.waitForCompletion(true)?0:1);
}
}
I am still getting error regarding mismatch of key for map
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1034)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:595)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:668)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-06-13 14:27:21,116 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
Can some one please suggest what is wrong. I have tried all hooks and crooks.
You've not configured the Mapper or reducer classes in your main block, so the default Mapper is being used - which is known as the identity mapper - each pair it receives as input is output (hence the LongWritable as the output key):
job.setMapperClass(PrimeNumberMap.class);
job.setReducerClass(PrimeNumberReduce.class);
The mapper should be defined as below,
public class PrimeNumberMap extends Mapper<**IntWritable**, Text, IntWritable, IntWritable> {
instead of
public class PrimeNumberMap extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
As it is mentioned in the comment before you should have the mapper and reducer defined.
job.setMapperClass(PrimeNumberMap.class);
job.setReducerClass(PrimeNumberReduce.class);
Please refer to Hadoop Definitive guide 3rd edition, Chapter 2, Page 24
I am a fresh hand in hadoop mapreduce program.
When mapping, I use IntWritable but I reduce the values in IntWritable format and convert the result to double before using DoubleWritable in context write.
It fails when running.
My method to handle the covert int in map to double in reduce is:
Mapper(LongWritable,Text,Text,DoubleWritable)
Reducer(Text,DoubleWritable,Text,DoubleWritable)
job.setOutputValueClass(DoubleWritable.Class)