im just a beginner in hadoop.im getting null pointer exception while performing seconday sort
This is my mapper class
public void map(LongWritable key, Text value,
OutputCollector<Text, Employee> outputCollector, Reporter reporter)
throws IOException {
// TODO Auto-generated method stub
String employeeId = value.toString().split(",")[0];
String employeeName= value.toString().split(",")[1];
String employeeDept= value.toString().split(",")[2];
String employeejoinDate= value.toString().split(",")[3];
String employeSalary= value.toString().split(",")[4];
//System.out.println(employeSalary);
Employee employee=new Employee(Integer.parseInt(employeeId),employeeName,employeeDept,employeejoinDate,Integer.parseInt(employeSalary));
outputCollector.collect(new Text(employeeName),employee);
}
This is my reducer
public void reduce(Text arg0, Iterator<Employee> arg1,
OutputCollector<NullWritable,IntWritable> arg2, Reporter arg3)
throws IOException {
// TODO Auto-generated method stub
System.out.println("inside reducer");
while(arg1.hasNext()){
arg2.collect(NullWritable.get(),new IntWritable(arg1.next().getEmployeeSalary()));
}
this is my employee class
public class Employee implements WritableComparable<Employee>{
private int employeeId;
private String employeeName;
private String employeeDept;
private String employeeJoinDt;
private int employeeSalary;
public Employee(int employeeId,String employeeName,String employeeDept,String employeeJoinDt,int employeeSalary){
this.employeeId=employeeId;
this.employeeName=employeeName;
this.employeeDept=employeeDept;
this.employeeJoinDt=employeeJoinDt;
this.employeeSalary=employeeSalary;
}
public int getEmployeeId() {
return employeeId;
}
public void setEmployeeId(int employeeId) {
this.employeeId = employeeId;
}
public String getEmployeeName() {
return employeeName;
}
public void setEmployeeName(String employeeName) {
this.employeeName = employeeName;
}
public String getEmployeeDept() {
return employeeDept;
}
public void setEmployeeDept(String employeeDept) {
this.employeeDept = employeeDept;
}
public String getEmployeeJoinDt() {
return employeeJoinDt;
}
public void setEmployeeJoinDt(String employeeJoinDt) {
this.employeeJoinDt = employeeJoinDt;
}
public int getEmployeeSalary() {
return employeeSalary;
}
public void setEmployeeSalary(int employeeSalary) {
this.employeeSalary = employeeSalary;
}
#Override
public void readFields(DataInput input) throws IOException {
// TODO Auto-generated method stubt
this.employeeId=input.readInt();
this.employeeName=input.readUTF();
this.employeeDept=input.readUTF();
this.employeeJoinDt=input.readUTF();
this.employeeSalary=input.readInt();
}
#Override
public void write(DataOutput output) throws IOException {
// TODO Auto-generated method stub
output.writeInt(this.employeeId);
output.writeUTF(this.employeeName);
output.writeUTF(this.employeeDept);
output.writeUTF(this.employeeJoinDt);
output.writeInt(this.employeeSalary);
}
public int compareTo(Employee employee) {
// TODO Auto-generated method stub
if(this.employeeSalary>employee.getEmployeeSalary())
return 1;
else if(this.employeeSalary<employee.getEmployeeSalary())
return -1;
else
return 0;
}
}
this is my sort comparator class
public class SecondarySortComparator extends WritableComparator {
public SecondarySortComparator(){
super(Employee.class);
System.out.println("sort");
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
// TODO Auto-generated method stub
Employee employee1 = (Employee)a;
Employee employee2 = (Employee)b;
int i = employee1.getEmployeeSalary()>employee2.getEmployeeSalary()?1:-1;
return i;
}
this is my groupo comparator class
public class SecondarySortGroupingComparator extends WritableComparator{
public SecondarySortGroupingComparator(){
super(Employee.class,true);
System.out.println("group");
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
// TODO Auto-generated method stub
Employee employee1 = (Employee)a;
Employee employee2 = (Employee)b;
return employee1.getEmployeeName().compareTo(employee2.getEmployeeName());
}
}
this is the error iam getting
13/09/01 19:13:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/09/01 19:13:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/09/01 19:13:47 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/09/01 19:13:47 INFO mapred.FileInputFormat: Total input paths to process : 1
13/09/01 19:13:47 INFO mapred.JobClient: Running job: job_local_0001
13/09/01 19:13:47 INFO util.ProcessTree: setsid exited with exit code 0
13/09/01 19:13:47 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1b3f8f6
13/09/01 19:13:47 INFO mapred.MapTask: numReduceTasks: 1
13/09/01 19:13:47 INFO mapred.MapTask: io.sort.mb = 100
13/09/01 19:13:48 INFO mapred.JobClient: map 0% reduce 0%
13/09/01 19:13:48 INFO mapred.MapTask: data buffer = 79691776/99614720
sort13/09/01 19:13:48 INFO mapred.MapTask: record buffer = 262144/327680
1
1
1
1
13/09/01 19:13:49 INFO mapred.MapTask: Starting flush of map output
13/09/01 19:13:49 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:96)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1111)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:70)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1399)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
13/09/01 19:13:49 INFO mapred.JobClient: Job complete: job_local_0001
13/09/01 19:13:49 INFO mapred.JobClient: Counters: 0
13/09/01 19:13:49 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at secondarysort.JobRunner.main(JobRunner.java:31)
any suggestions on how to solve this problem?
thanks in advance
This line seems to cause the problem.
context.write(new Text(employeeName), employee);
You are emitting employee object (of type Employee) as a value but not as a key and both SecondarySortComparator and SecondarySortGroupingComparator work upon your keys not values.
Hence, the main problem is you are passing a Text as a key and that is causing the issue. You might consider passing the employee object as a Key instead of Text for the two Comparators to actually work.
You might also want to put a default constructor in your Employee class -
public Employee() { }
Related
Firstly, I am a newbie at Hadoop MapReduce. My reducer does not run but shows that the job is successfully completed. Below is my console output :
INFO mapreduce.Job: Running job: job_1418240815217_0015
INFO mapreduce.Job: Job job_1418240815217_0015 running in uber mode : false
INFO mapreduce.Job: map 0% reduce 0%
INFO mapreduce.Job: map 100% reduce 0%
INFO mapreduce.Job: Job job_1418240815217_0015 completed successfully
INFO mapreduce.Job: Counters: 30
The main class is :
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
#SuppressWarnings("deprecation")
Job job = new Job(conf,"NPhase2");
job.setJarByClass(NPhase2.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(NPhase2Value.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
int numberOfPartition = 0;
List<String> other_args = new ArrayList<String>();
for(int i = 0; i < args.length; ++i)
{
try {
if ("-m".equals(args[i])) {
//conf.setNumMapTasks(Integer.parseInt(args[++i]));
++i;
} else if ("-r".equals(args[i])) {
job.setNumReduceTasks(Integer.parseInt(args[++i]));
} else if ("-k".equals(args[i])) {
int knn = Integer.parseInt(args[++i]);
conf.setInt("knn", knn);
System.out.println(knn);
} else {
other_args.add(args[i]);
}
job.setNumReduceTasks(numberOfPartition * numberOfPartition);
//conf.setNumReduceTasks(1);
} catch (NumberFormatException except) {
System.out.println("ERROR: Integer expected instead of " + args[i]);
} catch (ArrayIndexOutOfBoundsException except) {
System.out.println("ERROR: Required parameter missing from " + args[i-1]);
}
}
// Make sure there are exactly 2 parameters left.
if (other_args.size() != 2) {
System.out.println("ERROR: Wrong number of parameters: " +
other_args.size() + " instead of 2.");
}
FileInputFormat.setInputPaths(job, other_args.get(0));
FileOutputFormat.setOutputPath(job, new Path(other_args.get(1)));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
My mapper is :
public static class MapClass extends Mapper
{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String line = value.toString();
String[] parts = line.split("\\s+");
// key format <rid1>
IntWritable mapKey = new IntWritable(Integer.valueOf(parts[0]));
// value format <rid2, dist>
NPhase2Value np2v = new NPhase2Value(Integer.valueOf(parts[1]), Float.valueOf(parts[2]));
context.write(mapKey, np2v);
}
}
My reducer class is :
public static class Reduce extends Reducer<IntWritable, NPhase2Value, NullWritable, Text>
{
int numberOfPartition;
int knn;
class Record
{
public int id2;
public float dist;
Record(int id2, float dist)
{
this.id2 = id2;
this.dist = dist;
}
public String toString()
{
return Integer.toString(id2) + " " + Float.toString(dist);
}
}
class RecordComparator implements Comparator<Record>
{
public int compare(Record o1, Record o2)
{
int ret = 0;
float dist = o1.dist - o2.dist;
if (Math.abs(dist) < 1E-6)
ret = o1.id2 - o2.id2;
else if (dist > 0)
ret = 1;
else
ret = -1;
return -ret;
}
}
public void setup(Context context)
{
Configuration conf = new Configuration();
conf = context.getConfiguration();
numberOfPartition = conf.getInt("numberOfPartition", 2);
knn = conf.getInt("knn", 3);
}
public void reduce(IntWritable key, Iterator<NPhase2Value> values, Context context) throws IOException, InterruptedException
{
//initialize the pq
RecordComparator rc = new RecordComparator();
PriorityQueue<Record> pq = new PriorityQueue<Record>(knn + 1, rc);
// For each record we have a reduce task
// value format <rid1, rid2, dist>
while (values.hasNext())
{
NPhase2Value np2v = values.next();
int id2 = np2v.getFirst().get();
float dist = np2v.getSecond().get();
Record record = new Record(id2, dist);
pq.add(record);
if (pq.size() > knn)
pq.poll();
}
while(pq.size() > 0)
{
context.write(NullWritable.get(), new Text(key.toString() + " " + pq.poll().toString()));
//break; // only ouput the first record
}
} // reduce
}
This is my helper class :
public class NPhase2Value implements WritableComparable {
private IntWritable first;
private FloatWritable second;
public NPhase2Value() {
set(new IntWritable(), new FloatWritable());
}
public NPhase2Value(int first, float second) {
set(new IntWritable(first), new FloatWritable(second));
}
public void set(IntWritable first, FloatWritable second) {
this.first = first;
this.second = second;
}
public IntWritable getFirst() {
return first;
}
public FloatWritable getSecond() {
return second;
}
#Override
public void write(DataOutput out) throws IOException {
first.write(out);
second.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
first.readFields(in);
second.readFields(in);
}
#Override
public boolean equals(Object o) {
if (o instanceof NPhase2Value) {
NPhase2Value np2v = (NPhase2Value) o;
return first.equals(np2v.first) && second.equals(np2v.second);
}
return false;
}
#Override
public String toString() {
return first.toString() + " " + second.toString();
}
#Override
public int compareTo(NPhase2Value np2v) {
return 1;
}
}
The command line command I use is :
hadoop jar knn.jar NPhase2 -m 1 -r 3 -k 4 phase1out phase2out
I am trying hard to figure out the error but still not able to come up with solution. Please help me in this regards as I am running on a tight schedule.
Because you have set the number of reducer task as 0. See this:
int numberOfPartition = 0;
//.......
job.setNumReduceTasks(numberOfPartition * numberOfPartition);
I dont see you have resetted numberOfPartition anywhere in your code. I thins you should set it where you are parsing -r option or remove call to setNumReduceTasks method as above completely as you are setting it already while parsing -r option.
import java.io.*;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.lib.db.DBWritable;
public class DBInputWritable implements Writable, DBWritable
{
String symbol;
String date;
double open;
double high;
double low;
double close;
int volume;
double adjClose;
//private final static SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
public void readFields(DataInput in) throws IOException
{
symbol=in.readLine();
date=in.readLine();
open=in.readDouble();
high=in.readDouble();
low=in.readDouble();
close=in.readDouble();
volume=in.readInt();
adjClose=in.readDouble();
}
public void readFields(ResultSet rs) throws SQLException
{
symbol = rs.getString(2);
date = rs.getString(3);
open = rs.getDouble(4);
high = rs.getDouble(5);
low = rs.getDouble(6);
close = rs.getDouble(7);
volume = rs.getInt(8);
adjClose = rs.getDouble(9);
}
public void write(DataOutput out) throws IOException
{
}
public void write( PreparedStatement ps) throws SQLException
{
}
public String getSymbol()
{
return symbol;
}
public String getDate()
{
return date;
}
public double getOpen()
{
return open;
}
public double getHigh()
{
return high;
}
public double getLow()
{
return low;
}
public double getClose()
{
return close;
}
public int getVolume()
{
return volume;
}
public double getAdjClose()
{
return adjClose;
}
}
public class DBOutputWritable implements Writable, DBWritable
{
String symbol;
String date;
double open;
double high;
double low;
double close;
int volume;
double adjClose;
;
public DBOutputWritable(String symbol,String date,String open,String high,String low,String close,String volume,String adjClose)
{
this.symbol=symbol;
this.date=date;
this.open=Double.parseDouble(open);
this.high=Double.parseDouble(high);
this.low=Double.parseDouble(low);
this.close=Double.parseDouble(close);
this.volume=Integer.parseInt(volume);
this.adjClose=Double.parseDouble(adjClose);
}
public void readFields(DataInput in) throws IOException
{
}
public void readFields(ResultSet rs) throws SQLException
{
}
public void write(DataOutput out) throws IOException
{
out.writeChars(symbol);
out.writeChars(date);
out.writeDouble(open);
out.writeDouble(high);
out.writeDouble(low);
out.writeDouble(close);
out.writeInt(volume);
out.writeDouble(adjClose);
}
public void write(PreparedStatement ps) throws SQLException
{
ps.setString(1,symbol);
ps.setString(2,date);
ps.setDouble(3,open);
ps.setDouble(4,high);
ps.setDouble(5,low);
ps.setDouble(6,close);
ps.setInt(7,volume);
ps.setDouble(8,adjClose);
}
}
public class Map extends Mapper<LongWritable,DBInputWritable,Text,Text>
{
public void map(LongWritable key, DBInputWritable value, Context ctx)
{
try
{
Text set;
set= new Text(value.getDate());
String line = value.getSymbol()+","+value.getDate()+","+value.getOpen()+","+value.getHigh()+","+value.getLow()+","+value.getClose()+","+value.getVolume()+","+value.getAdjClose();
ctx.write(set,new Text(line));
}
catch(IOException e)
{
e.printStackTrace();
}
catch(InterruptedException e)
{
e.printStackTrace();
}
}
}
public class Reduce extends Reducer<Text, Text, DBOutputWritable, NullWritable>
{
public void reduce(Text key, Text value, Context ctx)
{
try
{
String []line= value.toString().split(",");
String sym=line[0];
String dt=line[1];
String opn=line[2];
String hgh=line[3];
String lw=line[4];
String cls=line[5];
String vlm=line[6];
String adcls=line[7];
ctx.write(new DBOutputWritable(sym,dt,opn,hgh,lw,cls,vlm,adcls),NullWritable.get());
}
catch(IOException e)
{
e.printStackTrace();
}
catch(InterruptedException e)
{
e.printStackTrace();
}
}
}
public class Main
{
public static void main(String [] args) throws Exception
{
Configuration conf = new Configuration();
DBConfiguration.configureDB(conf,
"com.mysql.jdbc.Driver", //Driver Class
"jdbc:mysql://192.168.198.128:3306/testDb", //DB URL
"sqoopuser", //USERNAME
"passphrase"); //PASSWORD
Job job = new Job(conf);
job.setJarByClass(Main.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(DBOutputWritable.class);
job.setOutputValueClass(NullWritable.class);
job.setInputFormatClass(DBInputFormat.class);
job.setOutputFormatClass(DBOutputFormat.class);
DBInputFormat.setInput(
job,
DBInputWritable.class,
"aapldata", //input table name
null,
null,
new String[] {"stock","symbol", "date" ,"open", "high", "low", "close", "volume", "adjClose"}
//Table Columns
);
DBOutputFormat.setOutput(
job,
"aapldatanew", //Output Table Name
new String[] {"symbol", "date" ,"open", "high", "low", "close", "volume", "adjClose"}
//Table Columns
);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
I think Code is picture perfect. But still I encounter below error:
14/11/26 22:09:47 INFO mapred.JobClient: map 100% reduce 0%
14/11/26 22:09:55 INFO mapred.JobClient: map 100% reduce 33%
14/11/26 22:09:58 INFO mapred.JobClient: Task Id : attempt_201411262208_0001_r_000000_2, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.mapreduce.lidb.DBWritable
at org.apache.hadoop.mapreduce.lib.db.DBOutputFormat$DBRecordWriter.write(DBOutputFormat.java:66
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:156)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Need your valuable Insights.
In your map class get the input as text instead of DBInputWritable:
public class Map extends Mapper {
public void map(LongWritable key,Text value, Context ctx)
I can find 2 problems:
mapper output key,value classes not matching with the job configuration. Please chek it.
mapper while jobconfigured. Please correct as your need. I approach the 2nd problem considering mapper key,value pair is your right choice.
you didn't override reduce method !!
As per your job configurationThe signature should be:
public void reduce(Text key, Iterable<Text> values, Context context){
//...your code
}
Explanation for your exception
Since you didn't override Reducer's reduce, the reducer will reduce with the default implementation(which is called identity reducer).Source code:
protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
) throws IOException, InterruptedException {
for(VALUEIN value: values) {
context.write((KEYOUT) key, (VALUEOUT) value);
}
}
As specified in the source code it simply iterate over values and write it using Output key value calsses.But in your case intermmediate pairs(i.e IntWritable,Text) do not match with pairs of DBoutputFormat.
HTH !
I am new to hadoop and I am trying to get max salary of an employee. My data looks like
1231,"","","",4000<br/>
1232,"","","",5000<br/>
..................<br/>
..................<br/>
This is my mapper class, here, I am to trying to emit the full tuple
public class maxmap extends MapReduceBase implements Mapper<LongWritable,Text,Text,employee> {
#Override
public void map(LongWritable key, Text value,
OutputCollector<Text,employee> outputCollector, Reporter reporter)
throws IOException {
// TODO Auto-generated method stub
//TupleWritable tupleWritable= new TupleWritable(new Writable[]{new Text(value.toString().split(",")[1]),
//new Text(value.toString().split(",")[0])
//});Str
String employeeId = value.toString().split(",")[0];
int count =1231;
employee employee=new employee(Integer.parseInt(employeeId), "employeeName", "StemployeeDept", "employeeJoinDt",1231);
//tupleWritable.write();
outputCollector.collect(new Text("max salry"),employee);
}
}
This is my Reducer class
public class maxsalreduce extends MapReduceBase implements Reducer<Text,employee,Text,IntWritable> {
#Override
public void reduce(Text key, Iterator<employee> values,
OutputCollector<Text, IntWritable> collector, Reporter reporter)
throws IOException {
// TODO Auto-generated method stub
System.out.println("in reducer");
while(values.hasNext()){
employee employee=values.next();
System.out.println("employee id"+employee.employeeId);
}
collector.collect(new Text(""), new IntWritable(1));
}
}
This is my employee class
public class employee implements Writable{
public int employeeId;
private String employeeName;
private String employeeDept;
private String employeeJoinDt;
public employee(int employeeId,String employeeName,String employeeDept,String employeeJoinDt,int employeeSalary){
this.employeeId=employeeId;
System.out.println(this.employeeId);
this.employeeName=employeeName;
this.employeeDept=employeeDept;
this.employeeJoinDt=employeeJoinDt;
this.employeeSalary=employeeSalary;
}
public employee() {
// TODO Auto-generated constructor stub
}
public int getEmployeeId() {
return employeeId;
}
public void setEmployeeId(int employeeId) {
this.employeeId = employeeId;
}
public String getEmployeeName() {
return employeeName;
}
public void setEmployeeName(String employeeName) {
this.employeeName = employeeName;
}
public String getEmployeeDept() {
return employeeDept;
}
public void setEmployeeDept(String employeeDept) {
this.employeeDept = employeeDept;
}
public String getEmployeeJoinDt() {
return employeeJoinDt;
}
public void setEmployeeJoinDt(String employeeJoinDt) {
this.employeeJoinDt = employeeJoinDt;
}
public int getEmployeeSalary() {
return employeeSalary;
}
public void setEmployeeSalary(int employeeSalary) {
this.employeeSalary = employeeSalary;
}
private int employeeSalary;
#Override
public void readFields(DataInput input) throws IOException {
// TODO Auto-generated method stubt
System.out.println("employee id is"+input.readInt());
//this.employeeId=input.readInt();
//this.employeeName=input.readUTF();
//this.employeeDept=input.readUTF();
//this.employeeJoinDt=input.readUTF();mployee id
//this.employeeSalary=input.readInt();
new employee(input.readInt(),input.readUTF(),input.readUTF(),input.readUTF(),input.readInt());
}
#Override
public void write(DataOutput output) throws IOException {
// TODO Auto-generated method stub
output.writeInt(this.employeeId);
output.writeUTF(this.employeeName);
output.writeUTF(this.employeeDept);
output.writeUTF(this.employeeJoinDt);
output.writeInt(this.employeeSalary);
}
}
This is my job runner
public class jobrunner {
public static void main(String[] args) throws IOException
{
JobConf jobConf = new JobConf(jobrunner.class);
jobConf.setJobName("Count no of employees");
jobConf.setMapperClass(maxmap.class);
jobConf.setReducerClass(maxsalreduce.class);
FileInputFormat.setInputPaths(jobConf, new Path("hdfs://localhost:9000/employee_data.txt"));
FileOutputFormat.setOutputPath(jobConf,new Path("hdfs://localhost:9000/dummy20.txt"));
jobConf.setOutputKeyClass(Text.class);
jobConf.setOutputValueClass(employee.class);
JobClient.runJob(jobConf);
}
}
This is the exception I am getting
java.lang.RuntimeException: problem advancing post rec#0
at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1214)
at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:249)
at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:245)
at tuplewritable.maxsalreduce.reduce(maxsalreduce.java:24)
at tuplewritable.maxsalreduce.reduce(maxsalreduce.java:1)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readUTF(DataInputStream.java:592)
at java.io.DataInputStream.readUTF(DataInputStream.java:547)
at tuplewritable.employee.readFields(employee.java:76)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1271)
at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1211)
... 7 more
13/08/17 20:44:14 INFO mapred.JobClient: map 100% reduce 0%
13/08/17 20:44:14 INFO mapred.JobClient: Job complete: job_local_0001
13/08/17 20:44:14 INFO mapred.JobClient: Counters: 21
13/08/17 20:44:14 INFO mapred.JobClient: File Input Format Counters
13/08/17 20:44:14 INFO mapred.JobClient: Bytes Read=123
13/08/17 20:44:14 INFO mapred.JobClient: FileSystemCounters
13/08/17 20:44:14 INFO mapred.JobClient: FILE_BYTES_READ=146
13/08/17 20:44:14 INFO mapred.JobClient: HDFS_BYTES_READ=123
13/08/17 20:44:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=39985
13/08/17 20:44:14 INFO mapred.JobClient: Map-Reduce Framework
13/08/17 20:44:14 INFO mapred.JobClient: Map output materialized bytes=270
13/08/17 20:44:14 INFO mapred.JobClient: Map input records=4
13/08/17 20:44:14 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/17 20:44:14 INFO mapred.JobClient: Spilled Records=4
13/08/17 20:44:14 INFO mapred.JobClient: Map output bytes=256
13/08/17 20:44:14 INFO mapred.JobClient: Total committed heap usage (bytes)=160763904
13/08/17 20:44:14 INFO mapred.JobClient: CPU time spent (ms)=0
13/08/17 20:44:14 INFO mapred.JobClient: Map input bytes=123
13/08/17 20:44:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=92
13/08/17 20:44:14 INFO mapred.JobClient: Combine input records=0
13/08/17 20:44:14 INFO mapred.JobClient: Reduce input records=0
13/08/17 20:44:14 INFO mapred.JobClient: Reduce input groups=0
13/08/17 20:44:14 INFO mapred.JobClient: Combine output records=0
13/08/17 20:44:14 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
13/08/17 20:44:14 INFO mapred.JobClient: Reduce output records=0
13/08/17 20:44:14 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
13/08/17 20:44:14 INFO mapred.JobClient: Map output records=4
13/08/17 20:44:14 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at tuplewritable.jobrunner.main(jobrunner.java:30)
13/08/17 20:44:14 ERROR hdfs.DFSClient: Exception closing file /dummy20.txt/_temporary/_attempt_local_0001_r_000000_0/part-00000 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /dummy20.txt/_temporary/_attempt_local_0001_r_000000_0/part-00000 File does not exist. Holder DFSClient_1595916561 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1629)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663)
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /dummy20.txt/_temporary/_attempt_local_0001_r_000000_0/part-00000 File does not exist. Holder DFSClient_1595916561 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1629)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663)
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3894)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3809)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1342)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:275)
at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:328)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1446)
at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:277)
at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:260)
You have these going wrong in your classes.
In the Employee class remove
System.out.println("employee id is"+input.readInt());
And,
new employee(input.readInt(),input.readUTF(),input.readUTF(),
input.readUTF(),input.readInt());
from,
#Override
public void readFields(DataInput input) throws IOException {
// TODO Auto-generated method stubt
System.out.println("employee id is"+input.readInt());
//this.employeeId=input.readInt();
//this.employeeName=input.readUTF();
//this.employeeDept=input.readUTF();
//this.employeeJoinDt=input.readUTF();mployee id
//this.employeeSalary=input.readInt();
new employee(input.readInt(),input.readUTF(),input.readUTF(),input.readUTF(),input.readInt());
}
Reason: The System.out.println("employee id is"+input.readInt()); already deserializes your first input and that's why using input.readInt() again is causing the issue. And the other line new Employee(....), you are possibly well aware of not using it like this. Atleast I don't do.
Next in the JobRunner class,
remove this line:
jobConf.setOutputValueClass(employee.class);
add these lines,
jobConf.setMapOutputKeyClass(Text.class);
jobConf.setMapOutputValueClass(employee.class);
jobConf.setOutputKeyClass(Text.class);
jobConf.setOutputValueClass(IntWritable.class);
Addendum: Please use capital letters to start a class name. It breaks Java naming convention, if you are not.
Can somebody give one good example link for mapreduce with Hbase? My requirement is run mapreduce on hdfs file and store reducer output to hbase table. Mapper input will be hdfs file and output will be Text,IntWritable key value pairs. Reducers output will be Put object ie add reducer Iterable IntWritable values and store in hbase table.
Here is the code which will solve your problem
Driver
HBaseConfiguration conf = HBaseConfiguration.create();
Job job = new Job(conf,"JOB_NAME");
job.setJarByClass(yourclass.class);
job.setMapperClass(yourMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Intwritable.class);
FileInputFormat.setInputPaths(job, new Path(inputPath));
TableMapReduceUtil.initTableReducerJob(TABLE,
yourReducer.class, job);
job.setReducerClass(yourReducer.class);
job.waitForCompletion(true);
Mapper&Reducer
class yourMapper extends Mapper<LongWritable, Text, Text,IntWritable> {
//#overide map()
}
class yourReducer
extends
TableReducer<Text, IntWritable,
ImmutableBytesWritable>
{
//#override rdeuce()
}
**Ckeck the bellow code that works fine for me with Phoenix Hbase and map reduce **
This program will read data from Hbase table and inset result in to another table after map-reduce job .
Table :-> STOCK ,STOCK_STATS
StockComputationJob.java
public static class StockMapper extends Mapper<NullWritable, StockWritable, Text , DoubleWritable> {
private Text stock = new Text();
private DoubleWritable price = new DoubleWritable ();
#Override
protected void map(NullWritable key, StockWritable stockWritable, Context context) throws IOException, InterruptedException {
double[] recordings = stockWritable.getRecordings();
final String stockName = stockWritable.getStockName();
System.out.println("Map-"+recordings);
double maxPrice = Double.MIN_VALUE;
for(double recording : recordings) {
System.out.println("M-"+key+"-"+recording);
if(maxPrice < recording) {
maxPrice = recording;
}
}
System.out.println(stockName+"--"+maxPrice);
stock.set(stockName);
price.set(maxPrice);
context.write(stock,price);
}
}
public static void main(String[] args) throws Exception {
final Configuration conf = new Configuration();
HBaseConfiguration.addHbaseResources(conf);
conf.set(HConstants.ZOOKEEPER_QUORUM, zkUrl);
final Job job = Job.getInstance(conf, "stock-stats-job");
// We can either specify a selectQuery or ignore it when we would like to retrieve all the columns
final String selectQuery = "SELECT STOCK_NAME,RECORDING_YEAR,RECORDINGS_QUARTER FROM STOCK ";
// StockWritable is the DBWritable class that enables us to process the Result of the above query
PhoenixMapReduceUtil.setInput(job,StockWritable.class,"STOCK",selectQuery);
// Set the target Phoenix table and the columns
PhoenixMapReduceUtil.setOutput(job, "STOCK_STATS", "STOCK_NAME,MAX_RECORDING");
job.setMapperClass(StockMapper.class);
job.setReducerClass(StockReducer.class);
job.setOutputFormatClass(PhoenixOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(DoubleWritable.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(StockWritable.class);
TableMapReduceUtil.addDependencyJars(job);
job.waitForCompletion(true);
}
}
StockReducer.java
public class StockReducer extends Reducer<Text, DoubleWritable, NullWritable , StockWritable> {
protected void reduce(Text key, Iterable<DoubleWritable> recordings, Context context) throws IOException, InterruptedException {
double maxPrice = Double.MIN_VALUE;
System.out.println(recordings);
for(DoubleWritable recording : recordings) {
System.out.println("R-"+key+"-"+recording);
if(maxPrice < recording.get()) {
maxPrice = recording.get();
}
}
final StockWritable stock = new StockWritable();
stock.setStockName(key.toString());
stock.setMaxPrice(maxPrice);
System.out.println(key+"--"+maxPrice);
context.write(NullWritable.get(),stock);
}
}
StockWritable.java
public class StockWritable implements DBWritable,Writable {
private String stockName;
private int year;
private double[] recordings;
private double maxPrice;
public void readFields(DataInput input) throws IOException {
}
public void write(DataOutput output) throws IOException {
}
public void readFields(ResultSet rs) throws SQLException {
stockName = rs.getString("STOCK_NAME");
setYear(rs.getInt("RECORDING_YEAR"));
final Array recordingsArray = rs.getArray("RECORDINGS_QUARTER");
setRecordings((double[])recordingsArray.getArray());
}
public void write(PreparedStatement pstmt) throws SQLException {
pstmt.setString(1, stockName);
pstmt.setDouble(2, maxPrice);
}
public int getYear() {
return year;
}
public void setYear(int year) {
this.year = year;
}
public double[] getRecordings() {
return recordings;
}
public void setRecordings(double[] recordings) {
this.recordings = recordings;
}
public double getMaxPrice() {
return maxPrice;
}
public void setMaxPrice(double maxPrice) {
this.maxPrice = maxPrice;
}
public String getStockName() {
return stockName;
}
public void setStockName(String stockName) {
this.stockName = stockName;
}
}
I am trying to write simple map reduce program to find largest prime number using new API (0.20.2). This is how my Map and reduce class look likeā¦
public class PrimeNumberMap extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
public void map (LongWritable key, Text Kvalue,Context context) throws IOException,InterruptedException
{
Integer value = new Integer(Kvalue.toString());
if(isNumberPrime(value))
{
context.write(new IntWritable(value), new IntWritable(new Integer(key.toString())));
}
}
boolean isNumberPrime(Integer number)
{
if (number == 1) return false;
if (number == 2) return true;
for (int counter =2; counter<(number/2);counter++)
{
if(number%counter ==0 )
return false;
}
return true;
}
}
public class PrimeNumberReduce extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
public void reduce ( IntWritable primeNo, Iterable<IntWritable> Values,Context context) throws IOException ,InterruptedException
{
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : Values)
{
maxValue= Math.max(maxValue, value.get());
}
//output.collect(primeNo, new IntWritable(maxValue));
context.write(primeNo, new IntWritable(maxValue)); }
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException{
if (args.length ==0)
{
System.err.println(" Usage:\n\tPrimenumber <input Directory> <output Directory>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(Main.class);
job.setJobName("Prime");
// Creating job configuration object
FileInputFormat.addInputPath(job, new Path (args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
String star ="*********************************************";
System.out.println(star+"\n Prime number computer \n"+star);
System.out.println(" Application started ... keeping fingers crossed :/ ");
System.exit(job.waitForCompletion(true)?0:1);
}
}
I am still getting error regarding mismatch of key for map
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1034)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:595)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:668)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-06-13 14:27:21,116 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
Can some one please suggest what is wrong. I have tried all hooks and crooks.
You've not configured the Mapper or reducer classes in your main block, so the default Mapper is being used - which is known as the identity mapper - each pair it receives as input is output (hence the LongWritable as the output key):
job.setMapperClass(PrimeNumberMap.class);
job.setReducerClass(PrimeNumberReduce.class);
The mapper should be defined as below,
public class PrimeNumberMap extends Mapper<**IntWritable**, Text, IntWritable, IntWritable> {
instead of
public class PrimeNumberMap extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
As it is mentioned in the comment before you should have the mapper and reducer defined.
job.setMapperClass(PrimeNumberMap.class);
job.setReducerClass(PrimeNumberReduce.class);
Please refer to Hadoop Definitive guide 3rd edition, Chapter 2, Page 24
I am a fresh hand in hadoop mapreduce program.
When mapping, I use IntWritable but I reduce the values in IntWritable format and convert the result to double before using DoubleWritable in context write.
It fails when running.
My method to handle the covert int in map to double in reduce is:
Mapper(LongWritable,Text,Text,DoubleWritable)
Reducer(Text,DoubleWritable,Text,DoubleWritable)
job.setOutputValueClass(DoubleWritable.Class)