I'm a beginner in Hadoop and was working with the ArrayWritables in Hadoop map-reduce.
And this is the Mapper code I'm using :-
public class Base_Mapper extends Mapper<LongWritable, Text, Text, IntWritable> {
String currLine[] = new String[1000];
Text K = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
currLine = line.split("");
int count = 0;
for (int i = 0; i < currLine.length; i++) {
String currToken = currLine[i];
count++;
K.set(currToken);
context.write(K, new IntWritable(count));
}
}
}
Reducer :-
public class Base_Reducer extends Reducer<Text, IntWritable,Text, IntArrayWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
IntArrayWritable finalArray = new IntArrayWritable();
IntWritable[] arr = new IntWritable[1000];
for (int i = 0; i < 150; i++)
arr[i] = new IntWritable(0);
int redCount = 0;
for (IntWritable val : values) {
int thisValue = val.get();
for (int i = 1; i <= 150; i++) {
if (thisValue == i)
arr[i - 1] = new IntWritable(redCount++);
}
}
finalArray.set(arr);
context.write(key, finalArray);
}
}
I'm using IntArrayWritable as subclass of ArrayWritable as shown below :-
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
public IntArrayWritable(IntWritable[] values) {
super(IntWritable.class, values);
}
}
My Intended output of the Job was some set of Bases as key(which is correct) and an array of IntWritables as value.
But I'm getting the output as:-
com.feathersoft.Base.IntArrayWritable#30374534
A com.feathersoft.Base.IntArrayWritable#7ca071a6
C com.feathersoft.Base.IntArrayWritable#9858936
G com.feathersoft.Base.IntArrayWritable#1df33d1c
N com.feathersoft.Base.IntArrayWritable#4c3108a0
T com.feathersoft.Base.IntArrayWritable#272d6774
What are all changes I have to make inorder to resolve this issue ?
You need to override default behavior of toString() method in your IntArrayWritable implementation.
Please try this:
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
public IntArrayWritable(IntWritable[] values) {
super(IntWritable.class, values);
}
#Override
public String toString() {
StringBuilder sb = new StringBuilder("[");
for (String s : super.toStrings())
{
sb.append(s).append(" ");
}
sb.append("]")
return sb.toString();
}
}
If you liked this answer please mark it as accepted. Thank you.
Related
I'm trying to get pearson's correlation for all pairs of column.
This is My MapReduce code :
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Pearson
{
public static class MyMapper extends Mapper<LongWritable,Text,IndexPair,ValuePair>{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] tokens = line.split(",");
double[] arr = toDouble(tokens);
for(int i=0; i < arr.length; i++) {
for(int j=i+1; j < arr.length; j++) {
IndexPair k2 = new IndexPair(i, j);
ValuePair v2 = new ValuePair(arr[i], arr[j]);
context.write(k2, v2);
}
}
}
public double[] toDouble(String[] tokens) {
double[] arr = new double[tokens.length];
for(int i=0; i < tokens.length; i++) {
arr[i] = Double.parseDouble(tokens[i]);
}
return arr;
}
}
public static class MyReduce extends Reducer<IndexPair,ValuePair,IndexPair,DoubleWritable>
{
public void reduce(IndexPair key, Iterable<ValuePair> values, Context context) throws IOException, InterruptedException {
double x = 0.0d;
double y = 0.0d;
double xx = 0.0d;
double yy = 0.0d;
double xy = 0.0d;
double n = 0.0d;
for(ValuePair pairs : values) {
x += pairs.v1;
y += pairs.v2;
xx += Math.pow(pairs.v1, 2.0d);
yy += Math.pow(pairs.v2, 2.0d);
xy += (pairs.v1 * pairs.v2);
n += 1.0d;
}
double numerator = xy - ((x * y) / n);
double denominator1 = xx - (Math.pow(x, 2.0d) / n);
double denominator2 = yy - (Math.pow(y, 2.0d) / n);
double denominator = Math.sqrt(denominator1 * denominator2);
double corr = numerator / denominator;
context.write(key, new DoubleWritable(corr));
}
}
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Pearson's Correlation");
job.setJarByClass(Pearson.class);
job.setMapperClass(MyMapper.class);
job.setCombinerClass(MyReduce.class);
job.setReducerClass(MyReduce.class);
job.setMapOutputKeyClass(IndexPair.class);
job.setMapOutputValueClass(ValuePair.class);
job.setOutputKeyClass(IndexPair.class);
job.setOutputValueClass(DoubleWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
And Code for IndexPair is this one :
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;
public class IndexPair implements WritableComparable<IndexPair>{
public static String[] labels
={"Year","Month","MEI","CO2","CH4","N2O","CFC-11","CFC-12","TSI","Aerosols","Temp"};
public long i,j;
public IndexPair()
{
}
public IndexPair(long i,long j) {
this.i=i;
this.j=j;
}
#Override
public void readFields(DataInput in) throws IOException {
i = in.readLong();
j = in.readLong();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeLong(i);
out.writeLong(j);
}
#Override
public int compareTo(IndexPair o) {
Long i1 = i;
Long j1 = j;
Long i2 = o.i;
Long j2 = o.j;
int result = i1.compareTo(i2);
if (0 == result) {
return j1.compareTo(j2);
}
return result;
}
#Override
public String toString()
{
return "Corelation between column "+labels[(int) i]+"-->"+ labels[(int)j];
}
}
And Code For value Pair is thisone :
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;
public class ValuePair implements WritableComparable<ValuePair>{
public double v1,v2;
public ValuePair()
{
}
public ValuePair(double i,double j)
{
v1=i;
v2=j;
}
#Override
public void readFields(DataInput in) throws IOException {
v1=in.readDouble();
v2=in.readDouble();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeDouble(v1);
out.writeDouble(v2);
}
#Override
public int compareTo(ValuePair o) {
// comparator for value pair is not required....
return 0;
}
}
But Whn I'm trying to execute this, I'm getting the following error
17/07/20 13:59:49 INFO mapreduce.Job: map 0% reduce 0%
17/07/20 13:59:53 INFO mapreduce.Job: Task Id : attempt_1500536519279_0007_m_000000_0, Status : FAILED
Error: java.io.IOException: wrong value class: class org.apache.hadoop.io.DoubleWritable is not class ValuePair
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:194)
at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1411)
at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1728)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
at Pearson$MyReduce.reduce(Pearson.java:66)
at Pearson$MyReduce.reduce(Pearson.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1749)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1639)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1491)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
The problem is that you use the reducer as a combiner:
job.setCombinerClass(MyReduce.class);
The output key and value types of the combiner should be the same as the ones of the mapper, while, when you use the reducer as a combiner, it tries to emit pairs of different types, hence the error.
I have wrote the below code, which is not comparing if block it keep on going into else block.
Please go through that and check if you found any discrepancy.
please help on that
public class ReduceIncurance extends Reducer<Text, Text, Text, IntWritable> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException
{
int sum = 0;
int count = 0;
String[] input = values.toString().split(",");
for (String val : input) {
System.out.println("first:" + val);
if (val.equalsIgnoreCase("Residential")) {
System.out.println(val);
count++;
sum += count;
} else {
System.out.println("into elsee part");
count++;
sum += count;
}
context.write(key, new IntWritable(sum));
}
}
}
Try this
public void reduce(Text key, Iterable<Text> values , Context context) throws IOException, InterruptedException
{
int count=0;
for (Text val : values)
{
if (val.toString().equalsIgnoreCase("Residential"))
{
count ++;
}
else
{
System.out.println("into elsee part");
}
}
context.write(key, new IntWritable(count));
}
This will give you the count of value 'residential' under each key.
Issue is in this code String[] input = values.toString().split(",");. Iterable<Text> cannot be converted to String[] like this.
For a specific key you need to iterate through the values. You dont need to store them to String[].
Try this
public class ReduceIncurance extends Reducer<Text, Text, Text, IntWritable> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException
{
int sum = 0;
int count = 0;
for (Text val : values) {
String[] input = val.toString().split(",");
for (int i = 0; i < input.length; i++) {
if (input[i].equalsIgnoreCase("Residential")) {
System.out.println(val);
count++;
sum += count;
} else {
System.out.println("into elsee part");
count++;
sum += count;
}
}
context.write(key, new IntWritable(sum));
}
}
}
I dont still understand why you are incrementing sum and count in both if and else block.
I have spent two days on this issue. Thanks in advance if anyone can help! Here is the description:
First mapper and reduce work well, and the output with SequenceFileOutputFormat can be found in the output path.
First mapper:
public static class TextToRecordMapper
extends Mapper<Object, Text, Text, IntArrayWritable>{
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
}
}
First reducer:
public static class MacOneSensorSigCntReducer
extends Reducer<Text,IntArrayWritable,Text,IntArrayWritable> {
public void reduce(Text key, Iterable<IntArrayWritable> values,
Context context
) throws IOException, InterruptedException {
}
}
The Job part:
Job job = new Job(conf, "word count");
job.setJarByClass(RawInputText.class);
job.setMapperClass(TextToRecordMapper.class);
job.setReducerClass(MacOneSensorSigCntReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntArrayWritable.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
job.waitForCompletion(true);
This works well, and then I add my second mapper and reducer to deal with the output of the first part.
Second mapper:
public static class MacSensorsTimeLocMapper
extends Mapper<Text,IntArrayWritable,Text,IntWritable> {
private Text macInfo = new Text();
public void map(Text key, Iterable<IntArrayWritable> values,
Context context
) throws IOException, InterruptedException {
}
}
Second reducer:
public static class MacInfoTestReducer
extends Reducer<Text,IntWritable,Text,Text> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
}
}
The Job part:
Job secondJob = new Job(conf, "word count 2");
secondJob.setJarByClass(RawInputText.class);
FileInputFormat.addInputPath(secondJob, new Path(otherArgs[1]));
secondJob.setInputFormatClass(SequenceFileInputFormat.class);
secondJob.setMapperClass(MacSensorsTimeLocMapper.class);
secondJob.setMapOutputKeyClass(Text.class);
secondJob.setMapOutputValueClass(IntArrayWritable.class);
//do not use test reducer to make things simple
//secondJob.setReducerClass(MacInfoTestReducer.class);
FileOutputFormat.setOutputPath(secondJob, new Path(otherArgs[2]));
System.exit(secondJob.waitForCompletion(true) ? 0 : 1);
The second mapper function is not called when I run the code, and the output is generated with text like the following:
00:08:CA:6C:A2:81 com.hicapt.xike.IntArrayWritable#234265
Seems like the framework calls IdentityMapper instead of mine. But how do I change that to make my mapper be called with SequenceFileInputFormat as the input format.
all the code added below:
import java.io.IOException;
import java.util.Collection;
import java.util.Hashtable;
import java.util.Iterator;
import java.util.TreeMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class RawInputText {
public static class TextToRecordMapper
extends Mapper<Object, Text, Text, IntArrayWritable>{
private Text word = new Text();
private IntArrayWritable mapv = new IntArrayWritable();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String line = value.toString();
String[] valArray = line.split(",");
if(valArray.length == 6){
IntWritable[] valInts = new IntWritable[2];
word.set(valArray[0]+"-"+valArray[1]);
valInts[0] = new IntWritable(Integer.parseInt(valArray[2]));
valInts[1] = new IntWritable(Integer.parseInt(valArray[4]));
mapv.set(valInts);
context.write(word, mapv);
}
}
}
public static class MacOneSensorSigCntReducer
extends Reducer<Text,IntArrayWritable,Text,IntArrayWritable> {
private Text macKey = new Text();
private IntArrayWritable macInfo = new IntArrayWritable();
public void reduce(Text key, Iterable<IntArrayWritable> values,
Context context
) throws IOException, InterruptedException {
String[] keyArray = key.toString().split("-");
if(keyArray.length < 2){
int a = 10;
a= 20;
}
String mac = keyArray[1];
String sen = keyArray[0];
Hashtable<Integer, MinuteSignalInfo> rssiTime = new Hashtable<Integer, MinuteSignalInfo>();
MinuteSignalInfo minSig;
int rssi = 0;
int ts = 0;
int i = 0;
for (IntArrayWritable val : values) {
i = 0;
for(Writable element : val.get()) {
IntWritable eleVal = (IntWritable)element;
if(i%2 == 0)
rssi = eleVal.get();
else
ts = eleVal.get()/60;
i++;
}
minSig = (MinuteSignalInfo)rssiTime.get(ts);
if(minSig == null){
minSig = new MinuteSignalInfo();
minSig.rssi = rssi;
minSig.count = 1;
}else{
minSig.rssi += rssi;
minSig.count += 1;
}
rssiTime.put(ts, minSig);
}
TreeMap<Integer, MinuteSignalInfo> treeMap = new TreeMap<Integer, MinuteSignalInfo>();
treeMap.putAll(rssiTime);
macKey.set(mac);
i = 0;
IntWritable[] valInts = new IntWritable[1+treeMap.size()*3];
valInts[i++] = new IntWritable(Integer.parseInt(sen));
Collection<Integer> macs = treeMap.keySet();
Iterator<Integer> it = macs.iterator();
while(it.hasNext()) {
int tsKey = it.next();
valInts[i++] = new IntWritable(tsKey);
valInts[i++] = new IntWritable(treeMap.get(tsKey).rssi);
valInts[i++] = new IntWritable(treeMap.get(tsKey).count);
}
macInfo.set(valInts);
context.write(macKey, macInfo);
}
}
public static class MacSensorsTimeLocMapper
extends Mapper<Text,IntArrayWritable,Text,IntWritable> {
private Text macInfo = new Text();
public void map(Text key, Iterable<IntArrayWritable> values,
Context context
) throws IOException, InterruptedException {
int i = 0;
int sensor = 0;
int ts = 0;
int rssi = 0;
int count = 0;
Hashtable<Integer, MinuteSignalInfo> rssiTime = new Hashtable<Integer, MinuteSignalInfo>();
MinuteSignalInfo minSig;
for (IntArrayWritable val : values) {
i = 0;
for(Writable element : val.get()) {
IntWritable eleVal = (IntWritable)element;
int valval = eleVal.get();
if(i == 0) {
sensor = valval;
}else if(i%3 == 1){
ts = valval;
}else if(i%3 == 2){
rssi = valval;
}else if(i%3 == 0){
count = valval;
minSig = (MinuteSignalInfo)rssiTime.get(ts);
if(minSig == null){
minSig = new MinuteSignalInfo();
minSig.rssi = rssi;
minSig.count = count;
minSig.sensor = sensor;
rssiTime.put(ts, minSig);
}else{
if((rssi/count) < (minSig.rssi/minSig.count)){
minSig.rssi = rssi;
minSig.count = count;
minSig.sensor = sensor;
rssiTime.put(ts, minSig);
}
}
}
i++;
}
}
TreeMap<Integer, MinuteSignalInfo> treeMap = new TreeMap<Integer, MinuteSignalInfo>();
treeMap.putAll(rssiTime);
String macLocs = "";
Collection<Integer> tss = treeMap.keySet();
Iterator<Integer> it = tss.iterator();
while(it.hasNext()) {
int tsKey = it.next();
macLocs += String.valueOf(tsKey) + ",";
macLocs += String.valueOf(treeMap.get(tsKey).sensor) + ";";
}
macInfo.set(macLocs);
context.write(key, new IntWritable(10));
//context.write(key, macInfo);
}
}
public static class MacSensorsTimeLocReducer
extends Reducer<Text,IntArrayWritable,Text,Text> {
private Text macInfo = new Text();
public void reduce(Text key, Iterable<IntArrayWritable> values,
Context context
) throws IOException, InterruptedException {
int i = 0;
int sensor = 0;
int ts = 0;
int rssi = 0;
int count = 0;
Hashtable<Integer, MinuteSignalInfo> rssiTime = new Hashtable<Integer, MinuteSignalInfo>();
MinuteSignalInfo minSig;
for (IntArrayWritable val : values) {
i = 0;
for(Writable element : val.get()) {
IntWritable eleVal = (IntWritable)element;
int valval = eleVal.get();
if(i == 0) {
sensor = valval;
}else if(i%3 == 1){
ts = valval;
}else if(i%3 == 2){
rssi = valval;
}else if(i%3 == 0){
count = valval;
minSig = (MinuteSignalInfo)rssiTime.get(ts);
if(minSig == null){
minSig = new MinuteSignalInfo();
minSig.rssi = rssi;
minSig.count = count;
minSig.sensor = sensor;
rssiTime.put(ts, minSig);
}else{
if((rssi/count) < (minSig.rssi/minSig.count)){
minSig.rssi = rssi;
minSig.count = count;
minSig.sensor = sensor;
rssiTime.put(ts, minSig);
}
}
}
i++;
}
}
TreeMap<Integer, MinuteSignalInfo> treeMap = new TreeMap<Integer, MinuteSignalInfo>();
treeMap.putAll(rssiTime);
String macLocs = "";
Collection<Integer> tss = treeMap.keySet();
Iterator<Integer> it = tss.iterator();
while(it.hasNext()) {
int tsKey = it.next();
macLocs += String.valueOf(tsKey) + ",";
macLocs += String.valueOf(treeMap.get(tsKey).sensor) + ";";
}
macInfo.set(macLocs);
context.write(key, macInfo);
}
}
public static class MacInfoTestReducer
extends Reducer<Text,IntArrayWritable,Text,Text> {
private Text macInfo = new Text();
public void reduce(Text key, Iterable<IntArrayWritable> values,
Context context
) throws IOException, InterruptedException {
String tmp = "";
for (IntArrayWritable val : values) {
for(Writable element : val.get()) {
IntWritable eleVal = (IntWritable)element;
int valval = eleVal.get();
tmp += String.valueOf(valval) + " ";
}
}
macInfo.set(tmp);
context.write(key, macInfo);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 3) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
/*
Job job = new Job(conf, "word count");
job.setJarByClass(RawInputText.class);
job.setMapperClass(TextToRecordMapper.class);
job.setReducerClass(MacOneSensorSigCntReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntArrayWritable.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
job.waitForCompletion(true);
*/
Job secondJob = new Job(conf, "word count 2");
secondJob.setJarByClass(RawInputText.class);
FileInputFormat.addInputPath(secondJob, new Path(otherArgs[1]));
secondJob.setInputFormatClass(SequenceFileInputFormat.class);
secondJob.setMapperClass(MacSensorsTimeLocMapper.class);
//secondJob.setMapperClass(Mapper.class);
secondJob.setMapOutputKeyClass(Text.class);
secondJob.setMapOutputValueClass(IntArrayWritable.class);
secondJob.setReducerClass(MacInfoTestReducer.class);
//secondJob.setOutputKeyClass(Text.class);
//secondJob.setOutputValueClass(IntArrayWritable.class);
FileOutputFormat.setOutputPath(secondJob, new Path(otherArgs[2]));
System.exit(secondJob.waitForCompletion(true) ? 0 : 1);
}
}
package com.hicapt.xike;
public class MinuteSignalInfo {
public int sensor;
public int rssi;
public int count;
public MinuteSignalInfo() {
rssi = 0;
count = 0;
sensor = 0;
}
}
package com.hicapt.xike;
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
/*
public void readFields(DataInput in) throws IOException{
super.readFields(in);
}
public void write(DataOutput out) throws IOException{
super.write(out);
}*/
}
I've write Linear Regression Program in java.
Input is -->
2,21.05
3,23.51
4,24.23
5,27.71
6,30.86
8,45.85
10,52.12
11,55.98
I want store input in array like x[]={2,3,...11} before processing input to reduce task. Then send that array variable to reduce() function
But I'm only on value at a time My program.
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class LinearRegression {
public static class RegressionMapper extends
Mapper<LongWritable, Text, Text, CountRegression> {
private Text id = new Text();
private CountRegression countRegression = new CountRegression();
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String tempString = value.toString();
String[] inputData = tempString.split(",");
String xVal = inputData[0];
String yVal = inputData[1];
countRegression.setxVal(Integer.parseInt(xVal));
countRegression.setyVal(Float.parseFloat(yVal));
id.set(xVal);
context.write(id, countRegression);
}
}
public static class RegressionReducer extends
Reducer<Text, CountRegression, Text, CountRegression> {
private CountRegression result = new CountRegression();
// static float meanX = 0;
// private float xValues[];
// private float yValues[];
static float xRed = 0.0f;
static float yRed = 0.3f;
static float sum = 0;
static ArrayList<Float> list = new ArrayList<Float>();
public void reduce(Text key, Iterable<CountRegression> values,
Context context) throws IOException, InterruptedException {
//float b = 0;
// while(values.iterator().hasNext())
// {
// xRed = xRed + values.iterator().next().getxVal();
// yRed = yRed + values.iterator().next().getyVal();
// }
for (CountRegression val : values) {
list.add(val.getxVal());
// list.add(val.getyVal());
// xRed += val.getxVal();
// yRed = val.getyVal();
// meanX += val.getxVal();
//xValues = val.getxVal();
}
for (int i=0; i< list.size(); i++) {
int lastIndex = list.listIterator().previousIndex();
sum += list.get(lastIndex);
}
result.setxVal(sum);
result.setyVal(yRed);
context.write(key, result);
}
}
public static class CountRegression implements Writable {
private float xVal = 0;
private float yVal = 0;
public float getxVal() {
return xVal;
}
public void setxVal(float x) {
this.xVal = x;
}
public float getyVal() {
return yVal;
}
public void setyVal(float y) {
this.yVal = y;
}
#Override
public void readFields(DataInput in) throws IOException {
xVal = in.readFloat();
yVal = in.readFloat();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeFloat(xVal);
out.writeFloat(yVal);
}
#Override
public String toString() {
return "y = "+xVal+" +"+yVal+" x" ;
}
}
public static void main(String[] args) throws Exception {
// Provides access to configuration parameters.
Configuration conf = new Configuration();
// Create a new Job It allows the user to configure the job, submit it, control its execution, and query the state.
Job job = new Job(conf);
//Set the user-specified job name.
job.setJobName("LinearRegression");
//Set the Jar by finding where a given class came from.
job.setJarByClass(LinearRegression.class);
// Set the Mapper for the job.
job.setMapperClass(RegressionMapper.class);
// Set the Combiner for the job.
job.setCombinerClass(RegressionReducer.class);
// Set the Reducer for the job.
job.setReducerClass(RegressionReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(CountRegression.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
I have this hadoop map reduce code that works on graph data (in adjacency list form) and kind of similar to in-adjacency list to out-adjacency list transformation algorithms. The main MapReduce Task code is following:
public class TestTask extends Configured
implements Tool {
public static class TTMapper extends MapReduceBase
implements Mapper<Text, TextArrayWritable, Text, NeighborWritable> {
#Override
public void map(Text key,
TextArrayWritable value,
OutputCollector<Text, NeighborWritable> output,
Reporter reporter) throws IOException {
int numNeighbors = value.get().length;
double weight = (double)1 / numNeighbors;
Text[] neighbors = (Text[]) value.toArray();
NeighborWritable me = new NeighborWritable(key, new DoubleWritable(weight));
for (int i = 0; i < neighbors.length; i++) {
output.collect(neighbors[i], me);
}
}
}
public static class TTReducer extends MapReduceBase
implements Reducer<Text, NeighborWritable, Text, Text> {
#Override
public void reduce(Text key,
Iterator<NeighborWritable> values,
OutputCollector<Text, Text> output,
Reporter arg3)
throws IOException {
ArrayList<NeighborWritable> neighborList = new ArrayList<NeighborWritable>();
while(values.hasNext()) {
neighborList.add(values.next());
}
NeighborArrayWritable neighbors = new NeighborArrayWritable
(neighborList.toArray(new NeighborWritable[0]));
Text out = new Text(neighbors.toString());
output.collect(key, out);
}
}
#Override
public int run(String[] arg0) throws Exception {
JobConf conf = Util.getMapRedJobConf("testJob",
SequenceFileInputFormat.class,
TTMapper.class,
Text.class,
NeighborWritable.class,
1,
TTReducer.class,
Text.class,
Text.class,
TextOutputFormat.class,
"test/in",
"test/out");
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new TestTask(), args);
System.exit(res);
}
}
The auxiliary code is following:
TextArrayWritable:
public class TextArrayWritable extends ArrayWritable {
public TextArrayWritable() {
super(Text.class);
}
public TextArrayWritable(Text[] values) {
super(Text.class, values);
}
}
NeighborWritable:
public class NeighborWritable implements Writable {
private Text nodeId;
private DoubleWritable weight;
public NeighborWritable(Text nodeId, DoubleWritable weight) {
this.nodeId = nodeId;
this.weight = weight;
}
public NeighborWritable () { }
public Text getNodeId() {
return nodeId;
}
public DoubleWritable getWeight() {
return weight;
}
public void setNodeId(Text nodeId) {
this.nodeId = nodeId;
}
public void setWeight(DoubleWritable weight) {
this.weight = weight;
}
#Override
public void readFields(DataInput in) throws IOException {
nodeId = new Text();
nodeId.readFields(in);
weight = new DoubleWritable();
weight.readFields(in);
}
#Override
public void write(DataOutput out) throws IOException {
nodeId.write(out);
weight.write(out);
}
public String toString() {
return "NW[nodeId=" + (nodeId != null ? nodeId.toString() : "(null)") +
",weight=" + (weight != null ? weight.toString() : "(null)") + "]";
}
public boolean equals(Object o) {
if (!(o instanceof NeighborWritable)) {
return false;
}
NeighborWritable that = (NeighborWritable)o;
return (nodeId.equals(that.getNodeId()) && (weight.equals(that.getWeight())));
}
}
and the Util class:
public class Util {
public static JobConf getMapRedJobConf(String jobName,
Class<? extends InputFormat> inputFormatClass,
Class<? extends Mapper> mapperClass,
Class<?> mapOutputKeyClass,
Class<?> mapOutputValueClass,
int numReducer,
Class<? extends Reducer> reducerClass,
Class<?> outputKeyClass,
Class<?> outputValueClass,
Class<? extends OutputFormat> outputFormatClass,
String inputDir,
String outputDir) throws IOException {
JobConf conf = new JobConf();
if (jobName != null)
conf.setJobName(jobName);
conf.setInputFormat(inputFormatClass);
conf.setMapperClass(mapperClass);
if (numReducer == 0) {
conf.setNumReduceTasks(0);
conf.setOutputKeyClass(outputKeyClass);
conf.setOutputValueClass(outputValueClass);
conf.setOutputFormat(outputFormatClass);
} else {
// may set actual number of reducers
// conf.setNumReduceTasks(numReducer);
conf.setMapOutputKeyClass(mapOutputKeyClass);
conf.setMapOutputValueClass(mapOutputValueClass);
conf.setReducerClass(reducerClass);
conf.setOutputKeyClass(outputKeyClass);
conf.setOutputValueClass(outputValueClass);
conf.setOutputFormat(outputFormatClass);
}
// delete the existing target output folder
FileSystem fs = FileSystem.get(conf);
fs.delete(new Path(outputDir), true);
// specify input and output DIRECTORIES (not files)
FileInputFormat.addInputPath(conf, new Path(inputDir));
FileOutputFormat.setOutputPath(conf, new Path(outputDir));
return conf;
}
}
My input is following graph: (in binary format, here I am giving the text format)
1 2
2 1,3,5
3 2,4
4 3,5
5 2,4
According to the logic of the code the output should be:
1 NWArray[size=1,{NW[nodeId=2,weight=0.3333333333333333],}]
2 NWArray[size=3,{NW[nodeId=5,weight=0.5],NW[nodeId=3,weight=0.5],NW[nodeId=1,weight=1.0],}]
3 NWArray[size=2,{NW[nodeId=2,weight=0.3333333333333333],NW[nodeId=4,weight=0.5],}]
4 NWArray[size=2,{NW[nodeId=5,weight=0.5],NW[nodeId=3,weight=0.5],}]
5 NWArray[size=2,{NW[nodeId=2,weight=0.3333333333333333],NW[nodeId=4,weight=0.5],}]
But the output is coming as:
1 NWArray[size=1,{NW[nodeId=2,weight=0.3333333333333333],}]
2 NWArray[size=3,{NW[nodeId=5,weight=0.5],NW[nodeId=5,weight=0.5],NW[nodeId=5,weight=0.5],}]
3 NWArray[size=2,{NW[nodeId=2,weight=0.3333333333333333],NW[nodeId=2,weight=0.3333333333333333],}]
4 NWArray[size=2,{NW[nodeId=5,weight=0.5],NW[nodeId=5,weight=0.5],}]
5 NWArray[size=2,{NW[nodeId=2,weight=0.3333333333333333],NW[nodeId=2,weight=0.3333333333333333],}]
I cannot understand the reason why the expected output is not coming out. Any help will be appreciated.
Thanks.
You're falling foul of object re-use
while(values.hasNext()) {
neighborList.add(values.next());
}
values.next() will return the same object reference, but the underlying contents of that object will change for each iteration (the readFields method is called to re-populate the contents)
Suggest you amend to (you'll need to obtain the Configuration conf variable from a setup method, unless you can obtain it from the Reporter or OutputCollector - sorry i don't use the old API)
while(values.hasNext()) {
neighborList.add(
ReflectionUtils.copy(conf, values.next(), new NeighborWritable());
}
But I still can't understand why my unit test passed then. Here is the code -
public class UWLTInitReducerTest {
private Text key;
private Iterator<NeighborWritable> values;
private NeighborArrayWritable nodeData;
private TTReducer reducer;
/**
* Set up the states for calling the map function
*/
#Before
public void setUp() throws Exception {
key = new Text("1001");
NeighborWritable[] neighbors = new NeighborWritable[4];
for (int i = 0; i < 4; i++) {
neighbors[i] = new NeighborWritable(new Text("300" + i), new DoubleWritable((double) 1 / (1 + i)));
}
values = Arrays.asList(neighbors).iterator();
nodeData = new NeighborArrayWritable(neighbors);
reducer = new TTReducer();
}
/**
* Test method for InitModelMapper#map - valid input
*/
#Test
public void testMapValid() {
// mock the output object
OutputCollector<Text, UWLTNodeData> output = mock(OutputCollector.class);
try {
// call the API
reducer.reduce(key, values, output, null);
// in order (sequential) verification of the calls to output.collect()
verify(output).collect(key, nodeData);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Why didn't this code catch the bug?