hadoop mapreduce programming if condition - hadoop

I have wrote the below code, which is not comparing if block it keep on going into else block.
Please go through that and check if you found any discrepancy.
please help on that
public class ReduceIncurance extends Reducer<Text, Text, Text, IntWritable> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException
{
int sum = 0;
int count = 0;
String[] input = values.toString().split(",");
for (String val : input) {
System.out.println("first:" + val);
if (val.equalsIgnoreCase("Residential")) {
System.out.println(val);
count++;
sum += count;
} else {
System.out.println("into elsee part");
count++;
sum += count;
}
context.write(key, new IntWritable(sum));
}
}
}

Try this
public void reduce(Text key, Iterable<Text> values , Context context) throws IOException, InterruptedException
{
int count=0;
for (Text val : values)
{
if (val.toString().equalsIgnoreCase("Residential"))
{
count ++;
}
else
{
System.out.println("into elsee part");
}
}
context.write(key, new IntWritable(count));
}
This will give you the count of value 'residential' under each key.
Issue is in this code String[] input = values.toString().split(",");. Iterable<Text> cannot be converted to String[] like this.

For a specific key you need to iterate through the values. You dont need to store them to String[].
Try this
public class ReduceIncurance extends Reducer<Text, Text, Text, IntWritable> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException
{
int sum = 0;
int count = 0;
for (Text val : values) {
String[] input = val.toString().split(",");
for (int i = 0; i < input.length; i++) {
if (input[i].equalsIgnoreCase("Residential")) {
System.out.println(val);
count++;
sum += count;
} else {
System.out.println("into elsee part");
count++;
sum += count;
}
}
context.write(key, new IntWritable(sum));
}
}
}
I dont still understand why you are incrementing sum and count in both if and else block.

Related

I want to write a Hadoop MapReduce Join in Java

I'm completely new in Hadoop Framework and I want to write a "MapReduce" program (HadoopJoin.java) that joins on x attribute between two tables R and S. The structure of the two tables is :
R (tag : char, x : int, y : varchar(30))
and
S (tag : char, x : int, z : varchar(30))
For example we have for R table :
r 10 r-10-0
r 11 r-11-0
r 12 r-12-0
r 21 r-21-0
And for S table :
s 11 s-11-0
s 21 s-41-0
s 21 s-41-1
s 12 s-31-0
s 11 s-31-1
The result should look like :
r 11 r-11-0 s 11 s-11-0
etc.
Can anyone help me please ?
It will be very difficult to describe join in mapreduce for someone who is new to this Framework but here I provide a working implementation for your situation and I definitely recommend you to read section 9 of Hadoop The Definitive Guide 4th Eddition. It has described how to implement Join in mapreduce very well.
First of all you might consider using higher level frameworks such as Pig, Hive and Spark because they provide join operation in their core part of implementation.
Secondly There are many ways to implement mapreduce depending of the nature of your data. This ways include map-side join and reduce-side join. In this answer I have implemented the reduce-side join:
Implementation:
First of all we should have two different mapper for two different datset notice that in your case same mapper can be used for two dataset but in many situation you need different mappers for different dataset and because of that I have defined two mappers to make this solution more general:
I have used TextPair that have two attributes, one of them is the key that is used to join data and the other one is a tag that specify which dataset this record belongs to. If it belongs to the first dataset this tag will be 0. otherwise it will be 1.
I have implemented TextPair.FirstComparator to ensure that for each key(join by key) the record of the first dataset is the first key which is received by reducer. And all the other records in second dataset with that id are received after that. Actually this line of code will do the trick for us:
job.setGroupingComparatorClass(TextPair.FirstComparator.class);
So in reducer the first record that we will receive is the record from dataset1 and after that we receive record from dataset2. The only thing that should be done is that we have to write those records.
Mapper for dataset1:
public class JoinDataSet1Mapper
extends Mapper<LongWritable, Text, TextPair, Text> {
#Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] data = value.toString().split(" ");
context.write(new TextPair(data[1], "0"), value);
}
}
Mapper for DataSet2:
public class JoinDataSet2Mapper
extends Mapper<LongWritable, Text, TextPair, Text> {
#Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] data = value.toString().split(" ");
context.write(new TextPair(data[1], "1"), value);
}
}
Reducer:
public class JoinReducer extends Reducer<TextPair, Text, NullWritable, Text> {
public static class KeyPartitioner extends Partitioner<TextPair, Text> {
#Override
public int getPartition(TextPair key, Text value, int numPartitions) {
return (key.getFirst().hashCode() & Integer.MAX_VALUE) % numPartitions;
}
}
#Override
protected void reduce(TextPair key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Iterator<Text> iter = values.iterator();
Text stationName = new Text(iter.next());
while (iter.hasNext()) {
Text record = iter.next();
Text outValue = new Text(stationName.toString() + "\t" + record.toString());
context.write(NullWritable.get(), outValue);
}
}
}
Custom key:
public class TextPair implements WritableComparable<TextPair> {
private Text first;
private Text second;
public TextPair() {
set(new Text(), new Text());
}
public TextPair(String first, String second) {
set(new Text(first), new Text(second));
}
public TextPair(Text first, Text second) {
set(first, second);
}
public void set(Text first, Text second) {
this.first = first;
this.second = second;
}
public Text getFirst() {
return first;
}
public Text getSecond() {
return second;
}
#Override
public void write(DataOutput out) throws IOException {
first.write(out);
second.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
first.readFields(in);
second.readFields(in);
}
#Override
public int hashCode() {
return first.hashCode() * 163 + second.hashCode();
}
#Override
public boolean equals(Object o) {
if (o instanceof TextPair) {
TextPair tp = (TextPair) o;
return first.equals(tp.first) && second.equals(tp.second);
}
return false;
}
#Override
public String toString() {
return first + "\t" + second;
}
#Override
public int compareTo(TextPair tp) {
int cmp = first.compareTo(tp.first);
if (cmp != 0) {
return cmp;
}
return second.compareTo(tp.second);
}
public static class FirstComparator extends WritableComparator {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
public FirstComparator() {
super(TextPair.class);
}
#Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
try {
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
} catch (IOException e) {
throw new IllegalArgumentException(e);
}
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
if (a instanceof TextPair && b instanceof TextPair) {
return ((TextPair) a).first.compareTo(((TextPair) b).first);
}
return super.compare(a, b);
}
}
}
JobDriver:
public class JoinJob extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), "Join two DataSet");
job.setJarByClass(getClass());
Path ncdcInputPath = new Path(getConf().get("job.input1.path"));
Path stationInputPath = new Path(getConf().get("job.input2.path"));
Path outputPath = new Path(getConf().get("job.output.path"));
MultipleInputs.addInputPath(job, ncdcInputPath,
TextInputFormat.class, JoinDataSet1Mapper.class);
MultipleInputs.addInputPath(job, stationInputPath,
TextInputFormat.class, JoinDataSet2Mapper.class);
FileOutputFormat.setOutputPath(job, outputPath);
job.setPartitionerClass(JoinReducer.KeyPartitioner.class);
job.setGroupingComparatorClass(TextPair.FirstComparator.class);
job.setMapOutputKeyClass(TextPair.class);
job.setReducerClass(JoinReducer.class);
job.setOutputKeyClass(Text.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new JoinJob(), args);
System.exit(exitCode);
}
}

Issue when working with ArrayWritables

I'm a beginner in Hadoop and was working with the ArrayWritables in Hadoop map-reduce.
And this is the Mapper code I'm using :-
public class Base_Mapper extends Mapper<LongWritable, Text, Text, IntWritable> {
String currLine[] = new String[1000];
Text K = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
currLine = line.split("");
int count = 0;
for (int i = 0; i < currLine.length; i++) {
String currToken = currLine[i];
count++;
K.set(currToken);
context.write(K, new IntWritable(count));
}
}
}
Reducer :-
public class Base_Reducer extends Reducer<Text, IntWritable,Text, IntArrayWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
IntArrayWritable finalArray = new IntArrayWritable();
IntWritable[] arr = new IntWritable[1000];
for (int i = 0; i < 150; i++)
arr[i] = new IntWritable(0);
int redCount = 0;
for (IntWritable val : values) {
int thisValue = val.get();
for (int i = 1; i <= 150; i++) {
if (thisValue == i)
arr[i - 1] = new IntWritable(redCount++);
}
}
finalArray.set(arr);
context.write(key, finalArray);
}
}
I'm using IntArrayWritable as subclass of ArrayWritable as shown below :-
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
public IntArrayWritable(IntWritable[] values) {
super(IntWritable.class, values);
}
}
My Intended output of the Job was some set of Bases as key(which is correct) and an array of IntWritables as value.
But I'm getting the output as:-
com.feathersoft.Base.IntArrayWritable#30374534
A com.feathersoft.Base.IntArrayWritable#7ca071a6
C com.feathersoft.Base.IntArrayWritable#9858936
G com.feathersoft.Base.IntArrayWritable#1df33d1c
N com.feathersoft.Base.IntArrayWritable#4c3108a0
T com.feathersoft.Base.IntArrayWritable#272d6774
What are all changes I have to make inorder to resolve this issue ?
You need to override default behavior of toString() method in your IntArrayWritable implementation.
Please try this:
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
public IntArrayWritable(IntWritable[] values) {
super(IntWritable.class, values);
}
#Override
public String toString() {
StringBuilder sb = new StringBuilder("[");
for (String s : super.toStrings())
{
sb.append(s).append(" ");
}
sb.append("]")
return sb.toString();
}
}
If you liked this answer please mark it as accepted. Thank you.

Can I get a Partition number of Hadoop?

I am a hadoop newbie.
I want to get a partition number on output file.
At first, I made a customized partitioner.
public static class MyPartitioner extends Partitioner<Text, LongWritable> {
public int getPartition(Text key, LongWritable value, int numReduceTasks) {
int numOfChars = key.toString().length();
return numOfChars % numReduceTasks;
}
}
It works. But, I want to output partition numbers 'visually' on Reducer.
How can I get a partiton number ??
Below is my reducer source.
public static class MyReducer extends Reducer<Text, LongWritable, Text, Text>{
private Text textList = new Text();
public void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
String list = new String();
for(LongWritable value: values) {
list = new String(list + "\t" + value.toString());
}
textList.set(list);
context.write(key, textList);
}
}
I want to put a partition number on 'list' respectively. There will be '0' or '1'.
list = new String(list + "\t" + value.toString() + "\t" + ??);
It would be great if someone helps me.
+
Thanks to the answer, I got a solution. But, It didn't work and I think I did something wrong.
Below is the modified MyPartitioner.
public static class MyPartitioner extends Partitioner {
public int getPartition(Text key, LongWritable value, int numReduceTasks) {
int numOfChars = key.toString().length();
return numOfChars % numReduceTasks;
private int bring_num = 0;
public void configure(JobConf job) {
bring_num = jobConf.getInt(numOfChars & numReduceTasks);
}
}
}
Add the below code to the Reducer class to get the partition number in a class variable which can be later used in the reducer method.
String partition;
protected void setup(Context context) throws IOException,
InterruptedException {
Configuration conf = context.getConfiguration();
partition = conf.get("mapred.task.partition");
}

have wo reduce with one map in M/R program

I have a question.. How can I have a mapreduce job with one mapper and two reducer that both reducer inputs come from map output? and each of reducers has its own output?
and one other thing is that can mapper have 2 or more inputs?
public static class dpred extends Reducer<Text, DoubleWritable, Text, DoubleWritable>
{
public void reduce1(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException
{
double beta = 17.62;
DoubleWritable result1 = new DoubleWritable();
double mul = 1;
double res = 1;
for (DoubleWritable val : values){
// System.out.println(val.get());
mul *= val.get();
}
res = beta*mul;
result1.set(res);
context.write(key, result1);
}
///////////////////////////////////////////////////////////
public void reduce2(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException
{
double landa = 243.12;
double sum = 0;
double res = 0;
DoubleWritable result2 = new DoubleWritable();
for (DoubleWritable val : values){
// System.out.println(val.get());
landa += val.get();
}
// System.out.println(sum);
result2.set(landa);
context.write(key, result2);
}
}
if the operation is this simple you could consider doing 2 context.write() in once reduce function (it's possible with MultipleOutputs to write them to different files if you want)

Why Hadoop shuffle not working as expected

I have this hadoop map reduce code that works on graph data (in adjacency list form) and kind of similar to in-adjacency list to out-adjacency list transformation algorithms. The main MapReduce Task code is following:
public class TestTask extends Configured
implements Tool {
public static class TTMapper extends MapReduceBase
implements Mapper<Text, TextArrayWritable, Text, NeighborWritable> {
#Override
public void map(Text key,
TextArrayWritable value,
OutputCollector<Text, NeighborWritable> output,
Reporter reporter) throws IOException {
int numNeighbors = value.get().length;
double weight = (double)1 / numNeighbors;
Text[] neighbors = (Text[]) value.toArray();
NeighborWritable me = new NeighborWritable(key, new DoubleWritable(weight));
for (int i = 0; i < neighbors.length; i++) {
output.collect(neighbors[i], me);
}
}
}
public static class TTReducer extends MapReduceBase
implements Reducer<Text, NeighborWritable, Text, Text> {
#Override
public void reduce(Text key,
Iterator<NeighborWritable> values,
OutputCollector<Text, Text> output,
Reporter arg3)
throws IOException {
ArrayList<NeighborWritable> neighborList = new ArrayList<NeighborWritable>();
while(values.hasNext()) {
neighborList.add(values.next());
}
NeighborArrayWritable neighbors = new NeighborArrayWritable
(neighborList.toArray(new NeighborWritable[0]));
Text out = new Text(neighbors.toString());
output.collect(key, out);
}
}
#Override
public int run(String[] arg0) throws Exception {
JobConf conf = Util.getMapRedJobConf("testJob",
SequenceFileInputFormat.class,
TTMapper.class,
Text.class,
NeighborWritable.class,
1,
TTReducer.class,
Text.class,
Text.class,
TextOutputFormat.class,
"test/in",
"test/out");
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new TestTask(), args);
System.exit(res);
}
}
The auxiliary code is following:
TextArrayWritable:
public class TextArrayWritable extends ArrayWritable {
public TextArrayWritable() {
super(Text.class);
}
public TextArrayWritable(Text[] values) {
super(Text.class, values);
}
}
NeighborWritable:
public class NeighborWritable implements Writable {
private Text nodeId;
private DoubleWritable weight;
public NeighborWritable(Text nodeId, DoubleWritable weight) {
this.nodeId = nodeId;
this.weight = weight;
}
public NeighborWritable () { }
public Text getNodeId() {
return nodeId;
}
public DoubleWritable getWeight() {
return weight;
}
public void setNodeId(Text nodeId) {
this.nodeId = nodeId;
}
public void setWeight(DoubleWritable weight) {
this.weight = weight;
}
#Override
public void readFields(DataInput in) throws IOException {
nodeId = new Text();
nodeId.readFields(in);
weight = new DoubleWritable();
weight.readFields(in);
}
#Override
public void write(DataOutput out) throws IOException {
nodeId.write(out);
weight.write(out);
}
public String toString() {
return "NW[nodeId=" + (nodeId != null ? nodeId.toString() : "(null)") +
",weight=" + (weight != null ? weight.toString() : "(null)") + "]";
}
public boolean equals(Object o) {
if (!(o instanceof NeighborWritable)) {
return false;
}
NeighborWritable that = (NeighborWritable)o;
return (nodeId.equals(that.getNodeId()) && (weight.equals(that.getWeight())));
}
}
and the Util class:
public class Util {
public static JobConf getMapRedJobConf(String jobName,
Class<? extends InputFormat> inputFormatClass,
Class<? extends Mapper> mapperClass,
Class<?> mapOutputKeyClass,
Class<?> mapOutputValueClass,
int numReducer,
Class<? extends Reducer> reducerClass,
Class<?> outputKeyClass,
Class<?> outputValueClass,
Class<? extends OutputFormat> outputFormatClass,
String inputDir,
String outputDir) throws IOException {
JobConf conf = new JobConf();
if (jobName != null)
conf.setJobName(jobName);
conf.setInputFormat(inputFormatClass);
conf.setMapperClass(mapperClass);
if (numReducer == 0) {
conf.setNumReduceTasks(0);
conf.setOutputKeyClass(outputKeyClass);
conf.setOutputValueClass(outputValueClass);
conf.setOutputFormat(outputFormatClass);
} else {
// may set actual number of reducers
// conf.setNumReduceTasks(numReducer);
conf.setMapOutputKeyClass(mapOutputKeyClass);
conf.setMapOutputValueClass(mapOutputValueClass);
conf.setReducerClass(reducerClass);
conf.setOutputKeyClass(outputKeyClass);
conf.setOutputValueClass(outputValueClass);
conf.setOutputFormat(outputFormatClass);
}
// delete the existing target output folder
FileSystem fs = FileSystem.get(conf);
fs.delete(new Path(outputDir), true);
// specify input and output DIRECTORIES (not files)
FileInputFormat.addInputPath(conf, new Path(inputDir));
FileOutputFormat.setOutputPath(conf, new Path(outputDir));
return conf;
}
}
My input is following graph: (in binary format, here I am giving the text format)
1 2
2 1,3,5
3 2,4
4 3,5
5 2,4
According to the logic of the code the output should be:
1 NWArray[size=1,{NW[nodeId=2,weight=0.3333333333333333],}]
2 NWArray[size=3,{NW[nodeId=5,weight=0.5],NW[nodeId=3,weight=0.5],NW[nodeId=1,weight=1.0],}]
3 NWArray[size=2,{NW[nodeId=2,weight=0.3333333333333333],NW[nodeId=4,weight=0.5],}]
4 NWArray[size=2,{NW[nodeId=5,weight=0.5],NW[nodeId=3,weight=0.5],}]
5 NWArray[size=2,{NW[nodeId=2,weight=0.3333333333333333],NW[nodeId=4,weight=0.5],}]
But the output is coming as:
1 NWArray[size=1,{NW[nodeId=2,weight=0.3333333333333333],}]
2 NWArray[size=3,{NW[nodeId=5,weight=0.5],NW[nodeId=5,weight=0.5],NW[nodeId=5,weight=0.5],}]
3 NWArray[size=2,{NW[nodeId=2,weight=0.3333333333333333],NW[nodeId=2,weight=0.3333333333333333],}]
4 NWArray[size=2,{NW[nodeId=5,weight=0.5],NW[nodeId=5,weight=0.5],}]
5 NWArray[size=2,{NW[nodeId=2,weight=0.3333333333333333],NW[nodeId=2,weight=0.3333333333333333],}]
I cannot understand the reason why the expected output is not coming out. Any help will be appreciated.
Thanks.
You're falling foul of object re-use
while(values.hasNext()) {
neighborList.add(values.next());
}
values.next() will return the same object reference, but the underlying contents of that object will change for each iteration (the readFields method is called to re-populate the contents)
Suggest you amend to (you'll need to obtain the Configuration conf variable from a setup method, unless you can obtain it from the Reporter or OutputCollector - sorry i don't use the old API)
while(values.hasNext()) {
neighborList.add(
ReflectionUtils.copy(conf, values.next(), new NeighborWritable());
}
But I still can't understand why my unit test passed then. Here is the code -
public class UWLTInitReducerTest {
private Text key;
private Iterator<NeighborWritable> values;
private NeighborArrayWritable nodeData;
private TTReducer reducer;
/**
* Set up the states for calling the map function
*/
#Before
public void setUp() throws Exception {
key = new Text("1001");
NeighborWritable[] neighbors = new NeighborWritable[4];
for (int i = 0; i < 4; i++) {
neighbors[i] = new NeighborWritable(new Text("300" + i), new DoubleWritable((double) 1 / (1 + i)));
}
values = Arrays.asList(neighbors).iterator();
nodeData = new NeighborArrayWritable(neighbors);
reducer = new TTReducer();
}
/**
* Test method for InitModelMapper#map - valid input
*/
#Test
public void testMapValid() {
// mock the output object
OutputCollector<Text, UWLTNodeData> output = mock(OutputCollector.class);
try {
// call the API
reducer.reduce(key, values, output, null);
// in order (sequential) verification of the calls to output.collect()
verify(output).collect(key, nodeData);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Why didn't this code catch the bug?

Resources