Unable to group data in Reducer - hadoop

I am trying to write a MapReduce application in which the Mapper passes a set of values to the Reducer as follows:
Hello
World
Hello
Hello
World
Hi
Now these values are to be grouped and counted first and then some further processing is to be done. The code I wrote is:
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
List<String> records = new ArrayList<String>();
/* Collects all the records from the mapper into the list. */
for (Text value : values) {
records.add(value.toString());
}
/* Groups the values. */
Map<String, Integer> groupedData = groupAndCount(records);
Set<String> groupKeys = groupedData.keySet();
/* Writes the grouped data. */
for (String groupKey : groupKeys) {
System.out.println(groupKey + ": " + groupedData.get(groupKey));
context.write(NullWritable.get(), new Text(groupKey + groupedData.get(groupKey)));
}
}
public Map<String, Integer> groupAndCount(List<String> records) {
Map<String, Integer> groupedData = new HashMap<String, Integer>();
String currentRecord = "";
Collections.sort(records);
for (String record : records) {
System.out.println(record);
if (!currentRecord.equals(record)) {
currentRecord = record;
groupedData.put(currentRecord, 1);
} else {
int currentCount = groupedData.get(currentRecord);
groupedData.put(currentRecord, ++currentCount);
}
}
return groupedData;
}
But in the output I get a count of 1 for all. The sysout statements are printed something like:
Hello
World
Hello: 1
World: 1
Hello
Hello: 1
Hello
World
Hello: 1
World: 1
Hi
Hi: 1
I cannot understand what the issue is and why not all records are received by the Reducer at once and passed to the groupAndCount method.

As you note in your comment, if each value has a different corresponding key then they will not be reduced in the same reduce call, and you'll get the output you're currently seeing.
Fundamental to Hadoop reducers is the notion that values will be collected and reduced for the same key - i suggest you re-read some of the Hadoop getting started documentation, especially the Word Count example, which appears to be roughly what you are trying to achieve with your code.

Related

Java 8 stream reduce Map

I have a LinkedHashMap which contains multiple entries. I'd like to reduce the multiple entries to a single one in the first step, and than map that to a single String.
For example:
I'm starting with a Map like this:
{"<a>"="</a>", "<b>"="</b>", "<c>"="</c>", "<d>"="</d>"}
And finally I want to get a String like this:
<a><b><c><d></d></c></b></a>
(In that case the String contains the keys in order, than the values in reverse order. But that doesn't really matter, I'd like an general solution)
I think I need map.entrySet().stream().reduce(), but I have no idea what to write in the reduce method, and how to continue.
Since you're reducing entries by concatenating keys with keys and values with values, the identity you're looking for is an entry with empty strings for both key and value.
String reduceEntries(LinkedHashMap<String, String> map) {
Entry<String, String> entry =
map.entrySet()
.stream()
.reduce(
new SimpleImmutableEntry<>("", ""),
(left, right) ->
new SimpleImmutableEntry<>(
left.getKey() + right.getKey(),
right.getValue() + left.getValue()
)
);
return entry.getKey() + entry.getValue();
}
Java 9 adds a static method Map.entry(key, value) for creating immutable entries.
here is an example about how I would do it :
import java.util.LinkedHashMap;
public class Main {
static String result = "";
public static void main(String [] args)
{
LinkedHashMap<String, String> map = new LinkedHashMap<String, String>();
map.put("<a>", "</a>");
map.put("<b>", "</b>");
map.put("<c>", "</c>");
map.put("<d>", "</d>");
map.keySet().forEach(s -> result += s);
map.values().forEach(s -> result += s);
System.out.println(result);
}
}
note: you can reverse values() to get d first with ArrayUtils.reverse()

MapReduce sort by value in descending order

I'm trying to write in pseudo code a MapReduce task that returns the items sorted in descending order. For example: for the wordcount task, instead of getting:
apple 1
banana 3
mango 2
I want the output to be:
banana 3
mango 2
apple 1
Any ideas of how to do it? I know how to do it in ascending order (replace the keys and value in the mapper job) but not in descending order.
Here you can take help of below reducer code to achieve sorting in descending order .
Assuming you have written mapper and driver code where mapper will produce output as (Banana,1) etc
In reducer we will sum all values for a particular key and put final result in a map then sort the map on the basis of values and write final result in cleanup function of reduce.
Please see below code for further understadnind:
public class Word_Reducer extends Reducer<Text, IntWritable, Text, IntWritable> {
// Change access modifier as per your need
public Map<String , Integer > map = new LinkedHashMap<String , Integer>();
public void reduce(Text key , Iterable<IntWritable> values ,Context context)
{
// write logic for your reducer
// Enter reduced values in map for each key
for (IntWritable value : values ){
// calculate "count" associated with each word
}
map.put(key.toString() , count);
}
public void cleanup(Context context){
//Cleanup is called once at the end to finish off anything for reducer
//Here we will write our final output
Map<String , Integer> sortedMap = new HashMap<String , Integer>();
sortedMap = sortMap(map);
for (Map.Entry<String,Integer> entry = sortedMap.entrySet()){
context.write(new Text(entry.getKey()),new IntWritable(entry.getValue()));
}
}
public Map<String , Integer > sortMap (Map<String,Integer> unsortMap){
Map<String ,Integer> hashmap = new LinkedHashMap<String,Integer>();
int count=0;
List<Map.Entry<String,Integer>> list = new
LinkedList<Map.Entry<String,Integer>>(unsortMap.entrySet());
//Sorting the list we created from unsorted Map
Collections.sort(list , new Comparator<Map.Entry<String,Integer>>(){
public int compare (Map.Entry<String , Integer> o1 , Map.Entry<String , Integer> o2 ){
//sorting in descending order
return o2.getValue().compareTo(o1.getValue());
}
});
for(Map.Entry<String, Integer> entry : list){
// only writing top 3 in the sorted map
if(count>2)
break;
hashmap.put(entry.getKey(),entry.getValue());
}
return hashmap ;
}

Volley: param.put with two for loop not working correctly

i have two arraylist like below
final ArrayList<String> numbers = new ArrayList<String>();
final ArrayList<String> names = new ArrayList<String>();
i added values into arralist as below
numbers.add(m,""+phoneNumber);
names.add(m,name);
// m is index start from 0, i used while loop for that
numbers containing 949 rows and names also 949 rows
i am sending request using Stringrequest
following is my sending data, sending method is post
#Override
protected Map<String, String> getParams() throws AuthFailureError {
Map<String,String> param=new HashMap<String, String>();
for (int z=0;z<names.size();z++)
{
param.put("names["+z+"]",names.get(z).toString();
}
for (int z1=0;z1<numbers.size();z1++)
{
param.put("numbers["+z1+"]",numbers.get(z1).toString());
}
return param;
}
from php server when i return total rows of numbers and names
example: echo count($numbers).":"$names;
response comes as
508:493
but when i use only one for loop like below and return param then response count is 949 as i expected,
for (int z=0;z<names.size();z++)
{
param.put("names["+z+"]",names.get(z).toString();
}
BUT when i use BOTH for loop at a time again its not sending data correctly,
what is problem here?

Merging two SortedMapWritable in Hadoop?

I have defined a class called EquivalenceClsAggValue which has a data field of array (called aggValues).
class public class EquivalenceClsAggValue extends Configured implements WritableComparable<EquivalenceClsAggValue>{
public ArrayList<SortedMapWritable> aggValues;
It has a method which take another object of type EquivalenceClsAggValue and merge its aggValues into aggValues of this class as follows:
public void addEquivalenceCls(EquivalenceClsAggValue eq){
//comment: eq contains only one entry as it comes from the mapper
if (this.aggValues.size()==0){ //new line
this.aggValues = eq.aggValues;
return;
}
for(int i=0;i<eq.aggValues.size();i++){
SortedMapWritable cm = aggValues.get(i); //cm: current map
SortedMapWritable nm = eq.aggValues.get(i); //nm: new map
Text nk = (Text) nm.firstKey();//nk: new key
if(cm.containsKey(nk)){//increment the value
IntWritable ovTmp = (IntWritable) cm.get(nk);
int ov = ovTmp.get();
cm.remove(nk);
cm.put(nk, new IntWritable(ov+1));
}
else{//add new entry
cm.put(nk, new IntWritable(1));
}
}
}
But this function is not merging two aggValues. Could someone help me figure it out?
This is how I call this method:
public void reduce(IntWritable keyin,Iterator<EquivalenceClsAggValue> valuein,OutputCollector<IntWritable, EquivalenceClsAggValue> output,Reporter arg3) throws IOException {
EquivalenceClsAggValue comOutput = valuein.next();//initialize the output with the first input
while(valuein.hasNext()){
EquivalenceClsAggValue e = valuein.next();
comOutput.addEquivalenceCls(e);
}
output.collect(keyin, comOutput);
}
Looks like you're falling foul of object re-use. Hadoop re-uses the same object so each call to valuein.next() actually returns the same object reference, but the contents of that object are re-initialised via the readFields method.
Try changing as follows (create a new instance to aggregate into):
EquivalenceClsAggValue comOutput = new EquivalenceClsAggValue();
while(valuein.hasNext()){
EquivalenceClsAggValue e = valuein.next();
comOutput.addEquivalenceCls(e);
}
output.collect(keyin, comOutput);
EDIT: and you probably need to update your aggregate method too (to be wary of object re-use):
public void addEquivalenceCls(EquivalenceClsAggValue eq){
//comment: eq contains only one entry as it comes from the mapper
for(int i=0;i<eq.aggValues.size();i++){
SortedMapWritable cm = aggValues.get(i); //cm: current map
SortedMapWritable nm = eq.aggValues.get(i); //nm: new map
Text nk = (Text) nm.firstKey();//nk: new key
if(cm.containsKey(nk)){//increment the value
// you don't need to remove and re-add, just update the IntWritable
IntWritable ovTmp = (IntWritable) cm.get(nk);
ovTmp.set(ovTmp.get() + 1);
}
else{//add new entry
// be sure to create a copy of nk when you add in to the map
cm.put(new Text(nk), new IntWritable(1));
}
}
}

Hadoop seems to modify my key object during an iteration over values of a given reduce call

Hadoop Version: 0.20.2 (On Amazon EMR)
Problem: I have a custom key that i write during map phase which i added below. During the reduce call, I do some simple aggregation on values for a given key. Issue I am facing is that during the iteration of values in reduce call, my key got changed and i got values of that new key.
My key type:
class MyKey implements WritableComparable<MyKey>, Serializable {
private MyEnum type; //MyEnum is a simple enumeration.
private TreeMap<String, String> subKeys;
MyKey() {} //for hadoop
public MyKey(MyEnum t, Map<String, String> sK) { type = t; subKeys = new TreeMap(sk); }
public void readFields(DataInput in) throws IOException {
Text typeT = new Text();
typeT.readFields(in);
this.type = MyEnum.valueOf(typeT.toString());
subKeys.clear();
int i = WritableUtils.readVInt(in);
while ( 0 != i-- ) {
Text keyText = new Text();
keyText.readFields(in);
Text valueText = new Text();
valueText.readFields(in);
subKeys.put(keyText.toString(), valueText.toString());
}
}
public void write(DataOutput out) throws IOException {
new Text(type.name()).write(out);
WritableUtils.writeVInt(out, subKeys.size());
for (Entry<String, String> each: subKeys.entrySet()) {
new Text(each.getKey()).write(out);
new Text(each.getValue()).write(out);
}
}
public int compareTo(MyKey o) {
if (o == null) {
return 1;
}
int typeComparison = this.type.compareTo(o.type);
if (typeComparison == 0) {
if (this.subKeys.equals(o.subKeys)) {
return 0;
}
int x = this.subKeys.hashCode() - o.subKeys.hashCode();
return (x != 0 ? x : -1);
}
return typeComparison;
}
}
Is there anything wrong with this implementation of key? Following is the code where I am facing the mixup of keys in reduce call:
reduce(MyKey k, Iterable<MyValue> values, Context context) {
Iterator<MyValue> iterator = values.iterator();
int sum = 0;
while(iterator.hasNext()) {
MyValue value = iterator.next();
//when i come here in the 2nd iteration, if i print k, it is different from what it was in iteration 1.
sum += value.getResult();
}
//write sum to context
}
Any help in this would be greatly appreciated.
This is expected behavior (with the new API at least).
When the next method for the underlying iterator of the values Iterable is called, the next key/value pair is read from the sorted mapper / combiner output, and checked that the key is still part of the same group as the previous key.
Because hadoop re-uses the objects passed to the reduce method (just calling the readFields method of the same object) the underlying contents of the Key parameter 'k' will change with each iteration of the values Iterable.

Resources