Merging two SortedMapWritable in Hadoop? - hadoop

I have defined a class called EquivalenceClsAggValue which has a data field of array (called aggValues).
class public class EquivalenceClsAggValue extends Configured implements WritableComparable<EquivalenceClsAggValue>{
public ArrayList<SortedMapWritable> aggValues;
It has a method which take another object of type EquivalenceClsAggValue and merge its aggValues into aggValues of this class as follows:
public void addEquivalenceCls(EquivalenceClsAggValue eq){
//comment: eq contains only one entry as it comes from the mapper
if (this.aggValues.size()==0){ //new line
this.aggValues = eq.aggValues;
return;
}
for(int i=0;i<eq.aggValues.size();i++){
SortedMapWritable cm = aggValues.get(i); //cm: current map
SortedMapWritable nm = eq.aggValues.get(i); //nm: new map
Text nk = (Text) nm.firstKey();//nk: new key
if(cm.containsKey(nk)){//increment the value
IntWritable ovTmp = (IntWritable) cm.get(nk);
int ov = ovTmp.get();
cm.remove(nk);
cm.put(nk, new IntWritable(ov+1));
}
else{//add new entry
cm.put(nk, new IntWritable(1));
}
}
}
But this function is not merging two aggValues. Could someone help me figure it out?
This is how I call this method:
public void reduce(IntWritable keyin,Iterator<EquivalenceClsAggValue> valuein,OutputCollector<IntWritable, EquivalenceClsAggValue> output,Reporter arg3) throws IOException {
EquivalenceClsAggValue comOutput = valuein.next();//initialize the output with the first input
while(valuein.hasNext()){
EquivalenceClsAggValue e = valuein.next();
comOutput.addEquivalenceCls(e);
}
output.collect(keyin, comOutput);
}

Looks like you're falling foul of object re-use. Hadoop re-uses the same object so each call to valuein.next() actually returns the same object reference, but the contents of that object are re-initialised via the readFields method.
Try changing as follows (create a new instance to aggregate into):
EquivalenceClsAggValue comOutput = new EquivalenceClsAggValue();
while(valuein.hasNext()){
EquivalenceClsAggValue e = valuein.next();
comOutput.addEquivalenceCls(e);
}
output.collect(keyin, comOutput);
EDIT: and you probably need to update your aggregate method too (to be wary of object re-use):
public void addEquivalenceCls(EquivalenceClsAggValue eq){
//comment: eq contains only one entry as it comes from the mapper
for(int i=0;i<eq.aggValues.size();i++){
SortedMapWritable cm = aggValues.get(i); //cm: current map
SortedMapWritable nm = eq.aggValues.get(i); //nm: new map
Text nk = (Text) nm.firstKey();//nk: new key
if(cm.containsKey(nk)){//increment the value
// you don't need to remove and re-add, just update the IntWritable
IntWritable ovTmp = (IntWritable) cm.get(nk);
ovTmp.set(ovTmp.get() + 1);
}
else{//add new entry
// be sure to create a copy of nk when you add in to the map
cm.put(new Text(nk), new IntWritable(1));
}
}
}

Related

Where is TextFormFieldBuilder?

The code below, which can be found in the last example here (https://kb.itextpdf.com/home/it7kb/examples/creating-form-fields) uses a class called TextFormFieldBuilder. This class doesn't seem to exist in the API though (at least not for c#). I just downloaded the latest nuget package, and the link has "it7kb" so I assume this documentation is for itext 7.
What am I missing? What do I need to do to make the example work?
namespace iText.Samples.Sandbox.Events
{
public class GenericFields
{
public static readonly String DEST = "results/sandbox/events/generic_fields.pdf";
public static void Main(String[] args)
{
FileInfo file = new FileInfo(DEST);
file.Directory.Create();
new GenericFields().ManipulatePdf(DEST);
}
protected void ManipulatePdf(String dest)
{
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(dest));
Document doc = new Document(pdfDoc);
Paragraph p = new Paragraph();
p.Add("The Effective Date is ");
Text day = new Text(" ");
day.SetNextRenderer(new FieldTextRenderer(day, "day"));
p.Add(day);
p.Add(" day of ");
Text month = new Text(" ");
month.SetNextRenderer(new FieldTextRenderer(month, "month"));
p.Add(month);
p.Add(", ");
Text year = new Text(" ");
year.SetNextRenderer(new FieldTextRenderer(year, "year"));
p.Add(year);
p.Add(" that this will begin.");
doc.Add(p);
doc.Close();
}
private class FieldTextRenderer : TextRenderer
{
protected String fieldName;
public FieldTextRenderer(Text textElement, String fieldName) : base(textElement)
{
this.fieldName = fieldName;
}
// If renderer overflows on the next area, iText uses getNextRender() method to create a renderer for the overflow part.
// If getNextRenderer isn't overriden, the default method will be used and thus a default rather than custom
// renderer will be created
public override IRenderer GetNextRenderer()
{
return new FieldTextRenderer((Text) modelElement, fieldName);
}
public override void Draw(DrawContext drawContext)
{
PdfTextFormField field = new TextFormFieldBuilder(drawContext.GetDocument(), fieldName)
.SetWidgetRectangle(GetOccupiedAreaBBox()).CreateText();
PdfAcroForm.GetAcroForm(drawContext.GetDocument(), true)
.AddField(field);
}
}
}
}
EDIT: I tried the following as it seems to be equivalent logic, but when I run it I get a null reference object on the following line. Specifically, the null reference error happens on the .AddField(field) method call on the last line of the Draw method, but on inspection there is nothing that is null on that line so the error must be coming within that method so I can't tell what the issue is.
PdfTextFormField field = PdfTextFormField.CreateText(drawContext.GetDocument(), GetOccupiedAreaBBox());

Java 8 stream reduce Map

I have a LinkedHashMap which contains multiple entries. I'd like to reduce the multiple entries to a single one in the first step, and than map that to a single String.
For example:
I'm starting with a Map like this:
{"<a>"="</a>", "<b>"="</b>", "<c>"="</c>", "<d>"="</d>"}
And finally I want to get a String like this:
<a><b><c><d></d></c></b></a>
(In that case the String contains the keys in order, than the values in reverse order. But that doesn't really matter, I'd like an general solution)
I think I need map.entrySet().stream().reduce(), but I have no idea what to write in the reduce method, and how to continue.
Since you're reducing entries by concatenating keys with keys and values with values, the identity you're looking for is an entry with empty strings for both key and value.
String reduceEntries(LinkedHashMap<String, String> map) {
Entry<String, String> entry =
map.entrySet()
.stream()
.reduce(
new SimpleImmutableEntry<>("", ""),
(left, right) ->
new SimpleImmutableEntry<>(
left.getKey() + right.getKey(),
right.getValue() + left.getValue()
)
);
return entry.getKey() + entry.getValue();
}
Java 9 adds a static method Map.entry(key, value) for creating immutable entries.
here is an example about how I would do it :
import java.util.LinkedHashMap;
public class Main {
static String result = "";
public static void main(String [] args)
{
LinkedHashMap<String, String> map = new LinkedHashMap<String, String>();
map.put("<a>", "</a>");
map.put("<b>", "</b>");
map.put("<c>", "</c>");
map.put("<d>", "</d>");
map.keySet().forEach(s -> result += s);
map.values().forEach(s -> result += s);
System.out.println(result);
}
}
note: you can reverse values() to get d first with ArrayUtils.reverse()

how to return value from tryAdvance method in Java8's spliterator?

I am new to Java 8 and trying to understand the splitIterator feature of java8.
I have written below code, my requirement is whenever I call get(); the get method should return me one value from itr3; Is it possible to get the same? and how?
public class TestSplitIterator {
static List<Integer> list = new ArrayList<Integer>();
public static void main(String args[]) {
for (int i = 0; i < 100; i++) {
list.add(i);
}
// below method call should return only one value whenever i call it;
get(list);
}
private static int get(List<Integer> list) {
Collections.sort(list, Collections.reverseOrder());
System.out.println(list);
Spliterator<Integer> itr1 = list.spliterator();
Spliterator<Integer> itr2 = itr1.trySplit();
Spliterator<Integer> itr3 = itr2.trySplit();
// i want to return value from itr3 whenever get(List list ic called)
}
}
If I don't misunderstand you. you need a collector object that collect the elements in a spliterator. for example:
Integer[] collector = new Integer[1];
boolean exist = itr3.tryAdvance(value -> collector[0] = value);
System.out.println(collector[0]);
OR collect all of the elements in a spliterator by using another List, for example:
List<Integer> collector = new ArrayList<>();
while (itr3.tryAdvance(collector::add)) ;
System.out.println(collector);

Java 8 is not maintaining the order while grouping

I m using Java 8 for grouping by data. But results obtained are not in order formed.
Map<GroupingKey, List<Object>> groupedResult = null;
if (!CollectionUtils.isEmpty(groupByColumns)) {
Map<String, Object> mapArr[] = new LinkedHashMap[mapList.size()];
if (!CollectionUtils.isEmpty(mapList)) {
int count = 0;
for (LinkedHashMap<String, Object> map : mapList) {
mapArr[count++] = map;
}
}
Stream<Map<String, Object>> people = Stream.of(mapArr);
groupedResult = people
.collect(Collectors.groupingBy(p -> new GroupingKey(p, groupByColumns), Collectors.mapping((Map<String, Object> p) -> p, toList())));
public static class GroupingKey
public GroupingKey(Map<String, Object> map, List<String> cols) {
keys = new ArrayList<>();
for (String col : cols) {
keys.add(map.get(col));
}
}
// Add appropriate isEqual() ... you IDE should generate this
#Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final GroupingKey other = (GroupingKey) obj;
if (!Objects.equals(this.keys, other.keys)) {
return false;
}
return true;
}
#Override
public int hashCode() {
int hash = 7;
hash = 37 * hash + Objects.hashCode(this.keys);
return hash;
}
#Override
public String toString() {
return keys + "";
}
public ArrayList<Object> getKeys() {
return keys;
}
public void setKeys(ArrayList<Object> keys) {
this.keys = keys;
}
}
Here i am using my class groupingKey by which i m dynamically passing from ux. How can get this groupByColumns in sorted form?
Not maintaining the order is a property of the Map that stores the result. If you need a specific Map behavior, you need to request a particular Map implementation. E.g. LinkedHashMap maintains the insertion order:
groupedResult = people.collect(Collectors.groupingBy(
p -> new GroupingKey(p, groupByColumns),
LinkedHashMap::new,
Collectors.mapping((Map<String, Object> p) -> p, toList())));
By the way, there is no reason to copy the contents of mapList into an array before creating the Stream. You may simply call mapList.stream() to get an appropriate Stream.
Further, Collectors.mapping((Map<String, Object> p) -> p, toList()) is obsolete. p->p is an identity mapping, so there’s no reason to request mapping at all:
groupedResult = mapList.stream().collect(Collectors.groupingBy(
p -> new GroupingKey(p, groupByColumns), LinkedHashMap::new, toList()));
But even the GroupingKey is obsolete. It basically wraps a List of values, so you could just use a List as key in the first place. Lists implement hashCode and equals appropriately (but you must not modify these key Lists afterwards).
Map<List<Object>, List<Object>> groupedResult=
mapList.stream().collect(Collectors.groupingBy(
p -> groupByColumns.stream().map(p::get).collect(toList()),
LinkedHashMap::new, toList()));
Based on #Holger's great answer. I post this to help those who want to keep the order after grouping as well as changing the mapping.
Let's simplify and suppose we have a list of persons (int age, String name, String adresss...etc) and we want the names grouped by age while keeping ages in order:
final LinkedHashMap<Integer, List<String> map = myList
.stream()
.sorted(Comparator.comparing(p -> p.getAge())) //sort list by ages
.collect(Collectors.groupingBy(p -> p.getAge()),
LinkedHashMap::new, //keeps the order
Collectors.mapping(p -> p.getName(), //map name
Collectors.toList())));

Hadoop seems to modify my key object during an iteration over values of a given reduce call

Hadoop Version: 0.20.2 (On Amazon EMR)
Problem: I have a custom key that i write during map phase which i added below. During the reduce call, I do some simple aggregation on values for a given key. Issue I am facing is that during the iteration of values in reduce call, my key got changed and i got values of that new key.
My key type:
class MyKey implements WritableComparable<MyKey>, Serializable {
private MyEnum type; //MyEnum is a simple enumeration.
private TreeMap<String, String> subKeys;
MyKey() {} //for hadoop
public MyKey(MyEnum t, Map<String, String> sK) { type = t; subKeys = new TreeMap(sk); }
public void readFields(DataInput in) throws IOException {
Text typeT = new Text();
typeT.readFields(in);
this.type = MyEnum.valueOf(typeT.toString());
subKeys.clear();
int i = WritableUtils.readVInt(in);
while ( 0 != i-- ) {
Text keyText = new Text();
keyText.readFields(in);
Text valueText = new Text();
valueText.readFields(in);
subKeys.put(keyText.toString(), valueText.toString());
}
}
public void write(DataOutput out) throws IOException {
new Text(type.name()).write(out);
WritableUtils.writeVInt(out, subKeys.size());
for (Entry<String, String> each: subKeys.entrySet()) {
new Text(each.getKey()).write(out);
new Text(each.getValue()).write(out);
}
}
public int compareTo(MyKey o) {
if (o == null) {
return 1;
}
int typeComparison = this.type.compareTo(o.type);
if (typeComparison == 0) {
if (this.subKeys.equals(o.subKeys)) {
return 0;
}
int x = this.subKeys.hashCode() - o.subKeys.hashCode();
return (x != 0 ? x : -1);
}
return typeComparison;
}
}
Is there anything wrong with this implementation of key? Following is the code where I am facing the mixup of keys in reduce call:
reduce(MyKey k, Iterable<MyValue> values, Context context) {
Iterator<MyValue> iterator = values.iterator();
int sum = 0;
while(iterator.hasNext()) {
MyValue value = iterator.next();
//when i come here in the 2nd iteration, if i print k, it is different from what it was in iteration 1.
sum += value.getResult();
}
//write sum to context
}
Any help in this would be greatly appreciated.
This is expected behavior (with the new API at least).
When the next method for the underlying iterator of the values Iterable is called, the next key/value pair is read from the sorted mapper / combiner output, and checked that the key is still part of the same group as the previous key.
Because hadoop re-uses the objects passed to the reduce method (just calling the readFields method of the same object) the underlying contents of the Key parameter 'k' will change with each iteration of the values Iterable.

Resources