How to do poll values from Priority queue based on a condition - java-8

I have map Map<String, PriorityQueue> where the queue is ordered based on the score (reverse). I populated the map from a List where key being data.getGroup and value being Dataitself.
Now my usecase is,
if the size of the map is <=3, I just want to return the Data object so I am just doing a poll top values(Data object) for each key and
if the size of the map is > 3 then I need to get 3 values(1 value/key) from the map based on the score.
For eg:
// output should be just Data(17.0, "five", "D"), Data(4.0, "two", "A"), Data(3.0, "three", "B") though there will be only 4 keys (A,B,C,D)
ArrayList<Data> dataList = new ArrayList<Data>();
dataList.add(new Data(1.0, "one", "A"));
dataList.add(new Data(4.0, "two", "A"));
dataList.add(new Data(3.0, "three", "B"));
dataList.add(new Data(2.0, "four", "C"));
dataList.add(new Data(7.0, "five", "D"));
dataList.add(new Data(17.0, "five", "D"));
// output should be just Data(5.0, "six", "A"), Data(3.14, "two", "B"), Data(3.14, "three", "C") as there will be only 3 keys (A,B,C)
ArrayList<Data> dataList2 = new ArrayList<Data>();
dataList2.add(new Data(3.0, "one", "A"));
dataList2.add(new Data(5.0, "six", "A"));
dataList2.add(new Data(3.14, "two", "B"));
dataList2.add(new Data(3.14, "three", "C"));
I tried the below, but is there a better/smarter (optimized) way to do it in Java?
// n = 3
public List<Data> getTopN(final List<Data> dataList, final int n) {
private static final Comparator< Data > comparator = Comparator.comparing(Data::getScore).reversed();
Map<String, PriorityQueue<Data>> map = Maps.newHashMap();
for (Data data : dataList) {
String key = data.getGroup();
if (key != null) {
if (!map.containsKey(key)) {
map.put(key, new PriorityQueue<>(comparator));
}
map.get(key).add(data);
}
}
if (map.size <= n) {
List<Data> result = new ArrayList<Data>();
for (Map.Entry<String, PriorityQueue<Data>> entrySet: map.entrySet()){
PriorityQueue<Data> priorityQueue = entrySet.getValue();
result.add(priorityQueue.peek());
}
return result;
} else if (map.size > n) {
List<Data> result = new ArrayList<Data>();
for (Map.Entry<String, PriorityQueue<Data>> entrySet: map.entrySet()){
PriorityQueue<Data> priorityQueue = entrySet.getValue();
result.add(priorityQueue.peek());
}
return result.stream()
.sorted(Comparator.comparingDouble(Data::getScore).reversed())
.limit(n)
.collect(Collectors.toList());
}
}
Data Object looks like this:
public class Data {
double score;
String name;
String group;
public void setName(String name) {
this.name = name;
}
public void setGroup(String group) {
this.group = group;
}
public void setScore(double score) {
this.score = score;
}
public String getName() {
return name;
}
public String getGroup() {
return group;
}
public double getScore() {
return score;
}
}

Since your starting point is a List<Data>, there’s not much sense in adding the elements to a Map<String, PriorityQueue<Data>> when all you’re interested in is one value, i.e. the maximum value, per key. In that case, you can simply store the maximum value.
Further, it’s worth considering the differences between the map methods keySet(), values(), and entrySet(). Using the latter is only useful when you’re interested in both, key and value, within the loop’s body. Otherwise, use either keySet() or values() to simplify the operation.
Only when trying to get the top n values from the map, using a PriorityQueue may improve the performance:
private static final Comparator<Data> BY_SCORE = Comparator.comparing(Data::getScore);
private static final BinaryOperator<Data> MAX = BinaryOperator.maxBy(BY_SCORE);
public List<Data> getTopN(List<Data> dataList, int n) {
Map<String, Data> map = new HashMap<>();
for(Data data: dataList) {
String key = data.getGroup();
if(key != null) map.merge(key, data, MAX);
}
if(map.size() <= n) {
return new ArrayList<>(map.values());
}
else {
PriorityQueue<Data> top = new PriorityQueue<>(n, BY_SCORE);
for(Data d: map.values()) {
top.add(d);
if(top.size() > n) top.remove();
}
return new ArrayList<>(top);
}
}
Note that the BinaryOperator.maxBy(…) is using the ascending order as basis and also the priority queue now needs the ascending order, as we’re removing the smallest elements such that the top n remain in the queue for the result. Therefore, reversed() has been removed from the Comparator here.
Using a priority queue provides a benefit if n is small, especially in comparison to the map’s size. If n is rather large or expected to be close to the map’s size, it is likely more efficient to use
List<Data> top = new ArrayList<>(map.values());
top.sort(BY_SCORE.reversed());
top.subList(n, top.size()).clear();
return top;
which sorts all of the map’s values in descending order and removes the excess elements. This can be combined with the code handling the map.size() <= n scenario:
public List<Data> getTopN(List<Data> dataList, int n) {
Map<String, Data> map = new HashMap<>();
for(Data data: dataList) {
String key = data.getGroup();
if(key != null) map.merge(key, data, MAX);
}
List<Data> top = new ArrayList<>(map.values());
if(top.size() > n) {
top.sort(BY_SCORE.reversed());
top.subList(n, top.size()).clear();
}
return top;
}

Related

Group the data into a Map<Long, List<Long>> where Lists need to be sorted

Assume I have the following domain object:
public class MyObj {
private Long id;
private Long relationId;
private Long seq;
// getters
}
There is a list List<MyObj> list. I want to create a Map by grouping the data by relationId (*key) and sort values (value is a list of id).
My code without sort values:
List<MyObj> list = getMyObjs();
// key: relationId, value: List<Long> ids (needs to be sorted)
Map<Long, List<Long>> map = list.stream()
.collect(Collectors.groupingBy(
MyObj::getRelationId,
Collectors.mapping(MyObj::getId, toList())
));
public class MyObjComparator{
public static Comparator<MyObj> compare() {
...
}
}
I have created compare method MyObjComparator::compare, my question is how to sort this map's values in the above stream.
To obtain the Map having the sorted lists of id as Values, you can sort the stream elements by id before collecting them (as #shmosel has pointed out in the comment).
Collector groupingBy() will preserve the order of stream elements while storing them into Lists. In short, the only case when the order can break is while merging the partial results during parallel execution using an unordered collector (which has a leeway of combining results in arbitrary order). groupingBy() is not unordered, therefore the order of values in the list would reflect the initial order of elements in the stream. You can find detailed explanation in this answer by #Holger.
You don't need a TreeMap (or a LinkedHashMap), unless you want the Entries of the Map to be sorted as well.
List<MyObj> list = getMyObjs();
// Key: relationId, Value: List<Long> ids (needs to be sorted)
Map<Long, List<Long>> map = list.stream()
.sorted(Comparator.comparing(MyObj::getId))
.collect(Collectors.groupingBy(
MyObj::getRelationId,
Collectors.mapping(MyObj::getId, Collectors.toList())
));
As #dan1st said you can use TreeMap if you want to sort keys
If you want to sort values you can only sort them before they are grouped and then they are grouped again
#Data
#AllArgsConstructor
public class MyObj {
private Long relationId;
private Long id;
static int comparing(MyObj obj,MyObj obj2){
return obj.getId().compareTo(obj2.getId());
}
public static void main(String[] args) {
List<MyObj> list = new ArrayList<>();
list.add(new MyObj(2L, 3L));
list.add(new MyObj(2L, 1L));
list.add(new MyObj(2L, 5L));
list.add(new MyObj(1L, 1L));
list.add(new MyObj(1L, 2L));
list.add(new MyObj(1L, 3L));
Map<Long, List<Long>> collect = list.stream()
// value sort
// .sorted(MyObj::comparing)
.collect(groupingBy(MyObj::getRelationId,
// key sort
// (Supplier<Map<Long, List<Long>>>) () -> new TreeMap<>(Long::compareTo),
mapping(MyObj::getId, toList())));
try {
// collect = {"1":[1,2,3],"2":[1,3,5]}
System.out.println("collect = " + new ObjectMapper().writeValueAsString(collect));
} catch (JsonProcessingException e) {
throw new RuntimeException(e);
}
}
}
It appears from the presented code that the resulting map should look like Map<Long, List<Long>> with the key - relationId and the value - list of id. Therefore custom comparator for List<Long> should be implemented like this:
Comparator<List<Long>> cmp = (a, b) -> {
for (int i = 0, n = Math.min(a.size(), b.size()); i < n; i++) {
int res = Long.compare(a.get(i), b.get(i));
if (res != 0) {
return res;
}
}
return Integer.compare(a.size(), b.size());
};
This comparator should be applied to the entry set of the map, however, either the map should be converted into a SortedSet of the map entries, or a LinkedHashMap needs to be recreated on the basis of the comparator:
Map<Long, List<Long>> map = list.stream()
.collect(groupingBy(
MyObj::getRelationId, Collectors.mapping(MyObj::getId, toList())
))
.entrySet()
.stream()
.sorted(cmp)
.collect(toMap(Map.Entry::getKey, Map.Entry::getValue, (a, b) -> a, LinkedHashMap::new));
Not clear if you want to sort by ids or by elements and extract the ids.
In the first case you can use an ending operation after the collect to get a sorted list. As Collections.sort() can sort a list, you just have to call it at the appropriate place.
Map<Long, List<Long>> map = list.stream()
.collect(Collectors.groupingBy(
MyObj::getRelationId,
Collectors.collectingAndThen(Collectors.mapping(MyObj::getId,toList()),
l -> { Collections.sort(l, your_comparator); return l; })
));
In the second case you just need to sort the stream (if it is finite and not too big).

Java Map: group by key's attribute and max over value

I have an instance of Map<Reference, Double> the challenge is that the key objects may contain a reference to the same object, I need to return a map of the same type of the "input" but grouped by the attribute key and by retaining the max value.
I tried by using groupingBy and maxBy but I'm stuck.
private void run () {
Map<Reference, Double> vote = new HashMap<>();
Student s1 = new Student(12L);
vote.put(new Reference(s1), 66.5);
vote.put(new Reference(s1), 71.71);
Student s2 = new Student(44L);
vote.put(new Reference(s2), 59.75);
vote.put(new Reference(s2), 64.00);
// I need to have a Collection of Reference objs related to the max value of the "vote" map
Collection<Reference> maxVote = vote.entrySet().stream().collect(groupingBy(Map.Entry.<Reference, Double>comparingByKey(new Comparator<Reference>() {
#Override
public int compare(Reference r1, Reference r2) {
return r1.getObjId().compareTo(r2.getObjId());
}
}), maxBy(Comparator.comparingDouble(Map.Entry::getValue))));
}
class Reference {
private final Student student;
public Reference(Student s) {
this.student = s;
}
public Long getObjId() {
return this.student.getId();
}
}
class Student {
private final Long id;
public Student (Long id) {
this.id = id;
}
public Long getId() {
return id;
}
}
I have an error in the maxBy argument: Comparator.comparingDouble(Map.Entry::getValue) and I don't know how to fix it. Is there a way to achieve the expected result?
You can use Collectors.toMap to get the collection of Map.Entry<Reference, Double>
Collection<Map.Entry<Reference, Double>> result = vote.entrySet().stream()
.collect(Collectors.toMap( a -> a.getKey().getObjId(), Function.identity(),
BinaryOperator.maxBy(Comparator.comparingDouble(Map.Entry::getValue)))).values();
then stream over again to get List<Reference>
List<Reference> result = vote.entrySet().stream()
.collect(Collectors.toMap(a -> a.getKey().getObjId(), Function.identity(),
BinaryOperator.maxBy(Comparator.comparingDouble(Map.Entry::getValue))))
.values().stream().map(e -> e.getKey()).collect(Collectors.toList());
Using your approach of groupingBy and maxBy:
Comparator<Entry<Reference, Double>> c = Comparator.comparing(e -> e.getValue());
Map<Object, Optional<Entry<Reference, Double>>> map =
vote.entrySet().stream()
.collect(
Collectors.groupingBy
(
e -> ((Reference) e.getKey()).getObjId(),
Collectors.maxBy(c)));
// iterate to get the result (or invoke another stream)
for (Entry<Object, Optional<Entry<Reference, Double>>> obj : map.entrySet()) {
System.out.println("Student Id:" + obj.getKey() + ", " + "Max Vote:" + obj.getValue().get().getValue());
}
Output (For input in your question):
Student Id:12, Max Vote:71.71
Student Id:44, Max Vote:64.0

How to group objects in Java 8

WalletCreditNoteVO a1 = new WalletCreditNoteVO(1L, 1L, "A", WalletCreditNoteStatus.EXPIRED, null, null, CreditNoteType.CAMPAIGN_VOUCHER, BigDecimal.ONE, BigDecimal.ONE, "GBP");
WalletCreditNoteVO a2 = new WalletCreditNoteVO(1L, 1L, "A", WalletCreditNoteStatus.EXPIRED, null, null, CreditNoteType.CAMPAIGN_VOUCHER, BigDecimal.ONE, BigDecimal.TEN, "GBP");
WalletCreditNoteVO a3 = new WalletCreditNoteVO(2L, 1L, "A", WalletCreditNoteStatus.EXPIRED, null, null, CreditNoteType.CAMPAIGN_VOUCHER, BigDecimal.ONE, BigDecimal.ONE, "GBP");
WalletCreditNoteVO a4 = new WalletCreditNoteVO(2L, 1L, "A", WalletCreditNoteStatus.EXPIRED, null, null, CreditNoteType.CAMPAIGN_VOUCHER, BigDecimal.ONE, BigDecimal.TEN, "GBP");
final List<WalletCreditNoteVO> walletCreditNoteVOs = Lists.newArrayList(a1, a2, a3, a4);
Map<WalletCreditNoteVO, BigDecimal> collect2 = walletCreditNoteVOs.stream().collect(
groupingBy(wr -> new WalletCreditNoteVO(wr.getCreditNoteId(), wr.getWalletCustomerId(), wr.getCreditNoteTitle(),
wr.getWalletCreditNoteStatus(), wr.getCreditNoteStartDate(), wr.getCreditNoteExpiryDate(), wr.getCreditNoteType(), wr.getCreditNoteValue(), wr.getCurrency()),
mapping(WalletCreditNoteVO::getAvailableBalance,
reducing(BigDecimal.ZERO, (sum, elem) -> sum.add(elem)))));
I want to introduce condition for final reducing to be either sum (as written above) or last value in the list of BigDecimal based on the status of getWalletCreditNoteStatus
Can someone please help.
Thanks #xiumeteo . Below is improved solution
Function<WalletCreditNoteVO, WalletCreditNoteVO> function = wr -> new WalletCreditNoteVO(wr.getCreditNoteId(), wr.getWalletCustomerId(), wr.getCreditNoteTitle(),
wr.getWalletCreditNoteStatus(), wr.getCreditNoteStartDate(), wr.getCreditNoteExpiryDate(), wr.getCreditNoteType(), wr.getCreditNoteValue(), wr.getCurrency());
final Map<WalletCreditNoteVO, BigDecimal> collectMap =
walletCreditNoteVOs.stream()
.collect(groupingBy(function, LinkedHashMap::new, Collectors.collectingAndThen(
toList(),
(list) -> {
final List<BigDecimal> availableBalances = list.stream().map(WalletCreditNoteVO::getAvailableBalance).collect(toList());
if (list.stream().allMatch(WalletCreditNoteVO::isStatusExpired)) {
return availableBalances.stream().filter(o -> o != null).reduce((a, b) -> b).orElse(null).abs();
} else {
return availableBalances.stream().filter(o -> o != null).reduce(BigDecimal.ZERO, BigDecimal::add);
}
})));
List<WalletCreditNoteVO> walletCreditNoteVOGrouped = new ArrayList<>();
for(Map.Entry<WalletCreditNoteVO, BigDecimal> entry : collectMap.entrySet()){
WalletCreditNoteVO key = entry.getKey();
key.setAvailableBalance(entry.getValue());
walletCreditNoteVOGrouped.add(key);
}
I now want to remove 'for loop' and stream logic should just give me one list of WalletCreditNoteVO instead of Map of WalletCreditNoteVO as key and BigDecimal as value, with value set directly in the WalletCreditNoteVO
Thanks all again (I can't add code in my comments so adding it here).
So I did a test for your case, I created a dummy class that resembles yours:
public static class Something{
private String name;
private Integer sum;
private boolean checker;
public Something(String name, Integer sum, boolean checker) {
this.name = name;
this.sum = sum;
this.checker = checker;
}
public String getName() {
return name;
}
public boolean isChecker() {
return checker;
}
public Integer getSum() {
return sum;
}
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
Something something = (Something) o;
return new EqualsBuilder().append(getName(), something.getName()).append(getSum(), something.getSum()).isEquals();
}
#Override
public int hashCode() {
return new HashCodeBuilder(17, 37).append(getName()).append(getSum()).toHashCode();
}
}
And then I did this little test
List<Something> items = Arrays.asList(new Something("name", 10, false), new Something("name", 14, true), new Something("name", 11, false),
new Something("name", 11, false), new Something("noName", 12, false));
final Map<Something, Integer> somethingToSumOrLastElement =
items.stream()
.collect(Collectors.groupingBy(Function.identity(),
Collectors.collectingAndThen(
Collectors.toList(), // first we collect all your related items into a list
(list) -> { //this collector allow us to have a finisher, Function<List<Something>, Object>, let's define it
final List<Integer> integerStream = list.stream().map(Something::getSum).collect(Collectors.toList());
if (list.stream().allMatch(Something::isChecker)) { // we check for the method you want to check
//you have to change this depending on required logic
//for this case if that's true for every element in the list, we do the reduce by summing
return integerStream.stream().reduce(0, (sum, next) -> sum + next);
}
//if not, we just get the last element of that list
return integerStream.stream().reduce(0, (sum, next) -> next);
})));
I think this is ok, but maybe someone has a better idea on how to handle your issue.
Ping me if you need clarification :)

Java 8 is not maintaining the order while grouping

I m using Java 8 for grouping by data. But results obtained are not in order formed.
Map<GroupingKey, List<Object>> groupedResult = null;
if (!CollectionUtils.isEmpty(groupByColumns)) {
Map<String, Object> mapArr[] = new LinkedHashMap[mapList.size()];
if (!CollectionUtils.isEmpty(mapList)) {
int count = 0;
for (LinkedHashMap<String, Object> map : mapList) {
mapArr[count++] = map;
}
}
Stream<Map<String, Object>> people = Stream.of(mapArr);
groupedResult = people
.collect(Collectors.groupingBy(p -> new GroupingKey(p, groupByColumns), Collectors.mapping((Map<String, Object> p) -> p, toList())));
public static class GroupingKey
public GroupingKey(Map<String, Object> map, List<String> cols) {
keys = new ArrayList<>();
for (String col : cols) {
keys.add(map.get(col));
}
}
// Add appropriate isEqual() ... you IDE should generate this
#Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final GroupingKey other = (GroupingKey) obj;
if (!Objects.equals(this.keys, other.keys)) {
return false;
}
return true;
}
#Override
public int hashCode() {
int hash = 7;
hash = 37 * hash + Objects.hashCode(this.keys);
return hash;
}
#Override
public String toString() {
return keys + "";
}
public ArrayList<Object> getKeys() {
return keys;
}
public void setKeys(ArrayList<Object> keys) {
this.keys = keys;
}
}
Here i am using my class groupingKey by which i m dynamically passing from ux. How can get this groupByColumns in sorted form?
Not maintaining the order is a property of the Map that stores the result. If you need a specific Map behavior, you need to request a particular Map implementation. E.g. LinkedHashMap maintains the insertion order:
groupedResult = people.collect(Collectors.groupingBy(
p -> new GroupingKey(p, groupByColumns),
LinkedHashMap::new,
Collectors.mapping((Map<String, Object> p) -> p, toList())));
By the way, there is no reason to copy the contents of mapList into an array before creating the Stream. You may simply call mapList.stream() to get an appropriate Stream.
Further, Collectors.mapping((Map<String, Object> p) -> p, toList()) is obsolete. p->p is an identity mapping, so there’s no reason to request mapping at all:
groupedResult = mapList.stream().collect(Collectors.groupingBy(
p -> new GroupingKey(p, groupByColumns), LinkedHashMap::new, toList()));
But even the GroupingKey is obsolete. It basically wraps a List of values, so you could just use a List as key in the first place. Lists implement hashCode and equals appropriately (but you must not modify these key Lists afterwards).
Map<List<Object>, List<Object>> groupedResult=
mapList.stream().collect(Collectors.groupingBy(
p -> groupByColumns.stream().map(p::get).collect(toList()),
LinkedHashMap::new, toList()));
Based on #Holger's great answer. I post this to help those who want to keep the order after grouping as well as changing the mapping.
Let's simplify and suppose we have a list of persons (int age, String name, String adresss...etc) and we want the names grouped by age while keeping ages in order:
final LinkedHashMap<Integer, List<String> map = myList
.stream()
.sorted(Comparator.comparing(p -> p.getAge())) //sort list by ages
.collect(Collectors.groupingBy(p -> p.getAge()),
LinkedHashMap::new, //keeps the order
Collectors.mapping(p -> p.getName(), //map name
Collectors.toList())));

Hadoop seems to modify my key object during an iteration over values of a given reduce call

Hadoop Version: 0.20.2 (On Amazon EMR)
Problem: I have a custom key that i write during map phase which i added below. During the reduce call, I do some simple aggregation on values for a given key. Issue I am facing is that during the iteration of values in reduce call, my key got changed and i got values of that new key.
My key type:
class MyKey implements WritableComparable<MyKey>, Serializable {
private MyEnum type; //MyEnum is a simple enumeration.
private TreeMap<String, String> subKeys;
MyKey() {} //for hadoop
public MyKey(MyEnum t, Map<String, String> sK) { type = t; subKeys = new TreeMap(sk); }
public void readFields(DataInput in) throws IOException {
Text typeT = new Text();
typeT.readFields(in);
this.type = MyEnum.valueOf(typeT.toString());
subKeys.clear();
int i = WritableUtils.readVInt(in);
while ( 0 != i-- ) {
Text keyText = new Text();
keyText.readFields(in);
Text valueText = new Text();
valueText.readFields(in);
subKeys.put(keyText.toString(), valueText.toString());
}
}
public void write(DataOutput out) throws IOException {
new Text(type.name()).write(out);
WritableUtils.writeVInt(out, subKeys.size());
for (Entry<String, String> each: subKeys.entrySet()) {
new Text(each.getKey()).write(out);
new Text(each.getValue()).write(out);
}
}
public int compareTo(MyKey o) {
if (o == null) {
return 1;
}
int typeComparison = this.type.compareTo(o.type);
if (typeComparison == 0) {
if (this.subKeys.equals(o.subKeys)) {
return 0;
}
int x = this.subKeys.hashCode() - o.subKeys.hashCode();
return (x != 0 ? x : -1);
}
return typeComparison;
}
}
Is there anything wrong with this implementation of key? Following is the code where I am facing the mixup of keys in reduce call:
reduce(MyKey k, Iterable<MyValue> values, Context context) {
Iterator<MyValue> iterator = values.iterator();
int sum = 0;
while(iterator.hasNext()) {
MyValue value = iterator.next();
//when i come here in the 2nd iteration, if i print k, it is different from what it was in iteration 1.
sum += value.getResult();
}
//write sum to context
}
Any help in this would be greatly appreciated.
This is expected behavior (with the new API at least).
When the next method for the underlying iterator of the values Iterable is called, the next key/value pair is read from the sorted mapper / combiner output, and checked that the key is still part of the same group as the previous key.
Because hadoop re-uses the objects passed to the reduce method (just calling the readFields method of the same object) the underlying contents of the Key parameter 'k' will change with each iteration of the values Iterable.

Resources