java8 stream grouping and sorting on aggregate sum - java-8

Given a java class Something
class Something {
private int parentKey;
private String parentName;
private int childKey;
private int noThings;
public Something(int parentKey, String parentName, int childKey,
int noThings) {
this.parentKey = parentKey;
this.parentName = parentName;
this.childKey = childKey;
this.noThings = noThings;
}
public int getParentKey() {
return this.parentKey;
}
public int getNoThings() {
return this.noThings;
}
}
I have a list of something objects
List<Something> somethings = newArrayList(
new Something(425, "Lemon", 44, 23),
new Something(123, "Orange", 125, 66),
new Something(425, "Lemon", 11, 62),
new Something(123, "Orange", 126, 32),
new Something(323, "Lime", 25, 101),
new Something(123, "Orange", 124, 88)
);
I want to be able to sort them so that the they are sorted by the cumulative sum of the noThings per parent object and then by the noThings.
So that I end up with
List<Something> sortedSomethings = newArrayList(
new Something(123, "Orange", 124, 88),
new Something(123, "Orange", 125, 66),
new Something(123, "Orange", 126, 32),
new Something(323, "Lime", 25, 101),
new Something(425, "Lemon", 11, 62),
new Something(425, "Lemon", 44, 23)
);
I know that to map it by parentKey and sum of noThings is
Map<Integer, Integer> totalNoThings = colns
.stream()
.collect(
Collectors.groupingBy(
Something::getParentKey,
Collectors.summingInt(ClientCollectionsReceived::getNoThings)));
I thought that maybe wrapping my Something class and having the total per Parent Key might work in someway.
class SomethingWrapper {
private int totalNoThingsPerClient;
private Something something;
}
But it seems like a lot of work and not very elegant.
Any observations/ ideas would be gratefully appreciated.

Well, you already did the main work by collecting the aggregate information
Map<Integer, Integer> totalNoThings = somethings.stream()
.collect(Collectors.groupingBy(Something::getParentKey,
Collectors.summingInt(Something::getNoThings)));
then all you need to do is utilizing these information in a sort operation:
List<Something> sorted=somethings.stream().sorted(
Comparator.comparing((Something x)->totalNoThings.get(x.getParentKey()))
.thenComparing(Something::getNoThings).reversed())
.collect(Collectors.toList());

Actually had to make one small tweak, rather than totalNoThings.get, it was totalNothings.indexOf
So final soln. was
List<Integer> totalNoThings
= somethings.stream()
.collect(Collectors.groupingBy(Something::getParentKey,
Collectors.summingInt(Something::getNoThings)))
.entrySet().stream()
.sorted(Map.Entry.comparingByValue())
.map(Map.Entry::getKey)
.collect(Collectors.toList());
List<Something> sorted
= somethings.stream().sorted(
Comparator.comparing(
(Something obj)->totalNoThings.indexOf(
obj.getParentKey()))
.thenComparing(Something::getNoThings).reversed())
.collect(Collectors.toList());

Related

Java 8 - How to filter in java 8 to get results from different categories of Dish Type?

I am new to Java 8 and looking to do something interesting here. Actually looking to get the Highest Caloric Dish from Each Dish.Type.
I tried something below, but its giving me all the values from the Dish.Type.
#Builder
#Data
#AllArgsConstructor
public class Dish {
public enum Type { MEAT, FISH, OTHER }
private final String name;
private final boolean vegetarian;
private final int calories;
private final Type type;
public static final List<Dish> menu =
Arrays.asList( new Dish("pork", false, 800, Dish.Type.MEAT),
new Dish("beef", false, 700, Dish.Type.MEAT),
new Dish("chicken", false, 400, Dish.Type.MEAT),
new Dish("french fries", true, 530, Dish.Type.OTHER),
new Dish("rice", true, 350, Dish.Type.OTHER),
new Dish("season fruit", true, 120, Dish.Type.OTHER),
new Dish("pizza", true, 550, Dish.Type.OTHER),
new Dish("prawns", false, 400, Dish.Type.FISH),
new Dish("salmon", false, 450, Dish.Type.FISH));
}
Looking to get result - get highest Caloric Dish from each different Dish Type.
I tried below, but gives all elements from the same dish type. Any pointers ?
List<Dish> truncatingStream = Dish.menu.stream().filter(d -> d.getCalories() > 300).limit(3).collect(toList());
truncatingStream.forEach(System.out::println);
Map<Type, Dish> map = menu.stream()
.collect(Collectors.toMap(
Dish::getType,
Function.identity(),
BinaryOperator.maxBy(Comparator.comparing(Dish::getCalories))));
You need to collect to a Map where key is Type and value is Dish. When you encounter two dishes of the same Type - you merge them or you take the max, according to the Comparator.comparing(Dish::getCalories)) -meaning the one that has the most calories. This is what :
BinaryOperator.maxBy(Comparator.comparing(Dish::getCalories)))
merger is doing.

PercentileAggregation - Convert into HashMap

I am using PercentileAggregation in my code.
Results from _plugin/head:
"aggregations": {
"load_time_outlier": {
"values": {
"1.0": 35,
"1.0_as_string": "35.0",
"5.0": 35,
"5.0_as_string": "35.0",
"25.0": 35,
"25.0_as_string": "35.0",
"50.0": 35,
"50.0_as_string": "35.0",
"75.0": 35,
"75.0_as_string": "35.0",
"95.0": 36,
"95.0_as_string": "36.0",
"99.0": 36,
"99.0_as_string": "36.0"
}
}
}
through the Java client( TCP), I am getting it as InternalPercentiles.
Aggregations aggregations = response.getAggregations();
if(aggregations.getAsMap().get(aggregationKey) instanceof InternalPercentiles){
InternalPercentiles intPercentiles =
(InternalPercentiles) aggregations.getAsMap().get(aggregationKey);
//My logic here
}
I want to write a logic in commented place, so that I would get my result as a map:
Key: load_time_outlier
value: list containing a Map of [{"1.0": 35}, { "5.0": 35},etc..]
Logic I tried:
Iterator<Percentile> iterator = intPercentiles.iterator();
Map<String, Object> aggregationTermsMap = new LinkedHashMap<String, Object>();
while(iterator.hasNext()){
Percentile percentile = iterator.next();
aggregationTermsMap.put(new Double(percentile.getPercent()).toString(), percentile.getValue());
}
aggregationTermsList.add(aggregationTermsMap);
aggregationResults.put(aggregationKey, aggregationTermsList);
inputs please.
Got an answer: Class cast ((InternalPercentiles)intPercentiles).iterator() was missing.
Iterator<Percentile> iterator = ((InternalPercentiles)intPercentiles).iterator();
Map<String, Object> aggregationTermsMap = new LinkedHashMap<String, Object>();
while(iterator.hasNext()){
Percentile percentile = iterator.next();
aggregationTermsMap.put(new Double(percentile.getPercent()).toString(), percentile.getValue());
}
aggregationTermsList.add(aggregationTermsMap);
aggregationResults.put(aggregationKey, aggregationTermsList);

Linq subselect filter

probably someone can help me with this (at least for me) complicated problem.
Lets say i have the following data (in DB)
Tab1 (id_t1): Item
(1)
(2)
(3)
Tab2 (id_t2, id_t1): Group
(4, 1)
(5, 1)
(6, 2)
(7, 3)
Tab3 (id_t3, id_t2, v): GroupField
(10, 4, 100)
(11, 4, 300)
(12, 5, 200)
(13, 6, 100)
(14, 6, 200)
(15, 7, 100)
(16, 7, 300)
Now i'd like to select all Items that include all of some specific GroupFields.
Eg. i have v = list(100,200)
and i like to get back 1,2 but not 3
1 because Group4 holds the Field10 with v=100 and Group5 holds Field12 with v=200
and 2 because Group6 holds Field13 with v=100 and Field14 with v=200
Is something like this possible in Linq? (i allready tried different ways (any/all) but without success so far.
I don't get the point how to overcome that "field can be in any Group and not all in one Group"...
I don't even know how to do this in SQL in one command without using temp-tables/cursors.
_rene
Try this:
var result =
groups.Join(fields, o => o.Id, i => i.GroupId,
(o, i) => new { Group = o, Field = i } )
.GroupBy(x => x.Group.ItemId)
.Where(x => values.All(y => x.Any(z => z.Field.Value == y)))
.Select(x => x.Key)
.Distinct();
The following classes are used:
class Group
{
public Group(int id, int itemId)
{
Id = id;
ItemId = itemId;
}
public int Id { get; set; }
public int ItemId { get; set; }
}
class GroupField
{
public GroupField(int id, int groupId, int value)
{
Id = id;
GroupId = groupId;
Value = value;
}
public int Id { get; set; }
public int GroupId { get; set; }
public int Value { get; set; }
}
and the following initialization:
var groups = new [] { new Group(4, 1), new Group(5, 1),
new Group(6, 2), new Group(7, 3) };
var fields = new [] { new GroupField(10, 4, 100),
new GroupField(11, 4, 300),
new GroupField(12, 5, 200),
new GroupField(13, 6, 100),
new GroupField(14, 6, 200),
new GroupField(15, 7, 100),
new GroupField(16, 7, 300)
};
var values = new [] { 100, 200 };

Can these row test style unit tests be improved to follow good TDD design practices?

Can the following unittest be improved, to follow good TDD design practises (naming, using rowtests, designing the classes) in any of the .NET TDD/BDD frameworks?
Also, is there a better way in any of the frameworks to have rowtests where I can have a individual expectation for each row, just like I do it in this (NUnit) example?
The system under test here is the Constraint class that can have multiple ranges of valid integers. The test test the NarrowDown method that can make the valid ranges smaller based on another constraint.
[TestFixture]
internal class ConstraintTests
{
[Test]
public void NarrowDown_Works()
{
RowTest_NarrowDown(
new Range[] { new Range(0, 10), new Range(20, 30), new Range(40, 50) },
new Range[] { new Range(1, 9), new Range(21, 29), new Range(41, 49) },
new Range[] { new Range(1, 9), new Range(21, 29), new Range(41, 49) });
RowTest_NarrowDown(
new Range[] { new Range(0, 10), new Range(20, 30), new Range(40, 50), new Range(60, 70) },
new Range[] { new Range(1, 9), new Range(21, 29), new Range(41, 49) },
new Range[] { new Range(1, 9), new Range(21, 29), new Range(41, 49) });
RowTest_NarrowDown(
new Range[] { new Range(0, 10), new Range(20, 30), new Range(40, 50) },
new Range[] { new Range(1, 9), new Range(21, 29), new Range(41, 49), new Range(60, 70) });
}
private static void RowTest_NarrowDown(IEnumerable<Range> sut, IEnumerable<Range> context)
{
Constraint constraint = new Constraint(sut);
Constraint result = constraint.NarrowDown(new Constraint(context));
Assert.That(result, Is.Null);
}
private static void RowTest_NarrowDown(IEnumerable<Range> sut, IEnumerable<Range> context, IEnumerable<Range> expected)
{
Constraint constraint = new Constraint(sut);
Constraint result = constraint.NarrowDown(new Constraint(context));
Assert.That(result, Is.Not.Null);
Assert.That(result.Bounds, Is.EquivalentTo(expected));
}
}
First, you could improve the name of your unit test NarrowDown_Works is extremely vague, and I can't tell what the class under test is supposed to be doing.
You have lots of assertions going on and lots of data, I can't tell what is important. Try to break your test into smaller tests and it will be easier to name them as well. If possible use one assertion per test.
Your construction of test data is quite complex, consider using matchers like NHamcrest to reduce the amount of assertion data you need instead of using Is.EquivalentTo.
You could also use a builder or factory constructors to to make the initialization simpler for the Constraint class simpler rather than passing in an array of Ranges.
You should use a data-driven approach with data factories (in NUnit-speak, they're called test case sources). This makes your tests a lot easier to read, understand, modify and maintain (or, more generally, a lot cleaner):
[TestFixture]
internal class ConstraintTests
{
static object[] TwoRanges =
{
new object[]
{
new[] { new Range(0, 10), new Range(20, 30), new Range(40, 50) },
new[] { new Range(1, 9), new Range(21, 29), new Range(41, 49), new Range(60, 70) }
}
};
static object[] ThreeRanges =
{
new object[]
{
new[] { new Range(0, 10), new Range(20, 30), new Range(40, 50) },
new[] { new Range(1, 9), new Range(21, 29), new Range(41, 49) },
new[] { new Range(1, 9), new Range(21, 29), new Range(41, 49) }
},
new object[]
{
new[] { new Range(0, 10), new Range(20, 30), new Range(40, 50), new Range(60, 70) },
new[] { new Range(1, 9), new Range(21, 29), new Range(41, 49) },
new[] { new Range(1, 9), new Range(21, 29), new Range(41, 49) }
}
};
[Test, TestCaseSource("TwoRanges")]
public void NarrowDown_WhenCalledWithTwoRanges_GivesTheExpectedResult(IEnumerable<Range> sut, IEnumerable<Range> context)
{
Constraint constraint = new Constraint(sut);
Constraint result = constraint.NarrowDown(new Constraint(context));
Assert.That(result, Is.Null);
}
[Test, TestCaseSource("ThreeRanges")]
public void NarrowDown_WhenCalledWithThreeRanges_GivesTheExpectedResult(IEnumerable<Range> sut, IEnumerable<Range> context, IEnumerable<Range> expected)
{
Constraint constraint = new Constraint(sut);
Constraint result = constraint.NarrowDown(new Constraint(context));
Assert.That(result, Is.Not.Null);
Assert.That(result.Bounds, Is.EquivalentTo(expected));
}
}
See how much simpler your test methods have become now? Also, this will make each set of data from the originating test case source run in a separate test, so the whole thing won't fail only because one set of data causes a failure. Remember: A test should assert only one thing.
HTH!

C# LINQ - sort and group a Dictionary<string,DateTime> by date with maximum group size

I am looking to created batches from a Dictionary<string, DateTime> with the following constraints:
All items in the batch much share the same date
There can be no more than X items in a single batch. If there are more items with the same date, another batch must be created.
I have worked out the following logic, but was wondering if there was some other more succinct way of doing this with just linq.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace dictionary_sort_by_value_test
{
class Program
{
static void Main(string[] args)
{
int maxBatchSize = 3;
Dictionary<string, DateTime> secs = new Dictionary<string, DateTime>();
secs.Add("6571 JT", new DateTime(2011, 1, 10));
secs.Add("6572 JT", new DateTime(2011, 1, 12));
secs.Add("6573 JT", new DateTime(2011, 1, 12));
secs.Add("6574 JT", new DateTime(2011, 1, 12));
secs.Add("6575 JT", new DateTime(2011, 1, 10));
secs.Add("6576 JT", new DateTime(2011, 1, 11));
secs.Add("6577 JT", new DateTime(2011, 1, 11));
secs.Add("6578 JT", new DateTime(2011, 1, 11));
secs.Add("6579 JT", new DateTime(2011, 1, 11));
var sorted = secs.OrderBy(o => o.Value).GroupBy(o => o.Value);
foreach (var date in sorted)
{
Console.Write("\nNew batch at {0} \n", date.Key);
int batchsize = 0;
foreach (var sec in date)
{
if (batchsize < maxBatchSize)
{
Console.Write(" {0} {1} \n", sec.Key, sec.Value);
batchsize++;
}
else
{
Console.Write("\nNew batch at {0} \n", date.Key);
Console.Write(" {0} {1} \n", sec.Key, sec.Value);
batchsize = 1;
}
}
}
}
}
}
You group by your key, then inside the result you group by the item index divided by the desired chunk size.
var chunkSize = 3;
var sorted = secs
.OrderBy(kv => kv.Key)
.GroupBy(o => o.Value)
.Select(g => new {Chunks = g.Select((o,i) => new {Val = o, Index = i})
.GroupBy(item => item.Index / chunkSize)});
And displaying it:
foreach(var item in sorted.SelectMany(item => item.Chunks))
{
Console.WriteLine("New batch at " + item.First().Val.Value);
foreach(var element in item)
Console.WriteLine(element.Val.Key);
}
Not strictly using linq to solve your problems but a more succinct way of handling the iteration:
static void Main(string[] args)
{
int maxBatchSize = 3;
Dictionary<string, DateTime> secs = new Dictionary<string, DateTime>();
secs.Add("6571 JT", new DateTime(2011, 1, 10));
secs.Add("6572 JT", new DateTime(2011, 1, 12));
secs.Add("6573 JT", new DateTime(2011, 1, 12));
secs.Add("6574 JT", new DateTime(2011, 1, 12));
secs.Add("6575 JT", new DateTime(2011, 1, 10));
secs.Add("6576 JT", new DateTime(2011, 1, 11));
secs.Add("6577 JT", new DateTime(2011, 1, 11));
secs.Add("6578 JT", new DateTime(2011, 1, 11));
secs.Add("6574 JT", new DateTime(2011, 1, 11));
secs.Add("6579 JT", new DateTime(2011, 1, 11));
secs.Add("6580 JT", new DateTime(2011, 1, 11));
secs.Add("6581 JT", new DateTime(2011, 1, 11));
secs.Add("6582 JT", new DateTime(2011, 1, 11));
secs.Add("6583 JT", new DateTime(2011, 1, 11));
secs.OrderBy(o => o.Value).GroupBy(o => o.Value).ToList().ForEach(date =>
{
Console.Write("\nNew batch at {0} \n", date.Key);
int batchsize = 0;
foreach (var sec in date)
{
if (batchsize >= maxBatchSize)
{
Console.Write("\nNew batch at {0} \n", date.Key);
batchsize = 0;
}
Console.Write(" {0} {1} \n", sec.Key, sec.Value);
batchsize++;
}
});
Console.ReadLine();
}
You can do it with 2 GroupBys. First you group by DateTime, and then group by page. I had to specify the generic arguments explicitly, because the compiler was picking the wrong overload, and that made the query code longer.
var groups = secs.GroupBy<KeyValuePair<string, DateTime>, DateTime, string, Group>(
p => p.Value,
p => p.Key,
(d, g) => new Group {
Date = d,
Pages = g.Select((s, i) => new KeyValuePair<string, int>(s, i / maxBatchSize))
.GroupBy<KeyValuePair<string, int>, int, string, Page>(
p => p.Value,
p => p.Key,
(p, g2) => new Page { Id = p, Items = g2.ToList() }) });
foreach (var group in groups)
{
Console.WriteLine("Date: {0}", group.Date);
foreach (var page in group.Pages)
{
Console.WriteLine("Page: {0}", page.Id);
foreach (var key in page.Items)
Console.WriteLine(key);
}
}
As you can see, I had to define 2 classes because as I said, I had to specify the generic arguments, because using anonymous types made the overload resolution pick another overload.
class Group
{
public DateTime Date;
public IEnumerable<Page> Pages;
}
class Page
{
public int Id;
public IEnumerable<string> Items;
}
Hope this helps.

Resources