Java8 Nested Grouping with time intervals - java-8

I've the list in the following format:
List<MyObject> myObj = Arrays.asList(
new MyObject(time="2017-05-09T15:37:51.896+00:00", name="123", status=200)
new MyObject(time="2017-05-09T15:37:57.090+00:00", name="ABC", status=200)
new MyObject(time="2017-05-09T15:37:59.733+00:00", name="ABC", status=200)
new MyObject(time="2017-05-09T15:39:57.883+00:00", name="ABC", status=200)
new MyObject(time="2017-05-09T15:40:00.862+00:00", name="ABC", status=200)
new MyObject(time="2017-05-09T15:40:04.659+00:00", name="ABC", status=200)
new MyObject(time="2017-05-09T15:40:05.114+00:00", name="ABC", status=500)
new MyObject(time="2017-05-09T15:45:58.796+00:00", name="XYZ", status=200)
new MyObject(time="2017-05-09T15:46:00.562+00:00", name="XYZ", status=200)
new MyObject(time="2017-05-09T15:48:04.144+00:00", name="ABC", status=200)
new MyObject(time="2017-05-09T15:48:04.364+00:00", name="ABC", status=200)
new MyObject(time="2017-05-09T15:48:04.750+00:00", name="ABC", status=200)
new MyObject(time="2017-05-09T15:48:07.052+00:00", name="XYZ", status=202)
);
I wanted to iterate through that and perform some grouping with 1m interval and achieve as follows:
ABC
-> 15:37
-> 200 -> 2
-> 202 -> 0
-> 500 -> 0
-> 15:38
-> 200 -> 0
-> 202 -> 0
-> 500 -> 0
-> 15:39
-> 200 -> 1
-> 202 -> 0
-> 500 -> 0
-> 15:40
-> 200 -> 2
-> 202 -> 0
-> 500 -> 1
What I've tried so far is:
myObj.stream()
.collect(Collectors.groupingBy(MyObject::getName,
Collectors.groupingBy(MyObject::getTime)));
But this actually groups by name, then by time. But I want to have the 1m time interval and then group by status also.
Need help here since I'm novice to streams and lambdas in java.
EDIT: Please note that the getTime returns a String, not the Date.

This could be achieved via:
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("HH:mm");
Map<String, Map<String, Map<Integer, Integer>>> collect = myObj.stream()
.collect(Collectors.groupingBy(MyObject::getName, TreeMap::new,
Collectors.groupingBy(
myObject -> ZonedDateTime.parse(myObject.getTime()).format(formatter),
TreeMap::new,
Collectors.groupingBy(MyObject::getStatus, TreeMap::new,
Collectors.summingInt(i -> 1)
)
)
));
Outcome:
{
123={
15:37={200=1}
},
ABC={
15:37={200=2}, 15:39={200=1}, 15:40={200=2, 500=1}, 15:48={200=3}
},
XYZ={
15:45={200=1}, 15:46={200=1}, 15:48={202=1}
}
}

I understand that you require “empty” entries, both for the minutes where there are no objects for the name in question, and also for the statuses that don’t appear within that minute. Streams are great for processing the elements that are in the stream, but not good at inventing elements that are not already there.
My suggestion is you build the map structure from the elements you have using streams, and then fill in the missing empty entries afterward. The first part is quite like Flown’s answer.
First I suggest if you can fit your MyObject class with an extra getter in addition to the standard ones:
public LocalTime getWholeMinute() {
return ZonedDateTime.parse(time)
.withZoneSameInstant(ZoneOffset.UTC)
.toLocalTime()
.truncatedTo(ChronoUnit.MINUTES);
}
This will return the whole minute of the time. The method can return LocalDateTime if you need to distinguish times accross days (and you can always format it later to print only the time). If you cannot modify MyObject you may instead declare a static method that takes a MyObject, extracts the time string, performs the above calculation and returns the LocalTime or LocalDateTime. Either of the suggested methods will come in handy next:
final int[] statuses = { 200, 202, 500 };
if (! myObj.isEmpty()) {
LocalTime minTime = myObj.stream()
.map(MyObject::getWholeMinute)
.min(Comparator.naturalOrder())
.get();
LocalTime maxTime = myObj.stream()
.map(MyObject::getWholeMinute)
.max(Comparator.naturalOrder())
.get();
Map<String, Map<LocalTime, Map<Integer, Long>>> counts = myObj.stream()
.collect(Collectors.groupingBy(MyObject::getName,
Collectors.groupingBy(MyObject::getWholeMinute,
Collectors.groupingBy(MyObject::getStatus,
Collectors.counting()))));
for (Map.Entry<String, Map<LocalTime, Map<Integer, Long>>> outerEntry :
counts.entrySet()) {
Map<LocalTime, Map<Integer, Long>> middleMap = outerEntry.getValue();
LocalTime currentMinute = minTime;
while (! currentMinute.isAfter(maxTime)) {
// fill in missing map
Map<Integer, Long> innerMap
= middleMap.computeIfAbsent(currentMinute, t -> new HashMap<>(4));
// fill in missing counts
for (int status : statuses) {
innerMap.putIfAbsent(status, 0L);
}
currentMinute = currentMinute.plusMinutes(1);
}
}
// ...
}
The other approach I was thinking of was creating the map structure filled with zero counts first, and then iterating over the list and increasing the relevant count for each object encountered. I think it will take just a few more code lines and may be clearer since it will not use the many nested collectors.

Related

Filtering with comparing each element of a Flux to a single Mono

I am trying to use a Mono of a username to filter out every element of a Flux (the flux having multiple courses) and I am using Cassandra as backend , here is the schema:
CREATE TABLE main.courses_by_user (
course_creator text PRIMARY KEY,
courseid timeuuid,
description text,
enrollmentkey text
) WITH additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
course_creator | courseid | description | enrollmentkey
----------------+--------------------------------------+-------------+---------------
rascall | b7757e80-0c24-11ed-aec5-23fe9d87e512 | Cosmology | hubble
dibiasky | b7757e81-0c24-11ed-aec5-23fe9d87e512 | astronomy | thebigbang
michaeljburry | b7753060-0c24-11ed-aec5-23fe9d87e512 | Lol | enter
Noam Chomsky | 6c1a4800-09ac-11ed-ada9-83d934863d60 | Hi | Bye
I am using zipWith to pair the Flux of courses with the Mono of user, here is the code
public Flux<CourseByCreator> getMyCourses(Principal placeholder){
Mono<User> principal = Mono.just(new User("dibiasky", "whatifitsnotequal","Kate" ,"Dibiasky"));
return allCreatedCourses = this.courseByCreatorRepository.findAll()
.zipWith(principal)
.flatMap(tuple -> {
if(tuple.getT1().getCourseCreator().equals(tuple.getT2().getUsername())){
System.out.println(tuple.getT2().getUsername());
return Flux.just(tuple.getT1());
}else{
return Flux.empty();
}
}).log();
}
For some reason I am not getting an empty return result despite the user have one matching username and the courses have one matching course with the same creator
What am I doing wrong here?
Flux.zipWith:
Zip this Flux with another Publisher source, that is to say wait for both to emit one element and combine these elements once into a Tuple2. The operator will continue doing so until any of the sources completes.
The mono will emit one then complete.
I'd rather first resolve the principal, then filter the courses:
return Mono
.just(new User("dibiasky", "whatifitsnotequal","Kate" ,"Dibiasky"))
.flatMapMany(principal -> {
return courseByCreatorRepository
.findAll()
.filter(course -> couse.getCourseCreator().equals(principal.getUsername()));
})
.log();
public Flux<CourseByCreator> getMyCourses(Principal placeholder){
Mono<User> principal = Mono.just(new User("dibiasky", "whatifitsnotequal","Kate" ,"Dibiasky"));
return this.courseByCreatorRepository.findAll()
.filterWhen(course -> principal
.filter(user -> user.getUsername().equals(course. getCourseCreator()))
.hasElement()
)
.log();
}

Sorting a map of >50k entries takes too long. Is there a quicker way to sort a Map in dart?

I have a probability values returned from a neural network. The size of list returned is 50,257, so there are a lot of values. The list looks like[-126.32508850097656, -126.77257537841797, -127.69950866699219, -129.98387145996094, ......]
I need the top K values and their indices. So I converted the list to a Map:
final temp = outputLogits.asMap();
and then sorted them using:
var sortedKeys = temp.keys.toList(growable: false)
..sort((k1, k2) => temp[k2].compareTo(temp[k1]));
It produces the desired result, but the issue is that it takes way too long.
Am I doing this wrong? is there a more efficient way to get the same result?
further details:
The unsorted list looks like this:
[-126.32508850097656, -126.77257537841797, -127.69950866699219, -129.98387145996094, -128.03782653808594, -128.08395385742188, -126.33218383789062, -126.6927261352539, -127.6688232421875, -126.58303833007812, -127.32843017578125, -126.1390380859375, -126.54962158203125, -126.38087463378906, -127.82595825195312, -126.3281021118164, -125.81211853027344, -126.20887756347656, -125.95697784423828, -126.07755279541016, -126.35894012451172, -126.70021057128906, -127.03215026855469, -126.67304992675781, -126.92938995361328, -126.64434814453125, -128.20814514160156, -127.24195861816406, -128.25816345214844, -126.73397827148438, -127.62574768066406, -128.8334197998047, -124.46258544921875, -126.03125762939453, -126.18477630615234, -125.85749053955078, -126.11980438232422, -125.64325714111328, -126.06704711914062, -126.35154724121094, -124.83910369873047, -126.90412902832031, -126.02999877929688, -126.60641479492188, -125.97348022460938, -126.56074523925781, -126.58230590820312, -126.49268341064453, -128.5759735107422,
I need to find the top 40 probabilities, and their index and I achieve this using:
final temp = outputLogits.asMap(); // converts the above list to a Map<int, double>
// sort the map values descending
// then take the largest 40 values
var sortedKeys = temp.keys.toList(growable: false)
..sort((k1, k2) => temp[k2].compareTo(temp[k1]));
final Map<int, double> sortedMap = {};
for (final key in sortedKeys.take(40)) {
sortedMap[key] = temp[key];
}
after sorting this is what sortedMap looks like:
{198: -117.52079772949219, 383: -118.29053497314453, 887: -119.25838470458984, 1119: -119.66973876953125, 632: -119.74752807617188, 628: -119.87970733642578, 554: -119.88958740234375, 1081: -119.9058837890625, 843: -120.10496520996094, 317: -120.21776580810547, 2102: -120.23406982421875, 770: -120.31946563720703, 2293: -120.40717315673828, 1649: -120.44376373291016, 366: -120.47624969482422, 2080: -120.4794921875, 2735: -120.74302673339844, 3244: -120.89102935791016, 2893: -120.97686004638672, 314: -120.98660278320312, 5334: -121.00469970703125, 1318: -121.03706359863281, 679: -121.12769317626953, 1881: -121.14120483398438, 1629: -121.18737030029297, 50256: -121.19244384765625, 357: -121.22344207763672, 1550: -121.27531433105469, 775: -121.31112670898438, 7486: -121.3316421508789, 921: -121.37474060058594, 1114: -121.43411254882812, 2312: -121.43602752685547, 1675: -121.51364135742188, 4874: -121.5697021484375, 1867: -121.57322692871094, 1439: -121.60330963134766, 8989: -121.60348510742188, 1320: -121.604621
I need the top value and their respective index thats why converted to Map
Try the following:
void main() {
final temp = [
-126.32508850097656,
-126.77257537841797,
-127.69950866699219,
-129.98387145996094,
-128.03782653808594,
-128.08395385742188,
-126.33218383789062,
-126.6927261352539,
-127.6688232421875,
-126.58303833007812,
-127.32843017578125,
];
final filteredLogitsWithIndexes = Map.fromEntries(
(temp.asMap().entries.toList(growable: false)
..sort((e1, e2) => e2.value.compareTo(e1.value)))
.take(5));
print(filteredLogitsWithIndexes);
// {0: -126.32508850097656, 6: -126.33218383789062, 9: -126.58303833007812,
// 7: -126.6927261352539, 1: -126.77257537841797}
}
This should save you a lot of time since we don't need to make a lookup in the map for each comparison (since a MapEntry contains both key and value).

F# Type Provider slows down intellisense in Visual Studio 2017

I have a very simple type provider; all types are erased, the provided type has 2000 int readonly properties Tag1..Tag2000
let ns = "MyNamespace"
let asm = Assembly.GetExecutingAssembly()
let private newProperty t name getter isStatic = ProvidedProperty(name, t, getter, isStatic = isStatic)
let private newStaticProperty t name getter = newProperty t name (fun _ -> getter) true
let private newInstanceProperty t name getter = newProperty t name (fun _ -> getter) false
let private addStaticProperty t name getter (``type``:ProvidedTypeDefinition) = ``type``.AddMember (newStaticProperty t name getter); ``type``
let private addInstanceProperty t name getter (``type``:ProvidedTypeDefinition) = ``type``.AddMember (newInstanceProperty t name getter); ``type``
[<TypeProvider>]
type TypeProvider(config : TypeProviderConfig) as this =
inherit TypeProviderForNamespaces(config)
let provider = ProvidedTypeDefinition(asm, ns, "Provider", Some typeof<obj>, hideObjectMethods = true)
let tags = ProvidedTypeDefinition(asm, ns, "Tags", Some typeof<obj>, hideObjectMethods = true)
do [1..2000] |> Seq.iter (fun i -> addInstanceProperty typeof<int> (sprintf "Tag%d" i) <## i ##> tags |> ignore)
do provider.DefineStaticParameters([ProvidedStaticParameter("Host", typeof<string>)], fun name args ->
let provided = ProvidedTypeDefinition(asm, ns, name, Some typeof<obj>, hideObjectMethods = true)
addStaticProperty tags "Tags" <## obj() ##> provided |> ignore
provided
)
do this.AddNamespace(ns, [provider; tags])
Then a test project with two modules in separate files:
module Common
open MyNamespace
type Provided = Provider<"">
let providedTags = Provided.Tags
type LocalTags() =
member this.Tag1 with get() : int = 1
member this.Tag2 with get() : int = 2
.
.
member this.Tag1999 with get() : int = 1999
member this.Tag2000 with get() : int = 2000
let localTags = LocalTags()
module Tests
open Common
open Xunit
[<Fact>]
let ProvidedTagsTest () =
Assert.Equal<int>(providedTags.Tag1001, 1001)
[<Fact>]
let LocalTagsTest () =
Assert.Equal<int>(localTags.Tag100, 100)
Everything works as expected (tests execution included). The problem I have is with the design time behavior inside Visual Studio, while I write code. I expect to have some overhead due to the type provider, but the slowness seems frankly excessive. The times reported below are in seconds and refer to the time measured from pushing the dot (.) key until the intellisense property list appears on the screen
providedTags. -> 15
localTags. -> 5
If I comment out or remove the first test code lines (so to eliminate any references to the provided stuff), then I get
localTags. -> immediate
If the number of properties is greater, the time seems to increase exponentially, not linearly, so that at 10000 it becomes minutes.
Questions are:
Is this normal or am I doing something wrong?
Are there guidelines to achieve a faster response?
If someone is curious about why I need so many properties, I am trying to supply an instrument to data analysts so that they can write F# scripts and get data out of an historian database with more than 10000 tags in its schema.
Issue has been fixed by Don Syme, see
https://github.com/fsprojects/FSharp.TypeProviders.SDK/issues/220
and
https://github.com/fsprojects/FSharp.TypeProviders.SDK/pull/229

Joining twice the same stream

I would like use the joining collector twice on a same stream for produce a string like this Tea:5 - Coffee:3 - Money:10 .
Drink is enum with an Bigdecimal attribute (price).
currently I done like this :
Map<Drink, Long> groupByDrink = listOfDrinks.stream().collect(groupingBy(identity(),counting()));
String acc = groupByDrink.entrySet().stream().map(ite -> join(":", ite.getKey().code(), ite.getValue().toString())).collect(joining(" - "));
acc += " - Money:" + groupByDrink.entrySet().stream().map(ite -> ite.getKey().price().multiply(valueOf(ite.getValue()))).reduce(ZERO, BigDecimal::add);
I think, you are overusing new features.
join(":", ite.getKey().code(), ite.getValue().toString())
bears no advantage over the classical
ite.getKey().code()+":"+ite.getValue()
Besides that, I’m not sure what you mean with “use the joining collector twice on a same stream”. If you want to use the joining collector for the summary element as well, you have to concat it as stream before collecting:
String acc = Stream.concat(
groupByDrink.entrySet().stream()
.map(ite -> ite.getKey().code()+":"+ite.getValue()),
Stream.of("Money:" + groupByDrink.entrySet().stream()
.map(ite -> ite.getKey().price().multiply(valueOf(ite.getValue())))
.reduce(ZERO, BigDecimal::add).toString())
).collect(joining(" - "));

How can I create a Bacon.Property representing a property in a referenced object?

I have started playing with Bacon.js, and I have come across a problem I can't find any example for.
I have a set of nodes, which may reference other nodes.
It looks like this in imperative code:
alice = { name: "Alice" }
bob = { name: "Bob" }
carol = { name: "Carol" }
alice.likes = bob
bob.likes = carol
alice.likes.name //=> "Bob"
bob.name = "Bobby"
alice.likes.name //=> "Bobby"
alice.likes = carol
alice.likes.name //=> "Carol"
Now, I would like to have a Property for alice.likes.name, which has to change
whenever alice.likes points to a different object, or the name property of the
current object in alice.likes changes.
I have come up with the code below (LiveScript syntax), which correctly logs 3 messages: Bob, Bobby, Carol.
I'm using Bus for testing purposes.
mkBus = (initialValue) ->
bus = new Bacon.Bus()
property = bus.toProperty(initialValue)
property.push = (newValue) -> bus.push(newValue)
property
alice = { pName: mkBus("Alice"), pLikes: mkBus(null) }
bob = { pName: mkBus("Bob"), pLikes: mkBus(null) }
carol = { pName: mkBus("Carol"), pLikes: mkBus(null) }
alice.pLikes.onValue (person) ->
if person
person.pName.onValue (vName) ->
console.log vName
alice.pLikes.push(bob)
bob.pLikes.push(carol)
# Change name
bob.pName.push("Bobby")
# Change reference
alice.pLikes.push(carol)
Question: How can I make a Property that represents the name of alice.likes.name?
I.e:
nameOfAlicesFavourite.onValue (vName) ->
console.log vName
I'm new to FRP, so let me know if I'm doing something horribly wrong.
Thanks #Bergi for pointing me to flatMap.
flatMap creates a new stream for every event in the source. I used flatMapLatest, so that only the name changes from the latest liked person are transmitted on the output stream.
Here's the code:
nameOfAlicesFavourite = alice.pLikes.flatMapLatest (person) ->
if person
person.pName
else
Bacon.never!

Resources