Java 8- Multiple Group by Into Map of Collection - java-8

I'm trying to do a groupingBy on two attributes of an object with Java streams. That's easy enough as has been documented by some answers:
products.stream().collect(
Collectors.groupingBy(Product::getUpc,
Collectors.groupingBy(Product::getChannelIdentifier)));
for example, the above snippet will produce a Map of Maps in the form
Map<String, Map<String, List<Product>>>
Where a map has keys of UPC codes and its values are maps that have keys of Channel Identifiers which reference a list of products.
That's cool, but what if I don't need the nested value to be a map? That is to say, I want to organize the nested collection by ChannelIdentifier, but I only care about the .values() of the map, not the map itself. Is there a way to get a result that matches the following?
Map<String, List<List<Product>>
Lists or collections... it doesn't matter. Thanks!

The grouping operation unavoidably needs to maintain a Map as it has to track the key values for the grouping. But you can use the values() view directly:
Map<String, Collection<List<Product>>> m=products.stream().collect(
Collectors.groupingBy(Product::getUpc, Collectors.collectingAndThen(
Collectors.groupingBy(Product::getChannelIdentifier), Map::values)));
If the resulting map will have a longer lifetime and you want to reduce the required storage space, or if you need a List, you may copy the view into a list during that step:
Map<String, List<List<Product>>> map=products.stream().collect(
Collectors.groupingBy(Product::getUpc, Collectors.collectingAndThen(
Collectors.groupingBy(Product::getChannelIdentifier),
m -> new ArrayList<>(m.values()) )));

Related

Data Structure - Abstract data type VS Concrete data type

Abstract data type (ADT) : Organized data and operations on this data
Examples : Stack, Queue
what's meaning of Concrete data type (CDT)?
please explain by Examples.
One way to understand it is that an ADT is a specification of an object with certain methods.
For example if we talk about a List we are referring to an object that performs list operations such as:
add to the beginning
add to the end
insert at position
size
etc..
A "concrete data type" in this context would refer to the actual Data Structure you use to implement the list.
For example, one implementation of a List is to create nodes with a value and next pointer to point to the next node in the list.
Another is to have a value array, and a next array to tell you where the next node is (this is a more popular implementation for parallelism).
And yet another is to have a dynamic array (known as an ArrayList in Java) where you use an array till it fills up and then you duplicate it's size and copy the values to the new array.
So the concrete data type refers to the data structure actually being used, whereas the ADT is the abstract concept like List, Dictionary, Stack, Queue, Graph, etc..
There are many ways to implement an ADT.

How to achieve dynamic custom fields of different data type using gRPC proto

Looking for a solution in gRPC protobuff to implement dynamic fields of different datatypes for an multi-tenant application.
Also there can be any number of dynamic fields based on tenant.
Using map in proto, I can define different set of map for each data type. Is there any optimized way to achieve this.
Any help on this is appreciated.
There are a few different ways of transferring dynamic content in protobuf. Which is ideal varies depending on your use case. The options are ordered by their dynamism. Less dynamic options normally have better performance.
Use google.protobuf.Any in proto3. This is useful when you want to store arbitrary protobuf messages and is commonly used to provide extension points. It replaces extensions from proto2. Any has a child message and its type, so your application can check at runtime if it understands the type. If your application does not know the type, then it can copy the Any but can't decode its contents. Any cannot directly hold scalar types (like int32), but each scalar has a wrapper message that can be used instead. Because each Any includes the type of the message as a string, it is poorly suited if you need lots of them with small contents.
Use the JSON mapping message google.protobuf.Value. This is useful when you want to store arbitrary schemaless JSON data. Because it does not need to store the full type of its contents, a Value holding a ListValue of number_values (doubles) will be more compact on-the-wire than repeated Any. But if a schema is available, an Any containing a message with repeated double will be more compact on-the-wire than Value.
Use a oneof that contains each permitted type. Commonly a new message type is needed to hold the oneof. This is useful when you can restrict the schema but values have a relationship, like if the position of each value in a list is important and the types in the list are mixed. This is similar to Value but lets you choose your own types. While technically more powerful than Value it is typically used to produce a more constrained data structure. It is equal to or more compact on-the-wire than Value. This requires knowing the needed types ahead-of-time. Example: map<string, MyValue>, where MyValue is:
message MyValue {
oneof kind {
int32 int_value = 1;
string string_value = 2;
}
}
Use a separate field/collection for each type. For each type you can have a separate field in a protobuf message. This is the approach you were considering. This is the most compact on-the-wire and most efficient in memory. You must know the types you are interested in storing ahead of time. Example: map<string, int32> int_values = 1; map<string, string> string_values = 2.

How to join two publishers based on a common attribute and construct a single publisher out of it, in Spring reactor/ web flux?

Suppose I have two fluxes Flux<Class1> and Flux<Class2> and both Class1 and Class2 have a common attribute, say "id".
The use case is to join the two fluxes based on the common attribute "id" and construct a single Flux<Tuple<Class1, Class2>>, similar to joining two sql tables.
-There will always be a 1 to 1 match, for the attribute id, between the two fluxes.
-The fluxes won't contain more than 100 objects.
-The fluxes are not ordered by id.
How do I achieve this in Project Reactor/Spring web flux?
Assuming that:
both collections aren't very big (you can hold them in memory without risking OOM issues)
they're not sorted by id
each element in a collection has its counterpart in the other
First, you should make those Class1, Class2 implement Comparable or at least prepare a comparator implementation that you can use to sort them by their id.
Then you can use the zip operator for that:
Flux<Class1> flux1 = ...
Flux<Class2> flux2 = ...
Flux<Tuple2<Class1,Class2>> zipped = Flux.zip(flux1.sort(comparator1), flux2.sort(comparator2));
Tuple2 is a Reactor core class that lets you access each element of the Tuple like this
Tuple2<Class1,Class2> tuple = ...
Class1 klass1 = tuple.getT1();
Class2 klass2 = tuple.getT2();
In this case, sort will buffer all elements and this might cause memory/latency issues if the collections are large. Depending on how the ordering is done in those collections (let's say the ordering is not guaranteed, but those were batch inserted), you could also buffer some of them (using window) and do the sorting on each window (with sort).
Of course, ideally, being able to fetch both already sorted would avoid buffering data and would improve backpressure support in your application.
I think this should work with the following constraints:
the 2nd Flux needs to emit the same elements to all subscribers since it gets subscribed to over and over again.
this is basically the equivalent of a nested loop join so highly inefficient for large fluxes.
every element of the first Flux has a matching element in the second one.
flux1.flatMap(
f1 -> flux2.filter(f2 -> f2.id.equals(f1.id)).take(1)) // take the first with matching id
.map(f2 -> Tuple.of(f1,f2))) // convert to tuple.
writen without IDE. Consider pseudo code.

java customize a hashmap values

I am working on using a real time application in java, I have a data structure that looks like this.
HashMap<Integer, Object> myMap;
now this works really well for storing the data that I need but it kills me on getting data out. The underlying problems that I run into is that if i call
Collection<Object> myObjects = myMap.values();
Iterator<object> it = myObjects.iterator();
while(it.hasNext(){ object o = it.next(); }
I declare the iterator and collection as variable in my class, and assign them each iteration, but iterating over the collection is very slow. This is a real time application so need to iterate at least 25x per second.
Looking at the profiler I see that there is a new instance of the iterator being created every update.
I was thinking of two ways of possibly changing the hashmap to possibly fix my problems.
1. cache the iterator somehow although i'm not sure if that's possible.
2. possibly changing the return type of hashmap.values() to return a list instead of a collection
3. use a different data structure but I don't know what I could use.
If this is still open use Google Guava collections. They have things like multiMap for the structures you are defining. Ok, these might not be an exact replacement, but close:
From the website here: https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained
Every experienced Java programmer has, at one point or another, implemented a Map> or Map>, and dealt with the awkwardness of that structure. For example, Map> is a typical way to represent an unlabeled directed graph. Guava's Multimap framework makes it easy to handle a mapping from keys to multiple values. A Multimap is a general way to associate keys with arbitrarily many values.

What is the best data structure for this in-memory lookup table?

I need to store a lookup table as an instance member in one of my classes. The table will be initialized when the object is constructed. Each "row" will have 3 "columns":
StringKey (e.g., "car")
EnumKey (e.g., LookupKeys.Car)
Value (e.g, "Ths is a car.")
I want to pick the data structure that will yield the best performance for doing lookups either by the StringKey or the EnumKey.
It's kind of awkward having 2 keys for the same dictionary value. I've never encountered this before, so I'm wondering what the norm is for this type of thing.
I could make a Key/Value/Value structure instead of Key/Key/Value, but I'm wondering what type of performance impact that would have.
Am I thinking about this all wrong?
Well ... "Wrong" is a harsh way of putting it. I think that because the most common dictionary is "single key to value", and a lot of effort goes into providing efficient data structures for that (maps), it's often best to just use two of those, sharing the memory for the values if at all possible.
You have two hashmaps.
One from StringKey to value.
One from EnumKey to value.
You do not have to duplicate all the Value instances, those objects can be shared between the two hashmaps.
If it's a LOT of items, you might want to use two treemaps instead of two hashmaps. But the essential principle ("Share the Values") applies to both structures. One set of Values with two maps.
Is it really necessary to key into the same structure with both types of key? You probably don't need to rebuild a complex data structure yourself. You could do some sort of encapsulation for the lookup table so that you really have two lookup tables if memory is not an issue. You could use this encapsulating structure to simulate being able to pull out the value from the "same" structure with either type of key.
OR
If there is some way to map between the enum value and the string key you could go that route with only having one type of lookup table.
LINQ's ILookup(TKey, TElement) interface may help. Assuming your Dictionary is something like:
Dictionary<carKey, carValue> cars;
You could use:
ILookUp<carValue, carKey> lookup = cars.ToLookup(x => x.Value, x => x.Key);
(...actually I think I might have slightly misread the question - but an ILookUp might still fit the bill, but the key/value set might need to be the key and the enum.)
If every value is guaranteed to be accessible by both types of keys, another idea would be to convert one type of key to another. For example:
public Value getValue(String key)
{
dictionary.get(key); // normal way
}
public Value getValue(Enum enumKey)
{
String realKey = toKey(enumKey);
getValue(realKey); // use String key
}
You could have your Enum implement a toKey() method that returns their String key, or maybe have another dictionary that maps Enum values to the String counterparts.

Resources