Is possible to know the size of a stream without using a terminal operation - java-8

I have 3 interfaces
public interface IGhOrg {
int getId();
String getLogin();
String getName();
String getLocation();
Stream<IGhRepo> getRepos();
}
public interface IGhRepo {
int getId();
int getSize();
int getWatchersCount();
String getLanguage();
Stream<IGhUser> getContributors();
}
public interface IGhUser {
int getId();
String getLogin();
String getName();
String getCompany();
Stream<IGhOrg> getOrgs();
}
and I need to implement Optional<IGhRepo> highestContributors(Stream<IGhOrg> organizations)
this method returns a IGhRepo with most Contributors(getContributors())
I tried this
Optional<IGhRepo> highestContributors(Stream<IGhOrg> organizations){
return organizations
.flatMap(IGhOrg::getRepos)
.max((repo1,repo2)-> (int)repo1.getContributors().count() - (int)repo2.getContributors().count() );
}
but it gives me the
java.lang.IllegalStateException: stream has already been operated upon or closed
I understand that count() is a terminal operation in Stream but I can't solve this problem, please help!
thanks

Is possible to know the size of a stream without using a terminal operation
No it's not, because streams can be infinite or generate output on demand. It's not necessary that they are backed by collections.
but it gives me the
java.lang.IllegalStateException: stream has already been operated upon or closed
That's becase you are returning the same stream instance on each method invocation. You should return a new Stream instead.
I understand that count() is a terminal operation in Stream but I can't solve this problem, please help!
IMHO you are misusing the streams here. Performance and simplicity wise it's much better that you return some Collection<XXX> instead of Stream<XXX>

NO.
This is not possible to know the size of a stream in java.
As mentioned in java 8 stream docs
No storage. A stream is not a data structure that stores elements;
instead, it conveys elements from a source such as a data structure,
an array, a generator function, or an I/O channel, through a pipeline
of computational operations.

You don't specify this, but it looks like some or possibly all of the interface methods that return Stream<...> values don't return a fresh stream each time they are called.
This seems problematic to me from an API point of view, as it means each of these streams, and a fair chunk of the object's functionality can be used at most once.
You may be able to solve the particular problem you are having by ensuring that the stream from each object is used only once in the method, something like this:
Optional<IGhRepo> highestContributors(Stream<IGhOrg> organizations) {
return organizations
.flatMap(IGhOrg::getRepos)
.distinct()
.map(repo -> new AbstractMap.SimpleEntry<>(repo, repo.getContributors().count()))
.max(Map.Entry.comparingByValue())
.map(Map.Entry::getKey);
}
Unfortunately it looks like you will now be stuck if you want to (for example) print a list of the contributors, as the stream returned from getContributors() for the returned IGhRepo has already been consumed.
You might want to consider having your implementation objects return a fresh stream each time a stream returning method is called.

You could keep a counter that is incremented per "iteration" using peek. In the example below the counter is incremented before every item is processed with doSomeLogic
final var counter = new AtomicInteger();
getStream().peek(item -> counter.incrementAndGet()).forEach(this::doSomeLogic);

Related

Spring Webflux: efficiently using Flux and/or Mono stream multiple times (possible?)

I have the method below, where I am calling several ReactiveMongoRepositories in order to receive and process certain documents. Since I am kind of new to Webflux, I am learning as I go.
To my feeling the code below doesn't feel very efficient, as I am opening multiple streams at the same time. This non-blocking way of writing code makes it complicated somehow to get a value from a stream and re-use that value in the cascaded flatmaps down the line.
In the example below I have to call the userRepository twice, since I want the user at the beginning and than later as well. Is there a possibility to do this more efficiently with Webflux?
public Mono<Guideline> addGuideline(Guideline guideline, String keycloakUserId) {
Mono<Guideline> guidelineMono = userRepository.findByKeycloakUserId(keycloakUserId)
.flatMap(user -> {
return teamRepository.findUserInTeams(user.get_id());
}).zipWith(instructionRepository.findById(guideline.getInstructionId()))
.zipWith(userRepository.findByKeycloakUserId(keycloakUserId))
.flatMap(objects -> {
User user = objects.getT2();
Instruction instruction = objects.getT1().getT2();
Team team = objects.getT1().getT1();
if (instruction.getTeamId().equals(team.get_id())) {
guideline.setAddedByUser(user.get_id());
guideline.setTeamId(team.get_id());
guideline.setDateAdded(new Date());
guideline.setGuidelineStatus(GuidelineStatus.ACTIVE);
guideline.setGuidelineSteps(Arrays.asList());
return guidelineRepository.save(guideline);
} else {
return Mono.error(new InstructionDoesntBelongOrExistException("Unable to add, since this Instruction does not belong to you or doesn't exist anymore!"));
}
});
return guidelineMono;
}
i'll post my earlier comment as an answer. If anyone feels like writing the correct code for it then go ahead.
i don't have access to an IDE current so cant write an example but you could start by fetching the instruction from the database.
Keep that Mono<Instruction> then you fetch your User and flatMap the User and fetch the Team from the database. Then you flatMap the team and build a Mono<Tuple> consisting of Mono<Tuple<User, Team>>.
After that you take your 2 Monos and use zipWith with a Combinator function and build a Mono<Tuple<User, Team, Instruction>> that you can flatMap over.
So basically fetch 1 item, then fetch 2 items, then Combinate into 3 items. You can create Tuples using the Tuples.of(...) function.

How to Extract the String value from MONO/FLUX -

I am new to reactor programming,and need some help on MONO/Flux
I have POJO class
Employee.java
class Employee {
String name
}
I have Mono being returned on hitting a service, I need to extract the name from Mono as a string.
Mono<Employee> m = m.map(value -> value.getName())
but this returns again a Mono but not a string. I need to extract String value from this Mono.
You should do something like this:
m.block().getName();
This solution doesn't take care of null check.
A standard approach would be:
Employee e = m.block();
if (null != e) {
e.getName();
}
But using flux you should proceed using something like this:
Mono.just(new Employee().setName("Kill"))
.switchIfEmpty(Mono.defer(() -> Mono.just(new Employee("Bill"))))
.block()
.getName();
Keep in mind that requesting for blocking operation should be avoided if possible: it blocks the flow
You should be avoiding block() because it will block indefinitely until a next signal is received.
You should not think of the reactive container as something that is going to provide your program with an answer. Instead, you need to give it whatever you want to do with that answer. For example:
employeeMono.subscribe(value -> whatYouWantToDoWithName(value.getName()));

There is no way to create a reference to stream & it’s not possible to reuse the same stream multiple times

Reading article about java 8 stream, and found
Java Streams are consumable, so there is no way to create a reference
to stream for future usage. Since the data is on-demand, it’s not
possible to reuse the same stream multiple times.
at the same time at the same article
//sequential stream
Stream<Integer> sequentialStream = myList.stream();
//parallel stream
Stream<Integer> parallelStream = myList.parallelStream();
What does it mean of "there is no way to create a reference to stream for future usage" ? aren't sequentialStream and parallelStream references to streams
also what does it mean of "it’s not possible to reuse the same stream multiple times" ?
What it means is that every time you need to operate on a stream, you must make a new one.
So you cannot, for example, have something like:
Class Person {
private Stream<String> phoneNumbers;
Stream<String> getPhoneNumbers() {
return phoneNumbers;
}
}
and just reuse that one stream whenever you like. Instead, you must have something like
Class Person {
private List<String> phoneNumbers;
Stream<String> getPhoneNumbers() {
return phoneNumbers.stream(); // make a NEW stream over the same data
}
}
The code snipped you included does just that. It makes 2 different streams over the same data

Java8 streams map - check if all map operations succeeded?

I am trying to map one list to another using streams.
Some elements of the original list fail to map. That is, the mapping function may not be able to find an appropriate new value.
I want to know if any of the mappings has failed. Ideally I would also like to stop the processing once a failure happened.
What I am currently doing is:
The mapping function returns null if there's no mapped value
I filter() to remove nulls from the stream
I collect(), and then
I compare the size of the result to the size of the original list.
For example:
List<String> func(List<String> old, Map<String, String> oldToNew)
{
List<String> holger = old.stream()
.map(oldToNew::get)
.filter(Objects::nonNull)
.collect(Collectors.toList);
if (holger.size() < old.size()) {
// ... appropriate error handling code ...
}
else {
return holger;
}
}
This is not very elegant. Also, everything is processed even when the whole thing should fail.
Suggestions for a better way of doing it?
Or maybe I should ditch streams altogether and use good old loops?
There is no best solution because that heavily depends on the use case. E.g. if lookup failures are expected to be unlikely or the error handling implies throwing an exception anyway, just throwing an exception at the first failed lookup within the mapping function might indeed be a good choice. Then, no follow-up code has to care about error conditions.
Another way of handling it might be:
List<String> func(List<String> old, Map<String, String> oldToNew) {
Map<Boolean,List<String>> map=old.stream()
.map(oldToNew::get)
.collect(Collectors.partitioningBy(Objects::nonNull));
List<String> failed=map.get(false);
if(!failed.isEmpty())
throw new IllegalStateException(failed.size()+" lookups failed");
return map.get(true);
}
This can still be considered being optimized for the successful case as it collects a mostly meaningless list containing null values for the failures. But it has the point of being able to tell the number of failures (unlike using a throwing map function).
If a detailed error analysis has a high priority, you may use a solution like this:
List<String> func(List<String> old, Map<String, String> oldToNew) {
Map<Boolean,List<String>> map=old.stream()
.map(s -> new AbstractMap.SimpleImmutableEntry<>(s, oldToNew.get(s)))
.collect(Collectors.partitioningBy(e -> e.getValue()!=null,
Collectors.mapping(e -> Optional.ofNullable(e.getValue()).orElse(e.getKey()),
Collectors.toList())));
List<String> failed=map.get(false);
if(!failed.isEmpty())
throw new IllegalStateException("The following key(s) failed: "+failed);
return map.get(true);
}
It collects two meaningful lists, containing the failed keys for failed lookups and a list of successfully mapped values. Note that both lists could be returned.
You could change your filter to Objects::requireNonNull and catch a NullPointerException outside the stream

how to choose a field value from a specific stream in storm

public void execute(Tuple input) {
Object value = input.getValueByField(FIELD_NAME);
...
}
When calling getValueByField, how do I specify a particular stream name emitted by previous Bolt/Spout so that particular FIELD_NAME is coming from that stream?
I need to know this because I'm facing the following exception:
InvalidTopologyException(msg:Component: [bolt2-name] subscribes from non-existent stream: [default] of component [bolt1-name])
So, I want to specify a particular stream while calling getValueBy... methods.
I don't remember a way of doing it on a tuple, but you can get the information of who sent you the tuple:
String sourceComponent = tuple.getSourceComponent();
String streamId = tuple.getSourceStreamId();
Then you can use a classic switch/case in java to call a specific method that will know which fields are available.
You can also iterate through fields included in your tuple to check if the field is available but I find this way dirty.
for (String field : tuple.getFields()) {
// Check something on field...
}
Just found out that the binding to a specific stream could be done while building topology.
The Spout could declare fields to a stream (in declareOutputFields method)
declarer.declareStream(streamName, new Fields(field1, field2));
...and emit value to the stream
collector.emit(streamName, new Values(value1, value2...), msgID);
When Bolt is being added in the topology, it could subscribe to a specific stream from preceding spout or bolt like following
topologyBuilder.setBolt(boltId, new BoltClass(), parallelismLevel)
.localOrShuffleGrouping(spoutORBoltID, streamID);
The overloaded version of the method localOrShuffleGrouping provides an option to specify streamID as last argument.

Resources