Java8 streams map - check if all map operations succeeded? - java-8

I am trying to map one list to another using streams.
Some elements of the original list fail to map. That is, the mapping function may not be able to find an appropriate new value.
I want to know if any of the mappings has failed. Ideally I would also like to stop the processing once a failure happened.
What I am currently doing is:
The mapping function returns null if there's no mapped value
I filter() to remove nulls from the stream
I collect(), and then
I compare the size of the result to the size of the original list.
For example:
List<String> func(List<String> old, Map<String, String> oldToNew)
{
List<String> holger = old.stream()
.map(oldToNew::get)
.filter(Objects::nonNull)
.collect(Collectors.toList);
if (holger.size() < old.size()) {
// ... appropriate error handling code ...
}
else {
return holger;
}
}
This is not very elegant. Also, everything is processed even when the whole thing should fail.
Suggestions for a better way of doing it?
Or maybe I should ditch streams altogether and use good old loops?

There is no best solution because that heavily depends on the use case. E.g. if lookup failures are expected to be unlikely or the error handling implies throwing an exception anyway, just throwing an exception at the first failed lookup within the mapping function might indeed be a good choice. Then, no follow-up code has to care about error conditions.
Another way of handling it might be:
List<String> func(List<String> old, Map<String, String> oldToNew) {
Map<Boolean,List<String>> map=old.stream()
.map(oldToNew::get)
.collect(Collectors.partitioningBy(Objects::nonNull));
List<String> failed=map.get(false);
if(!failed.isEmpty())
throw new IllegalStateException(failed.size()+" lookups failed");
return map.get(true);
}
This can still be considered being optimized for the successful case as it collects a mostly meaningless list containing null values for the failures. But it has the point of being able to tell the number of failures (unlike using a throwing map function).
If a detailed error analysis has a high priority, you may use a solution like this:
List<String> func(List<String> old, Map<String, String> oldToNew) {
Map<Boolean,List<String>> map=old.stream()
.map(s -> new AbstractMap.SimpleImmutableEntry<>(s, oldToNew.get(s)))
.collect(Collectors.partitioningBy(e -> e.getValue()!=null,
Collectors.mapping(e -> Optional.ofNullable(e.getValue()).orElse(e.getKey()),
Collectors.toList())));
List<String> failed=map.get(false);
if(!failed.isEmpty())
throw new IllegalStateException("The following key(s) failed: "+failed);
return map.get(true);
}
It collects two meaningful lists, containing the failed keys for failed lookups and a list of successfully mapped values. Note that both lists could be returned.

You could change your filter to Objects::requireNonNull and catch a NullPointerException outside the stream

Related

Spring Webflux: efficiently using Flux and/or Mono stream multiple times (possible?)

I have the method below, where I am calling several ReactiveMongoRepositories in order to receive and process certain documents. Since I am kind of new to Webflux, I am learning as I go.
To my feeling the code below doesn't feel very efficient, as I am opening multiple streams at the same time. This non-blocking way of writing code makes it complicated somehow to get a value from a stream and re-use that value in the cascaded flatmaps down the line.
In the example below I have to call the userRepository twice, since I want the user at the beginning and than later as well. Is there a possibility to do this more efficiently with Webflux?
public Mono<Guideline> addGuideline(Guideline guideline, String keycloakUserId) {
Mono<Guideline> guidelineMono = userRepository.findByKeycloakUserId(keycloakUserId)
.flatMap(user -> {
return teamRepository.findUserInTeams(user.get_id());
}).zipWith(instructionRepository.findById(guideline.getInstructionId()))
.zipWith(userRepository.findByKeycloakUserId(keycloakUserId))
.flatMap(objects -> {
User user = objects.getT2();
Instruction instruction = objects.getT1().getT2();
Team team = objects.getT1().getT1();
if (instruction.getTeamId().equals(team.get_id())) {
guideline.setAddedByUser(user.get_id());
guideline.setTeamId(team.get_id());
guideline.setDateAdded(new Date());
guideline.setGuidelineStatus(GuidelineStatus.ACTIVE);
guideline.setGuidelineSteps(Arrays.asList());
return guidelineRepository.save(guideline);
} else {
return Mono.error(new InstructionDoesntBelongOrExistException("Unable to add, since this Instruction does not belong to you or doesn't exist anymore!"));
}
});
return guidelineMono;
}
i'll post my earlier comment as an answer. If anyone feels like writing the correct code for it then go ahead.
i don't have access to an IDE current so cant write an example but you could start by fetching the instruction from the database.
Keep that Mono<Instruction> then you fetch your User and flatMap the User and fetch the Team from the database. Then you flatMap the team and build a Mono<Tuple> consisting of Mono<Tuple<User, Team>>.
After that you take your 2 Monos and use zipWith with a Combinator function and build a Mono<Tuple<User, Team, Instruction>> that you can flatMap over.
So basically fetch 1 item, then fetch 2 items, then Combinate into 3 items. You can create Tuples using the Tuples.of(...) function.

Extracting all children belongs to specific parent in graphql

I am using GrapgQL and Java. I need to extract all the children belongs to specific parent. I have used the below way but it will fetch only the parent and it does not fetch any children.
schema {
query: Query
}
type LearningResource{
id: ID
name: String
type: String
children: [LearningResource]
}
type Query {
fetchLearningResource: LearningResource
}
#Component
public class LearningResourceDataFetcher implements DataFetcher{
#Override
public LearningResource get(DataFetchingEnvironment dataFetchingEnvironment) {
LearningResource lr3 = new LearningResource();
lr3.setId("id-03");
lr3.setName("Resource-3");
lr3.setType("Book");
LearningResource lr2 = new LearningResource();
lr2.setId("id-02");
lr2.setName("Resource-2");
lr2.setType("Paper");
LearningResource lr1 = new LearningResource();
lr1.setId("id-01");
lr1.setName("Resource-1");
lr1.setType("Paper");
List<LearningResource> learningResources = new ArrayList<>();
learningResources.add(lr2);
learningResources.add(lr3);
learningResource1.setChildren(learningResources);
return lr1;
}
}
return RuntimeWiring.newRuntimeWiring().type("Query", typeWiring -> typeWiring.dataFetcher("fetchLearningResource", learningResourceDataFetcher)).build();
My Controller endpoint
#RequestMapping(value = "/queryType", method = RequestMethod.POST)
public ResponseEntity query(#RequestBody String query) {
System.out.println(query);
ExecutionResult result = graphQL.execute(query);
System.out.println(result.getErrors());
System.out.println(result.getData().toString());
return ResponseEntity.ok(result.getData());
}
My request would be like below
{
fetchLearningResource
{
name
}
}
Can anybody please help me to sort this ?
Because I get asked this question a lot in real life, I'll answer it in detail here so people have easier time googling (and I have something to point at).
As noted in the comments, the selection for each level has to be explicit and there is no notion of an infinitely recursive query like get everything under a node to the bottom (or get all children of this parent recursively to the bottom).
The reason is mostly that allowing such queries could easily put you in a dangerous situation: a user would be able to request the entire object graph from the server in one easy go! For any non-trivial data size, this would kill the server and saturate the network in no time. Additionally, what would happen once a recursive relationship is encountered?
Still, there is a semi-controlled escape-hatch you could use here. If the scope in which you need everything is limited (and it really should be), you could map the output type of a specific query as a (complex) scalar.
In your case, this would mean mapping LearningResource as a scalar. Then, fetchLearningResource would effectively be returning a JSON blob, where the blob would happen to be all the children and their children recursively. Query resolution doesn't descent deeper once a scalar field is reached, as scalars are leaf nodes, so it can't keep resolving the children level-by-level. This means you'd have to recursively fetch everything in one go, by yourself, as GraphQL engine can't help you here. It also means sub-selections become impossible (as scalars can't have sub-selections - again, they're leaf nodes), so the client would always get all the children and all the fields from each child back. If you still need the ability to limit the selection in certain cases, you can expose 2 different queries e.g. fetchLearningResource and fetchAllLearningResources, where the former would be mapped as it is now, and the latter would return the scalar as explained.
An object scalar implementation is provided by the graphql-java ExtendedScalars project.
The schema could then look like:
schema {
query: Query
}
scalar Object
type Query {
fetchLearningResource: Object
}
And you'd use the method above to produce the scalar implementation:
RuntimeWiring.newRuntimeWiring()
.scalar(ExtendedScalars.Object) //register the scalar impl
.type("Query", typeWiring -> typeWiring.dataFetcher("fetchLearningResource", learningResourceDataFetcher)).build();
Depending on how you process the results of this query, the DataFetcher for fetchLearningResource may need to turn the resulting object into a map-of-maps (JSON-like object) before returning to the client. If you simply JSON-serialize the result anyway, you can likely skip this. Note that you're side-stepping all safety mechanisms here and must take care not to produce enormous results. By extension, if you need this in many places, you're very likely using a completely wrong technology for your problem.
I have not tested this with your code myself, so I might have skipped something important, but this should be enough to get you (or anyone googling) onto the right track (if you're sure this is the right track).
UPDATE: I've seen someone implement a custom Instrumentation that rewrites the query immediately after it's parsed, and adds all fields to the selection set if no field had already been selected, recursively. This effectively allows them to select everything implicitly.
In graphql-java v11 and prior, you could mutate the parsed query (represented by the Document class), but as of v12, it will no longer be possible, but instrumentations in turn gain the ability to replace the Document explicitly via the new instrumentDocument method.
Of course, this only makes sense if your schema is such that it can not be exploited or you fully control the client so there's no danger. You could also only do it selectively for some types, but it would be extremely confusing to use.

Java 8 JPA Repository Stream produce two (or more) results?

I have a Java 8 stream being returned by a Spring Data JPA Repository. I don't think my usecase is all that unusual, there are two (actually 3 in my case), collections off of the resulting stream that I would like collected.
Set<Long> ids = // initialized
try (Stream<SomeDatabaseEntity> someDatabaseEntityStream =
someDatabaseEntityRepository.findSomeDatabaseEntitiesStream(ids)) {
Set<Long> theAlphaComponentIds = someDatabaseEntityStream
.map(v -> v.getAlphaComponentId())
.collect(Collectors.toSet());
// operations on 'theAlphaComponentIds' here
}
I need to pull out the 'Beta' objects and do some work on those too. So I think I had to repeat the code, which seems completely wrong:
try (Stream<SomeDatabaseEntity> someDatabaseEntityStream =
someDatabaseEntityRepository.findSomeDatabaseEntitiesStream(ids)) {
Set<BetaComponent> theBetaComponents = someDatabaseEntityStream
.map(v -> v.getBetaComponent())
.collect(Collectors.toSet());
// operations on 'theBetaComponents' here
}
These two code blocks occur serially in the processing. Is there clean way to get both Sets from processing the Stream only once? Note: I do not want some kludgy solution that makes up a wrapper class for the Alpha's and Beta's as they don't really belong together.
You can always refactor code by putting the common parts into a method and turning the uncommon parts into parameters. E.g.
public <T> Set<T> getAll(Set<Long> ids, Function<SomeDatabaseEntity, T> f)
{
try(Stream<SomeDatabaseEntity> someDatabaseEntityStream =
someDatabaseEntityRepository.findSomeDatabaseEntitiesStream(ids)) {
return someDatabaseEntityStream.map(f).collect(Collectors.toSet());
}
}
usable via
Set<Long> theAlphaComponentIds = getAll(ids, v -> v.getAlphaComponentId());
// operations on 'theAlphaComponentIds' here
and
Set<BetaComponent> theBetaComponents = getAll(ids, v -> v.getBetaComponent());
// operations on 'theBetaComponents' here
Note that this pulls the “operations on … here” parts out of the try block, which is a good thing, as it implies that the associated resources are released earlier. This requires that BetaComponent can be processed independently of the Stream’s underlying resources (otherwise, you shouldn’t collect it into a Set anyway). For the Longs, we know for sure that they can be processed independently.
Of course, you could process the result out of the try block even without the moving the common code into a method. Whether the original code bears a duplication that requires this refactoring, is debatable. Actually, the operation consists a single statement within a try block that looks big only due to the verbose identifiers. Ask yourself, whether you would still deem the refactoring necessary, if the code looked like
Set<Long> alphaIDs, ids = // initialized
try(Stream<SomeDatabaseEntity> s = repo.findSomeDatabaseEntitiesStream(ids)) {
alphaIDs = s.map(v -> v.getAlphaComponentId()).collect(Collectors.toSet());
}
// operations on 'theAlphaComponentIds' here
Well, different developers may come to different conclusions…
If you want to reduce the number of repository queries, you can simply store the result of the query:
List<SomeDatabaseEntity> entities;
try(Stream<SomeDatabaseEntity> someDatabaseEntityStream =
someDatabaseEntityRepository.findSomeDatabaseEntitiesStream(ids)) {
entities=someDatabaseEntityStream.collect(Collectors.toList());
}
Set<Long> theAlphaComponentIds = entities.stream()
.map(v -> v.getAlphaComponentId()).collect(Collectors.toSet());
// operations on 'theAlphaComponentIds' here
Set<BetaComponent> theBetaComponents = entities.stream()
.map(v -> v.getBetaComponent()).collect(Collectors.toSet());
// operations on 'theBetaComponents' here

Is possible to know the size of a stream without using a terminal operation

I have 3 interfaces
public interface IGhOrg {
int getId();
String getLogin();
String getName();
String getLocation();
Stream<IGhRepo> getRepos();
}
public interface IGhRepo {
int getId();
int getSize();
int getWatchersCount();
String getLanguage();
Stream<IGhUser> getContributors();
}
public interface IGhUser {
int getId();
String getLogin();
String getName();
String getCompany();
Stream<IGhOrg> getOrgs();
}
and I need to implement Optional<IGhRepo> highestContributors(Stream<IGhOrg> organizations)
this method returns a IGhRepo with most Contributors(getContributors())
I tried this
Optional<IGhRepo> highestContributors(Stream<IGhOrg> organizations){
return organizations
.flatMap(IGhOrg::getRepos)
.max((repo1,repo2)-> (int)repo1.getContributors().count() - (int)repo2.getContributors().count() );
}
but it gives me the
java.lang.IllegalStateException: stream has already been operated upon or closed
I understand that count() is a terminal operation in Stream but I can't solve this problem, please help!
thanks
Is possible to know the size of a stream without using a terminal operation
No it's not, because streams can be infinite or generate output on demand. It's not necessary that they are backed by collections.
but it gives me the
java.lang.IllegalStateException: stream has already been operated upon or closed
That's becase you are returning the same stream instance on each method invocation. You should return a new Stream instead.
I understand that count() is a terminal operation in Stream but I can't solve this problem, please help!
IMHO you are misusing the streams here. Performance and simplicity wise it's much better that you return some Collection<XXX> instead of Stream<XXX>
NO.
This is not possible to know the size of a stream in java.
As mentioned in java 8 stream docs
No storage. A stream is not a data structure that stores elements;
instead, it conveys elements from a source such as a data structure,
an array, a generator function, or an I/O channel, through a pipeline
of computational operations.
You don't specify this, but it looks like some or possibly all of the interface methods that return Stream<...> values don't return a fresh stream each time they are called.
This seems problematic to me from an API point of view, as it means each of these streams, and a fair chunk of the object's functionality can be used at most once.
You may be able to solve the particular problem you are having by ensuring that the stream from each object is used only once in the method, something like this:
Optional<IGhRepo> highestContributors(Stream<IGhOrg> organizations) {
return organizations
.flatMap(IGhOrg::getRepos)
.distinct()
.map(repo -> new AbstractMap.SimpleEntry<>(repo, repo.getContributors().count()))
.max(Map.Entry.comparingByValue())
.map(Map.Entry::getKey);
}
Unfortunately it looks like you will now be stuck if you want to (for example) print a list of the contributors, as the stream returned from getContributors() for the returned IGhRepo has already been consumed.
You might want to consider having your implementation objects return a fresh stream each time a stream returning method is called.
You could keep a counter that is incremented per "iteration" using peek. In the example below the counter is incremented before every item is processed with doSomeLogic
final var counter = new AtomicInteger();
getStream().peek(item -> counter.incrementAndGet()).forEach(this::doSomeLogic);

Best practice for incorrect parameters on a remove method

So I have an abstract data type called RegionModel with a series of values (Region), each mapped to an index. It's possible to remove a number of regions by calling:
regionModel.removeRegions(index, numberOfRegionsToRemove);
My question is what's the best way to handle a call when the index is valid :
(between 0 (inclusive) and the number of Regions in the model (exclusive))
but the numberOfRegionsToRemove is invalid:
(index + regionsToRemove > the number of regions in the model)
Is it best to throw an exception like IllegalArgumentException or just to remove as many Regions as I can (all the regions from index to the end of the model)?
Sub-question: if I throw an exception what's the recommended way to unit test that the call threw the exception and left the model untouched (I'm using Java and JUnit here but I guess this isn't a Java specific question).
Typically, for structures like this, you have a remove method which takes an index and if that index is outside the bounds of the items in the structure, an exception is thrown.
That being said, you should be consistent with whatever that remove method that takes a single index does. If it simply ignores incorrect indexes, then ignore it if your range exceeds (or even starts before) the indexes of the items in your structure.
I agree with Mitchel and casperOne -- an Exception makes the most sense.
As far as unit testing is concerned, JUnit4 allows you to exceptions directly:
http://www.ibm.com/developerworks/java/library/j-junit4.html
You would need only to pass parameters which are guaranteed to cause the exception, and add the correct annotation (#Test(expected=IllegalArgumentException.class)) to the JUnit test method.
Edit: As Tom Martin mentioned, JUnit 4 is a decent-sized step away from JUnit 3. It is, however, possible to also test exceptions using JUnit 3. It's just not as easy.
One of the ways I've tested exceptions is by using a try/catch block within the class itself, and embedding Assert statements within it.
Here's a simple example -- it's not complete (e.g. regionModel is assumed to be instantiated), but it should get the idea across:
public void testRemoveRegionsInvalidInputs() {
int originalSize = regionModel.length();
int index = 0;
int numberOfRegionsToRemove = 1,000; // > than regionModel's current size
try {
regionModel.removeRegions(index, numberOfRegionsToRemove);
// Since the exception will immediately go into the 'catch' block this code will only run if the IllegalArgumentException wasn't thrown
Assert.assertTrue("Exception not Thrown!", false);
}
catch (IllegalArgumentException e) {
Assert.assertTrue("Exception thrown, but regionModel was modified", regionModel.length() == originalSize);
}
catch (Exception e) {
Assert.assertTrue("Incorrect exception thrown", false);
}
}
I would say that an argument such as illegalArgumentException would be the best way to go here. If the calling code was not passing a workable value, you wouldn't necessarily want to trust that they really wanted to remove what it had them do.

Resources