Java-8 parallelStream(...) -> fill ArrayList - java-8

I have tried this code:
final List<ScheduleContainer> scheduleContainers = new ArrayList<>();
scheduleResponseContent.getSchedules().parallelStream().forEach(s -> scheduleContainers.addAll(s));
With parallelStream I get either an ArrayIndexOutOfBoundException or a NullpointerException because some entries in scheduleContainers are null.
With ... .stream()... everything works fine.
My question now would be if there is a possibiliy to fix this or did I misuse parallelStream?

Yes, you are misusing parallelStream. First of all, as you have already said twice in your previous question, you should use stream(), and not parallelStream(), by default. Going parallel has an intrinsic cost, that usually makes things less efficient than a simple sequential stream, unless you has a massive amount of data to process, and the process of each element takes time. You should have a performance problem, and measure if a parallel stream solves it, before using one. There's also a much bigger chance of screwing up with a parallel stream, as your post shows.
Read Should I always use a parallel stream when possible? for more arguments.
Second, this code is not thread-safe at all, since it uses several concurrent threads to add to a thread-unsafe ArrayList. It can be safe if you use collect() to create the final list for you instead of forEach() and add things to the list by yourself.
The code should be
List<ScheduleContainer> scheduleContainers =
scheduleResponseContent.getSchedules().
.stream()
.flatMap(s -> s.stream())
.collect(Collectors.toList());

Not sure about the cause of the error, but there are better ways to use the Stream API to create a List from multiple input Lists.
final List<ScheduleContainer> scheduleContainers =
scheduleResponseContent.getSchedules()
.parallelStream()
.flatMap(s->s.stream()) // assuming getSchedules() returns some
// Collection<ScheduleContainer>, based
// on your use of addAll(s)
.collect(Collectors.toList());

Related

Leveraging spring to reduce DB calls

I have a data piece that is:
foo{
string: one
string: two
list<string>: listOne
list<string>: listTwo
}
such that in the DB one is associated with multiple entries of listOne.
not much background, I'm at a loss as to where to even look for answers. I received feed back to try to eliminate a jdbctemplate.query during a code review with a "there may be a way to reduce this using #autowire".
no code to share, I just need a place to start looking for answers. I've been on the spring website and I don't see anything that looks like I can use it. and I didn't see any google results that resemble what I'm looking for.
I should probably preface this with the fact that I'm a new dev so even a simple answer is likely not something I've tried. so this came about because my query for listOne and listTwo are returning columns. so I first tried using a mapper with the jdbcTemplate.query() that returned a string. but jdbc didn't like that. so I ended up returning a list from the mapper. then jdbc turns those answers into a list>, I then afterwards loop through those list> to convert them to a list and store them in foo. in my mind an ideal solution allows me to combine the two queries and the mapper looks like (pseudo code):
public foo fooMapper implements<RowMapper>(){
foo.one = resultSet.get("thingOne")
foo.two = resultSet.get("thingTwo")
foo.listOne = resultSet.get("[a portion of the column]listThingOne")
foo.listTwo = resultSet.get("[a portion of the column]listThingTwo")
return foo;
}
it should be noted that the the result set is mono-directional, I found out when I tried using a string[] instead of a list.

How to return a viewEntryCollection in random order

I have the following code
var vec:ViewEntryCollection = database.getView("view").getAllEntriesByKey("Mykey",true)
how can I make "vec" in random order using SSJS (or java) so that I get a new order every time?
How about having a secondary sort column on the view with a formula of #Unique. Would need to refresh the view each time and performance may not be great if the view is big.
Considered the average collection size I would loop through the collection and add each item to a Java list or a JavaScript array.
If you go Java you can use Collections.shuffle.
If you go JavaScript you can use well established functions/algorithms
For better performance, do NOT keep collection entries in memory. First, make list/array of UNIDs from your view. That will be the slowest part. Then pick any random number and pick desired number of UNIDs from the list/array. Call getDocumentByUnid or initialize (say 10) datasources.

Get last value from incomplete observable

There is an incomplete observable which can have or not have a replay of n values. I would like to get the last value from it - or just the next one if there is none yet.
This works for first available value with first() and take(1) (example):
possiblyReplayedIncomplteObservable.first().toPromise().then(val => ...);
But for the last value both last() and takeLast(1) wait for observable completion - not the desirable behaviour here.
How can this be solved? Is there a specific operator for that?
I had a solution for ReplaySubject(2) that 'drains' the sequence to get the latest element and if the sequence is empty simply takes the last element, yet, it was cumbersome and did not scale well (for example, if you decide to increase the replay size to 3). I then remembered that Replay/Behavior subjects tend to be hard to manage when they are piped. The simplest solution to that is to create a 'shadow' sequence and pipe your ReplaySubject into it (instead of creating it by transformation/operation on your ReplaySubject), hence:
var subject$ = new Rx.ReplaySubject(3);
var lastValue$ = new Rx.ReplaySubject(1);
subject$.subscribe(lastValue$); // short hand for subject$.subscribe(v => lastValue$.next(v))
lastValue$.take(1).toPromise().then(...);
========== Old solutions, ignoring the ReplaySubject(2) =================
After reading the comment below, the correct code is:
Rx.Observable.combineLatest(possiblyReplayedIncomplteObservable).take(1).subscribe(...)
and not
Rx.Observable.combineLatest(possiblyReplayedIncomplteObservable).subscribe(...)
This is due to the fact the promise is a "one time" observable. I think the toPromise() code resolves the result only on completion.
The take(1) will not affect your original stream since it operates on the new stream which is created by combineLatest.
And actually, the simplest way is:
possiblyReplayedIncomplteObservable.take(1).toPromise().then(...)

groupingBy operation in Java-8

I'm trying to re-write famous example of Spark's text classification (http://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/) on Java 8.
I have a problem - in this code I'm making some data preparations for getting idfs of all words in all files:
termDocsRdd.collect().stream().flatMap(doc -> doc.getTerms().stream()
.map(term -> new ImmutableMap.Builder<String, String>()
.put(doc.getName(),term)
.build())).distinct()
And I'm stuck on the groupBy operation. (I need to group this by term, so each term must be a key and the value must be a sequence of documents).
In Scala this operation looks very simple - .groupBy(_._2).
But how can I do this in Java?
I tried to write something like:
.groupingBy(term -> term, mapping((Document) d -> d.getDocNameContainsTerm(term), toList()));
but it's incorrect...
Somebody knows how to write it in Java?
Thank You very much.
If I understand you correctly, you want to do something like this:
(import static java.util.stream.Collectors.*;)
Map<Term, Set<Document>> collect = termDocsRdd.collect().stream().flatMap(
doc -> doc.getTerms().stream().map(term -> new AbstractMap.SimpleEntry<>(doc, term)))
.collect(groupingBy(Map.Entry::getValue, mapping(Map.Entry::getKey, toSet())));
The use of Map.Entry/ AbstractMap.SimpleEntry is due to the absence of a standard Pair<K,V> class in Java-8. Map.Entry implementations can fulfill this role but at the cost of having unintuitive and verbose type and method names (regarding the task of serving as Pair implementation).
If you are using the current Eclipse version (I tested with LunaSR1 20140925) with its limited type inference, you have to help the compiler a little bit:
Map<Term, Set<Document>> collect = termDocsRdd.collect().stream().flatMap(
doc -> doc.getTerms().stream().<Map.Entry<Document,Term>>map(term -> new AbstractMap.SimpleEntry<>(doc, term)))
.collect(groupingBy(Map.Entry::getValue, mapping(Map.Entry::getKey, toSet())));

What kind of data structure will be best for storing a key-value pair where the value will be a String for some key and a List<String> for some keys?

For example, key 1 will have values "A","B","C" but key 2 will have value "D". If I use
Map<String, List<String>>
I need to populate the List<String> even when I have only single String value.
What data structure should be used in this case?
Map<String,List<String>> would be the standard way to do it (using a size-1 list when there is only a single item).
You could also have something like Map<String, Object> (which should work in either Java or presumably C#, to name two), where the value is either List<String> or String, but this would be fairly bad practice, as there are readability issue (you don't know what Object represents right off the bat from seeing the type), casting happens during runtime, which isn't ideal, among other things.
It does however depend what type of queries you plan to run. Map<String,Set<String>> might be a good idea if you plan of doing existence checks in the List and it can be large. Set<StringPair> (where StringPair is a class with 2 String members) is another consideration if there are plenty of keys with only 1 mapped value. There are plenty of solutions which would be more appropriate under various circumstances - it basically comes down to looking at the type of queries you want to perform and picking an appropriate structure according to that.

Resources