Java-8 sort a collection - java-8

Is there a better way of sorting a collection in Java-8 without checking first if collection is empty or null?
if (institutions != null && !institutions.isEmpty()) {
Collections.sort(institutions);
}

Though the question is old, just adding another way of doing it.
First of all, the collection shouldn't be null. If so:
institutions.sort(Comparator.comparing(Institutions::getId));

I can only think of 3 (4) ways:
Use a SortedSet (e.g. TreeSet) and insert it there. Elements will be sorted right away, however insertion time may be bad. Also, you can not have equal elements in there (e.g. 3x 1), so it might not be the best solution.
Then there is the normal Collections.sort(). You don't have to check that your list is empty, however you do have to make sure it is not null. Frankly though, do you ever have a use case where your list is null and you want to sort it? This sounds like it might be a bit of a design issue.
Finally you can use streams to return sorted streams. I wrote up a little test that measures the time of this:
public static void main(String[] args) {
List<Integer> t1 = new ArrayList<>();
List<Integer> t2 = new ArrayList<>();
List<Integer> t3 = new ArrayList<>();
for(int i = 0; i< 100_000_00; i++) {
int tmp = new Random().nextInt();
t1.add(tmp);
t2.add(tmp);
t3.add(tmp);
}
long start = System.currentTimeMillis();
t1.sort(null); // equivalent to Collections.sort() - in place sort
System.out.println("T1 Took: " + (System.currentTimeMillis() - start));
start = System.currentTimeMillis();
List<Integer> sortedT2 = t2.stream().sorted().collect(Collectors.toList());
System.out.println("T2 Took: " + (System.currentTimeMillis() - start));
start = System.currentTimeMillis();
List<Integer> sortedT3 = t3.parallelStream().sorted().collect(Collectors.toList());
System.out.println("T3 Took: " + (System.currentTimeMillis() - start));
}
Sorting random integers results in: (on my box obviously)
Collections.sort() -> 4163
stream.sorted() -> 4485
parallelStream().sorted() -> 1620
A few points:
Collections.sort() and List#sort will sort the existing list in place. The streaming API (both parallel and normal) will created new sorted lists.
Again - the stream can be empty, but it can't be null. It appears that parallel streams are the quickest, however you have to keep in mind the pitfalls of parallel streams. Read some info e.g. here: Should I always use a parallel stream when possible?
Finally, if you want to check for null before, you can write your own static helper, for example:
public static <T extends Comparable<? super T>> void saveSort(final List<T> myList) {
if(myList != null) {
myList.sort(null);
}
}
public static <T> void saveSort(final List<T> myList, Comparator<T> comparator) {
if(myList != null) {
myList.sort(comparator);
}
}
I hope that helps!
Edit: Another Java8 advantage for sorting is to supply your comparator as lambda:
List<Integer> test = Arrays.asList(4,2,1,3);
test.sort((i1, i2) -> i1.compareTo(i2));
test.forEach(System.out::println);

Related

Redis Bulk Fetch of 5-10 MB From HMSET

Use Case: our data structure is like below:
tp1 "i1" : {object hash}, "i2" : {object hash}
tp2 "i3" : {object hash}, "i4" : {object hash}
tp1 and tp2 are hmset keys. we are referring as tp keys.
Each tp key can have 100-200 records in it. And each hash has a size of 1-1.5 KB.
Below is our implementation with spring data:
public Map<String, Map<String, T>> getAllMulti(List<String> keys) {
long start = System.currentTimeMillis();
log.info("Redis pipeline fetch started with keys size :{}", keys.size());
Map<String, Map<String, T>> responseMap = new HashMap<>();
if (CollectionUtils.isNotEmpty(keys)) {
List<Object> resultSet = redisTemplate.executePipelined((RedisCallback<T>) connection -> {
for (String key : keys) {
connection.hGetAll(key.getBytes());
}
return null;
});
responseMap = IntStream.range(0, keys.size())
.boxed()
.collect(Collectors.toMap(keys::get, i -> (Map<String, T>) resultSet.get(i)));
}
long timeTaken = System.currentTimeMillis() - start;
log.info("Time taken in redis pipeline fetch: {}", timeTaken);
return responseMap;
}
Objective: Our objective is to load hashes of around 500-600 tp keys. We thought of using redis pipeline for this purpose. But as we are increasing the number of tp keys, the response time is increasing significantly. And it is not consistent also.
For response time improvement we have tried compression/messagePack, still no benefit.
One more solution we have tried, where we have partitioned our tpkeys into multiple partition and run the above implementation in parallel. Observation is if the number of tpkeys is small then the batch takes less time. if tpkeys size is increasing,time taken for the batch with same number of keys is increasing.
Any help/lead will be appreciated. Thanks

How to get new userinput in a stream while its running using Java8

I need to validate user input and if it doesn't meet the conditions then I need to replace it with correct input. So far I am stuck on two parts. Im fairly new to java8 and not so familiar with all the libraries so if you can give me advice on where to read up more on these I would appreciate it.
List<String> input = Arrays.asList(args);
List<String> validatedinput = input.stream()
.filter(p -> {
if (p.matches("[0-9, /,]+")) {
return true;
}
System.out.println("The value has to be positve number and not a character");
//Does the new input actually get saved here?
sc.nextLine();
return false;
}) //And here I am not really sure how to map the String object
.map(String::)
.validatedinput(Collectors.toList());
This type of logic shouldn't be done with streams, a while loop would be a good candidate for it.
First, let's partition the data into two lists, one list representing the valid inputs and the other representing invalid inputs:
Map<Boolean, List<String>> resultSet =
Arrays.stream(args)
.collect(Collectors.partitioningBy(s -> s.matches(yourRegex),
Collectors.toCollection(ArrayList::new)));
Then create the while loop to ask the user to correct all their invalid inputs:
int i = 0;
List<String> invalidInputs = resultSet.get(false);
final int size = invalidInputs.size();
while (i < size){
System.out.println("The value --> " + invalidInputs.get(i) +
" has to be positive number and not a character");
String temp = sc.nextLine();
if(temp.matches(yourRegex)){
resultSet.get(true).add(temp);
i++;
}
}
Now, you can collect the list of all the valid inputs and do what you like with it:
List<String> result = resultSet.get(true);

how to convert forEach to lambda

Iterator<Rate> rateIt = rates.iterator();
int lastRateOBP = 0;
while (rateIt.hasNext())
{
Rate rate = rateIt.next();
int currentOBP = rate.getPersonCount();
if (currentOBP == lastRateOBP)
{
rateIt.remove();
continue;
}
lastRateOBP = currentOBP;
}
how can i use above code convert to lambda by stream of java 8? such as list.stream().filter().....but i need to operation list.
The simplest solution is
Set<Integer> seen = new HashSet<>();
rates.removeIf(rate -> !seen.add(rate.getPersonCount()));
it utilizes the fact that Set.add will return false if the value is already in the Set, i.e. has been already encountered. Since these are the elements you want to remove, all you have to do is negating it.
If keeping an arbitrary Rate instance for each group with the same person count is sufficient, there is no sorting needed for this solution.
Like with your original Iterator-based solution, it relies on the mutability of your original Collection.
If you really want distinct and sorted as you say in your comments, than it is as simple as :
TreeSet<Rate> sorted = rates.stream()
.collect(Collectors.toCollection(() ->
new TreeSet<>(Comparator.comparing(Rate::getPersonCount))));
But notice that in your example with an iterator you are not removing duplicates, but only duplicates that are continuous (I've exemplified that in the comment to your question).
EDIT
It seems that you want distinct by a Function; or in simpler words you want distinct elements by personCount, but in case of a clash you want to take the max pos.
Such a thing is not yet available in jdk. But it might be, see this.
Since you want them sorted and distinct by key, we can emulate that with:
Collection<Rate> sorted = rates.stream()
.collect(Collectors.toMap(Rate::getPersonCount,
Function.identity(),
(left, right) -> {
return left.getLos() > right.getLos() ? left : right;
},
TreeMap::new))
.values();
System.out.println(sorted);
On the other hand if you absolutely need to return a TreeSet to actually denote that this are unique elements and sorted:
TreeSet<Rate> sorted = rates.stream()
.collect(Collectors.collectingAndThen(
Collectors.toMap(Rate::getPersonCount,
Function.identity(),
(left, right) -> {
return left.getLos() > right.getLos() ? left : right;
},
TreeMap::new),
map -> {
TreeSet<Rate> set = new TreeSet<>(Comparator.comparing(Rate::getPersonCount));
set.addAll(map.values());
return set;
}));
This should work if your Rate type has natural ordering (i.e. implements Comparable):
List<Rate> l = rates.stream()
.distinct()
.sorted()
.collect(Collectors.toList());
If not, use a lambda as a custom comparator:
List<Rate> l = rates.stream()
.distinct()
.sorted( (r1,r2) -> ...some code to compare two rates... )
.collect(Collectors.toList());
It may be possible to remove the call to sorted if you just need to remove duplicates.

Sorting for Azure DocumentDB

I want to use DocumentDB to store roughly 200.000 documents of the same type. The documents each get an integer id field and I would like to retrieve them paged, in reverse order (highest id first).
So recently I found out there is no sorting for DocumentDB (see also DocumentDB - query result order). Perhaps it is better to go for a different database (such as RavenDB) however, time is pressing and I want to avoid the cost of switching to another database.
The question:
I have been looking at implementing my own sorted index of the documents on the client side (ASP Web API 2). I was thinking of creating a SortedList of key(id) and value(document.selflink). Then I could create a Getter with parameters for count, offset and a predicate to filter the documents. Below I added a quick example.
I just have the feeling this is a bad idea; either slow, costing too many resources or can be better done another way. So I am open for implementation suggestions...
public class SortableDocumentDbRepository
{
private SortedList _sorted = new SortedList();
private readonly string _sortedPropertyName;
private DocumentCollection ReadOrCreateCollection(string databaseLink) {
DocumentCollection col = base.ReadOrCreateCollection(databaseLink);
var docs = Client.CreateDocumentQuery(Collection.DocumentsLink)
.AsEnumerable();
lock (_sorted.SyncRoot) {
foreach (Document doc in docs) {
var propVal = doc.GetPropertyValue<string>(_sortedPropertyName);
if (propVal != null) {
_sorted.Add(propVal, doc.SelfLink);
}
}
}
return col;
}
public List<T> GetItems<T>(int count, int offset, Expression<Func<T, bool>> predicate) {
List<T> result = new List<T>();
lock (_sorted.SyncRoot) {
var values = _sorted.GetValueList();
for (int i = offset; i < _sorted.Count; i++) {
var queryable = predicate != null ?
Client.CreateDocumentQuery<T>(values[i].ToString()).Where(predicate) :
Client.CreateDocumentQuery<T>(values[i].ToString());
T item = queryable.AsEnumerable().FirstOrDefault();
if (item == null || item.Equals(default(T))) continue;
result.Add(item);
if (result.Count >= count) return result;
}
}
return result;
}
}
Microsoft has implemented Sorting:
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sql-query-reference#bk_orderby_clause
Example: SELECT * FROM c ORDER BY c._ts DESC
As you mentioned, order by unfortunately isn't implemented yet.
Your approach looks reasonable to me.
I see you are using a predicate to narrow the query result set (pulling 200,000 records for any DB will be costly).
Since it looks like you are looking to order by id - you can also look in to setting up a range index on id allowing you to perform range queries (e.g. < and >) on the id and further narrow the query result set. There is also a range index included by default on the _ts (timestamp) system property on documents that may also be helpful in this context.
See: http://azure.microsoft.com/en-us/documentation/articles/documentdb-indexing-policies/

How to sort IEnumerable with limited result count? (another implementation of .OrderBy.Take)

I have a binary file which contains more than 100 millions of objects and I read the file using BinaryReader and return (Yield) the object (File reader and IEnumerable implementation is here: Performance comparison of IEnumerable and raising event for each item in source? )
One of object's properties indicates the object rank (like A5). Assume that I want to get sorted top n objects based on the property.
I saw the code for OrderBy function: it uses QuickSort algorithm. I tried to sort the IEnumerable result with OrderBy and Take(n) function together, but I got OutOfMemory exception, because OrderBy function creates an array with size of total objects count to implement Quicksort.
Actually, the total memory I need is n so there is no need to create a big array. For instance, if I get Take(1000) it will return only 1000 objects and it doesn't depend on the total count of whole objects.
How can I get the result of OrderBy function with Take function? In another word, I need a limited or blocked sorted list with the capacity which is defined by end-user.
If you want top N from ordered source with default LINQ operators, then only option is loading all items into memory, sorting them and selecting first N results:
items.Sort(condition).Take(N) // Out of memory
If you want to sort only top N items, then simply take items first, and sort them:
items.Take(N).Sort(condition)
UPDATE you can use buffer for keeping N max ordered items:
public static IEnumerable<T> TakeOrdered<T, TKey>(
this IEnumerable<T> source, int count, Func<T, TKey> keySelector)
{
Comparer<T, TKey> comparer = new Comparer<T,TKey>(keySelector);
List<T> buffer = new List<T>();
using (var iterator = source.GetEnumerator())
{
while (iterator.MoveNext())
{
T current = iterator.Current;
if (buffer.Count == count)
{
// check if current item is less than minimal buffered item
if (comparer.Compare(current, buffer[0]) <= 0)
continue;
buffer.Remove(buffer[0]); // remove minimual item
}
// find index of current item
int index = buffer.BinarySearch(current, comparer);
buffer.Insert(index >= 0 ? index : ~index, current);
}
}
return buffer;
}
This solution also uses custom comparer for items (to compare them by keys):
public class Comparer<T, TKey> : IComparer<T>
{
private readonly Func<T, TKey> _keySelector;
private readonly Comparer<TKey> _comparer = Comparer<TKey>.Default;
public Comparer(Func<T, TKey> keySelector)
{
_keySelector = keySelector;
}
public int Compare(T x, T y)
{
return _comparer.Compare(_keySelector(x), _keySelector(y));
}
}
Sample usage:
string[] items = { "b", "ab", "a", "abcd", "abc", "bcde", "b", "abc", "d" };
var top5byLength = items.TakeOrdered(5, s => s.Length);
var top3byValue = items.TakeOrdered(3, s => s);
LINQ does not have a built-in class that lets you take the top n elements without loading the whole collection into memory, but you can definitely build it yourself.
One simple approach would be using a SortedDictionary of lists: keep adding elements to it until you hit the limit of n. After that, check each element that you are about to add with the smallest element that you have found so far (i.e. dict.Keys.First()). If the new element is smaller, discard it; otherwise, remove the smallest element, and add a new one.
At the end of the loop your sorted dictionary will have at most n elements, and they would be sorted according to the comparator that you set on the dictionary.

Resources