Comparator.compareBoolean() the same as Comparator.compare()? - java-8

How can I write this
Comparator <Item> sort = (i1, i2) -> Boolean.compare(i2.isOpen(), i1.isOpen());
to something like this (code does not work):
Comparator<Item> sort = Comparator.comparing(Item::isOpen).reversed();
Comparing method does not have something like Comparator.comparingBool(). Comparator.comparing returns int and not "Item".

Why can't you write it like this?
Comparator<Item> sort = Comparator.comparing(Item::isOpen);
Underneath Boolean.compareTo is called, which in turn is the same as Boolean.compare
public static int compare(boolean x, boolean y) {
return (x == y) ? 0 : (x ? 1 : -1);
}
And this: Comparator.comparing returns int and not "Item". make little sense, Comparator.comparing must return a Comparator<T>; in your case it correctly returns a Comparator<Item>.

The overloads comparingInt, comparingLong, and comparingDouble exist for performance reasons only. They are semantically identical to the unspecialized comparing method, so using comparing instead of comparingXXX has the same outcome, but might having boxing overhead, but the actual implications depend on the particular execution environment.
In case of boolean values, we can predict that the overhead will be negligible, as the method Boolean.valueOf will always return either Boolean.TRUE or Boolean.FALSE and never create new instances, so even if a particular JVM fails to inline the entire code, it does not depend on the presence of Escape Analysis in the optimizer.
As you already figured out, reversing a comparator is implemented by swapping the argument internally, just like you did manually in your lambda expression.
Note that it is still possible to create a comparator fusing the reversal and an unboxed comparison without having to repeat the isOpen() expression:
Comparator<Item> sort = Comparator.comparingInt(i -> i.isOpen()? 0: 1);
but, as said, it’s unlikely to have a significantly higher performance than the Comparator.comparing(Item::isOpen).reversed() approach.
But note that if you have a boolean sort criteria and care for the maximum performance, you may consider replacing the general-purpose sort algorithm with a bucket sort variant. E.g.
If you have a Stream, replace
List<Item> result = /* stream of Item */
.sorted(Comparator.comparing(Item::isOpen).reversed())
.collect(Collectors.toList());
with
Map<Boolean,List<Item>> map = /* stream of Item */
.collect(Collectors.partitioningBy(Item::isOpen,
Collectors.toCollection(ArrayList::new)));
List<Item> result = map.get(true);
result.addAll(map.get(false));
or, if you have a List, replace
list.sort(Comparator.comparing(Item::isOpen).reversed());
with
ArrayList<Item> temp = new ArrayList<>(list.size());
list.removeIf(item -> !item.isOpen() && temp.add(item));
list.addAll(temp);
etc.

Use comparing using key extractor parameter:
Comparator<Item> comparator =
Comparator.comparing(Item::isOpen, Boolean::compare).reversed();

Related

Is there any performance benefit of using Arrays.stream() over iterating on an array?

I need to iterate on all the enum values, check if they were used to construct an int (called input) and if so, add them to a Set (called usefulEnums). I can either use streams API or iterate over all the enums to do this task. Is there any benefit of using Arrays.stream() over the traditional approach of iterating over the values() array?
enum TestEnum { VALUE1, VALUE2, VALUE3 };
Set<TestEnum> usefulEnums = new HashSet<>();
Arrays.stream(TestEnum.values())
.filter(t -> (input & t.getValue()) != 0)
.forEach(usefulEnums::add);
for (TestEnum t : TestEnum.values()) {
if ((input & t.getValue()) != 0) {
usefulEnums.add(t);
}
}
If you care for efficiency, you should consider:
Set<TestEnum> usefulEnums = EnumSet.allOf(TestEnum.class);
usefulEnums.removeIf(t -> (input & t.getValue()) == 0);
Note that when you have to iterate over all enum constants of a type, using EnumSet.allOf(EnumType.class).stream() avoids the array creation of EnumType.values() entirely, however, most enum types don’t have enough constants for this to make a difference. Further, the JVM’s optimizer may remove the temporary array creation anyway.
But for this specific task, where the result is supposed to be a Set<TestEnum>, using an EnumSet instead of a HashSet may even improve subsequent operations working with the Set. Creating an EnumSet holding all constants and removing unintented constants like in the solution above, means just initializing a long with 0b111, followed by clearing the bits of nonmatching elements.
For this short operation the for loop is going to be faster (nano-seconds faster), but to me the stream operation is more verbose, it tells exactly what is being done here. It's like reading diagonally.
Also you could collect directly to a HashSet:
Arrays.stream(TestEnum.values())
.filter(t -> (input & t.getValue()) != 0)
.collect(Collectors.toCollection(HashSet::new));
Valuable input from Holger as usual makes this even nicer:
EnumSet<TestEnum> filtered = EnumSet.allOf(TestEnum.class).stream()
.filter(t -> (input & t.getValue()) != 0)
.collect(Collectors.toCollection(() -> EnumSet.noneOf(TestEnum.class)));

performance of many if statements/switch cases

If I had literally 1000s of simple if statements or switch statements
ex:
if 'a':
return 1
if 'b':
return 2
if 'c':
return 3
...
...
Would the performance of creating trivial if statements be faster when compared to searching a list for something? I imagined that because every if statement must be tested until the desired output is found (worst case O(n)) it would have the same performance if I were to search through a list. This is just an assumption. I have no evidence to prove this. I am curious to know this.
You could potentially put these things in to delegates that are then in a map, the key of which is the input you've specified.
C# Example:
// declare a map. The input(key) is a char, and we have a function that will return an
// integer based on that char. The function may do something more complicated.
var map = new Dictionary<char, Func<char, int>>();
// Add some:
map['a'] = (c) => { return 1; };
map['b'] = (c) => { return 2; };
map['c'] = (c) => { return 3; };
// etc... ad infinitum.
Now that we have this map, we can quite cleanly return something based on the input
public int Test(char c)
{
Func<char, int> func;
if(map.TryGetValue(c, out func))
return func(c);
return 0;
}
In the above code, we can call Test and it will find the appropriate function to call (if present). This approach is better (imho) than a list as you'd have to potentially search the entire list to find the desired input.
This depends on the language and the compiler/interpreter you use. In many interpreted languages, the performance will be the same, in other languages, the switch statements gives the compiler crucial additional information that it can use to optimize the code.
In C, for instance, I expect a long switch statement like the one you present to use a lookup table under the hood, avoiding explicit comparison with all the different values. With that, your switch decision takes the same time, no matter how many cases you have. A compiler might also hardcode a binary search for the matching case. These optimizations are typically not performed when evaluating a long else if() ladder.
In any case, I repeat, it depends on the interpreter/compiler: If your compiler optimized else if() ladders, but no switch statements, what it could do with a switch statement is quite irrelevant. However, for mainline languages, you should be able to expect all constructs to be optimized.
Apart from that, I advise to use a switch statement wherever applicable, it carries a lot more semantic information to the reader than an equivalent else if() ladder.

Good algorithm to turn stl map into sorted list of the keys based on a numeric value

I have a stl map that's of type:
map<Object*, baseObject*>
where
class baseObject{
int ID;
//other stuff
};
If I wanted to return a list of objects (std::list< Object* >), what's the best way to sort it in order of the baseObject.ID's?
Am I just stuck looking through for every number or something? I'd prefer not to change the map to a boost map, although I wouldn't be necessarily against doing something that's self contained within a return function like
GetObjectList(std::list<Object*> &objects)
{
//sort the map into the list
}
Edit: maybe I should iterate through and copy the obj->baseobj into a map of baseobj.ID->obj ?
What I'd do is first extract the keys (since you only want to return those) into a vector, and then sort that:
std::vector<baseObject*> out;
std::transform(myMap.begin(), myMap.end(), std::back_inserter(out), [](std::pair<Object*, baseObject*> p) { return p.first; });
std::sort(out.begin(), out.end(), [&myMap](baseObject* lhs, baseObject* rhs) { return myMap[lhs].componentID < myMap[rhs].componentID; });
If your compiler doesn't support lambdas, just rewrite them as free functions or function objects. I just used lambdas for conciseness.
For performance, I'd probably reserve enough room in the vector initially, instead of letting it gradually expand.
(Also note that I haven't tested the code, so it might need a little bit of fiddling)
Also, I don't know what this map is supposed to represent, but holding a map where both key and value types are pointers really sets my "bad C++" sense tingling. It smells of manual memory management and muddled (or nonexistent) ownership semantics.
You mentioned getting the output in a list, but a vector is almost certainly a better performing option, so I used that. The only situation where a list is preferable is really when you have no intention of ever iterating over it, and if you need the guarantee that pointers and iterators stay valid after modification of the list.
The first thing is that I would not use a std::list, but rather a std::vector. Now as of the particular problem you need to perform two operations: generate the container, sort it by whatever your criteria is.
// Extract the data:
std::vector<Object*> v;
v.reserve( m.size() );
std::transform( m.begin(), m.end(),
std::back_inserter(v),
[]( const map<Object*, baseObject*>::value_type& v ) {
return v.first;
} );
// Order according to the values in the map
std::sort( v.begin(), v.end(),
[&m]( Object* lhs, Object* rhs ) {
return m[lhs]->id < m[rhs]->id;
} );
Without C++11 you will need to create functors instead of the lambdas, and if you insist in returning a std::list then you should use std::list<>::sort( Comparator ). Note that this is probably inefficient. If performance is an issue (after you get this working and you profile and know that this is actually a bottleneck) you might want to consider using an intermediate map<int,Object*>:
std::map<int,Object*> mm;
for ( auto it = m.begin(); it != m.end(); ++it )
mm[ it->second->id ] = it->first;
}
std::vector<Object*> v;
v.reserve( mm.size() ); // mm might have less elements than m!
std::transform( mm.begin(), mm.end(),
std::back_inserter(v),
[]( const map<int, Object*>::value_type& v ) {
return v.second;
} );
Again, this might be faster or slower than the original version... profile.
I think you'll do fine with:
GetObjectList(std::list<Object*> &objects)
{
std::vector <Object*> vec;
vec.reserve(map.size());
for(auto it = map.begin(), it_end = map.end(); it != it_end; ++it)
vec.push_back(it->second);
std::sort(vec.begin(), vec.end(), [](Object* a, Object* b) { return a->ID < b->ID; });
objects.assign(vec.begin(), vec.end());
}
Here's how to do what you said, "sort it in order of the baseObject.ID's":
typedef std::map<Object*, baseObject*> MapType;
MapType mymap; // don't care how this is populated
// except that it must not contain null baseObject* values.
struct CompareByMappedId {
const MapType &map;
CompareByMappedId(const MapType &map) : map(map) {}
bool operator()(Object *lhs, Object *rhs) {
return map.find(lhs)->second->ID < map.find(rhs)->second->ID;
}
};
void GetObjectList(std::list<Object*> &objects) {
assert(objects.empty()); // pre-condition, or could clear it
// or for that matter return a list by value instead.
// copy keys into list
for (MapType::const_iterator it = mymap.begin(); it != mymap.end(); ++it) {
objects.push_back(it->first);
}
// sort the list
objects.sort(CompareByMappedId(mymap));
}
This isn't desperately efficient: it does more looking up in the map than is strictly necessary, and manipulating list nodes in std::list::sort is likely a little slower than std::sort would be at manipulating a random-access container of pointers. But then, std::list itself isn't very efficient for most purposes, so you expect it to be expensive to set one up.
If you need to optimize, you could create a vector of pairs of (int, Object*), so that you only have to iterate over the map once, no need to look things up. Sort the pairs, then put the second element of each pair into the list. That may be a premature optimization, but it's an effective trick in practice.
I would create a new map that had a sort criterion that used the component id of your objects. Populate the second map from the first map (just iterate through or std::copy in). Then you can read this map in order using the iterators.
This has a slight overhead in terms of insertion over using a vector or list (log(n) time instead of constant time), but it avoids the need to sort after you've created the vector or list which is nice.
Also, you'll be able to add more elements to it later in your program and it will maintain its order without need of a resort.
I'm not sure I completely understand what you're trying to store in your map but perhaps look here
The third template argument of an std::map is a less functor. Perhaps you can utilize this to sort the data stored in the map on insertion. Then it would be a straight forward loop on a map iterator to populate a list

Filtering subsets using Linq

Imagine a have a very long enunumeration, too big to reasonably convert to a list. Imagine also that I want to remove duplicates from the list. Lastly imagine that I know that only a small subset of the initial enumeration could possibly contain duplicates. The last point makes the problem practical.
Basically I want to filter out the list based on some predicate and only call Distinct() on that subset, but also recombine with the enumeration where the predicate returned false.
Can anyone think of a good idiomatic Linq way of doing this? I suppose the question boils down to the following:
With Linq how can you perform selective processing on a predicated enumeration and recombine the result stream with the rejected cases from the predicate?
You can do it by traversing the list twice, once to apply the predicate and dedup, and a second time to apply the negation of the predicate. Another solution is to write your own variant of the Where extension method that pushes non-matching entries into a buffer on the side:
IEnumerable<T> WhereTee(this IEnumerable<T> input, Predicate<T> pred, List<T> buffer)
{
foreach (T t in input)
{
if (pred(t))
{
yield return t;
}
else
{
buffer.Add(t);
}
}
}
Can you give a little more details on how you would like to recombine the elments.
One way i can think of solving this problem is by using the Zip operator of .Net 4.0 like this.
var initialList = new List<int>();
var resjectedElemnts = initialList.Where( x=> !aPredicate(x) );
var accepetedElements = initialList.Where( x=> aPredicate(x) );
var result = accepetedElements.Zip(resjectedElemnts,(accepted,rejected) => T new {accepted,rejected});
This will create a list of pair of rejected and accepeted elements. But the size of the list will be contrained by the shorter list between the two inputs.

An efficient technique to replace an occurence in a sequence with mutable or immutable state

I am searching for an efficient a technique to find a sequence of Op occurences in a Seq[Op]. Once an occurence is found, I want to replace the occurence with a defined replacement and run the same search again until the list stops changing.
Scenario:
I have three types of Op case classes. Pop() extends Op, Push() extends Op and Nop() extends Op. I want to replace the occurence of Push(), Pop() with Nop(). Basically the code could look like seq.replace(Push() ~ Pop() ~> Nop()).
Problem:
Now that I call seq.replace(...) I will have to search in the sequence for an occurence of Push(), Pop(). So far so good. I find the occurence. But now I will have to splice the occurence form the list and insert the replacement.
Now there are two options. My list could be mutable or immutable. If I use an immutable list I am scared regarding performance because those sequences are usually 500+ elements in size. If I replace a lot of occurences like A ~ B ~ C ~> D ~ E I will create a lot of new objects If I am not mistaken. However I could also use a mutable sequence like ListBuffer[Op].
Basically from a linked-list background I would just do some pointer-bending and after a total of four operations I am done with the replacement without creating new objects. That is why I am now concerned about performance. Especially since this is a performance-critical operation for me.
Question:
How would you implement the replace() method in a Scala fashion and what kind of data structure would you use keeping in mind that this is a performance-critical operation?
I am happy with answers that point me in the right direction or pseudo code. No need to write a full replace method.
Thank you.
Ok, some considerations to be made. First, recall that, on lists, tail does not create objects, and prepending (::) only creates one object for each prepended element. That's pretty much as good as you can get, generally speaking.
One way of doing this would be this:
def myReplace(input: List[Op], pattern: List[Op], replacement: List[Op]) = {
// This function should be part of an KMP algorithm instead, for performance
def compare(pattern: List[Op], list: List[Op]): Boolean = (pattern, list) match {
case (x :: xs, y :: ys) if x == y => compare(xs, ys)
case (Nil, Nil) => true
case _ => false
}
var processed: List[Op] = Nil
var unprocessed: List[Op] = input
val patternLength = pattern.length
val reversedReplacement = replacement.reverse
// Do this until we finish processing the whole sequence
while (unprocessed.nonEmpty) {
// This inside algorithm would be better if replaced by KMP
// Quickly process non-matching sequences
while (unprocessed.nonEmpty && unprocessed.head != pattern.head) {
processed ::= unprocessed.head
unprocessed = unprocessed.tail
}
if (unprocessed.nonEmpty) {
if (compare(pattern, unprocessed)) {
processed :::= reversedReplacement
unprocessed = unprocessed drop patternLength
} else {
processed ::= unprocessed.head
unprocessed = unprocessed.tail
}
}
}
processed.reverse
}
You may gain speed by using KMP, particularly if the pattern searched for is long.
Now, what is the problem with this algorithm? The problem is that it won't test if the replaced pattern causes a match before that position. For instance, if I replace ACB with C, and I have an input AACBB, then the result of this algorithm will be ACB instead of C.
To avoid this problem, you should create a backtrack. First, you check at which position in your pattern the replacement may happen:
val positionOfReplacement = pattern.indexOfSlice(replacement)
Then, you modify the replacement part of the algorithm this:
if (compare(pattern, unprocessed)) {
if (positionOfReplacement > 0) {
unprocessed :::= replacement
unprocessed :::= processed take positionOfReplacement
processed = processed drop positionOfReplacement
} else {
processed :::= reversedReplacement
unprocessed = unprocessed drop patternLength
}
} else {
This will backtrack enough to solve the problem.
This algorithm won't deal efficiently, however, with multiply patterns at the same time, which I guess is where you are going. For that, you'll probably need some adaptation of KMP, to do it efficiently, or, otherwise, use a DFA to control possible matchings. It gets even worse if you want to match both AB and ABC.
In practice, the full blow problem is equivalent to regex match & replace, where the replace is a function of the match. Which means, of course, you may want to start looking into regex algorithms.
EDIT
I was forgetting to complete my reasoning. If that technique doesn't work for some reason, then my advice is going with an immutable tree-based vector. Tree-based vectors enable replacement of partial sequences with low amount of copying.
And if that doesn't do, then the solution is doubly linked lists. And pick one from a library with slice replacement -- otherwise you may end up spending way too much time debugging a known but tricky algorithm.

Resources