Faster implementation of Option.isEmpty? - performance

The implementation of isEmpty in Option is straightforward - here's a sketch:
abstract class Option[+A] { def isEmpty:Boolean }
object None extends Option[Nothing] { def isEmpty=true }
final class Some extends Option[+A] { def isEmpty=false }
isEmpty is used extremely heavily, including inside Option itself, so its performance is significant, even though it is so trivial.
I suspect it would be faster to implement it as:
abstract class Option[+A] { final def isEmpty = this eq None }
This implementation shouldn't require dereferencing the option or calling any methods on it, AFAIK - just a straightforward reference comparison.
I'd performance-test this, but JVM microbenchmarks are so tricky that I really have no confidence in my ability to create a meaningful result.
Are there any factors I'm overlooking?

Actually, you might be right. Using the following code:
sealed abstract class Opshun[+A] {
final def isEmpty = this eq Nun
def get: A
}
object Nun extends Opshun[Nothing] { def get = ??? }
case class Summ[+A](get: A) extends Opshun[A] {}
on the simplest possible test case (array of Option or Opshun), if all you do is test isEmpty, the pattern you suggested is 5x (!) faster, and you can verify this if you manually replace .isEmpty with eq None or a pattern match picking out None.
If you move to a more complex case where you test not-isEmpty and then get a stored value, the difference is less impressive (a third faster).
So the suggestion has merit; it's worth testing in a more official setting.
Note added in edit: this is with arrays large enough so that not everything fits in L2 cache. Either way is equally fast when it fits in L2.

(Based on Eugenes Zhulenev's comment)
It seems HotSpot compiler will do this optimization automatically, as there are only two subclasses of Option:
Polymorphism Performance Mysteries Explained:
The Server HotSpot compiler, according to Cliff Click, deals with bi-morphism as a special case of poly-morphism: "Where the server compiler can prove only two classes reach a call site, it will insert a type-check and then statically call both targets (which may then further inline, etc)."

Related

JAVA 8 Extract predicates as fields or methods?

What is the cleaner way of extracting predicates which will have multiple uses. Methods or Class fields?
The two examples:
1.Class Field
void someMethod() {
IntStream.range(1, 100)
.filter(isOverFifty)
.forEach(System.out::println);
}
private IntPredicate isOverFifty = number -> number > 50;
2.Method
void someMethod() {
IntStream.range(1, 100)
.filter(isOverFifty())
.forEach(System.out::println);
}
private IntPredicate isOverFifty() {
return number -> number > 50;
}
For me, the field way looks a little bit nicer, but is this the right way? I have my doubts.
Generally you cache things that are expensive to create and these stateless lambdas are not. A stateless lambda will have a single instance created for the entire pipeline (under the current implementation). The first invocation is the most expensive one - the underlying Predicate implementation class will be created and linked; but this happens only once for both stateless and stateful lambdas.
A stateful lambda will use a different instance for each element and it might make sense to cache those, but your example is stateless, so I would not.
If you still want that (for reading purposes I assume), I would do it in a class Predicates let's assume. It would be re-usable across different classes as well, something like this:
public final class Predicates {
private Predicates(){
}
public static IntPredicate isOverFifty() {
return number -> number > 50;
}
}
You should also notice that the usage of Predicates.isOverFifty inside a Stream and x -> x > 50 while semantically the same, will have different memory usages.
In the first case, only a single instance (and class) will be created and served to all clients; while the second (x -> x > 50) will create not only a different instance, but also a different class for each of it's clients (think the same expression used in different places inside your application). This happens because the linkage happens per CallSite - and in the second case the CallSite is always different.
But that is something you should not rely on (and probably even consider) - these Objects and classes are fast to build and fast to remove by the GC - whatever fits your needs - use that.
To answer, it's better If you expand those lambda expressions for old fashioned Java. You can see now, these are two ways we used in our codes. So, the answer is, it all depends how you write a particular code segment.
private IntPredicate isOverFifty = new IntPredicate<Integer>(){
public void test(number){
return number > 50;
}
};
private IntPredicate isOverFifty() {
return new IntPredicate<Integer>(){
public void test(number){
return number > 50;
}
};
}
1) For field case you will have always allocated predicate for each new your object. Not a big deal if you have a few instances, likes, service. But if this is a value object which can be N, this is not good solution. Also keep in mind that someMethod() may not be called at all. One of possible solution is to make predicate as static field.
2) For method case you will create the predicate once every time for someMethod() call. After GC will discard it.

Avoiding duplicate code when performing operation on different object properties

I have recently run into a problem which has had me thinking in circles. Assume that I have an object of type O with properties O.A and O.B. Also assume that I have a collection of instances of type O, where O.A and O.B are defined for each instance.
Now assume that I need to perform some operation (like sorting) on a collection of O instances using either O.A or O.B, but not both at any given time. My original solution is as follows.
Example -- just for demonstration, not production code:
public class O {
int A;
int B;
}
public static class Utils {
public static void SortByA (O[] collection) {
// Sort the objects in the collection using O.A as the key. Note: this is custom sorting logic, so it is not simply a one-line call to a built-in sort method.
}
public static void SortByB (O[] collection) {
// Sort the objects in the collection using O.B as the key. Same logic as above.
}
}
What I would love to do is this...
public static void SortAgnostic (O[] collection, FieldRepresentation x /* some non-bool, non-int variable representing whether to chose O.A or O.B as the sorting key */) {
// Sort by whatever "x" represents...
}
... but creating a new, highly-specific type that I will have to maintain just to avoid duplicating a few lines of code seems unnecessary to me. Perhaps I am incorrect on that (and I am sure someone will correct me if that statement is wrong :D), but that is my current thought nonetheless.
Question: What is the best way to implement this method? The logic that I have to implement is difficult to break down into smaller methods, as it is already fairly optimized. At the root of the issue is the fact that I need to perform the same operation using different properties of an object. I would like to stay away from using codes/flags/etc. in the method signature if possible so that the solution can be as robust as possible.
Note: When answering this question, please approach it from an algorithmic point of view. I am aware that some language-specific features may be suitable alternatives, but I have encountered this problem before and would like to understand it from a relatively language-agnostic viewpoint. Also, please do not constrain responses to sorting solutions only, as I have only chosen it as an example. The real question is how to avoid code duplication when performing an identical operation on two different properties of an object.
"The real question is how to avoid code duplication when performing an identical operation on two different properties of an object."
This is a very good question as this situation arises all the time. I think, one of the best ways to deal with this situation is to use the following pattern.
public class O {
int A;
int B;
}
public doOperationX1() {
doOperationX(something to indicate which property to use);
}
public doOperationX2() {
doOperationX(something to indicate which property to use);
}
private doOperationX(input ) {
// actual work is done here
}
In this pattern, the actual implementation is performed in a private method, which is called by public methods, with some extra information. For example, in this case, it can be
doOperationX(A), or doOperationX(B), or something like that.
My Reasoning: In my opinion this pattern is optimal as it achieves two main requirements:
It keeps the public interface descriptive and clear, as it keeps operations separate, and avoids flags etc that you also mentioned in your post. This is good for the client.
From the implementation perspective, it prevents duplication, as it is in one place. This is good for the development.
A simple way to approach this I think is to internalize the behavior of choosing the sort field to the class O itself. This way the solution can be language-agnostic.
The implementation in Java could be using an Abstract class for O, where the purpose of the abstract method getSortField() would be to return the field to sort by. All that the invocation logic would need to do is to implement the abstract method to return the desired field.
O o = new O() {
public int getSortField() {
return A;
}
};
The problem might be reduced to obtaining the value of the specified field from the given object so it can be use for sorting purposes, or,
TField getValue(TEntity entity, string fieldName)
{
// Return value of field "A" from entity,
// implementation depends on language of choice, possibly with
// some sort of reflection support
}
This method can be used to substitute comparisons within the sorting algorithm,
if (getValue(o[i], "A")) > getValue(o[j], "A"))
{
swap(i, j);
}
The field name can then be parametrized, as,
public static void SortAgnostic (O[] collection, string fieldName)
{
if (getValue(collection[i], fieldName)) > getValue(collection[j], fieldName))
{
swap(i, j);
}
...
}
which you can use like SortAgnostic(collection, "A").
Some languages allow you to express the field in a more elegant way,
public static void SortAgnostic (O[] collection, Expression fieldExpression)
{
if (getValue(collection[i], fieldExpression)) >
getValue(collection[j], fieldExpression))
{
swap(i, j);
}
...
}
which you can use like SortAgnostic(collection, entity => entity.A).
And yet another option can be passing a pointer to a function which will return the value of the field needed,
public static void SortAgnostic (O[] collection, Function getValue)
{
if (getValue(collection[i])) > getValue(collection[j]))
{
swap(i, j);
}
...
}
which given a function,
TField getValueOfA(TEntity entity)
{
return entity.A;
}
and passing it like SortAgnostic(collection, getValueOfA).
"... but creating a new, highly-specific type that I will have to maintain just to avoid duplicating a few lines of code seems unnecessary to me"
That is why you should use available tools like frameworks or other typo of code libraries that provide you requested solution.
When some mechanism is common that mean it can be moved to higher level of abstraction. When you can not find proper solution try to create own one. Think about the result of operation as not part of class functionality. The sorting is only a feature, that why it should not be part of your class from the beginning. Try to keep class as simple as possible.
Do not worry premature about the sense of having something small just because it is small. Focus on the final usage of it. If you use very often one type of sorting just create a definition of it to reuse it. You do not have to necessary create a utill class and then call it. Sometimes the base functionality enclosed in utill class is fair enough.
I assume that you use Java:
In your case the wheal was already implemented in person of Collection#sort(List, Comparator).
To full fill it you could create a Enum type that implement Comparator interface with predefined sorting types.

Does Java optimize new HashSet(someHashSet).contains() to be O(1)?

This class is an example of where the issue arises:
public class ContainsSet {
private static HashSet<E> myHashSet;
[...]
public static Set<E> getMyHashSet() {
return new HashSet<E>(myHashSet);
}
public static boolean doesMyHashSetContain(E e) {
return myHashSet.contains(e);
}
}
Now imagine two possible functions:
boolean method1() {
return ContainsSet.getMyHashSet().contains(someE);
}
boolean method2() {
return ContainsSet.doesMyHashSetContain(someE);
}
Now is my question whether or not method 1 will have the same time complexity after Java optimization as method 2.
(I used HashSet instead of just Set to emphasize that myHashSet.contains(someE) has complexity O(1).)
Without optimization it would not. Although .contains() has complexity O(1), the new HashSet<E>(myHashSet) has complexity O(n), which would give method 1 a complexity of O(n) + O(1) = O(n), which is horrible compared to the beloved O(1).
The reason why I this issue is imported is because I am teached not to return lists or sets if you will not allow an external class to change the contents of it. Returning a copy is an obvious solution, but it can be really time-consuming.
No, javac does not (and cannot) optimize this away. It's required to emit the byte code you describe in your source to this level. And the JVM will not be nearly intelligent enough to optimize this away. It's way too likely to have side effects to prove.
Don't return a copy of the HashSet if you want immutability. Wrap it in an unmodifiable wrapper: Collections.unmodifiableSet(myHashSet)
What can I say here but that creating a new HashSet and populating it via the constructor is expensive!
Java will not "optimize away" this work: Even though you and I know it would give the same result as "passing through" the contains() call, java can not know this.
No. That is beyond optimization. You returned a new object and you could use it in somewhere else, Java is not supposed to omit it. A new HashSet will be created.
This is not a good practice to return a copy. It's not only time consuming but also space consuming. As Sean said, you can wrap with unmodifiableSet or you can wrap it in your own class.
You can try this:
public static Set<E> getMyHashSet() {
return Collection.unmodifiableSortedSet(myHashSet);
}
Note: use that method will return a view of your set, not a copy.

What are the drawbacks of writing an algorithm in a class or in an object (in Scala)?

class A {
def algorithmImplementation (...) = { ... }
}
object A {
def algorithmImplementation (...) = { ... }
}
In which circumstances should the class be used and in which should the object be used (for implementing an algorithm, e.g. Dijkstra-Algorithm, as shown above) ?
Which criterias should be considered when making such a decision ?
At the moment, I can not really see what the beneftis of using a class are.
If you only have one implementation, this can largely be a judgement call. You mentioned Dijkstra's algorithm, which runs on a graph. Now you can write that algorithm to take a graph object as an explicit parameter. In that case, the algorithm would presumably appear in the Graph singleton object. Then it might be called as something like Graph.shortestPath(myGraph,fromNode,toNode).
Or you can write the algorithm in the Graph class, in which case it no longer takes the graph as an explicit parameter. Now it is called as something like myGraph.shortestPath(fromNode,toNode).
The latter case probably makes more sense when there is one main argument (eg, the graph), especially one that serves as a kind of context for the algorithm. But it may come down to which syntax you prefer.
However, if you have multiple implementations, the balance tips more toward the class approach, especially when the choice of which implementation is better depends on the choice of representation. For example, you might have two different implementations of shortestPath, one that works better on adjacency matrices and one that works better on adjacency lists. With the class approach, you can easily have two different graph classes for the two different representations, and each can have its own implementation of shortestPath. Then, when you call myGraph.shortestPath(fromNode,toNode), you automatically get the right implementation, even if you don't know whether myGraph uses adjacency matrices or adjacency lists. (This is kind of the whole point of OO.)
Classes can have subclasses which override implementation, objects cannot be subclassed.
Classes can also be type parametric where objects cannot be.
There's only ever one instance of the object, or at least one instance per container. A class has multiple instances. That means that the class can be parameterized with values
class A(param1 : int, param2 : int) {
def algorithmImplementation(arg : List[String]) = // use arg and params
}
And that can be reused like
val A42_13 = new A(42, 13)
val result1 = A42_13.algorithmImplementation(List("hello", "world"))
val result2 = A42_13.algorithmImplementation(List("goodbye", "cruel", "world"))
To bring all this home relative to your example of Djikstra's algorithm: imagine you want to write one implementation of the algorithm that is reusable across multiple node types. Then you might want to parameterize by the Node type, the type of metric used to measure distance, and the function used to calculate distance.
val Djikstra[Node, Metric <: Comparable[Metric]](distance : (Node, Node) => Metric) {
def compute(node : Node, nodes : Seq[Node]) : Seq[Metric] = {...}
}
You create one instance of Djikstra per distinct type of node/metric/distance function that you use in your program and reuse that instance without having to pass all that information in everytime you compute Djikstra.
In summary, classes are more flexible. Use them when you need the flexibility. Otherwise objects are fine.

What's so great about Func<> delegate?

Sorry if this is basic but I was trying to pick up on .Net 3.5.
Question: Is there anything great about Func<> and it's 5 overloads? From the looks of it, I can still create a similar delgate on my own say, MyFunc<> with the exact 5 overloads and even more.
eg: public delegate TResult MyFunc<TResult>() and a combo of various overloads...
The thought came up as I was trying to understand Func<> delegates and hit upon the following scenario:
Func<int,int> myDelegate = (y) => IsComposite(10);
This implies a delegate with one parameter of type int and a return type of type int. There are five variations (if you look at the overloads through intellisense). So I am guessing that we can have a delegate with no return type?
So am I justified in saying that Func<> is nothing great and just an example in the .Net framework that we can use and if needed, create custom "func<>" delegates to suit our own needs?
Thanks,
The greatness lies in establishing shared language for better communication.
Instead of defining your own delegate types for the same thing (delegate explosion), use the ones provided by the framework. Anyone reading your code instantly grasps what you are trying to accomplish.. minimizes the time to 'what is this piece of code actually doing?'
So as soon as I see a
Action = some method that just does something and returns no output
Comparison = some method that compares two objects of the same type and returns an int to indicate order
Converter = transforms Obj A into equivalent Obj B
EventHandler = response/handler to an event raised by some object given some input in the form of an event argument
Func = some method that takes some parameters, computes something and returns a result
Predicate = evaluate input object against some criteria and return pass/fail status as bool
I don't have to dig deeper than that unless it is my immediate area of concern. So if you feel the delegate you need fits one of these needs, use them before rolling your own.
Disclaimer: Personally I like this move by the language designers.
Counter-argument : Sometimes defining your delegate may help communicate intent better. e.g. System.Threading.ThreadStart over System.Action. So it’s a judgment call in the end.
The Func family of delegates (and their return-type-less cousins, Action) are not any greater than anything else you'd find in the .NET framework. They're just there for re-use so you don't have to redefine them. They have type parameters to keep things generic. E.g., a Func<T0,bool> is the same as a System.Predicate<T> delegate. They were originally designed for LINQ.
You should be able to just use the built-in Func delegate for any value-returning method that accepts up to 4 arguments instead of defining your own delegate for such a purpose unless you want the name to reflect your intention, which is cool.
Cases where you would absolutely need to define your delegate types include methods that accept more than 4 arguments, methods with out, ref, or params parameters, or recursive method signatures (e.g., delegate Foo Foo(Foo f)).
In addition to Marxidad's correct answer:
It's worth being aware of Func's related family, the Action delegates. Again, these are types overloaded by the number of type parameters, but declared to return void.
If you want to use Func/Action in a .NET 2.0 project but with a simple route to upgrading later on, you can cut and paste the declarations from my version comparison page. If you declare them in the System namespace then you'll be able to upgrade just by removing the declarations later - but then you won't be able to (easily) build the same code in .NET 3.5 without removing the declarations.
Decoupling dependencies and unholy tie-ups is one singular thing that makes it great. Everything else one can debate and claim to be doable in some home-grown way.
I've been refactoring slightly more complex system with an old and heavy lib and got blocked on not being able to break compile time dependency - because of the named delegate lurking on "the other side". All assembly loading and reflection didn't help - compiler would refuse to just cast a delegate() {...} to object and whatever you do to pacify it would fail on the other side.
Delegate type comparison which is structural at compile time turns nominal after that (loading, invoking). That may seem OK while you are thinking in terms of "my darling lib is going to be used forever and by everyone" but it doesn't scale to even slightly more complex systems. Fun<> templates bring a degree of structural equivalence back into the world of nominal typing . That's the aspect you can't achieve by rolling out your own.
Example - converting:
class Session (
public delegate string CleanBody(); // tying you up and you don't see it :-)
public static void Execute(string name, string q, CleanBody body) ...
to:
public static void Execute(string name, string q, Func<string> body)
Allows completely independent code to do reflection invocation like:
Type type = Type.GetType("Bla.Session, FooSessionDll", true);
MethodInfo methodInfo = type.GetMethod("Execute");
Func<string> d = delegate() { .....} // see Ma - no tie-ups :-)
Object [] params = { "foo", "bar", d};
methodInfo.Invoke("Trial Execution :-)", params);
Existing code doesn't notice the difference, new code doesn't get dependence - peace on Earth :-)
One thing I like about delegates is that they let me declare methods within methods like so, this is handy when you want to reuse a piece of code but you only need it within that method. Since the purpose here is to limit the scope as much as possible Func<> comes in handy.
For example:
string FormatName(string pFirstName, string pLastName) {
Func<string, string> MakeFirstUpper = (pText) => {
return pText.Substring(0,1).ToUpper() + pText.Substring(1);
};
return MakeFirstUpper(pFirstName) + " " + MakeFirstUpper(pLastName);
}
It's even easier and more handy when you can use inference, which you can if you create a helper function like so:
Func<T, TReturn> Lambda<T, TReturn>(Func<T, TReturn> pFunc) {
return pFunc;
}
Now I can rewrite my function without the Func<>:
string FormatName(string pFirstName, string pLastName) {
var MakeFirstUpper = Lambda((string pText) => {
return pText.Substring(0,1).ToUpper() + pText.Substring(1);
});
return MakeFirstUpper(pFirstName) + " " + MakeFirstUpper(pLastName);
}
Here's the code to test the method:
Console.WriteLine(FormatName("luis", "perez"));
Though it is an old thread I had to add that func<> and action<> also help us use covariance and contra variance.
http://msdn.microsoft.com/en-us/library/dd465122.aspx

Resources