Performance of thenComparing vs thenComparingInt - which to use? - sorting

I have a question, if I'm comparing ints, is there a performance difference in calling thenComparingInt(My::intMethod) vs thenComparing(My::intMethod), in other words, if I'm comparing differemt types, both reference and primitive, e.g. String, int, etc. Part of me just wants to say comparing().thenComparing().thenComparing() etc, but should I do comparing.thenComparing().thenComparingInt() if the 3rd call was comparing an int or Integer value?
I am assuming that comparing() and thenComparing() use the compareTo method to compare any given type behind the scenes or possibly for ints, the Integer.compare? I'm also assuming the answer to my original question may involve performance in that thenComparingInt would know an int is being passed in, whereas, thenComparing would have to autobox int to Integer then maybe cast to Object?
Also, another question whilst I think of it - is there a way of chaining method references, e.g. Song::getArtist::length where getArtist returns a string? Reason is I wanted to do something like this:
songlist.sort(
Comparator.comparing((Song s) -> s.getArtist().length()));
songlist.sort(
Comparator.comparing(Song::getArtist,
Comparator.comparingInt(String::length)));
songlist.sort(
Comparator.comparing(Song::getArtist, String::length));
Of the 3 examples, the top two compile but the bottom seems to throw a compilation error in Eclipse, I would have thought the 2nd argument of String::length was valid? But maybe not as it's expecting a Comparator not a function?

Question 1
I would think thenComparingInt(My::intMethod) might be better since it should avoid boxing, but you would have to try out both versions to see if it really makes a difference.
Question 2
songlist.sort(
Comparator.comparing(Song::getArtist, String::length));
Is invalid because the 2nd parameter should be a Comparator not a method that returns int.

Related

Refactoring Business Rule, Function Naming, Width, Height, Position X & Y

I am refactoring some business rule functions to provide a more generic version of the function.
The functions I am refactoring are:
DetermineWindowWidth
DetermineWindowHeight
DetermineWindowPositionX
DetermineWindowPositionY
All of them do string parsing, as it is a string parsing business rules engine.
My question is what would be a good name for the newly refactored function?
Obviously I want to shy away from a function name like:
DetermineWindowWidthHeightPositionXPositionY
I mean that would work, but it seems unnecessarily long when it could be something like:
DetermineWindowMoniker or something to that effect.
Function objective: Parse an input string like 1280x1024 or 200,100 and return either the first or second number. The use case is for data-driving test automation of a web browser window, but this should be irrelevant to the answer.
Question objective: I have the code to do this, so my question is not about code, but just the function name. Any ideas?
There are too little details, you should have specified at least the parameters and returns of the functions.
Have I understood correctly that you use strings of the format NxN for sizes and N,N for positions?
And that this generic function will have to parse both (and nothing else), and will return either the first or second part depending on a parameter of the function?
And that you'll then keep the various DetermineWindow* functions but make them all call this generic function?
If so:
Without knowing what parameters the generic function has it's even harder to help, but it's most likely impossible to give it a simple name.
Not all batches of code can be described by a simple name.
You'll most likely need to use a different construction if you want to have clear names. Here's an idea, in pseudo code:
ParseSize(string, outWidth, outHeight) {
ParsePair(string, "x", outWidht, outHeight)
}
ParsePosition(string, outX, outY) {
ParsePair(string, ",", outX, outY)
}
ParsePair(string, separator, outFirstItem, outSecondItem) {
...
}
And the various DetermineWindow would call ParseSize or ParsePosition.
You could also use just ParsePair, directly, but I thinks it's cleaner to have the two other functions in the middle.
Objects
Note that you'd probably get cleaner code by using objects rather than strings (a Size and a Position one, and probably a Pair one too).
The ParsePair code (adapted appropriately) would be included in a constructor or factory method that gives you a Pair out of a string.
---
Of course you can give other names to the various functions, objects and parameters, here I used the first that came to my mind.
It seems this question-answer provides a good starting point to answer this question:
Appropriate name for container of position, size, angle
A search on www.thesaurus.com for "Property" gives some interesting possible answers that provide enough meaningful context to the usage:
Aspect
Character
Characteristic
Trait
Virtue
Property
Quality
Attribute
Differentia
Frame
Constituent
I think ConstituentProperty is probably the most apt.

Understanding immutable composite types with fields of mutable types in Julia

Initial note: I'm working in Julia, but this question probably applies to many languages.
Setup: I have a composite type as follows:
type MyType
x::Vector{String}
end
I write some methods to act on MyType. For example, I write a method that allows me to insert a new element in x, e.g. function insert!(d::MyType, itemToInsert::String).
Question: Should MyType be mutable or immutable?
My understanding: I've read the Julia docs on this, as well as more general (and highly upvoted) questions on Stackoverflow (e.g. here or here), but I still don't really have a good handle on what it means to be mutable/immutable from a practical perspective (especially for the case of an immutable composite type, containing a mutable array of immutable types!)
Nonetheless, here is my attempt: If MyType is immutable, then it means that the field x must always point to the same object. That object itself (a vector of Strings) is mutable, so it is perfectly okay for me to insert new elements into it. What I am not allowed to do is try and alter MyType so that the field x points to an entirely different object. For example, methods that do the following are okay:
MyType.x[1] = "NewValue"
push!(MyType.x, "NewElementToAdd")
But methods that do the following are not okay:
MyType.x = ["a", "different", "string", "array"]
Is this right? Also, is the idea that the object that an immutable types field values are locked to are those that are created within the constructor?
Final Point: I apologise if this appears to duplicate other questions on SO. As stated, I have looked through them and wasn't able to get the understanding that I was after.
So here is something mind bending to consider (at least to me):
julia> immutable Foo
data::Vector{Float64}
end
julia> x = Foo([1.0, 2.0, 4.0])
Foo([1.0,2.0,4.0])
julia> append!(x.data, x.data); pointer(x.data)
Ptr{Float64} #0x00007ffbc3332018
julia> append!(x.data, x.data); pointer(x.data)
Ptr{Float64} #0x00007ffbc296ac28
julia> append!(x.data, x.data); pointer(x.data)
Ptr{Float64} #0x00007ffbc34809d8
So the data address is actually changing as the vector grows and needs to be reallocated! But - you can't change data yourself, as you point out.
I'm not sure there is a 100% right answer is really. I primarily use immutable for simple types like the Complex example in the docs in some performance critical situations, and I do it for "defensive programming" reasons, e.g. the code has no need to write to the fields of this type so I make it an error to do so. They are a good choice IMO whenever the type is a sort of an extension of a number, e.g. Complex, RGBColor, and I use them in place of tuples, as a kind of named tuple (tuples don't seem to perform well with Julia right now anyway, wheres immutable types perform excellently).

Variables Naming Conventions- var1,var2 or firstVar, secondVar?

In cases where you have 2 variables from the same type that play a similar roll, like for example a merge function of 2 arrays:
IntArray merge(IntArray array1,IntArray array2);
What do you think is the best (most readable, least error-prone) way to name the variables? var1,var2 or firstVar,secondVar? or maybe a different way?
If you express your opinion, I would be glad to hear the rational behind, especially regarding which is less error-prone.
Thanks in advance.
For C/C++, I think the following would be preferred:
IntArray merge(IntArray a, IntArray b);
Because it's simple, short and avoids using integers in the variable names.
Or the 'operator form':
IntArray merge(IntArray lhs, IntArray rhs);
(for left-hand side and right-hand side)
(alike:
IntArray operator+(IntArray lhs, IntArray rhs);
)
The 'numeric' variants, and camelCase are more like ECMAScript than C. But it's mostly a 'standard' created by people who simply used C/C++ and used naming like that (see plus template in C++ standard library for an example).
But you can't get much rationale behind it because it's simply a matter of taste. As long as the variables are equally important (i.e. merge(a, b) == merge(b, a)), it's just how they will referred from the implementation code.
For clarity, I use purpose of the variable as name instead of numeric number. It instantly tell me that the variable count is finite. For example:
function predicate(int subject, int object)
function merge_array(array firstArray, array secondArray)
Using (array1, array2) sounds wrong for me because it seems like there is array3, array4, ... while we know the variable/parameter count is fixed.
The arguments of a function should always be talking about who they are and about what they should be used for. If there's non difference between the two arrays (as the source and the destination, for example), maybe you should think about passing a list of arrays (named arraysToMerge or something like that).
Variable's name has to explain variable's purpose, always. Because code is (or at least should be) our real documentation.
In your case I would use something along the lines of var and anotherVar.
Such as:
IntArray merge(IntArray array, IntArray anotherArray);
Variables and parameters should be self-explanatory. I think the situation is described more clearly this way.

Initialize member variables in a method and not the constructor

I have a public method which uses a variable (only in the scope of the public method) I pass as a parameter we will call A, this method calls a private method multiple times which also requires the parameter.
At present I am passing the parameter every time but it looks weird, is it bad practice to make this member variable of the class or would the uncertainty about whether it is initialized out way the advantages of not having to pass it?
Simplified pseudo code:
public_method(parameter a)
do something with a
private_method(string_a, a)
private_method(string_b, a)
private_method(string_c, a)
private_method(String, parameter a)
do something with String and a
Additional information: parameter a is a read only map with over 100 entries and in reality I will be calling private_method about 50 times
I had this same problem myself.
I implemented it differently in 3 different contexts to see hands-on what are result using 3 different strategies, see below.
Note that I am type of programmer that makes many changes to the code always trying to improve it. Thus I settle only for the code that is amenable to changes, readbale, would you call this "flexible" code. I settle only for very clear code.
After experimentation, I came to these results:
Passing a as parameter is perfectly OK if you have one or two - short number - of such values. Passing in parmeters has very good visibility, clarity, clear passing lines, well visible lifetime (initialization points, destruction points), amenable to changes, easy to track.
If number of such values begin to grow to >= 5-6 values, I swithc to approach #3 below.
Passing values through class members -- did not do good to clarity of my code, eventually I got rid of it. It makes for less clear code. Code becomes muddled. I did not like it. It had no advantages.
As alternative to (1) and (2), I adopted Inner class approach, in cases when amount of such values is > 5 (which makes for too long argument list).
I pack those values into small Inner class and pass such object by reference as argument to all internal members.
Public function of a class usually creates an object of Inner class (I call is Impl or Ctx or Args) and passes it down to private functions.
This combines clarity of arg passing with brevity. It's perfect.
Good luck
Edit
Consider preparing array of strings and using a loop rather than writing 50 almost-identical calls. Something like char *strings[] = {...} (C/C++).
This really depends on your use case. Does 'a' represent a state that your application/object care about? Then you might want to make it a member of your object. Evaluate the big picture, think about maintenance, extensibility when designing structures.
If your parameter a is a of a class of your own, you might consider making the private_method a public method for the variable a.
Otherwise, I do not think this looks weird. If you only need a in just 1 function, making it a private variable of your class would be silly (at least to me). However, if you'd need it like 20 times I would do so :P Or even better, just make 'a' an object of your own that has that certain function you need.
A method should ideally not pass more than 7 parameters. Using the number of parameters more than 6-7 usually indicates a problem with the design (do the 7 parameters represent an object of a nested class?).
As for your question, if you want to make the parameter private only for the sake of passing between private methods without the parameter having anything to do with the current state of the object (or some information about the object), then it is not recommended that you do so.
From a performance point of view (memory consumption), reference parameters can be passed around as method parameters without any significant impact on the memory consumption as they are passed by reference rather than by value (i.e. a copy of the data is not created). For small number of parameters that can be grouped together you can use a struct. For example, if the parameters represent x and y coordinates of a point, then pass them in a single Point structure.
Bottomline
Ask yourself this question, does the parameter that you are making as a members represent any information (data) about the object? (data can be state or unique identification information). If the answer to his question is a clear no, then do not include the parameter as a member of the class.
More information
Limit number of parameters per method?
Parameter passing in C#

XPath: opposite of string() function?

In XPath it is possible to convert an object to string using the string() function. Now I want to convert the string back to an object.
I do understand it is not possible in some cases (for example for elements), because some information was lost. But it should be possible for simple types, like int or boolean.
I know, for numbers I can use number() function, but I want general mechanism which will work for any simple type variable.
Going to string is easy, because you've told it that you want a string.
Similarly, going to number is easy, because you've told it that you want a number.
But there is no generic way to say 'turn it back into x', because you haven't told it what x is.
(In other words, string() is like a cast like Java/C/C++/C# have. But there is no uncast.)
string() isn't an object serializer, so you can't deserialize.
Why do you want this? Perhaps there is another way of solving your problem.
If your object $x is the number 1234, then string($x) will be the string "1234".
If your object $x is a nodeset of 1000 XML elements, the first one being
<wibble><wobble>1<ping/>2</wobble>34</wibble>
then string($x) will be the string "1234".
The function is not a bijection, you can't have an inverse as many different values map to the same string.
In no language (that I know of) you can cast A to B and then call a magical function that reverts it back to whatever it was before you casted it.
The process of converting some data type into something else is always an unidirectional one - you lose the information what type it was before. That's because the new data type has no way of storing what it was before.
So, what are you trying to do? I strongly suspect that you ask this question because you are tackling a problem from the wrong end.

Resources