performance issue variable assignment - performance

I am wondering, what scenario would be best ?
(Please bare with my examples, these are just small examples of the situation in question. I know you could have the exact same function without a result variable.)
A)
public String doSomthing(){
String result;
if(condition){ result = "Option A";}
else{ result = "Option B";}
return result;
}
B)
public String doSomthing(){
String result = "Option B";
if(condition){ result = " Option A";}
return result;
}
Cause in scenario B: if the condition is met, Then you would be assigning result a value twice.
Yet in code, i keep seeing scenario A.

Actually, the overhead here is minimal, if any, considering the compiler optimisations. You would not care about it in a professional coding environment, unless you are writing a compiler yourself.
What is more important, considering (modern) programming paradigms, is the code style and readability.
Example A is far more readable, as it has a well-presented reason-outcome hierarchy. This is important especially for big methods, as it saves the programmer lots of analysis time.

Related

Do Java 8 streams produce slower code than plain imperative loops?

There are too much hype about functional programming and particularly the new Java 8 streams API. It is advertised as good replacement for old good loops and imperative paradigm.
Indeed sometimes it could look nice and do the job well. But what about performance?
E.g. here is the good article about that: Java 8: No more loops
Using the loop you can do all the job with a one iteration. But with a new stream API you will chain multiple loops which make it much slower(is it right?).
Look at their first sample. Loop will do not walk even through the whole array in most cases. However to do the filtering with a new stream API you have to cycle through the whole array to filter out all candidates and then you will be able to get the first one.
In this article it was mentioned about some laziness:
We first use the filter operation to find all articles that have the Java tag, then used the findFirst() operation to get the first occurrence. Since streams are lazy and filter returns a stream, this approach only processes elements until it finds the first match.
What does author mean about that laziness?
I did simple test and it shows that old good loop solution works 10x fast then stream approach.
public void test() {
List<String> list = Arrays.asList(
"First string",
"Second string",
"Third string",
"Good string",
"Another",
"Best",
"Super string",
"Light",
"Better",
"For string",
"Not string",
"Great",
"Super change",
"Very nice",
"Super cool",
"Nice",
"Very good",
"Not yet string",
"Let's do the string",
"First string",
"Low string",
"Big bunny",
"Superstar",
"Last");
long start = System.currentTimeMillis();
for (int i = 0; i < 100000000; i++) {
getFirstByLoop(list);
}
long end = System.currentTimeMillis();
System.out.println("Loop: " + (end - start));
start = System.currentTimeMillis();
for (int i = 0; i < 100000000; i++) {
getFirstByStream(list);
}
end = System.currentTimeMillis();
System.out.println("Stream: " + (end - start));
}
public String getFirstByLoop(List<String> list) {
for (String s : list) {
if (s.endsWith("string")) {
return s;
}
}
return null;
}
public Optional<String> getFirstByStream(List<String> list) {
return list.stream().filter(s -> s.endsWith("string")).findFirst();
}
Results was:
Loop: 517
Stream: 5790
BTW if I will use String[] instead of List the difference will be even more! Almost 100x!
QUESTION: Should I use old loop imperative approach if I'm looking for the best code performance? Is FP paradigm is just to make code "more concise and readable" but not about performance?
OR
is there something I missed and new stream API could be at least the same as efficient as loop imperative approach?
QUESTION: Should I use old loop imperative approach if I'm looking for the best code performance?
Right now, probably yes. Various benchmarks seem to suggest that streams are slower than loops for most tests. Though not catastrophically slower.
Counter examples:
In some cases, parallel streams can give a useful speed up.
Lazy streams can provide performance benefits for some problems; see http://java.amitph.com/2014/01/java-8-streams-api-laziness.html
It is possible to do equivalent things with loops, you can't do it with just loops.
But the bottom line is that performance is complicated and streams are not (yet) a magic bullet for speeding up your code.
Is FP paradigm is just to make code "more concise and readable" but not about performance?
Not exactly. It is certainly true that the FP paradigm is more concise and (to someone who is familiar with it) more readable.
However, by expressing the using the FP paradigm, you are also expressing it in a way that potentially could be optimized in ways that are much harder to achieve with code expressed using loops and assignment. FP code is also more amenable to formal methods; i.e. formal proof of correctness.
(In the context of this discussion of streams, "could be optimized" means in some future Java release.)
Laziness is about how elements are taken from the source of the stream - that is on demand. If there is needed to take more elements - they will, otherwise they will not. Here is an example:
Arrays.asList(1, 2, 3, 4, 5)
.stream()
.peek(x -> System.out.println("before filter : " + x))
.filter(x -> x > 2)
.peek(System.out::println)
.anyMatch(x -> x > 3);
Notice how each element goes through the entire pipeline of stages; that is filter is applied to one element at at time - not all of them, thus filter returns a Stream<Integer>. This allows the stream to be short-circuiting, as anyMatch does not even process 5 since there is no need at all.
Just notice that not all intermediate operations are lazy. For example sorted and distinct is not - and these are called stateful intermediate operations. Think about this way - to actually sort elements you do need to traverse the entire source. One more example that is not intuitive is flatMap, but this is not guaranteed and is seen more like a bug, more to read here
The speed is about how you measure, measuring micro-benchmarks in java is not easy, and de facto tool for that is jmh - you can try that out. There are numerous posts here on SO that show that streams are indeed slower (which in normal - they have an infrastructure), but the difference is not that big to actually care.

Inherent math operation in Java

I'm writing essentially a small calaculator where you input two numbers, an operand like "+, -, *, /" etc and it'll preform that function.
My initial thoughts have been to just have the variables get entered by the user, than just have them computed as imputed. I would like to do it this way, as to avoid writing several if else statements and keeping things a bit cleaner. But I can't find a place explaining how to inherently compute the values and imputed characters.
package hw2p1;
import java.util.Scanner;
import javax.script.*;
public class Calculator {
public static void PrintCalculator(){
Scanner input = new Scanner(System.in);
double num1; //1st entered number
double num2; //2nd entered number
String val1; //Math operator like + - * /.
double math1; //the results.
String equation = "";
System.out.print("Enter the first number:");
num1 = input.nextDouble();
System.out.print("Enter the second number:");
num2 = input.nextDouble();
System.out.print("Enter an operator:");
val1 = input.next();
I know this is incomplete, but I don't know/can't find the logic on how to string the three inputted values together and compute them.
The simplest solution would be
public static double calculate(double a, String op, double b) {
switch(op) {
case "+": return a+b;
case "-": return a-b;
case "*": return a*b;
case "/": return a/b;
case "%": return a%b;
case "^": return Math.pow(a, b);
default: throw new IllegalArgumentException("no such operator '"+op+"'");
}
}
which is semantically equivalent to a sequence of if-else statements, but clearly showing the intention and potentially more efficient, though there’s not much worth in speculating about performance here.
It can be invoked like
double math1 = calculate(num1, val1, num2); //the results.
but you really should retink your variable naming scheme. If you have to append a comment telling the purpose behind every declaration, you’re clearly doing something wrong. Why not naming the variables, e.g. firstInputNumber, secondInputNumber, operator and result in the first place? Then, you don’t need the comments telling the variable’s purposes.
I could not remove if-else but may be you can try something like this
public static void main(String[] args) {
int result = operate("+").applyAsInt(10, 20);
System.out.println(result);
}
private static IntBinaryOperator operate(String op){
switch(op){
case "+": return Math::addExact;
case "-": return Math::subtractExact;
//other cases
}
throw new RuntimeException("incorrect operator");
}
One issue though, Math do not have methods for double
tl;dr
You have to explicitly state that "+" means addition.
Long Anwser
When the user types something when your program runs, all of it is just characters. It's up to your program to interpret those characters to do something meaningful. Scanner can be helpful with this because it knows how to interpret the characters 1234 as an int. However, beyond data for the built-in types, Scanner is completely dumb.
In building your own calculator program, you are basically writing a simplified version of the Java compiler. The compiler is a program which reads characters from a file and converts them into something that can execute on a Java Virtual Machine. You will do much of the same thing in your own calculator program. This means that you have to explicitly write the rules that interpret characters that represent mathematical operators into the operations that they actually perform.
If you are interested in learning more about this topic, you can do some research on compilers and interpreters. They are generally made of two pieces: the lexical analyzer, or lexer, and the parser. This is all especially necessary if you want to allow the user to type in a complete mathematical expression like 2 + 3 rather than prompting them for each piece of the expression individually.
I hope this gives you some terminology that you can use in a Google search.

Why can't dead code detection be fully solved by a compiler?

The compilers I've been using in C or Java have dead code prevention (warning when a line won't ever be executed). My professor says that this problem can never be fully solved by compilers though. I was wondering why that is. I am not too familiar with the actual coding of compilers as this is a theory-based class. But I was wondering what they check (such as possible input strings vs acceptable inputs, etc.), and why that is insufficient.
The dead code problem is related to the Halting problem.
Alan Turing proved that it is impossible to write a general algorithm that will be given a program and be able to decide whether that program halts for all inputs. You may be able to write such an algorithm for specific types of programs, but not for all programs.
How does this relate to dead code?
The Halting problem is reducible to the problem of finding dead code. That is, if you find an algorithm that can detect dead code in any program, then you can use that algorithm to test whether any program will halt. Since that has been proven to be impossible, it follows that writing an algorithm for dead code is impossible as well.
How do you transfer an algorithm for dead code into an algorithm for the Halting problem?
Simple: you add a line of code after the end of the program you want to check for halt. If your dead-code detector detects that this line is dead, then you know that the program does not halt. If it doesn't, then you know that your program halts (gets to the last line, and then to your added line of code).
Compilers usually check for things that can be proven at compile-time to be dead. For example, blocks that are dependent on conditions that can be determined to be false at compile time. Or any statement after a return (within the same scope).
These are specific cases, and therefore it's possible to write an algorithm for them. It may be possible to write algorithms for more complicated cases (like an algorithm that checks whether a condition is syntactically a contradiction and therefore will always return false), but still, that wouldn't cover all possible cases.
Well, let's take the classical proof of the undecidability of the halting problem and change the halting-detector to a dead-code detector!
C# program
using System;
using YourVendor.Compiler;
class Program
{
static void Main(string[] args)
{
string quine_text = #"using System;
using YourVendor.Compiler;
class Program
{{
static void Main(string[] args)
{{
string quine_text = #{0}{1}{0};
quine_text = string.Format(quine_text, (char)34, quine_text);
if (YourVendor.Compiler.HasDeadCode(quine_text))
{{
System.Console.WriteLine({0}Dead code!{0});
}}
}}
}}";
quine_text = string.Format(quine_text, (char)34, quine_text);
if (YourVendor.Compiler.HasDeadCode(quine_text))
{
System.Console.WriteLine("Dead code!");
}
}
}
If YourVendor.Compiler.HasDeadCode(quine_text) returns false, then the line System.Console.WriteLn("Dead code!"); won't be ever executed, so this program actually does have dead code, and the detector was wrong.
But if it returns true, then the line System.Console.WriteLn("Dead code!"); will be executed, and since there is no more code in the program, there is no dead code at all, so again, the detector was wrong.
So there you have it, a dead-code detector that returns only "There is dead code" or "There is no dead code" must sometimes yield wrong answers.
If the halting problem is too obscure, think of it this way.
Take a mathematical problem that is believed to be true for all positive integer's n, but hasn't been proven to be true for every n. A good example would be Goldbach's conjecture, that any positive even integer greater than two can be represented by the sum of two primes. Then (with an appropriate bigint library) run this program (pseudocode follows):
for (BigInt n = 4; ; n+=2) {
if (!isGoldbachsConjectureTrueFor(n)) {
print("Conjecture is false for at least one value of n\n");
exit(0);
}
}
Implementation of isGoldbachsConjectureTrueFor() is left as an exercise for the reader but for this purpose could be a simple iteration over all primes less than n
Now, logically the above must either be the equivalent of:
for (; ;) {
}
(i.e. an infinite loop) or
print("Conjecture is false for at least one value of n\n");
as Goldbach's conjecture must either be true or not true. If a compiler could always eliminate dead code, there would definitely be dead code to eliminate here in either case. However, in doing so at the very least your compiler would need to solve arbitrarily hard problems. We could provide problems provably hard that it would have to solve (e.g. NP-complete problems) to determine which bit of code to eliminate. For instance if we take this program:
String target = "f3c5ac5a63d50099f3b5147cabbbd81e89211513a92e3dcd2565d8c7d302ba9c";
for (BigInt n = 0; n < 2**2048; n++) {
String s = n.toString();
if (sha256(s).equals(target)) {
print("Found SHA value\n");
exit(0);
}
}
print("Not found SHA value\n");
we know that the program will either print out "Found SHA value" or "Not found SHA value" (bonus points if you can tell me which one is true). However, for a compiler to be able to reasonably optimise that would take of the order of 2^2048 iterations. It would in fact be a great optimisation as I predict the above program would (or might) run until the heat death of the universe rather than printing anything without optimisation.
I don't know if C++ or Java have an Eval type function, but many languages do allow you do call methods by name. Consider the following (contrived) VBA example.
Dim methodName As String
If foo Then
methodName = "Bar"
Else
methodName = "Qux"
End If
Application.Run(methodName)
The name of the method to be called is impossible to know until runtime. Therefore, by definition, the compiler cannot know with absolute certainty that a particular method is never called.
Actually, given the example of calling a method by name, the branching logic isn't even necessary. Simply saying
Application.Run("Bar")
Is more than the compiler can determine. When the code is compiled, all the compiler knows is that a certain string value is being passed to that method. It doesn't check to see if that method exists until runtime. If the method isn't called elsewhere, through more normal methods, an attempt to find dead methods can return false positives. The same issue exists in any language that allows code to be called via reflection.
Unconditional dead code can be detected and removed by advanced compilers.
But there is also conditional dead code. That is code that cannot be known at the time of compilation and can only be detected during runtime. For example, a software may be configurable to include or exclude certain features depending on user preference, making certain sections of code seemingly dead in particular scenarios. That is not be real dead code.
There are specific tools that can do testing, resolve dependencies, remove conditional dead code and recombine the useful code at runtime for efficiency. This is called dynamic dead code elimination. But as you can see it is beyond the scope of compilers.
A simple example:
int readValueFromPort(const unsigned int portNum);
int x = readValueFromPort(0x100); // just an example, nothing meaningful
if (x < 2)
{
std::cout << "Hey! X < 2" << std::endl;
}
else
{
std::cout << "X is too big!" << std::endl;
}
Now assume that the port 0x100 is designed to return only 0 or 1. In that case the compiler cannot figure out that the else block will never be executed.
However in this basic example:
bool boolVal = /*anything boolean*/;
if (boolVal)
{
// Do A
}
else if (!boolVal)
{
// Do B
}
else
{
// Do C
}
Here the compiler can calculate out the the else block is a dead code.
So the compiler can warn about the dead code only if it has enough data to to figure out the dead code and also it should know how to apply that data in order to figure out if the given block is a dead code.
EDIT
Sometimes the data is just not available at the compilation time:
// File a.cpp
bool boolMethod();
bool boolVal = boolMethod();
if (boolVal)
{
// Do A
}
else
{
// Do B
}
//............
// File b.cpp
bool boolMethod()
{
return true;
}
While compiling a.cpp the compiler cannot know that boolMethod always returns true.
The compiler will always lack some context information. E.g. you might know, that a double value never exeeds 2, because that is a feature of the mathematical function, you use from a library. The compiler does not even see the code in the library, and it can never know all features of all mathematical functions, and detect all weired and complicated ways to implement them.
The compiler doesn't necessarily see the whole program. I could have a program that calls a shared library, which calls back into a function in my program which isn't called directly.
So a function which is dead with respect to the library it's compiled against could become alive if that library was changed at runtime.
If a compiler could eliminate all dead code accurately, it would be called an interpreter.
Consider this simple scenario:
if (my_func()) {
am_i_dead();
}
my_func() can contain arbitrary code and in order for the compiler to determine whether it returns true or false, it will either have to run the code or do something that is functionally equivalent to running the code.
The idea of a compiler is that it only performs a partial analysis of the code, thus simplifying the job of a separate running environment. If you perform a full analysis, that isn't a compiler any more.
If you consider the compiler as a function c(), where c(source)=compiled code, and the running environment as r(), where r(compiled code)=program output, then to determine the output for any source code you have to compute the value of r(c(source code)). If calculating c() requires the knowledge of the value of r(c()) for any input, there is no need for a separate r() and c(): you can just derive a function i() from c() such that i(source)=program output.
Others have commented on the halting problem and so forth. These generally apply to portions of functions. However it can be hard/impossible to know whether even an entire type (class/etc) is used or not.
In .NET/Java/JavaScript and other runtime driven environments there's nothing stopping types being loaded via reflection. This is popular with dependency injection frameworks, and is even harder to reason about in the face of deserialisation or dynamic module loading.
The compiler cannot know whether such types would be loaded. Their names could come from external config files at runtime.
You might like to search around for tree shaking which is a common term for tools that attempt to safely remove unused subgraphs of code.
Take a function
void DoSomeAction(int actnumber)
{
switch(actnumber)
{
case 1: Action1(); break;
case 2: Action2(); break;
case 3: Action3(); break;
}
}
Can you prove that actnumber will never be 2 so that Action2() is never called...?
I disagree about the halting problem. I wouldn't call such code dead even though in reality it will never be reached.
Instead, lets consider:
for (int N = 3;;N++)
for (int A = 2; A < int.MaxValue; A++)
for (int B = 2; B < int.MaxValue; B++)
{
int Square = Math.Pow(A, N) + Math.Pow(B, N);
float Test = Math.Sqrt(Square);
if (Test == Math.Trunc(Test))
FermatWasWrong();
}
private void FermatWasWrong()
{
Press.Announce("Fermat was wrong!");
Nobel.Claim();
}
(Ignore the type and overflow errors) Dead code?
Look at this example:
public boolean isEven(int i){
if(i % 2 == 0)
return true;
if(i % 2 == 1)
return false;
return false;
}
The compiler can't know that an int can only be even or odd. Therefore the compiler must be able to understand the semantics of your code. How should this be implemented? The compiler can't ensure that the lowest return will never be executed. Therefore the compiler can't detect the dead code.

If/ Else, test true first or false first

I have a rather specific question.
Say I am at the end of a function, and am determining whether to return true or false.
I would like to do this using an if/else statement, and have two options: (examples are in pseudocode)
1) Check if worked first:
if(resultVar != error){
return true;
}else{
return false;
}
2) Check if it failed first:
if(resultVar == error){
return false;
}else{
return true;
}
My question is simple: Which case is better (faster? cleaner?)?
I am really looking at the if/else itself, disregarding that the example is returning (but thanks for the answers)
The function is more likely to want to return true than false.
I realize that these two cases do the exact same thing, but are just 'reversed' in the order of which they do things. I would like to know if either one has any advantage over the other, whether one is ever so slightly faster, or follows a convention more closely, etc.
I also realize that this is extremely nitpicky, I just didn't know if there is any difference, and which would be best (if it matters).
Clarifications:
A comparison needs to be done to return a boolean. The fact that of what the examples are returning is less relevant than how the comparison happens.
This is by far the cleanest:
return resultvar != error;
The only difference in both examples may be the implementation of the operator. A != operator inverses the operation result. So it adds an overhead, but very small one. The == is a straight comparison.
But depending on what you plan to do on the If/else, if there is simply assigning a value to a variable, then the conditional ternary operator (?) is faster. For complex multi value decisions, a switch/case is more flexible, but slower.
This will be faster in your case:
return (resultVar == error) ? false : true;
This will depend entirely on the language and the compiler. There is no specific answer. In C for instance, both of these would be encoded rather like:
return (resultVar!=error);
by any decent compiler.
Put true first, because it potentially eleiminates a JUMP command in assembly. However, the difference is negligible, since it's an else, rather than an else if. There may /technically be a difference/, but you will see no performance difference in this case.

performance of many if statements/switch cases

If I had literally 1000s of simple if statements or switch statements
ex:
if 'a':
return 1
if 'b':
return 2
if 'c':
return 3
...
...
Would the performance of creating trivial if statements be faster when compared to searching a list for something? I imagined that because every if statement must be tested until the desired output is found (worst case O(n)) it would have the same performance if I were to search through a list. This is just an assumption. I have no evidence to prove this. I am curious to know this.
You could potentially put these things in to delegates that are then in a map, the key of which is the input you've specified.
C# Example:
// declare a map. The input(key) is a char, and we have a function that will return an
// integer based on that char. The function may do something more complicated.
var map = new Dictionary<char, Func<char, int>>();
// Add some:
map['a'] = (c) => { return 1; };
map['b'] = (c) => { return 2; };
map['c'] = (c) => { return 3; };
// etc... ad infinitum.
Now that we have this map, we can quite cleanly return something based on the input
public int Test(char c)
{
Func<char, int> func;
if(map.TryGetValue(c, out func))
return func(c);
return 0;
}
In the above code, we can call Test and it will find the appropriate function to call (if present). This approach is better (imho) than a list as you'd have to potentially search the entire list to find the desired input.
This depends on the language and the compiler/interpreter you use. In many interpreted languages, the performance will be the same, in other languages, the switch statements gives the compiler crucial additional information that it can use to optimize the code.
In C, for instance, I expect a long switch statement like the one you present to use a lookup table under the hood, avoiding explicit comparison with all the different values. With that, your switch decision takes the same time, no matter how many cases you have. A compiler might also hardcode a binary search for the matching case. These optimizations are typically not performed when evaluating a long else if() ladder.
In any case, I repeat, it depends on the interpreter/compiler: If your compiler optimized else if() ladders, but no switch statements, what it could do with a switch statement is quite irrelevant. However, for mainline languages, you should be able to expect all constructs to be optimized.
Apart from that, I advise to use a switch statement wherever applicable, it carries a lot more semantic information to the reader than an equivalent else if() ladder.

Resources