Running time of logical operator versus boolean operation - time

return true;
return i == 1;
Do they have the same running time?
Or return i == 1; requires 1 more extra time?

It depends on the type of return you want. For a boolean method, it is preferable to return "true". I == 1 is much less clear, especially you take an unnecessary check.

There really isn't a big difference - both execute in the same time.
However, if you have a small project, then returning either true or false / 1 or 0 is totally fine and will execute equally. Even if there is a difference, 1 or 0 would execute about 0.00000000000003 milliseconds faster, which is unnoticeable.
But, if you have a large project, then returning 1 or 0 will be faster. There would still be a difference in a couple of milliseconds (about 0.2ms). If your code has a massive loop that iterates thousands of times, it would make a difference.
In C, it is preferred and recommended that you return 1 or 0 for performance and speed, but it really depends in what language you are working with. Returning true or false is just a more human-readable version of saying return 1 or 0.

Related

Modulus optimization in a program

I have seen that many people prefer to use in code:
while(i<1000000){
ret+=a[i];
i++;
if(ret >= MOD)
ret -= MOD;
}
instead of making ret%MOD in the final step.
What is the difference between these two and how both these are equal?
How it is making an optimize our code?
Basically you can't tell without trying. There are two possible outcomes (considering my note further down below):
The compiler optimizes the code in some way that both solutions use either a conditional jump or a modulo operation. This does not only depend on how "bright" the compiler is, but it also has to consider the target architecture's available instruction set (but to be honest, it would be odd not having a modulo operation).
The compiler doesn't optimize the code (most probable for non-optimizing debug builds).
The basic difference that - as mentioned already - the solution with the if() will use one conditional jump, which - again depending on your architecture - might slow you down a bit, since the compiler can't prefetch the next instruction without evaluating the jump condition first.
One further note:
Either using a modulo operation or your if() statement actually isn't equal (depending on the actual values), simply due to the fact that ret % MOD would result in the following equal code:
while (ret >= MOD)
ret -= MOD;
Imagine a[i] being bigger than MOD and the new sum being bigger than two times MOD. In that case you'd end up with a ret bigger than MOD, something that won't happen when using modulo.
Let an example :
13 MOD 10
what it actually do is, give you the reminder after dividing 13 by 10.
that is : 13 - (10 * (int)(13/10)) = 13 - ( 10 * 1 ) = 3
so if a[i] <= mod then it will work good. but if a[i] > mod then see, what happens
let a[]= {15,15,15}
mod=7
in first step
ret = 0 + 15
ret = 15 - 7 = 8
2nd step
ret = 8 + 15 = 23
ret = 23 - 7 = 16
3rd step
ret = 16 + 15
ret = 31 - 7 = 24
So your final result is 24, but it should be 3.
you have to do :
while (ret >= MOD)
ret -= MOD;
if you want to use subtraction instead of mod..
And obviously sub is better than mod in respect to time... because mod is really time consuming :(
It is best not to try to optimise code unless you have a performance problem. Then find out where it is actually having the problems
An to answer you question the two are the same - but you need to check with the particular hardware/compiler to check.
The conditional test and subtraction is typically less expensive than a modulus operation, especially if the sum does not frequently exceed MOD. A modulus operation is effectively an integer division instruction, which typically has a latency which is an order of magnitude greater than that of compare/subtract. Having said that, unless the loop is a known performance bottleneck then you should just code for clarity and robustness.
Modulo requires integer division which is usually the slowest integer math operation on a CPU. Long ago before pipelines and branch prediction, this code was probably reliably faster than modulo. Nowadays branches can be very slow so its benefit is far from certain. If the values in a are always much smaller than MOD, it's probably still a win because the branch will be skipped most iterations and the branch predictor will mostly guess right. If they are not smaller, it's uncertain. You would need to benchmark both.
If you can write the program such that MOD is always a power of 2, you could use bit masking which is much faster than either.
If I saw this pattern in code that wasn't 1) from 1978 or 2) accompanied by a comment explaining how the author benchmarked it and found it was faster than modulo on the current compiler, typical user CPU, and a realistic data input, I'd roll my eyes hard.
Yes booth compute the same thing but:
operation % needs integer division which is more time costly then - and if
but on modern parallel machines (mean more pipelines by that not cores)
the CPU do more tasks at once unless they depend on each other or brunching occurs
that is why on modern machines is the % variant usually faster (if stalls the pipelines)
There are still platforms where the -=,if variant is faster
like MCU's so when you know you have just single CPU/MCU pipeline
or have very slow division then use this variant
you should always measure the result ties during optimization process
in your case you want to call just single mod per whole loop so it should be faster but check the later text ...
Compilers
modern compilers optimize code for your target platform and usually detect this and use the right choice
so you should not be consumed by the low level optimization instead of by programing the task functionality
but not all compilers are such for many platforms there are still used older compilers
also in some rare cases the optimizations are preferred to be turned off
because it could destroy specific desired timing, instruction patterns, or even functionality of the task ...
in such cases there is no choice and this knowledge suddenly comes handy
now the differences of your cases from algorithmic side:
while(i<1000000){ ret+=a[i]; i++; if(ret>=MOD) ret-=MOD; }
the sub result is still around modulo MOD
that mean you do not need more bits then used for max(a[i])+MOD*N where N depends on a[i]
if the sum(a[i]) will go to bignums then this will have more speed due to no need to increase sub-result bit-width
while(i<1000000){ ret+=a[i]; i++; } ret%=MOD;
this could overflow if variable ret can not hold the non modulo result
while(i<1000000){ ret+=a[i]; i++; ret%=MOD; }
this is how it should be for bigger non modulo results
if (ret>=MOD) ret-=MOD; is not modulo operation
it is just iteration of it.
more safe is while (ret>=MOD) ret-=MOD;
but if you know that the sub-result is not increasing too much (so it will not overflow in any few iterations) then if is OK
but in that case you should add while or modulo after the loop to ensure correct result

Faster way of testing a condition in MATLAB

I need to run many many tests of the form a<0 where a is a vector (a relatively short one). I am currently doing it with
all(v<0)
Is there a faster way?
Not sure which one will be faster (that may depend on the machine and Matlab version), but here are some alternatives to all(v<0):
~any(v>0)
nnz(v>=0)==0 %// Or ~nnz(v>=0)
sum(v>=0)==0 %// Or ~sum(v>=0)
isempty(find(v>0, 1)) %// Or isempty(find(v>0))
I think the issue is that the conditional is executed on all elements of the array first, then the condition is tested... That is, for the test "any(v<0)", matlab does the following I believe:
Step 1: compute v<0 for every element of v
Step 2: search through the results of step 1 for a true value
So even if the first element of v is less than zero, the conditional was first computed for all elements, hence wasting a lot of time. I think this is also true for any of the alternative solutions offered above.
I don't know of a faster way to do it easily, but wish I did. In some cases, breaking the array v up into smaller chunks and testing incrementally could speed things up, particularly if the condition is common. For example:
function result = anyLessThanZero(v);
w = v(:);
result = true;
for i=1:numel(w)
if ( w(i) < 0 )
return;
end
end
result = false;
end
but that can be very inefficient if the condition is rare. (If you were to really do this, there is probably a better way than I illustrate above to handle any condition, not just <0, but I show it this way to make it clear).

Do speeds of if statements in a repetitive loop affect overall performance?

If I have code that will take a while to execute, printing out results every iteration will slow down the program a lot. To still receive occasional output to check on the progress of the code, I might have:
if (i % 10000 == 0) {
# print progress here
}
Does the if statement checking every time slow it down at all? Should I just not put output and just wait, will that make it noticeably faster at all?
Also, is it faster to do: (i % 10000 == 0) or (i == 10000)?
Is checking equality or modulus faster?
In general case, it won't matter at all.
A slightly longer answer: It won't matter unless the loop is run millions of times and the other statement in it is actually less demanding than an if statement (for example, a simple multiplication etc.). In that case, you might see a slight performance drop.
Regarding (i % 10000 == 0) vs. (i == 10000), the latter is obviously faster, because it only compares, whereas the former possibility does a (fairly costly) modulus and a comparison.
That said, both an if statement and a modulus count won't make any difference if your loop doesn't take up 90 % of the program's running time. Which usually is the case only at school :). You probably spent a lot more time by asking this question than you would have saved by not printing anything. For development and debugging, this is not a bad way to go.
The golden rule for this kind of decisions:
Write the most readable and explicit code you can imagine to do the
thing you want it to do. If you have a performance problem, look at
wrong data structures and algorithmic choices first. If you have done
all those and need a really quick program, profile it to see which
part takes most time. After all those, you're allowed to do this kind
of low-level guesses.

Replicate() versus a for loop?

Does anyone know how the replicate() function works in R and how efficient it is relative to using a for loop?
For example, is there any efficiency difference between...
means <- replicate(100000, mean(rnorm(50)))
And...
means <- c()
for(i in 1:100000) {
means <- c(means, mean(rnorm(50)))
}
(I may have typed something slightly off above, but you get the idea.)
You can just benchmark the code and get your answer empirically. Note that I also added a second for loop flavor which circumvents the growing vector problem by preallocating the vector.
repl_function = function(no_rep) means <- replicate(no_rep, mean(rnorm(50)))
for_loop = function(no_rep) {
means <- c()
for(i in 1:no_rep) {
means <- c(means, mean(rnorm(50)))
}
means
}
for_loop_prealloc = function(no_rep) {
means <- vector(mode = "numeric", length = no_rep)
for(i in 1:no_rep) {
means[i] <- mean(rnorm(50))
}
means
}
no_loops = 50e3
benchmark(repl_function(no_loops),
for_loop(no_loops),
for_loop_prealloc(no_loops),
replications = 3)
test replications elapsed relative user.self sys.self
2 for_loop(no_loops) 3 18.886 6.274 17.803 0.894
3 for_loop_prealloc(no_loops) 3 3.209 1.066 3.189 0.000
1 repl_function(no_loops) 3 3.010 1.000 2.997 0.000
user.child sys.child
2 0 0
3 0 0
1 0 0
Looking at the relative column, the un-preallocated for loop is 6.2 times slower. However, the preallocated for loop is just as fast as replicate.
replicate is a wrapper for sapply, which itself is a wrapper for lapply. lapply is ultimately an .Internal function that is written in C and performs the looping in an optimised way, rather than through the interpreter. It's main advantages are efficient memory management, especially compared to the highly inefficient vector growing method you present above.
I have a very different experience with replicate which also confuses me. It often happens that my R crashes and my laptop hangs when I use replicate compared to for and this surprises me, as for the reasons mentioned above, I also expected a C-written function to outperform the for loop. For example, if you execute the functions below, you'll see that for loop is faster than replicate
system.time(for (i in 1:10) runif(1e7))
# user system elapsed
# 3.340 0.218 3.558
system.time(replicate(10, runif(1e7)))
# user system elapsed
# 4.622 0.484 5.109
so with 10 replicates, the for loop is clearly faster. If you repeat it for 100 replicates you get similar results. So I wonder if anyone can come with an example that shows its practical privileges compared to for.
PS I also created a function for the runif(1e7) and that made no difference in the comparison. Basically I failed to come with any example that shows the advantage of replicate.
Vectorization is the key difference between them. I will tray to explain this point. R is an high-level-interpreted computer language. It takes care of many basic computer tasks for you. When you write
x <- 2.0
you don’t have to tell your computer that
“2.0” is a floating-point number;
“x” should store numeric-type data;
it has to find a place in memory to put “5”;
it has to register “x” as a pointer to a certain place in memory.
R figures these things by itself.
But, for such comfortable issue, there is a price: it is slower than low level languages.
In C or FORTRAN, much of this "test if" would be accomplished during the compilation step, not during the program execution. They are translated into binary computer language (0/1) after they are written, BUT before they are run. This allows the compiler to organize the binary machine code in an optimal way for the computer to interpret.
What does this have to do with vectorization in R? Well, many R functions are actually written in a a compiled language, such as C, C++, and FORTRAN, and have a small R “wrapper”. This is the difference between yours approach. for loops add further test if operations that the machine has to do on data, making it slower

Which is faster? Comparison or assignment?

I'm doing a bit of coding, where I have to write this sort of code:
if( array[i]==false )
array[i]=true;
I wonder if it should be re-written as
array[i]=true;
This raises the question: are comparisions faster than assignments?
What about differences from language to language? (contrast between java & cpp, eg.)
NOTE: I've heard that "premature optimization is the root of all evil." I don't think that applies here :)
This isn't just premature optimization, this is micro-optimization, which is an irrelevant distraction.
Assuming your array is of boolean type then your comparison is unnecessary, which is the only relevant observation.
Well, since you say you're sure that this matters you should just write a test program and measure to find the difference.
Comparison can be faster if this code is executed on multiple variables allocated at scattered addresses in memory. With comparison you will only read data from memory to the processor cache, and if you don't change the variable value when the cache decides to to flush the line it will see that the line was not changed and there's no need to write it back to the memory. This can speed up execution.
Edit: I wrote a script in PHP. I just noticed that there was a glaring error in it meaning the best-case runtime was being calculated incorrectly (scary that nobody else noticed!)
Best case just beats outright assignment but worst case is a lot worse than plain assignment. Assignment is likely fastest in terms of real-world data.
Output:
assignment in 0.0119960308075 seconds
worst case comparison in 0.0188510417938 seconds
best case comparison in 0.0116770267487 seconds
Code:
<?php
$arr = array();
$mtime = explode(" ", microtime());
$starttime = $mtime[1] + $mtime[0];
reset_arr($arr);
for ($i=0;$i<10000;$i++)
$arr[i] = true;
$mtime = explode(" ", microtime());
$firsttime = $mtime[1] + $mtime[0];
$totaltime = ($firsttime - $starttime);
echo "assignment in ".$totaltime." seconds<br />";
reset_arr($arr);
for ($i=0;$i<10000;$i++)
if ($arr[i])
$arr[i] = true;
$mtime = explode(" ", microtime());
$secondtime = $mtime[1] + $mtime[0];
$totaltime = ($secondtime - $firsttime);
echo "worst case comparison in ".$totaltime." seconds<br />";
reset_arr($arr);
for ($i=0;$i<10000;$i++)
if (!$arr[i])
$arr[i] = false;
$mtime = explode(" ", microtime());
$thirdtime = $mtime[1] + $mtime[0];
$totaltime = ($thirdtime - $secondtime);
echo "best case comparison in ".$totaltime." seconds<br />";
function reset_arr($arr) {
for ($i=0;$i<10000;$i++)
$arr[$i] = false;
}
I believe if comparison and assignment statements are both atomic(ie one processor instruction) and the loop executes n times, then in the worst-case comparing then assigning would require n+1(comparing on every iteration plus setting the assignement) executions whereas constantly asssigning the bool would require n executions. Therefore the second one is more efficient.
Depends on the language. However looping through arrays can be costly as well. If the array is in consecutive memory, the fastest is to write 1 bits (255s) across the entire array with memcpy assuming your language/compiler can do this.
Thus performing 0 reads-1 write total, no reading/writing the loop variable/array variable (2 reads/2 writes each loop) several hundred times.
I really wouldn't expect there to be any kind of noticeable performance difference for something as trivial as this so surely it comes down to what gives you clear, more readable code. I my opinion that would be always assigning true.
Might give this a try:
if(!array[i])
array[i]=true;
But really the only way to know for sure is to profile, I'm sure pretty much any compiler would see the comparison to false as unnecessary and optimize it out.
It all depends on the data type. Assigning booleans is faster than first comparing them. But that may not be true for larger value-based datatypes.
As others have noted, this is micro-optimization.
(In politics or journalism, this is known as navel-gazing ;-)
Is the program large enough to have more than a couple layers of function/method/subroutine calls?
If so, it probably had some avoidable calls, and those can waste hundreds as much time as low-level inefficiencies.
On the assumption that you have removed those (which few people do), then by all means run it 10^9 times under a stopwatch, and see which is faster.
Why would you even write the first version? What's the benefit of checking to see if something is false before setting it true. If you always are going to set it true, then always set it true.
When you have a performance bottleneck that you've traced back to setting a single boolean value unnecessarily, come back and talk to us.
I remember in one book about assembly language the author claimed that if condition should be avoided, if possible.
It is much slower if the condition is false and execution has to jump to another line, considerably slowing down performance. Also since programs are executed in machine code, I think 'if' is slower in every (compiled) language, unless its condition is true almost all the time.
If you just want to flip the values, then do:
array[i] = !array[i];
Performance using this is actually worse though, as instead of only having to do a single check for a true false value then setting, it checks twice.
If you declare a 1000000 element array of true,false, true,false pattern comparision is slower. (var b = !b) essentially does a check twice instead of once

Resources