I'm trying to implement utf8 decoding in Ocaml as a learning project. To check the performance I'm benchmarking against the go standard library.
This is the go code:
package main
import (
"fmt"
"time"
"unicode/utf8"
)
func main() {
start := time.Now()
for i := 0; i < 1000000000; i++ {
utf8.ValidRune(23450)
}
elapsed := time.Since(start)
fmt.Println(elapsed)
}
When I run it, I get:
go build b.go
./b
344.979492ms
I decided to write an equivalent in ocaml:
let min = 0x0000
let max = 0x10FFFF
let surrogateMin = 0xD800
let surrogateMax = 0xDFFF
let validUchar c =
if (0 <= c && c < surrogateMin) then
true
else if (surrogateMax < c && c <= max) then
true
else
false
let time f x =
let t = Sys.time () in
let _ = f x in
let t2 = Sys.time () in
let diff = (t2 -. t) *. 1000. in
print_endline ((string_of_float diff) ^ "ms")
let test () =
for i = 0 to 1000000000 do
let _ = validUchar 23450 in
()
done
let () = time test ()
Output:
ocamlopt bMl.ml -o bMl
./bMl
2041.075ms
The ocaml equivalent basically copies the implementation of the go stdlib from https://golang.org/src/unicode/utf8/utf8.go#L517
Why is the ocaml code so much slower?
As observed you should be using Unix.gettimeofday to measure wallclock time. You can use Sys.opaque_identity however to prevent OCaml from optimizing useless operations away, and you can use ignore to 'return unit' instead of the usual value of an expression. Altogether:
let time f x =
let t = Unix.gettimeofday () in
ignore (Sys.opaque_identity (f x));
let t2 = Unix.gettimeofday () in
...
let test () =
for i = 1 to 1_000_000_000 do
ignore (Sys.opaque_identity (validUchar 23450));
done
Note the i = 1, which you want if you want exactly one billion iterations (a figure I couldn't tell was one billion before adding the underscores, which OCaml allows). Previously you were measuring one billion plus 1 iterations. Not that that was the difference.
Your verbose definition of validUchar did not benefit its performance any. Please write a microbenchmark and confirm that.
Finally, after making the changes suggested above and writing your validUchar in a more natural manner, I get an OCaml runtime that's identical to the Go runtime ... after adding -O3 to the ocamlopt arguments. And it's easy to confirm that this isn't due to the compiler 'optimizing the operations away' -- commenting out the f x call in time results in runtimes of 0 or near-0 values like 1.19e-06.
Don't be discouraged by the responses you got to this question. But do expect that any kind of "why does this benchmark have this result?" question to a programming forum will be answered similarly.
The Sys.time shouldn't be used for time measurements, as it returns a processor time, not the real time. The Unix.gettimeofday function is a much better candidate. Alternatively, you can time your program from the shell using the time command.
As a side note, benchmarking is hard, and it is very easy to get misleading results. In your particular case, if you will turn on optimizations both compilers will remove the computations, since they are not used and will produce code that does nothing, and thus is rather fast :)
Related
I am experimenting a bit with julia, since I've heard that it is suitable for scientific calculus and its syntax is reminiscent of python. I tried to write and execute a program to count prime numbers below a certain n, but the performances are not the ones hoped.
Here I post my code, with the disclaimer that I've literally started yesterday in julia programming and I am almost sure that something is wrong:
n = 250000
counter = 0
function countPrime(counter)
for i = 1:n
# print("begin counter= ", counter, "\n")
isPrime = true
# print("i= ", i, "\n")
for j = 2:(i-1)
if (i%j) == 0
isPrime = false
# print("j= ", j, "\n")
break
end
end
(isPrime==true) ? counter += 1 : counter
# print("Counter= ", counter, "\n")
end
return counter
end
println(countPrime(counter))
The fact is that the same program ported in C has about 5 seconds of execution time, while this one in julia has about 3 minutes and 50 seconds, which sounds odd to me since I thought that julia is a compiled language. What's happening?
Here is how I would change it:
function countPrime(n)
counter = 0
for i in 1:n
isPrime = true
for j in 2:i-1
if i % j == 0
isPrime = false
break
end
end
isPrime && (counter += 1)
end
return counter
end
This code runs in about 5 seconds on my laptop. Apart from stylistic changes the major change is that you should pass n as a parameter to your function and define the counter variable inside your functions.
The changes follow one of the first advices in the Performance Tips section of the Julia Manual.
The point is that when you use a global variable the Julia compiler is not able to make assumptions about the type of this variable (as it might change after the function was compiled), so it defensively assumes that it might be anything, which slows things down.
As for stylistic changes note that (isPrime==true) ? counter += 1 : counter can be written just as isPrime && (counter += 1) as you want to increment the counter if isPrime is true. Using the ternary operator ? : is not needed here.
To give a MWE of a problem with using global variables in functions:
julia> x = 10
10
julia> f() = x
f (generic function with 1 method)
julia> #code_warntype f()
MethodInstance for f()
from f() in Main at REPL[2]:1
Arguments
#self#::Core.Const(f)
Body::Any
1 ─ return Main.x
You can see that here inside the f function you refer to the global variable x. Therefore, when Julia compiles f it must assume that the value of x can have any type (which is called in Julia Any). Working with such values is slow as the compiler cannot use any optimizations that would take advantage of more specific type of value processed.
How efficient are Fortran's (90+) intrinsic (math) functions? I especially care about tanh and sech but am interested in the other Fortran intrinsic functions as well.
By "how efficient" I mean that if it is very hard to come up with a faster method then the intrinsics are efficient but if it is very easy to come up with a faster method then the intrinsics are inefficient.
Here is a MWE, in which my change to try to make it faster actually made it slower, suggesting the intrinsics are efficient.
program main
implicit none
integer, parameter :: n = 10000000
integer :: i
real :: x, var
real :: t1,t2,t3,t4
!! Intrinsic first
call cpu_time(t1)
do i = 1, n
x = REAL(i)/300.0
var = tanh(x)
end do
call cpu_time(t2)
write(*,*) "Elapsed CPU Time = ", t2 - t1
write(*,*) var
!! Intrinsic w/ small change
call cpu_time(t3)
do i = 1, n
x = REAL(i)/300.0
if (x > 10.0) then
var = 1.0
else
var = tanh(x)
end if
end do
call cpu_time(t4)
write(*,*) "Elapsed CPU Time = ", t4 - t3
write(*,*) var
end program main
Note that Fortran90 seems to be lazy; if I don't include the "write(,) var" then it says elapsed CPU time = 0.0
The following function is to find a number n which 1^3 + 2^3 + ... + (n-1) ^3 + n^3 = m. Is there any chance this function can be optimized for speed?
findNb :: Integer -> Integer
findNb m = findNb' 1 0
where findNb' n m' =
if m' == m then n - 1
else if m' < m then findNb' (n + 1) (m' + n^3)
else -1
I know there is a faster solution by using a math formula.
The reason I'm asking is that the similar implementation in JavaScript / C# seems far more faster than in Haskell. I'm just curious if it can be optimized. Thanks.
EDIT1: Add more evidences on the rum time
Haskell Version:
With main = print (findNb2 152000000000000000000000):
Compile with -O2 and profiling: ghc -o testo2.exe -O2 -prof -fprof-auto -rtsopts pileofcube.hs. Here is total time from profiling report:
total time = 0.19 secs (190 milliseconds) (190 ticks # 1000 us, 1 processor)
Compile with -O2 but no profiling: ghc -o testo22.exe -O2 pileofcube.hs. Run it with Measure-Command {./testo22.exe} in powershell. The result is:
Milliseconds : 157
JavaScript Version:
Code:
function findNb(m) {
let n = 0;
let sum = 0;
while (sum < m) {
n++;
sum += Math.pow(n, 3);
}
return sum === m ? n : -1;
}
var d1 = new Date();
findNb(152000000000000000000000);
console.log(new Date() - d1);
Result: 45 milliseconds running in Chrome on the same machine
EDIT2: Add C# Version
As #Berji and #Bakuriu commented, comparing to the JavaScript version above is not fair as it uses double-precision floating point numbers underlying and could not give the correct answer even. So I implemented it in C#, here is the code and result:
static void Main(string[] args)
{
BigInteger m = BigInteger.Parse("152000000000000000000000");
var s = new Stopwatch();
s.Start();
long n = 0;
BigInteger sum = 0;
while (sum < m)
{
n++;
sum += BigInteger.Pow(n, 3);
}
Console.WriteLine(sum == m ? n : -1);
s.Stop();
Console.WriteLine($"Escaped Time: {s.ElapsedMilliseconds} milliseconds.");
}
Result: Escaped Time: 457 milliseconds.
Conclusion
Haskell version is faster than C# one...
I was wrong at start because I didn't realized JavaScript use double-precision floating point numbers under the hood due to my poor JavaScript knowledge.
At this point seems the question does not make sense anymore...
Haskell too can use Double to get the wrong answer in less time:
% time ./so
./so 0.03s user 0.00s system 95% cpu 0.038 total
And Javascript too can get the correct result via npm-installing big-integer and using bigInt everywhere instead of Double:
% node so.js
^C
node so.js 35.62s user 0.30s system 93% cpu 38.259 total
... or maybe it isn't as trivial as that.
EDIT : I realized afterward that's not what the author of the question wanted. I'll keep it there as a in case someone wants to know the formula in question, but otherwise please disregard.
There is indeed a formula that lets you compute this in constant time (rather than n iterations). Since I couldn't remember the exact formula from school, I did a bit of searching, and here is is: https://proofwiki.org/wiki/Sum_of_Sequence_of_Cubes.
In haskell code, that would translate to
findNb n = n ^ 2 * (n + 1) ^ 2 / 4
which I believe should be much faster.
Not sure if this wording of that algorithm is faster, but try this?
findNb :: Integer -> Integer
findNb m = length $ takeWhile (<=m) $ scanl1 (+) [n^3 | n <- [1..]]
(This has different semantics in the undefined case, though.)
This question already has answers here:
How to "debug" Haskell with printfs?
(6 answers)
Closed 7 years ago.
I'm playing codewars to sharpen my Haskell skills, and running into a problem that I haven't had in imperative languages.
Let's say I'm writing a function foo() in javascript, which takes an int, adds two, squares it, subtracts one, and returns the square root of that number.
var foo = function(n) {
n += 2;
n = n * n;
n -= 1;
n = Math.sqrt(n);
}
I want to check on the state of the data being processed in the function at various points to help me troubleshoot/revise/debug code, so I will insert console.log() statements whenever I want to see where I'm at. For example, am I, in fact, squaring the sum of n+2 correctly halfway through the function? Let's see...
var foo = function(n) {
n += 2;
n = n * n;
console.log("n = " + n);
n -= 1;
n = Math.sqrt(n);
}
While this example should be simple enough for a Haskeller to write in one line, if you have a complex function and want to check the state at different points, how do Haskellers do it? Is there a standard practice using the IO() monad? Do they get around it some other way?
GHCi has a fancy debugger that lets you step through your code and evaluate it line by line, checking it's state and intermediary results.
But, of course, there is also the printf-style debugging that you are asking for, using the trace function from Debug.Trace. Nothing wrong with using that for small scripts imho, but it's generally discouraged.
trace has the type String -> a -> a, so you pass a string that gets printed (via unsafePerformIO) and any argument that gets simply returned.
In your example we could use it as follows. This is your function translated to Haskell:
foo x = sqrt $ (x+2)**2 - 1
Now we can just add trace and the string we want to see, e.g. "Debug Me: " ++ (show ((x+2)**2)). First import Debug.Trace, though:
import Debug.Trace
foo x = sqrt $ (trace ("Debug Me: " ++ (show ((x+2)**2))) ((x+2)**2)) - 1
A bit ugly? Following David Young's comment below, we better use traceShowId :: Show a => a -> a, if what we want to output is identical to the intermediary result (converted to String, of course):
import Debug.Trace
foo x = sqrt $ (traceShowId ((x+2)**2)) - 1
See here for a summary of debugging options.
So, I want to know if making the code more easy to read slows performance in Matlab.
function V = example(t, I)
a = 10;
b = 20;
c = 0.5;
V = zeros(1, length(t));
V(1) = 0;
delta_t = t(2) - t(1);
for i=1:length(t)-1
V(i+1) = V(i) + delta_t*feval(#V_prime,a,b,c,t(i));
end;
So, this function is just an example of a Euler method. The idea is that I name constant variables, a, b, c and define a function of the derivative. This basically makes the code easier to read. What I want to know is if declaring a,b,c slows down my code. Also, for performance improvement, would be better to put the equation of the derivative (V_prime) directly on the equation instead of calling it?
Following this mindset the code would look something like this.
function V = example(t, I)
V = zeros(1, length(t));
V(1) = 0;
delta_t = t(2) - t(1);
for i=1:length(t)-1
V(i+1) = V(i) + delta_t*(((10 + t(i)*3)/20)+0.5);
Also from what I've read, Matlab performs better when the code is vectorized, would that be the case in my code?
EDIT:
So, here is my actual code that I am working on:
function [V, u] = Izhikevich_CA1_Imp(t, I_amp, t_inj)
vr = -61.8; % resting potential (mV)
vt = -57.0; % threshold potential (mV)
c = -65.8; % reset membrane potential (mV)
vpeak = 22.6; % membrane voltage cutoff
khigh = 3.3; % nS/mV
klow = 0.1; % nS/mV
C = 115; % Membrane capacitance (pA)
a = 0.0012; % 1/ms
b = 3; % nS
d = 10; % pA
V = zeros(1, length(t));
V(1) = vr; u = 0; % initial values
span = length(t)-1;
delta_t = t(2) - t(1);
for i=1:span
if (V(i) <= vt)
k = klow;
else
k = khigh;
end;
if ((t(i) >= t_inj(1)) && (t(i) <= t_inj(2)))
I_inj = I_amp;
else I_inj = 0;
end;
V(i+1) = V(i) + delta_t*((k*(V(i)-vr)*(V(i)-vt)-u(i)+I_inj)/C);
u(i+1) = u(i) + delta_t*(a*(b*(V(i)-vr)-u(i)));
if (V(i+1) >= vpeak)
V(i+1) = c;
V(i) = vpeak;
u(i+1) = u(i+1) + d;
end;
end;
plot(t,V);
Since I didn't have any training in Matlab (learned by trying and failing), I have my C mindset of programming, and for what I understand, Matlab code should be vectorized.
Eventually I will start working with bigger functions, so performance will be a concern. Now my goal is to vectorize this code.
Usually it is faster.
Especially if you replace looped function calls (like plot()), you will see a significant increase in performance.
In one of my past projects, I had to optimize a program. This one was made using regular program rules (for, while, etc.). Using vectorization, I reached a 10 times increase in performance, which is quite notable..
I would suggest using vectorisation instead of loops most of the time.
On matlab you should basically forget the mindset coming from low-level C programming.
In my experience the first rule for achieving performance in matlab is to avoid loops and use built-in vectorized functions as much as possible. In general, you should try to avoid direct access to array elements like array(i).
Implementing your own ODE solver inevitably leads to very slow execution because in this case there is really no way to avoid the aforementioned things, even if your implementation is per se fine (like in your case). I strongly advise to rely on matlab's ode solvers which are highly optimized blocks of compiled code and much faster than any interpreted matlab code you can write.
In my opinion this goes along with readability of the code as well, at least for the trivial reason that you get a shorter code... but I guess it is also a matter of personal taste.