Scala performance on primes algorithm - performance

I'm quite new on Scala and so in order to start writing some code I've implemented this simple program:
package org.primes.sim
object Primes {
def is_prime(a: Int): Boolean = {
val l = Stream.range(3, a, 2) filter { e => a % e == 0}
l.size == 0
}
def gen_primes(m: Int) =
2 #:: Stream.from(3, 2) filter { e => is_prime(e) } take m
def primes(m : Int) = {
gen_primes(m) foreach println
}
def main(args: Array[String]) {
if (args.size == 0)
primes(10)
else
primes(args(0).toInt)
}
}
It generates n primes starting from 2. Then I've implemented the same algorithm in C++11 using range-v3 library of Eric Nibler.This is the code:
#include <iostream>
#include <vector>
#include <string>
#include <range/v3/all.hpp>
using namespace std;
using namespace ranges;
inline bool is_even(unsigned int n) { return n % 2 == 0; }
inline bool is_prime(unsigned int n)
{
if (n == 2)
return true;
else if (n == 1 || is_even(n))
return false;
else
return ranges::any_of(
view::iota(3, n) | view::remove_if(is_even),
[n](unsigned int e) { return n % e == 0; }
) == false;
}
void primes(unsigned int n)
{
auto rng = view::ints(2) | view::filter(is_prime);
ranges::for_each(view::take(rng, n), [](unsigned int e){ cout << e << '\n'; });
}
int main(int argc, char* argv[])
{
if (argc == 1)
primes(100);
else if (argc > 1)
{
primes(std::stoi(argv[1]));
}
}
As you can see the code looks very similar but the performance are very different:
For n = 5000, C++ completes in 0,265s instead Scala completes in 24,314s!!!
So, from this test, Scala seems 100x slower than C++11.
Which is the problem on Scala code? Could you give me some hints for a better usage of scalac?
Note: I've compiled the C++ code using gcc 4.9.2 and -O3 opt.
Thanks

The main speed problem lies with your is_prime implementation.
First of all, you filter a Stream to find all divisors, and then check if there were none (l.size == 0). But it's faster to return false as soon as the first divisor is found:
def is_prime(a: Int): Boolean =
Stream.range(3, a, 2).find(a % _ == 0).isEmpty
This decreased runtime from 22 seconds to 5 seconds for primes(5000) on my machine.
The second problem is Stream itself. Scala Streams are slow, and using them for simple number calculations is a huge overkill. Replacing Stream with Range decreased runtime further to 1,2 seconds:
def is_prime(a: Int): Boolean =
3.until(a, 2).find(a % _ == 0).isEmpty
That's decent: 5x slower than C++. Usually, I'd stop here, but it is possible to decrease running-time a bit more if we remove the higher-order function find.
While nice-looking and functional, find also induces some overhead. Loop implementation (basically replacing find with foreach) further decreased runtime to 0,45 seconds, which is less than 2x slower than C++ (that's already on the order of JVM overhead):
def is_prime(a: Int): Boolean = {
for (e <- 3.until(a, 2)) if (a % e == 0) return false
true
}
There's another Stream in gen_primes, so doing something with it may improve the run time more, but in my opinion that's not necessary. At that point in performance improvement, I think it would be better to switch to some other algorithm of generating primes: e.g., using only primes, instead of all odd numbers, to look for divisors, or using Sieve of Eratosthenes.
All in all, functional abstractions in Scala are implemented with actual objects on the heap, which have some overhead, and JIT compiler can't fix everything. But the selling point of C++ is zero-cost abstractions: everything that is possible is expanded during compilation through templates, constexpr and further aggressively optimized by the compiler.

Related

Why is iterating over a vector of integers slower in Rust than in Python, C# and C++?

I'm learning Rust right now and I'm using this simple Sieve of Erathostenes implementation:
fn get_primes(known_primes: &Vec<i64>, start: i64, stop: i64) -> Vec<i64> {
let mut new_primes = Vec::new();
for number in start..stop {
let mut is_prime = true;
let limit = (number as f64).sqrt() as i64;
for prime in known_primes {
if number % prime == 0 {
is_prime = false;
break;
}
if *prime > limit {
break;
}
}
if is_prime {
new_primes.push(number);
}
}
return new_primes;
}
I'm comparing it to virtually the same code (modulo syntax) in Python (with numba), C#, and C++ (gcc/clang). All of them are about 3x faster than this implementation on my machine.
I am compiling in release mode. To be exact, I've added this to my Cargo.toml, which seems to have the same effect:
[profile.dev]
opt-level = 3
I've also checked the toolchain, there is a slight (15% or so) difference between MSVC and GNU, but nothing that would explain this gap.
Am I getting something wrong here? Am I making a copy somewhere?
Is this code equivalent to the following C++ code?
vector<int> getPrimes(vector<int> &knownPrimes, int start, int stop) {
vector<int> newPrimes;
for (int number = start; number < stop; number += 1) {
bool isPrime = true;
int limit = (int)sqrt(number);
for (auto& prime : knownPrimes) {
if (number % prime == 0) {
isPrime = false;
break;
}
if (prime > limit)
break;
}
if (isPrime) {
newPrimes.push_back(number);
}
}
return newPrimes;
}
The size of a C++ int depends on target architecture, compiler options etc.. In the Rust code, you explicitly state a 64-bit integer. You may be comparing code using different underlying type sizes.

Performance of F# Array.reduce

I noticed while doing some F# experiments that if write my own reduce function for Array that it performs much better than the built in reduce. For example:
type Array with
static member inline fastReduce f (values : 'T[]) =
let mutable result = Unchecked.defaultof<'T>
for i in 0 .. values.Length-1 do
result <- f result values.[i]
result
This seems to behave identically to the built in Array.reduce but is ~2x faster for simple f
Is the built in one more flexible in some way?
By looking at the generated IL code it's easier to understand what's happening.
Using the built-in Array.reduce:
let reducer (vs : int []) : int = Array.reduce (+) vs
Gives the following equivalent C# (reverse engineered from the IL code using ILSpy)
public static int reducer(int[] vs)
{
return ArrayModule.Reduce<int>(new Program.BuiltIn.reducer#31(), vs);
}
Array.reduce looks like this:
public static T Reduce<T>(FSharpFunc<T, FSharpFunc<T, T>> reduction, T[] array)
{
if (array == null)
{
throw new ArgumentNullException("array");
}
int num = array.Length;
if (num == 0)
{
throw new ArgumentException(LanguagePrimitives.ErrorStrings.InputArrayEmptyString, "array");
}
OptimizedClosures.FSharpFunc<T, T, T> fSharpFunc = OptimizedClosures.FSharpFunc<T, T, T>.Adapt(reduction);
T t = array[0];
int num2 = 1;
int num3 = num - 1;
if (num3 >= num2)
{
do
{
t = fSharpFunc.Invoke(t, array[num2]);
num2++;
}
while (num2 != num3 + 1);
}
return t;
}
Notice that it invoking the reducer function f is a virtual call which typically the JIT:er struggles to inline.
Compare to your fastReduce function:
let reducer (vs : int []) : int = Array.fastReduce (+) vs
The reverse-engineered C# code:
public static int reducer(int[] vs)
{
int num = 0;
for (int i = 0; i < vs.Length; i++)
{
num += vs[i];
}
return num;
}
A lot more efficient as the virtual call is now gone. It seems that in this case F# inlines both the code for fastReduce as well as (+).
There's some kind of cut-off in F# as more complex reducer functions won't be inlined. I am unsure on the exact details.
Hope this helps
A side-note; Unchecked.defaultOf returns null values for class types in .NET such as string. I prefer LanguagePrimitives.GenericZero.
PS. A common trick for the real performance hungry is to loop towards 0. In F# that doesn't work for for-expressions because of a slight performance bug in how for-expressions are generated. In those case you can try to implement the loop using tail-recursion.

Optimized way to find if a number is a perfect square

I had a question in my assignment to find whether a number was perfect square or not:
Perfect square is an element of algebraic structure which is equal to
the square of another element.
For example: 4, 9, 16 etc.
What my friends did is, if n is the number, they looped n - 1 times calculating n * n:
// just a general gist
int is_square = 0;
for (int i = 2; i < n; i++)
{
if ((i * i) == n)
{
std::cout << "Yes , it is";
is_square = 1;
break;
}
}
if (is_square == 0)
{
std::cout << "No, it is not";
}
I came up with a solution as shown below:
if (ceil(sqrt(n)) == floor(sqrt(n)))
{
std::cout << "Yes , it is";
}
else
{
std::cout << "no , it is not";
}
And it works properly.
Can it be called as more optimized solution than others?
The tried and true remains:
double sqrt(double x); // from lib
bool is_sqr(long n) {
long root = sqrt(n);
return root * root == n;
}
You would need to know the complexity of the sqrt(x) function on your computer to compare it against other methods. By the way, you are calling sqrt(n) twice ; consider storing it into a variable (even if the compiler probably does this for you).
If using something like Newton's method, the complexity of sqrt(x) is in O(M(d)) where M(d) measures the time required to multiply two d-digits numbers. Wikipedia
Your friend's method does (n - 2) multiplications (worst case scenario), thus its complexity is like O(n * M(x)) where x is a growing number.
Your version only uses sqrt() (ceil and floor can be ignored because of their constant complexity), which makes it O(M(n)).
O(M(n)) < O(n * M(x)) - Your version is more optimized than your friend's, but is not the most efficient. Have a look at interjay's link for a better approach.
#include <iostream>
using namespace std;
void isPerfect(int n){
int ctr=0,i=1;
while(n>0){
n-=i;
i+=2;
ctr++;
}
if(!n)cout<<"\nSquare root = "<<ctr;
else cout<<"\nNot a perfect square";
}
int main() {
isPerfect(3);
isPerfect(2025);
return 0;
}
I don't know what limitations do you have but perfect square number definition is clear
Another way of saying that a (non-negative) number is a square number,
is that its square roots are again integers
in wikipedia
IF SQRT(n) == FLOOR(SQRT(n)) THEN
WRITE "Yes it is"
ELSE
WRITE "No it isn't"
examples sqrt(9) == floor(sqrt(9)), sqrt(10) == floor(sqrt(10))
I'll recommend this one
if(((unsigned long long)sqrt(Number))%1==0) // sqrt() from math library
{
//Code goes Here
}
It works because.... all Integers( & only positive integers ) are positive multiples of 1
and Hence the solution.....
You can also run a benchmark Test like;
I did with following code in MSVC 2012
#include <iostream>
#include <conio.h>
#include <time.h>
#include <math.h>
using namespace std;
void IsPerfect(unsigned long long Number);
void main()
{
clock_t Initial,Final;
unsigned long long Counter=0;
unsigned long long Start=Counter;
Start--;
cout<<Start<<endl;
Initial=clock();
for( Counter=0 ; Counter<=100000000 ; Counter++ )
{
IsPerfect( Counter );
}
Final=clock();
float Time((float)Final-(float)Initial);
cout<<"Calculations Done in "<< Time ;
getch();
}
void IsPerfect( unsigned long long Number)
{
if(ceil(sqrt(Number)) == floor(sqrt(Number)))
//if(((unsigned long long)sqrt(Number))%1==0) // sqrt() from math library
{
}
}
Your code took 13590 time units
Mine just 10049 time units
Moreover I'm using few extra steps that is Type conversion
like
(unsigned long long)sqrt(Number))
Without this it can do still better
I hope it helps .....
Have a nice day....
Your solutions is more optimized, but it may not work. Since sqrt(x) may return the true square root +/- epsilon, 3 different roots must be tested:
bool isPerfect(long x){
double k = round( sqrt(x) );
return (n==(k-1)*(k-1)) || (n==k*k) || (n==(k+1)*(k+1));
}
This is simple python code for finding perfect square r not:
import math
n=(int)(input())
giv=(str)(math.sqrt(n))
if(len(giv.split('.')[1])==1):
print ("yes")
else:
print ("No")

Get the last 1000 digits of 5^1234566789893943

I saw the following interview question on some online forum. What is a good solution for this?
Get the last 1000 digits of 5^1234566789893943
Simple algorithm:
1. Maintain a 1000-digits array which will have the answer at the end
2. Implement a multiplication routine like you do in school. It is O(d^2).
3. Use modular exponentiation by squaring.
Iterative exponentiation:
array ans;
int a = 5;
while (p > 0) {
if (p&1) {
ans = multiply(ans, a)
}
p = p>>1;
ans = multiply(ans, ans);
}
multiply: multiplies two large number using the school method and return last 1000 digits.
Time complexity: O(d^2*logp) where d is number of last digits needed and p is power.
A typical solution for this problem would be to use modular arithmetic and exponentiation by squaring to compute the remainder of 5^1234566789893943 when divided by 10^1000. However in your case this will still not be good enough as it would take about 1000*log(1234566789893943) operations and this is not too much, but I will propose a more general approach that would work for greater values of the exponent.
You will have to use a bit more complicated number theory. You can use Euler's theorem to get the remainder of 5^1234566789893943 modulo 2^1000 a lot more efficiently. Denote that r. It is also obvious that 5^1234566789893943 is divisible by 5^1000.
After that you need to find a number d such that 5^1000*d = r(modulo 2^1000). To solve this equation you should compute 5^1000(modulo 2^1000). After that all that is left is to do division modulo 2^1000. Using again Euler's theorem this can be done efficiently. Use that x^(phi(2^1000)-1)*x =1(modulo 2^1000). This approach is way faster and is the only feasible solution.
The key phrase is "modular exponentiation". Python has that built in:
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> help(pow)
Help on built-in function pow in module builtins:
pow(...)
pow(x, y[, z]) -> number
With two arguments, equivalent to x**y. With three arguments,
equivalent to (x**y) % z, but may be more efficient (e.g. for ints).
>>> digits = pow(5, 1234566789893943, 10**1000)
>>> len(str(digits))
1000
>>> digits
4750414775792952522204114184342722049638880929773624902773914715850189808476532716372371599198399541490535712666678457047950561228398126854813955228082149950029586996237166535637925022587538404245894713557782868186911348163750456080173694616157985752707395420982029720018418176528050046735160132510039430638924070731480858515227638960577060664844432475135181968277088315958312427313480771984874517274455070808286089278055166204573155093723933924226458522505574738359787477768274598805619392248788499020057331479403377350096157635924457653815121544961705226996087472416473967901157340721436252325091988301798899201640961322478421979046764449146045325215261829432737214561242087559734390139448919027470137649372264607375942527202021229200886927993079738795532281264345533044058574930108964976191133834748071751521214092905298139886778347051165211279789776682686753139533912795298973229094197221087871530034608077419911440782714084922725088980350599242632517985214513078773279630695469677448272705078125
>>>
The technique we need to know is exponentiation by squaring and modulus. We also need to use BigInteger in Java.
Simple code in Java:
BigInteger m = //BigInteger of 10^1000
BigInteger pow(BigInteger a, long b) {
if (b == 0) {
return BigInteger.ONE;
}
BigInteger val = pow(a, b/2);
if (b % 2 == 0)
return (val.multiply(val)).mod(m);
else
return (val.multiply(val).multiply(a)).mod(m);
}
In Java, the function modPow has done it all for you (thank Java).
Use congruence and apply modular arithmetic.
Square and multiply algorithm.
If you divide any number in base 10 by 10 then the remainder represents
the last digit. i.e. 23422222=2342222*10+2
So we know:
5=5(mod 10)
5^2=25=5(mod 10)
5^4=(5^2)*(5^2)=5*5=5(mod 10)
5^8=(5^4)*(5^4)=5*5=5(mod 10)
... and keep going until you get to that exponent
OR, you can realize that as we keep going you keep getting 5 as your remainder.
Convert the number to a string.
Loop on the string, starting at the last index up to 1000.
Then reverse the result string.
I posted a solution based on some hints here.
#include <vector>
#include <iostream>
using namespace std;
vector<char> multiplyArrays(const vector<char> &data1, const vector<char> &data2, int k) {
int sz1 = data1.size();
int sz2 = data2.size();
vector<char> result(sz1+sz2,0);
for(int i=sz1-1; i>=0; --i) {
char carry = 0;
for(int j=sz2-1; j>=0; --j) {
char value = data1[i] * data2[j]+result[i+j+1]+carry;
carry = value/10;
result[i+j+1] = value % 10;
}
result[i]=carry;
}
if(sz1+sz2>k){
vector<char> lastKElements(result.begin()+(sz1+sz2-k), result.end());
return lastKElements;
}
else
return result;
}
vector<char> calculate(unsigned long m, unsigned long n, int k) {
if(n == 0) {
return vector<char>(1, 1);
} else if(n % 2) { // odd number
vector<char> tmp(1, m);
vector<char> result1 = calculate(m, n-1, k);
return multiplyArrays(result1, tmp, k);
} else {
vector<char> result1 = calculate(m, n/2, k);
return multiplyArrays(result1, result1, k);
}
}
int main(int argc, char const *argv[]){
vector<char> v=calculate(5,8,1000);
for(auto c : v){
cout<<static_cast<unsigned>(c);
}
}
I don't know if Windows can show a big number (Or if my computer is fast enough to show it) But I guess you COULD use this code like and algorithm:
ulong x = 5; //There are a lot of libraries for other languages like C/C++ that support super big numbers. In this case I'm using C#'s default `Uint64` number.
for(ulong i=1; i<1234566789893943; i++)
{
x = x * x; //I will make the multiplication raise power over here
}
string term = x.ToString(); //Store the number to a string. I remember strings can store up to 1 billion characters.
char[] number = term.ToCharArray(); //Array of all the digits
int tmp=0;
while(number[tmp]!='.') //This will search for the period.
tmp++;
tmp++; //After finding the period, I will start storing 1000 digits from this index of the char array
string thousandDigits = ""; //Here I will store the digits.
for (int i = tmp; i <= 1000+tmp; i++)
{
thousandDigits += number[i]; //Storing digits
}
Using this as a reference, I guess if you want to try getting the LAST 1000 characters of this array, change to this in the for of the above code:
string thousandDigits = "";
for (int i = 0; i > 1000; i++)
{
thousandDigits += number[number.Length-i]; //Reverse array... ¿?
}
As I don't work with super super looooong numbers, I don't know if my computer can get those, I tried the code and it works but when I try to show the result in console it just leave the pointer flickering xD Guess it's still working. Don't have a pro Processor. Try it if you want :P

Is there some way to speed up recursion by remembering child nodes?

For example,
Look at the code that calculates the n-th Fibonacci number:
fib(int n)
{
if(n==0 || n==1)
return 1;
return fib(n-1) + fib(n-2);
}
The problem with this code is that it will generate stack overflow error for any number greater than 15 (in most computers).
Assume that we are calculating fib(10). In this process, say fib(5) is calculated a lot of times. Is there some way to store this in memory for fast retrieval and thereby increase the speed of recursion?
I am looking for a generic technique that can be used in almost all problems.
Yes your insight is correct.
This is called dynamic programming. It is usually a common memory runtime trade-off.
In the case of fibo, you don't even need to cache everything :
[edit]
The author of the question seems to be looking for a general method to cache rather than a method to compute Fibonacci. Search wikipedia or look at the code of the other poster to get this answer. Those answers are linear in time and memory.
**Here is a linear-time algorithm O(n), constant in memory **
in OCaml:
let rec fibo n =
let rec aux = fun
| 0 -> (1,1)
| n -> let (cur, prec) = aux (n-1) in (cur+prec, cur)
let (cur,prec) = aux n in prec;;
in C++:
int fibo(int n) {
if (n == 0 ) return 1;
if (n == 1 ) return 1;
int p = fibo(0);
int c = fibo(1);
int buff = 0;
for (int i=1; i < n; ++i) {
buff = c;
c = p+c;
p = buff;
};
return c;
};
This perform in linear time. But log is actually possible !!!
Roo's program is linear too, but way slower, and use memory.
Here is the log algorithm O(log(n))
Now for the log-time algorithm (way way way faster), here is a method :
If you know u(n), u(n-1), computing u(n+1), u(n) can be done by applying a matrix:
| u(n+1) | = | 1 1 | | u(n) |
| u(n) | | 1 0 | | u(n-1) |
So that you have :
| u(n) | = | 1 1 |^(n-1) | u(1) | = | 1 1 |^(n-1) | 1 |
| u(n-1) | | 1 0 | | u(0) | | 1 0 | | 1 |
Computing the exponential of the matrix has a logarithmic complexity.
Just implement recursively the idea :
M^(0) = Id
M^(2p+1) = (M^2p) * M
M^(2p) = (M^p) * (M^p) // of course don't compute M^p twice here.
You can also just diagonalize it (not to difficult), you will find the gold number and its conjugate in its eigenvalue, and the result will give you an EXACT mathematical formula for u(n). It contains powers of those eigenvalues, so that the complexity will still be logarithmic.
Fibo is often taken as an example to illustrate Dynamic Programming, but as you see, it is not really pertinent.
#John:
I don't think it has anything to do with do with hash.
#John2:
A map is a bit general don't you think? For Fibonacci case, all the keys are contiguous so that a vector is appropriate, once again there are much faster ways to compute fibo sequence, see my code sample over there.
This is called memoization and there is a very good article about memoization Matthew Podwysocki posted these days. It uses Fibonacci to exemplify it. And shows the code in C# also. Read it here.
If you're using C#, and can use PostSharp, here's a simple memoization aspect for your code:
[Serializable]
public class MemoizeAttribute : PostSharp.Laos.OnMethodBoundaryAspect, IEqualityComparer<Object[]>
{
private Dictionary<Object[], Object> _Cache;
public MemoizeAttribute()
{
_Cache = new Dictionary<object[], object>(this);
}
public override void OnEntry(PostSharp.Laos.MethodExecutionEventArgs eventArgs)
{
Object[] arguments = eventArgs.GetReadOnlyArgumentArray();
if (_Cache.ContainsKey(arguments))
{
eventArgs.ReturnValue = _Cache[arguments];
eventArgs.FlowBehavior = FlowBehavior.Return;
}
}
public override void OnExit(MethodExecutionEventArgs eventArgs)
{
if (eventArgs.Exception != null)
return;
_Cache[eventArgs.GetReadOnlyArgumentArray()] = eventArgs.ReturnValue;
}
#region IEqualityComparer<object[]> Members
public bool Equals(object[] x, object[] y)
{
if (Object.ReferenceEquals(x, y))
return true;
if (x == null || y == null)
return false;
if (x.Length != y.Length)
return false;
for (Int32 index = 0, len = x.Length; index < len; index++)
if (Comparer.Default.Compare(x[index], y[index]) != 0)
return false;
return true;
}
public int GetHashCode(object[] obj)
{
Int32 hash = 23;
foreach (Object o in obj)
{
hash *= 37;
if (o != null)
hash += o.GetHashCode();
}
return hash;
}
#endregion
}
Here's a sample Fibonacci implementation using it:
[Memoize]
private Int32 Fibonacci(Int32 n)
{
if (n <= 1)
return 1;
else
return Fibonacci(n - 2) + Fibonacci(n - 1);
}
Quick and dirty memoization in C++:
Any recursive method type1 foo(type2 bar) { ... } is easily memoized with map<type2, type1> M.
// your original method
int fib(int n)
{
if(n==0 || n==1)
return 1;
return fib(n-1) + fib(n-2);
}
// with memoization
map<int, int> M = map<int, int>();
int fib(int n)
{
if(n==0 || n==1)
return 1;
// only compute the value for fib(n) if we haven't before
if(M.count(n) == 0)
M[n] = fib(n-1) + fib(n-2);
return M[n];
}
EDIT: #Konrad Rudolph
Konrad points out that std::map is not the fastest data structure we could use here. That's true, a vector<something> should be faster than a map<int, something> (though it might require more memory if the inputs to the recursive calls of the function were not consecutive integers like they are in this case), but maps are convenient to use generally.
According to wikipedia Fib(0) should be 0 but it does not matter.
Here is simple C# solution with for cycle:
ulong Fib(int n)
{
ulong fib = 1; // value of fib(i)
ulong fib1 = 1; // value of fib(i-1)
ulong fib2 = 0; // value of fib(i-2)
for (int i = 0; i < n; i++)
{
fib = fib1 + fib2;
fib2 = fib1;
fib1 = fib;
}
return fib;
}
It is pretty common trick to convert recursion to tail recursion and then to loop. For more detail see for example this lecture (ppt).
What language is this? It doesnt overflow anything in c...
Also, you can try creating a lookup table on the heap, or use a map
caching is generally a good idea for this kind of thing. Since fibonacci numbers are constant, you can cache the result once you have calculated it. A quick c/pseudocode example
class fibstorage {
bool has-result(int n) { return fibresults.contains(n); }
int get-result(int n) { return fibresult.find(n).value; }
void add-result(int n, int v) { fibresults.add(n,v); }
map<int, int> fibresults;
}
fib(int n ) {
if(n==0 || n==1)
return 1;
if (fibstorage.has-result(n)) {
return fibstorage.get-result(n-1);
}
return ( (fibstorage.has-result(n-1) ? fibstorage.get-result(n-1) : fib(n-1) ) +
(fibstorage.has-result(n-2) ? fibstorage.get-result(n-2) : fib(n-2) )
);
}
calcfib(n) {
v = fib(n);
fibstorage.add-result(n,v);
}
This would be quite slow, as every recursion results in 3 lookups, however this should illustrate the general idea
Is this a deliberately chosen example? (eg. an extreme case you're wanting to test)
As it's currently O(1.6^n) i just want to make sure you're just looking for answers on handling the general case of this problem (caching values, etc) and not just accidentally writing poor code :D
Looking at this specific case you could have something along the lines of:
var cache = [];
function fib(n) {
if (n < 2) return 1;
if (cache.length > n) return cache[n];
var result = fib(n - 2) + fib(n - 1);
cache[n] = result;
return result;
}
Which degenerates to O(n) in the worst case :D
[Edit: * does not equal + :D ]
[Yet another edit: the Haskell version (because i'm a masochist or something)
fibs = 1:1:(zipWith (+) fibs (tail fibs))
fib n = fibs !! n
]
Try using a map, n is the key and its corresponding Fibonacci number is the value.
#Paul
Thanks for the info. I didn't know that. From the Wikipedia link you mentioned:
This technique of saving values that
have already been calculated is called
memoization
Yeah I already looked at the code (+1). :)
#ESRogs:
std::map lookup is O(log n) which makes it slow here. Better use a vector.
vector<unsigned int> fib_cache;
fib_cache.push_back(1);
fib_cache.push_back(1);
unsigned int fib(unsigned int n) {
if (fib_cache.size() <= n)
fib_cache.push_back(fib(n - 1) + fib(n - 2));
return fib_cache[n];
}
Others have answered your question well and accurately - you're looking for memoization.
Programming languages with tail call optimization (mostly functional languages) can do certain cases of recursion without stack overflow. It doesn't directly apply to your definition of Fibonacci, though there are tricks..
The phrasing of your question made me think of an interesting idea.. Avoiding stack overflow of a pure recursive function by only storing a subset of the stack frames, and rebuilding when necessary.. Only really useful in a few cases. If your algorithm only conditionally relies on the context as opposed to the return, and/or you're optimizing for memory not speed.
Mathematica has a particularly slick way to do memoization, relying on the fact that hashes and function calls use the same syntax:
fib[0] = 1;
fib[1] = 1;
fib[n_] := fib[n] = fib[n-1] + fib[n-2]
That's it. It caches (memoizes) fib[0] and fib[1] off the bat and caches the rest as needed. The rules for pattern-matching function calls are such that it always uses a more specific definition before a more general definition.
One more excellent resource for C# programmers for recursion, partials, currying, memoization, and their ilk, is Wes Dyer's blog, though he hasn't posted in awhile. He explains memoization well, with solid code examples here:
http://blogs.msdn.com/wesdyer/archive/2007/01/26/function-memoization.aspx
The problem with this code is that it will generate stack overflow error for any number greater than 15 (in most computers).
Really? What computer are you using? It's taking a long time at 44, but the stack is not overflowing. In fact, your going to get a value bigger than an integer can hold (~4 billion unsigned, ~2 billion signed) before the stack is going to over flow (Fibbonaci(46)).
This would work for what you want to do though (runs wiked fast)
class Program
{
public static readonly Dictionary<int,int> Items = new Dictionary<int,int>();
static void Main(string[] args)
{
Console.WriteLine(Fibbonacci(46).ToString());
Console.ReadLine();
}
public static int Fibbonacci(int number)
{
if (number == 1 || number == 0)
{
return 1;
}
var minus2 = number - 2;
var minus1 = number - 1;
if (!Items.ContainsKey(minus2))
{
Items.Add(minus2, Fibbonacci(minus2));
}
if (!Items.ContainsKey(minus1))
{
Items.Add(minus1, Fibbonacci(minus1));
}
return (Items[minus2] + Items[minus1]);
}
}
If you're using a language with first-class functions like Scheme, you can add memoization without changing the initial algorithm:
(define (memoize fn)
(letrec ((get (lambda (query) '(#f)))
(set (lambda (query value)
(let ((old-get get))
(set! get (lambda (q)
(if (equal? q query)
(cons #t value)
(old-get q))))))))
(lambda args
(let ((val (get args)))
(if (car val)
(cdr val)
(let ((ret (apply fn args)))
(set args ret)
ret))))))
(define fib (memoize (lambda (x)
(if (< x 2) x
(+ (fib (- x 1)) (fib (- x 2)))))))
The first block provides a memoization facility and the second block is the fibonacci sequence using that facility. This now has an O(n) runtime (as opposed to O(2^n) for the algorithm without memoization).
Note: the memoization facility provided uses a chain of closures to look for previous invocations. At worst case this can be O(n). In this case, however, the desired values are always at the top of the chain, ensuring O(1) lookup.
As other posters have indicated, memoization is a standard way to trade memory for speed, here is some pseudo code to implement memoization for any function (provided the function has no side effects):
Initial function code:
function (parameters)
body (with recursive calls to calculate result)
return result
This should be transformed to
function (parameters)
key = serialized parameters to string
if (cache[key] does not exist) {
body (with recursive calls to calculate result)
cache[key] = result
}
return cache[key]
By the way Perl has a memoize module that does this for any function in your code that you specify.
# Compute Fibonacci numbers
sub fib {
my $n = shift;
return $n if $n < 2;
fib($n-1) + fib($n-2);
}
In order to memoize this function all you do is start your program with
use Memoize;
memoize('fib');
# Rest of the fib function just like the original version.
# Now fib is automagically much faster ;-)
#lassevk:
This is awesome, exactly what I had been thinking about in my head after reading about memoization in Higher Order Perl. Two things which I think would be useful additions:
An optional parameter to specify a static or member method that is used for generating the key to the cache.
An optional way to change the cache object so that you could use a disk or database backed cache.
Not sure how to do this sort of thing with Attributes (or if they are even possible with this sort of implementation) but I plan to try and figure out.
(Off topic: I was trying to post this as a comment, but I didn't realize that comments have such a short allowed length so this doesn't really fit as an 'answer')

Resources