SCIP infeasibility detection with a MINLP

SCIP infeasibility detection with a MINLP - integer-programming

I'm using SCIPAMPL to solve mixed integer nonlinear programming problems (MINLPs). For the most part it's been working well, but I found an instance where the solver detects infeasibility erroneously.
set K default {};
var x integer >= 0;
var y integer >= 0;
var z;
var v1{K} binary;
param yk{K} integer default 0;
param M := 300;
param eps := 0.5;
minimize upperobjf:
16*x^2 + 9*y^2;
subject to
ll1: 4*x + y <= 50;
ul1: -4*x + y <= 0;
vf1{k in K}: z + eps <= (x + yk[k] - 20)^4 + M*(1 - v1[k]);
vf2: z >= (x + y - 20)^4;
aux1{k in K}: -(4*x + yk[k] - 50) <= M*v1[k] - eps;
# fix1: x = 4;
# fix2: y = 12;
let K := {1,2,3,4,5,6,7,8,9,10,11};
for {k in K} let yk[k] := k - 1;
solve;
display x,y,z,v1;
The solver is detecting infeasibility at the presolve phase. However, if you uncomment the two constraints that fix x and y to 4 and 12, the solver works and outputs the correct v and z values.
I'm curious about why this might be happening and whether I can formulate the problem in a different way to avoid it. One suggestion I got was that infeasibility detection is usually not very good with non-convex problems.
Edit: I should mention that this isn't just a SCIP issue. SCIP just hits the issue with this particular set K. If for instance I use bonmin, another global MINLP solver, I can solve the problem for this particular K, but if you expand K to go up to 15, then bonmin detects infeasibility when the problem remains feasible. For that K, I'm yet to find a solver that actually works. I've also tried minlp solvers based on FILTER. I'm yet to try BARON since it only takes GAMS input.

There are very good remarks about modeling issues regarding, e.g., big-M constraints in the comments to your original question. Numerical issues can indeed cause troubles, especially when nonlinear constraints are present.
Depending on how deep you would like to dive into that matter, I see 3 options for you:
You can decrease numeric precision by tuning the parameters numerics/feastol, numerics/epsilon, and numerics/lpfeastol. You can save the following lines in a file "scip.set" and save it to the working directory from where you call scipampl:
# absolute values smaller than this are considered zero
# [type: real, range: [1e-20,0.001], default: 1e-09]
numerics/epsilon = 1e-07
# absolute values of sums smaller than this are considered zero
# [type: real, range: [1e-17,0.001], default: 1e-06]
numerics/sumepsilon = 1e-05
# feasibility tolerance for constraints
# [type: real, range: [1e-17,0.001], default: 1e-06]
numerics/feastol = 1e-05
# primal feasibility tolerance of LP solver
# [type: real, range: [1e-17,0.001], default: 1e-06]
numerics/lpfeastol = 1e-05
You can now test different numerical precisions within scipampl by modifying the file scip.set
Save the solution you obtain by fixing your x and y-variables. If you pass this solution to the model without fixings, you get a message what caused the infeasibility. Usually, you will get a message that some variable bound or constraint is violated slightly outside a tolerance.
If you want to know precisely through which presolver a solution becomes infeasible, or if the former approach does not show any violation, SCIP offers the functionality to read in a debug solution; Specify the solution file "debug.sol" by uncommenting the line in src/scip/debug.h
/* #define SCIP_DEBUG_SOLUTION "debug.sol" */
and recompile SCIP and SCIPAmpl by using
make DBG=true
SCIP checks the debug-solution against every presolving reduction and outputs the presolver which causes the trouble.
I hope this is useful for you.

Looking deeper into this instance, SCIP seems to do something wrong in presolve.
In cons_nonlinear.c:7816 (function consPresolNonlinear), remove the line
if( nrounds == 0 )
so that SCIPexprgraphPropagateVarBounds is executed in any case.
That seems to fix the issue.

Related

Understanding a FastICA implementation

I'm trying to implement FastICA (independent component analysis) for blind signal separation of images, but first I thought I'd take a look at some examples from Github that produce good results. I'm trying to compare the main loop from the algorithm's steps on Wikipedia's FastICA and I'm having quite a bit of difficulty seeing how they're actually the same.
They look very similar, but there's a few differences that I don't understand. It looks like this implementation is similar to (or the same as) the "Multiple component extraction" version from Wiki.
Would someone please help me understand what's going on in the four or so lines having to do with the nonlinearity function with its first and second derivatives, and the first line of updating the weight vector? Any help is greatly appreciated!
Here's the implementation with the variables changed to mirror the Wiki more closely:
% X is sized (NxM, 3x50K) mixed image data matrix (one row for each mixed image)
C=3; % number of components to separate
W=zeros(numofIC,VariableNum); % weights matrix
for p=1:C
% initialize random weight vector of length N
wp = rand(C,1);
wp = wp / norm(wp);
% like do:
i = 1;
maxIterations = 100;
while i <= maxIterations+1
% until mat iterations
if i == maxIterations
fprintf('No convergence: ', p,maxIterations);
break;
end
wp_old = wp;
% this is the main part of the algorithm and where
% I'm confused about the particular implementation
u = 1;
t = X'*b;
g = t.^3;
dg = 3*t.^2;
wp = ((1-u)*t'*g*wp+u*X*g)/M-mean(dg)*wp;
% 2nd and 3rd wp update steps make sense to me
wp = wp-W*W'*wp;
wp = wp / norm(wp);
% or until w_p converges
if abs(abs(b'*bOld)-1)<1e-10
W(:,p)=b;
break;
end
i=i+1;
end
end
And the Wiki algorithms for quick reference:

First, I don't understand why the term that is always zero remains in the code:
wp = ((1-u)*t'*g*wp+u*X*g)/M-mean(dg)*wp;
The above can be simplified into:
wp = X*g/M-mean(dg)*wp;
Also removing u since it is always 1.
Second, I believe the following line is wrong:
t = X'*b;
The correct expression is:
t = X'*wp;
Now let's go through each variable here. Let's refer to
w = E{Xg(wTX)T} - E{g'(wTX)}w
as the iteration equation.
X is your input data, i.e. X in the iteration equation.
wp is the weight vector, i.e. w in the iteration equation. Its initial value is randomised.
g is the first derivative of a nonquadratic nonlinear function, i.e. g(wTX) in the iteration equation
dg is the first derivative of g, i.e. g'(wTX) in the iteration equation
M although its definition is not shown in the code you provide, but I think it should be the size of X.
Having the knowledge of the meaning of all variables, we can now try to understand the codes.
t = X'*b;
The above line computes wTX.
g = t.^3;
The above line computes g(wTX) = (wTX)3. Note that g(u) can be any equation as long as f(u), where g(u) = df(u)/du, is nonlinear and nonquadratic.
dg = 3*t.^2;
The above line computes the derivative of g.
wp = X*g/M-mean(dg)*wp;
Xg obviously calculates Xg(wTX). Xg/M calculates the average of Xg, which is equivalent to E{Xg(wTX)T}.
mean(dg) is E{g'(wTX)} and multiplies by wp or w in the equation.
Now you have what you needed for Newton-Raphson Method.

Matlab - Least Squares data fitting - Cost function with extra constraint

I am currently working on some MatLab code to fit experimental data to a sum of exponentials following a method described in this paper.
According to the paper, the data has to follow the following equation (written in pseudo-code):
y = sum(v(i)*exp(-x/tau(i)),i=1..n)
Here tau(i) is a set of n predefined constants. The number of constants also determines the size of the summation, and hence the size of v. For example, we can try to fit a sum of 100 exponentials, each with a different tau(i) to our data. However, due to the nature of the fitting and the exponential sum, we need to add another constraint to the problem, and hence to the cost function of the least-squares method used.
Normally, the cost function of the least-squares method is given by:
(y_data - sum(v(i)*exp(-x/tau(i)),i=1..n)^2
And this has to be minimized. However, to prevent over-fitting that would make the time-constant spectrum extremely noisy, the paper adds the following constraint to the cost function:
|v(i) - v(i+1)|^2
Because of this extra constraint, as far as I know, the regular algorithms, like lsqcurvefit aren't useable any longer, and I have to use fminsearch to search the minimum of my least-squares cost function with a constraint. The function that has to be minimized, according to me, is the following:
(y_data - sum(v(i)*exp(-x/tau(i)),i=1..n)^2 + sum(|v(j) - v(j+1)|^2,j=1..n-1)
My attempt to code this in MatLab is the following. Initially we define the function in a function script, then we use fminsearch to actually minimize the function and get values for v.
function res = funcost( v )
%FUNCOST Definition of the function that has to be minimised
%We define a function yvalues with 2 exponentials with known time-constants
% so we know the result that should be given by minimising.
xvalues = linspace(0,50,10000);
yvalues = 3-2*exp(-xvalues/1)-exp(-xvalues/10);
%Definition of 30 equidistant point in the logarithmic scale
terms = 30;
termsvector = [1:terms];
tau = termsvector;
for i = 1:terms
tau(i) = 10^(-1+3/terms*i);
end
%Definition of the regular function
res_1 = 3;
for i=1:terms
res_1 =res_1+ v(i).*exp(-xvalues./tau(i));
end
res_1 = res_1-yvalues;
%Added constraint
k=1;
res_2=0;
for i=1:terms-1
res_2 = res_2 + (v(i)-v(i+1))^2;
end
res=sum(res_1.*res_1) + k*res_2;
end
fminsearch(#funcost,zeros(30,1),optimset('MaxFunEvals',1000000,'MaxIter',1000000))
However, this code is giving me inaccurate results (no error, just inaccurate results), which leads me to believe I either made a mistake in the coding or in the interpretation of the added constraint for the least-squares method.

I would try to introduce the additional constrain in following way:
res_2 = max((v(1:(end-1))-v(2:end)).^2);
e.g. instead of minimizing an integrated (summed up) error, it does minmax.
You may also make this constrain stiff by
if res_2 > some_number
k = a_very_big_number;
else
k=0; % or k = a_small_number
end;

How Random module get tested in OCaml?

OCaml has a Random module, I am wondering how it tests itself for randomness. However, i don't have a clue what exactly they are doing. I understand it tries to test for chi-square with two more dependencies tests. Here are the code for the testing part:
chi-square test
(* Return the sum of the squares of v[i0,i1[ *)
let rec sumsq v i0 i1 =
if i0 >= i1 then 0.0
else if i1 = i0 + 1 then Pervasives.float v.(i0) *. Pervasives.float v.(i0)
else sumsq v i0 ((i0+i1)/2) +. sumsq v ((i0+i1)/2) i1
;;
let chisquare g n r =
if n <= 10 * r then invalid_arg "chisquare";
let f = Array.make r 0 in
for i = 1 to n do
let t = g r in
f.(t) <- f.(t) + 1
done;
let t = sumsq f 0 r
and r = Pervasives.float r
and n = Pervasives.float n in
let sr = 2.0 *. sqrt r in
(r -. sr, (r *. t /. n) -. n, r +. sr)
;;
Q1:, why they write sum of squares like that?
It seems it is just summing up all squares. Why not write like:
let rec sumsq v i0 i1 =
if i0 >= i1 then 0.0
else Pervasives.float v.(i0) *. Pervasives.float v.(i0) + (sumsq v (i0+1) i1)
Q2:, why they seem to use different way for chisquare?
From the chi squared test wiki, they formula is
But it seems they are using different formula, what's behind the scene?
Other two dependencies tests
(* This is to test for linear dependencies between successive random numbers.
*)
let st = ref 0;;
let init_diff r = st := int r;;
let diff r =
let x1 = !st
and x2 = int r
in
st := x2;
if x1 >= x2 then
x1 - x2
else
r + x1 - x2
;;
let st1 = ref 0
and st2 = ref 0
;;
(* This is to test for quadratic dependencies between successive random
numbers.
*)
let init_diff2 r = st1 := int r; st2 := int r;;
let diff2 r =
let x1 = !st1
and x2 = !st2
and x3 = int r
in
st1 := x2;
st2 := x3;
(x3 - x2 - x2 + x1 + 2*r) mod r
;;
Q3: I don't really know these two tests, can someone en-light me?

Q1:
It's a question of memory usage. You will notice that for large arrays, your implementation of sumsq will fail with "Stack overflow during evaluation" (on my laptop, it fails for r = 200000). This is because before adding Pervasives.float v.(i0) *. Pervasives.float v.(i0) to (sumsq v (i0+1) i1), you have to compute the latter. So it's not until you have computed the result of the last call of sumsq that you can start "going up the stack" and adding everything. Clearly, sumsq is going to be called r times in your case, so you will have to keep track of r calls.
By contrast, with their approach they only have to keep track of log(r) calls because once sumsq has been computed for half the array, you only need to the result of the corresponding call (you can forget about all the other calls that you had to do to compute that).
However, there are other ways of achieving this result and I'm not sure why they chose this one (maybe somebody will be able to tell ?). If you want to know more on the problems linked to recursion and memory, you should probably check the wikipedia article on tail-recursion. If you want to know more on the technique that they used here, you should check the wikipedia article on divide and conquer algorithms -- be careful though, because here we are talking about memory and the Wikipedia article will probably talk a lot about temporal complexity (speed).
Q2:
You should look more closely at both expressions. Here, all the E_i's are equal to n/r. If you replace this in the expression you gave, you will find the same expression that they use: (r *. t /. n) -. n. I didn't check about the values of the bounds though, but since you have a Chi-squared distribution with parameter r-minus-one-or-two degrees of freedom, and r quite large, it's not surprising to see them use this kind of confidence interval. The Wikipedia article you mentionned should help you figure out what confidence interval they use exactly fairly easily.
Good luck!
Edit: Oops, I forgot about Q3. I don't know these tests either, but I'm sure you should be able to find more about them by googling something like "linear dependency between consecutive numbers" or something. =)
Edit 2: In reply to Jackson Tale's June 29 question about the confidence interval:
They should indeed test it against the Chi-squared distribution -- or, rather, use the Chi-squared distribution to find a confidence interval. However, because of the central limit theorem, the Chi-squared distribution with k degrees of freedom converges to a normal law with mean k and variance 2k. A classical result is that the 95% confidence interval for the normal law is approximately [μ - 1.96 σ, μ + 1.96 σ], where μ is the mean and σ the standard deviation -- so that's roughly the mean ± twice the standard deviation. Here, the number of degrees of freedom is (I think) r - 1 ~ r (because r is large) so that's why I said I wasn't surprised by a confidence interval of the form [r - 2 sqrt(r), r + 2 sqrt(r)]. Nevertheless, now that I think about it I can't see why they don't use ± 2 sqrt(2 r)... But I might have missed something. And anyway, even if I was correct, since sqrt(2) > 1, they get a more stringent confidence interval, so I guess that's not really a problem. But they should document what they're doing a bit more... I mean, the tests that they're using are probably pretty standard so most likely most people reading their code will know what they're doing, but still...
Also, you should note that, as is often the case, this kind of test is not conclusive: generally, you want to show that something has some kind of effect. So you formulate two hypothesis : the null hypothesis, "there is no effect", and the alternative hypothesis, "there is an effect". Then, you show that, given your data, the probability that the null hypothesis holds is very low. So you conclude that the alternative hypothesis is (most likely) true -- i.e. that there is some kind of effect. This is conclusive. Here, what we would like to show is that the random number generator is good. So we don't want to show that the numbers it produces differ from some law, but that they conform to it. The only way to do that is to perform as many tests as possible showing that the number produced have the same property as randomly generated ones. But the only conclusion we can draw is "we were not able to find a difference between the actual data and what we would have observed, had they really been randomly generated". But this is not a lack of rigor from the OCaml developers: people always do that (eg, a lot of tests require, say, the normality. So before performing these tests, you try to find a test which would show that your variable is not normally distributed. And when you can't find any, you say "Oh well, the normality of this variable is probably sufficient for my subsequent tests to hold") -- simply because there is no other way to do it...
Anyway, I'm no statistician and the considerations above are simply my two cents, so you should be careful. For instance, I'm sure there is a better reason why they're using this particular confidence interval. I also think you should be able to figure it out if you write everything down carefully to make sure about what they're doing exactly.

finding the best/ scale/shift between two vectors

I have two vectors that represents a function f(x), and another vector f(ax+b) i.e. a scaled and shifted version of f(x). I would like to find the best scale and shift factors.
*best - by means of least squares error , maximum likelihood, etc.
any ideas?
for example:
f1 = [0;0.450541598502498;0.0838213779969326;0.228976968716819;0.91333736150167;0.152378018969223;0.825816977489547;0.538342435260057;0.996134716626885;0.0781755287531837;0.442678269775446;0];
f2 = [-0.029171964726699;-0.0278570165494982;0.0331454732535324;0.187656956432487;0.358856370923984;0.449974662483267;0.391341738643094;0.244800719791534;0.111797007617227;0.0721767235173722;0.0854437239807415;0.143888234591602;0.251750993723227;0.478953530572365;0.748209818420035;0.908044924557262;0.811960826711455;0.512568916956487;0.22669198638799;0.168136111568694;0.365578085161896;0.644996661336714;0.823562159983554;0.792812945867018;0.656803251999341;0.545799498053254;0.587013303815021;0.777464637372241;0.962722388208354;0.980537136457874;0.734416947254272;0.375435649393553;0.106489547770962;0.0892376361668696;0.242467741982851;0.40610516900965;0.427497319032133;0.301874099075184;0.128396341665384;0.00246347624097456;-0.0322120242872125]
*note that f(x) may be irreversible...
Thanks,
Ohad

For each f(x), take the absolute value of f(x) and normalize it such that it can be considered a probability mass function over its support. Calculate the expected value E[x] and variance of Var[x]. Then, we have that
E[a x + b] = a E[x] + b
Var[a x + b] = a^2 Var[x]
Use the above equations and the known values of E[x] and Var[x] to calculate a and b. Taking your values of f1 and f2 from your example, the following Octave script performs this procedure:
% Octave script
% f1, f2 are defined as given in your example
f1 = [zeros(length(f2) - length(f1), 1); f1];
save_f1 = f1; save_f2 = f2;
f1 = abs( f1 ); f2 = abs( f2 );
f1 = f1 ./ sum( f1 ); f2 = f2 ./ sum( f2 );
mean = #(x)sum(((1:length(x))' .* x));
var = #(x)sum((((1:length(x))'-mean(x)).^2) .* x);
m1 = mean(f1); m2 = mean(f2);
v1 = var(f1); v2 = var(f2)
a = sqrt( v2 / v1 ); b = m2 - a * m1;
plot( a .* (1:length( save_f1 )) + b, save_f1, ...
1:length( save_f2 ), save_f2 );
axis([0 length( save_f1 )];
And the output is

Here's a simple, effective, but perhaps somewhat naive approach.
First make sure you make a generic interpolator through both functions. That way you can evaluate both functions in between the given data points. I used a cubic-splines interpolator, since that seems general enough for the type of smooth functions you provided (and does not require additional toolboxes).
Then you evaluate the source function ("original") at a large number of points. Use this number also as a parameter in an inline function, that takes as input X, where
X = [a b]
(as in ax+b). For any input X, this inline function will compute
the function values of the target function at the same x-locations, but then scaled and offset by a and b, respectively.
The sum of the squared-differences between the resulting function values, and the ones of the source function you computed earlier.
Use this inline function in fminsearch with some initial estimate (one that you have obtained visually or by via automatic means). For the example you provided, I used a few random ones, which all converged to near-optimal fits.
All of the above in code:
function s = findScaleOffset
%% initialize
f2 = [0;0.450541598502498;0.0838213779969326;0.228976968716819;0.91333736150167;0.152378018969223;0.825816977489547;0.538342435260057;0.996134716626885;0.0781755287531837;0.442678269775446;0];
f1 = [-0.029171964726699;-0.0278570165494982;0.0331454732535324;0.187656956432487;0.358856370923984;0.449974662483267;0.391341738643094;0.244800719791534;0.111797007617227;0.0721767235173722;0.0854437239807415;0.143888234591602;0.251750993723227;0.478953530572365;0.748209818420035;0.908044924557262;0.811960826711455;0.512568916956487;0.22669198638799;0.168136111568694;0.365578085161896;0.644996661336714;0.823562159983554;0.792812945867018;0.656803251999341;0.545799498053254;0.587013303815021;0.777464637372241;0.962722388208354;0.980537136457874;0.734416947254272;0.375435649393553;0.106489547770962;0.0892376361668696;0.242467741982851;0.40610516900965;0.427497319032133;0.301874099075184;0.128396341665384;0.00246347624097456;-0.0322120242872125];
figure(1), clf, hold on
h(1) = subplot(2,1,1); hold on
plot(f1);
legend('Original')
h(2) = subplot(2,1,2); hold on
plot(f2);
linkaxes(h)
axis([0 max(length(f1),length(f2)), min(min(f1),min(f2)),max(max(f1),max(f2))])
%% make cubic interpolators and test points
pp1 = spline(1:numel(f1), f1);
pp2 = spline(1:numel(f2), f2);
maxX = max(numel(f1), numel(f2));
N = 100 * maxX;
x2 = linspace(1, maxX, N);
y1 = ppval(pp1, x2);
%% search for parameters
s = fminsearch(#(X) sum( (y1 - ppval(pp2,X(1)*x2+X(2))).^2 ), [0 0])
%% plot results
y2 = ppval( pp2, s(1)*x2+s(2));
figure(1), hold on
subplot(2,1,2), hold on
plot(x2,y2, 'r')
legend('before', 'after')
end
Results:
s =
2.886234493867320e-001 3.734482822175923e-001
Note that this computes the opposite transformation from the one you generated the data with. Reversing the numbers:
>> 1/s(1)
ans =
3.464721948700991e+000 % seems pretty decent
>> -s(2)
ans =
-3.734482822175923e-001 % hmmm...rather different from 7/11!
(I'm not sure about the 7/11 value you provided; using the exact values you gave to make a plot results in a less accurate approximation to the source function...Are you sure about the 7/11?)
Accuracy can be improved by either
using a different optimizer (fmincon, fminunc, etc.)
demanding a higher accuracy from fminsearch through optimset
having more sample points in both f1 and f2 to improve the quality of the interpolations
Using a better initial estimate
Anyway, this approach is pretty general and gives nice results. It also requires no toolboxes.
It has one major drawback though -- the solution found may not be the global optimizer, e.g., the quality of the outcomes of this method could be quite sensitive to the initial estimate you provide. So, always make a (difference) plot to make sure the final solution is accurate, or if you have a large number of such things to do, compute some sort of quality factor upon which you decide to re-start the optimization with a different initial estimate.
It is of course very possible to use the results of the Fourier+Mellin transforms (as suggested by chaohuang below) as an initial estimate to this method. That might be overkill for the simple example you provide, but I can easily imagine situations where this could indeed be very useful.

For the scale factor a, you can estimate it by computing the ratio of the amplitude spectra of the two signals since the Fourier transform is invariant to shift.
Similarly, you can estimate the shift factor b by using the Mellin transform, which is scale invariant.

Here's a super simple approach to estimate the scale a that works on your example data:
a = length(f2) / length(f1)
This gives 3.4167 which is close to your stated value of 3.4. If that estimate is good enough, you can use correlation to estimate the shift.
I realize that this is not exactly what you asked, but it may be an acceptable alternative depending on the data.

Both Rody Oldenhuis and jstarr's answers are correct. I'm adding my own answer just to sum things up, and connect between them.
I've messed up Rody's code a little bit and ended up with the following:
function findScaleShift
load f1f2
x0 = [length(f1)/length(f2) 0]; %initial guess, can do better
n=length(f1);
costFunc = #(z) sum((eval_f1(z,f2,n)-f1).^2);
opt.TolFun = eps;
xopt=fminsearch(costFunc,x0,opt);
f1r=eval_f1(xopt,f2,n);
subplot(211);
plot(1:n,f1,1:n,f1r,'--','linewidth',5)
title(xopt);
subplot(212);
plot(1:n,(f1-f1r).^2);
title('squared error')
end
function y = eval_f1(x,f2,n)
t = maketform('affine',[x(1) 0 x(2); 0 1 0 ; 0 0 1]');
y=imtransform(f2',t,'cubic','xdata',[1 n ],'ydata',[1 1])';
end
This gives zero results:
This method is accurate but exhaustive and may take some time. Another disadvantage is that it finds only a local minima, and may give false results if initial guess (x0) is far.
On the other hand, jstarr method gave the following results:
xopt = [ 3.49655562549115 -0.676062367063033]
which is 10% deviation from the correct answer. Pretty fast solution, but not as accurate as I requested, but still should be noted.
I think in order to get the best results jstarr method should be used as an initial guess for the method purposed by Rody, giving an accurate solution.
Ohad

Calculate the cosine of a sequence

I have to calculate the following:
float2 y = CONSTANT;
for (int i = 0; i < totalN; i++)
h[i] = cos(y*i);
totalN is a large number, so I would like to make this in a more efficient way. Is there any way to improve this? I suspect there is, because, after all, we know what's the result of cos(n), for n=1..N, so maybe there's some theorem that allows me to compute this in a faster way. I would really appreciate any hint.
Thanks in advance,
Federico

Using one of the most beautiful formulas of mathematics, Euler's formula
exp(i*x) = cos(x) + i*sin(x),
substituting x := n * phi:
cos(n*phi) = Re( exp(i*n*phi) )
sin(n*phi) = Im( exp(i*n*phi) )
exp(i*n*phi) = exp(i*phi) ^ n
Power ^n is n repeated multiplications.
Therefore you can calculate cos(n*phi) and simultaneously sin(n*phi) by repeated complex multiplication by exp(i*phi) starting with (1+i*0).
Code examples:
Python:
from math import *
DEG2RAD = pi/180.0 # conversion factor degrees --> radians
phi = 10*DEG2RAD # constant e.g. 10 degrees
c = cos(phi)+1j*sin(phi) # = exp(1j*phi)
h=1+0j
for i in range(1,10):
h = h*c
print "%d %8.3f"%(i,h.real)
or C:
#include <stdio.h>
#include <math.h>
// numer of values to calculate:
#define N 10
// conversion factor degrees --> radians:
#define DEG2RAD (3.14159265/180.0)
// e.g. constant is 10 degrees:
#define PHI (10*DEG2RAD)
typedef struct
{
double re,im;
} complex_t;
int main(int argc, char **argv)
{
complex_t c;
complex_t h[N];
int index;
c.re=cos(PHI);
c.im=sin(PHI);
h[0].re=1.0;
h[0].im=0.0;
for(index=1; index<N; index++)
{
// complex multiplication h[index] = h[index-1] * c;
h[index].re=h[index-1].re*c.re - h[index-1].im*c.im;
h[index].im=h[index-1].re*c.im + h[index-1].im*c.re;
printf("%d: %8.3f\n",index,h[index].re);
}
}

I'm not sure what kind of accuracy vs. performance compromises you're willing to make, but there are extensive discussions of various sinusoid approximation techniques at these links:
Fun with Sinusoids - http://www.audiomulch.com/~rossb/code/sinusoids/
Fast and accurate sine/cosine - http://www.devmaster.net/forums/showthread.php?t=5784
Edit (I think this is the "Don Cross" link that's broken on the "Fun with Sinusoids" page):
Optimizing Trig Calculations - http://groovit.disjunkt.com/analog/time-domain/fasttrig.html

Maybe the simplest formula is
cos(n+y) = 2cos(n)cos(y) - cos(n-y).
If you precompute the constant 2*cos(y) then each value cos(n+y) can be computed from the previous 2 values with one single multiplication and one subtraction.
I.e., in pseudocode
h[0] = 1.0
h[1] = cos(y)
m = 2*h[1]
for (int i = 2; i < totalN; ++i)
h[i] = m*h[i-1] - h[i-2]

Here's a method, but it uses a little bit of memory for the sin. It uses the trig identities:
cos(a + b) = cos(a)cos(b)-sin(a)sin(b)
sin(a + b) = sin(a)cos(b)+cos(a)sin(b)
Then here's the code:
h[0] = 1.0;
double g1 = sin(y);
double glast = g1;
h[1] = cos(y);
for (int i = 2; i < totalN; i++){
h[i] = h[i-1]*h[1]-glast*g1;
glast = glast*h[1]+h[i-1]*g1;
}
If I didn't make any errors then that should do it. Of course there could be round-off problems so be aware of that. I implemented this in Python and it is quite accurate.

There are some good answers here but they are all recursive. Recursive calculation will not work for cosine function when using floating point arithmetic; you will invariably get rounding errors which quickly compound.
Consider calculation y = 45 degrees, totalN 10 000. You won't end up with 1 as the final result.

To address Kirk's concerns: all of the solutions based on the recurrence for cos and sin boil down to computing
x(k) = R x(k - 1),
where R is the matrix that rotates by y and x(0) is the unit vector (1, 0). If the true result for k - 1 is x'(k - 1) and the true result for k is x'(k), then the error goes from e(k - 1) = x(k - 1) - x'(k - 1) to e(k) = R x(k - 1) - R x'(k - 1) = R e(k - 1) by linearity. Since R is what's called an orthogonal matrix, R e(k - 1) has the same norm as e(k - 1), and the error grows very slowly. (The reason it grows at all is due to round-off; the computer representation of R is in general almost, but not quite orthogonal, so it will be necessary to restart the recurrence using the trig operations from time to time depending on the accuracy required. This is still much, much faster than using the trig ops to compute each value.)

You can do this using complex numbers.
if you define x = sin(y) + i cos(y), cos(y*i) will be the real part of x^i.
You can compute for all i iteratively. Complex multiply is 2 multiplies plus two adds.

Knowing cos(n) doesn't help -- your math library already does these kind of trivial things for you.
Knowing that cos((i+1)y)=cos(iy+y)=cos(iy)cos(y)-sin(iy)sin(y) can help, if you precompute cos(y) and sin(y), and keep track of both cos(iy) and sin(i*y) along the way. It may result in some loss of precision, though - you'll have to check.

How accurate do you need the resulting cos(x) to be? If you can live with some, you could create a lookup table, sampling the unit circle at 2*PI/N intervals and then interpolate between two adjacent points. N would be chosen to achieve some desired level of accuracy.
What I don't know is whether an interpolation is actually less costly than computing a cosine. Since its usually done in microcode in modern CPUs, it may not be.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio