euclidean distance C#

euclidean distance C# - for-loop

I'm working on a C# project to find the euclidean distance between 2 points.
Right now I have 2 for loops like this:
for (int i = 0; i < pregcount; i ++)
{
double dist_minima = double.MaxValue;
for (int j = 0; j < poiscount; j ++)
{
double distancia_cuadratica = Math.Pow(pois[j, 0] - preg[i, 0], 2) + Math.Pow(pois[j, 1] - preg[i, 1], 2) + Math.Pow(pois[j, 2] - preg[i, 2], 2);
}
}
preg and pois are array matrix of n elements (preg is 250 elements and pois is 900,000 elements
Is there a way to make this faster? like a function in C# or a library that would just calculate the distance more quickly?
I take almost 2 minutes to complete the whole thing. The calculation inside the second loop is what takes all the time.

For starters, don't use Math.Pow.
Find x = pois[j, 0] - preg[i, 0]; etc.
Then distancia_cuadratica = x * x + y * y + z * z;
You also don't appear to be just finding the distance between two points - you appear to want to find the CLOSEST point from preg to your pois values. Is this correct? It may have an impact on the total algorithm we'll suggest to you.
Other than that, try to avoid declaring variables inside a tight loop, if you can. Prefer ++j to j++.
Do you really need double accuracy? If not, you may be able to switch to float and save some speed.
.Net languages are arguably not ideal for calculations like this. C++ may be a better choice. If you use C++ for instance there are vectorizing intrinsics you can use to do operations like this much more quickly. I used to use Intel Performance Primitives. Now you can use things like CUDA to do these calculations on the GPU.
You could try something new and experimental like CUDA Sharp, if you want to.

Related

Algorithm for downsampling array of intervals

I have a sorted array of N intervals of different length. I am plotting these intervals with alternating colors blue/green.
I am trying to find a method or algorithm to "downsample" the array of intervals to produce a visually similar plot, but with less elements.
Ideally I could write some function where I can pass the target number of output intervals as an argument. The output length only has to come close to the target.
input = [
[0, 5, "blue"],
[5, 6, "green"],
[6, 10, "blue"],
// ...etc
]
output = downsample(input, 25)
// [[0, 10, "blue"], ... ]
Below is a picture of what I am trying to accomplish. In this example the input has about 250 intervals, and the output about ~25 intervals. The input length can vary a lot.

Update 1:
Below is my original post which I initially deleted, because there were issues with displaying the equations and also I wasn't very confident if it really makes sense. But later, I figured that the optimisation problem that I described can be actually solved efficiently with DP (Dynamic programming).
So I did a sample C++ implementation. Here are some results:
Here is a live demo that you can play with in your browser (make sure browser support WebGL2, like Chrome or Firefox). It takes a bit to load the page.
Here is the C++ implementation: link
Update 2:
Turns out the proposed solution has the following nice property - we can easily control the importance of the two parts F1 and F2 of the cost function. Simply change the cost function to F(α)=F1 + αF2, where α >= 1.0 is a free parameter. The DP algorithm remains the same.
Here are some result for different α values using the same number of intervals N:
Live demo (WebGL2 required)
As can be seen, higher α means it is more important to cover the original input intervals even if this means covering more of the background in-between.
Original post
Even-though some good algorithms have already been proposed, I would like to propose a slightly unusual approach - interpreting the task as an optimisation problem. Although, I don't know how to efficiently solve the optimisation problem (or even if it can be solved in reasonable time at all), it might be useful to someone purely as a concept.
First, without loss of generality, lets declare the blue color to be background. We will be painting N green intervals on top of it (N is the number provided to the downsample() function in OP's description). The ith interval is defined by its starting coordinate 0 <= xi < xmax and width wi >= 0 (xmax is the maximum coordinate from the input).
Lets also define the array G(x) to be the number of green cells in the interval [0, x) in the input data. This array can easily be pre-calculated. We will use it to quickly calculate the number of green cells in arbitrary interval [x, y) - namely: G(y) - G(x).
We can now introduce the first part of the cost function for our optimisation problem:
The smaller F1 is, the better our generated intervals cover the input intervals, so we will be searching for xi, wi that minimise it. Ideally we want F1=0 which would mean that the intervals do not cover any of the background (which of course is not possible because N is less than the input intervals).
However, this function is not enough to describe the problem, because obviously we can minimise it by taking empty intervals: F1(x, 0)=0. Instead, we want to cover as much as possible from the input intervals. Lets introduce the second part of the cost function which corresponds to this requirement:
The smaller F2 is, the more input intervals are covered. Ideally we want F2=0 which would mean that we covered all of the input rectangles. However, minimising F2 competes with minimising F1.
Finally, we can state our optimisation problem: find xi, wi that minimize F=F1 + F2
How to solve this problem? Not sure. Maybe use some metaheuristic approach for global optimisation such as Simulated annealing or Differential evolution. These are typically easy to implement, especially for this simple cost function.
Best case would be to exist some kind of DP algorithm for solving it efficiently, but unlikely.

I would advise you to use Haar wavelet. That is a very simple algorithm which was often used to provide the functionality of progressive loading for big images on websites.
Here you can see how it works with 2D function. That is what you can use. Alas, the document is in Ukrainian, but code in C++, so readable:)
This document provides an example of 3D object:
Pseudocode on how to compress with Haar wavelet you can find in Wavelets for Computer Graphics: A Primer Part 1y.

You could do the following:
Write out the points that divide the whole strip into intervals as the array [a[0], a[1], a[2], ..., a[n-1]]. In your example, the array would be [0, 5, 6, 10, ... ].
Calculate double-interval lengths a[2]-a[0], a[3]-a[1], a[4]-a[2], ..., a[n-1]-a[n-3] and find the least of them. Let it be a[k+2]-a[k]. If there are two or more equal lengths having the lowest value, choose one of them randomly. In your example, you should get the array [6, 5, ... ] and search for the minimum value through it.
Swap the intervals (a[k], a[k+1]) and (a[k+1], a[k+2]). Basically, you need to assign a[k+1]=a[k]+a[k+2]-a[k+1] to keep the lengths, and to remove the points a[k] and a[k+2] from the array after that because two pairs of intervals of the same color are now merged into two larger intervals. Thus, the numbers of blue and green intervals decreases by one each after this step.
If you're satisfied with the current number of intervals, end the process, otherwise go to the step 1.
You performed the step 2 in order to decrease "color shift" because, at the step 3, the left interval is moved a[k+2]-a[k+1] to the right and the right interval is moved a[k+1]-a[k] to the left. The sum of these distances, a[k+2]-a[k] can be considered a measure of change you're introducing into the whole picture.
Main advantages of this approach:
It is simple.
It doesn't give a preference to any of the two colors. You don't need to assign one of the colors to be the background and the other to be the painting color. The picture can be considered both as "green-on-blue" and "blue-on-green". This reflects quite common use case when two colors just describe two opposite states (like the bit 0/1, "yes/no" answer) of some process extended in time or in space.
It always keeps the balance between colors, i.e. the sum of intervals of each color remains the same during the reduction process. Thus the total brightness of the picture doesn't change. It is important as this total brightness can be considered an "indicator of completeness" at some cases.

Here's another attempt at dynamic programming that's slightly different than Georgi Gerganov's, although the idea to try and formulate a dynamic program may have been inspired by his answer. Neither the implementation nor the concept is guaranteed to be sound but I did include a code sketch with a visual example :)
The search space in this case is not reliant on the total unit width but rather on the number of intervals. It's O(N * n^2) time and O(N * n) space, where N and n are the target and given number of (green) intervals, respectively, because we assume that any newly chosen green interval must be bound by two green intervals (rather than extend arbitrarily into the background).
The idea also utilises the prefix sum idea used to calculate runs with a majority element. We add 1 when we see the target element (in this case green) and subtract 1 for others (that algorithm is also amenable to multiple elements with parallel prefix sum tracking). (I'm not sure that restricting candidate intervals to sections with a majority of the target colour is always warranted but it may be a useful heuristic depending on the desired outcome. It's also adjustable -- we can easily adjust it to check for a different part than 1/2.)
Where Georgi Gerganov's program seeks to minimise, this dynamic program seeks to maximise two ratios. Let h(i, k) represent the best sequence of green intervals up to the ith given interval, utilising k intervals, where each is allowed to stretch back to the left edge of some previous green interval. We speculate that
h(i, k) = max(r + C*r1 + h(i-l, k-1))
where, in the current candidate interval, r is the ratio of green to the length of the stretch, and r1 is the ratio of green to the total given green. r1 is multiplied by an adjustable constant to give more weight to the volume of green covered. l is the length of the stretch.
JavaScript code (for debugging, it includes some extra variables and log lines):
function rnd(n, d=2){
let m = Math.pow(10,d)
return Math.round(m*n) / m;
}
function f(A, N, C){
let ps = [[0,0]];
let psBG = [0];
let totalG = 0;
A.unshift([0,0]);
for (let i=1; i<A.length; i++){
let [l,r,c] = A[i];
if (c == 'g'){
totalG += r - l;
let prevI = ps[ps.length-1][1];
let d = l - A[prevI][1];
let prevS = ps[ps.length-1][0];
ps.push(
[prevS - d, i, 'l'],
[prevS - d + r - l, i, 'r']
);
psBG[i] = psBG[i-1];
} else {
psBG[i] = psBG[i-1] + r - l;
}
}
//console.log(JSON.stringify(A));
//console.log('');
//console.log(JSON.stringify(ps));
//console.log('');
//console.log(JSON.stringify(psBG));
let m = new Array(N + 1);
m[0] = new Array((ps.length >> 1) + 1);
for (let i=0; i<m[0].length; i++)
m[0][i] = [0,0];
// for each in N
for (let i=1; i<=N; i++){
m[i] = new Array((ps.length >> 1) + 1);
for (let ii=0; ii<m[0].length; ii++)
m[i][ii] = [0,0];
// for each interval
for (let j=i; j<m[0].length; j++){
m[i][j] = m[i][j-1];
for (let k=j; k>i-1; k--){
// our anchors are the right
// side of each interval, k's are the left
let jj = 2*j;
let kk = 2*k - 1;
// positive means green
// is a majority
if (ps[jj][0] - ps[kk][0] > 0){
let bg = psBG[ps[jj][1]] - psBG[ps[kk][1]];
let s = A[ps[jj][1]][1] - A[ps[kk][1]][0] - bg;
let r = s / (bg + s);
let r1 = C * s / totalG;
let candidate = r + r1 + m[i-1][j-1][0];
if (candidate > m[i][j][0]){
m[i][j] = [
candidate,
ps[kk][1] + ',' + ps[jj][1],
bg, s, r, r1,k,m[i-1][j-1][0]
];
}
}
}
}
}
/*
for (row of m)
console.log(JSON.stringify(
row.map(l => l.map(x => typeof x != 'number' ? x : rnd(x)))));
*/
let result = new Array(N);
let j = m[0].length - 1;
for (let i=N; i>0; i--){
let [_,idxs,w,x,y,z,k] = m[i][j];
let [l,r] = idxs.split(',');
result[i-1] = [A[l][0], A[r][1], 'g'];
j = k - 1;
}
return result;
}
function show(A, last){
if (last[1] != A[A.length-1])
A.push(last);
let s = '';
let j;
for (let i=A.length-1; i>=0; i--){
let [l, r, c] = A[i];
let cc = c == 'g' ? 'X' : '.';
for (let j=r-1; j>=l; j--)
s = cc + s;
if (i > 0)
for (let j=l-1; j>=A[i-1][1]; j--)
s = '.' + s
}
for (let j=A[0][0]-1; j>=0; j--)
s = '.' + s
console.log(s);
return s;
}
function g(A, N, C){
const ts = f(A, N, C);
//console.log(JSON.stringify(ts));
show(A, A[A.length-1]);
show(ts, A[A.length-1]);
}
var a = [
[0,5,'b'],
[5,9,'g'],
[9,10,'b'],
[10,15,'g'],
[15,40,'b'],
[40,41,'g'],
[41,43,'b'],
[43,44,'g'],
[44,45,'b'],
[45,46,'g'],
[46,55,'b'],
[55,65,'g'],
[65,100,'b']
];
// (input, N, C)
g(a, 2, 2);
console.log('');
g(a, 3, 2);
console.log('');
g(a, 4, 2);
console.log('');
g(a, 4, 5);

I would suggest using K-means it is an algorithm used to group data(a more detailed explanation here: https://en.wikipedia.org/wiki/K-means_clustering and here https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html)
this would be a brief explanation of how the function should look like, hope it is helpful.
from sklearn.cluster import KMeans
import numpy as np
def downsample(input, cluster = 25):
# you will need to group your labels in a nmpy array as shown bellow
# for the sake of example I will take just a random array
X = np.array([[1, 2], [1, 4], [1, 0],[4, 2], [4, 4], [4, 0]])
# n_clusters will be the same as desired output
kmeans = KMeans(n_clusters= cluster, random_state=0).fit(X)
# then you can iterate through labels that was assigned to every entr of your input
# in our case the interval
kmeans_list = [None]*cluster
for i in range(0, X.shape[0]):
kmeans_list[kmeans.labels_[i]].append(X[i])
# after that you will basicly have a list of lists and every inner list will contain all points that corespond to a
# specific label
ret = [] #return list
for label_list in kmeans_list:
left = 10001000 # a big enough number to exced anything that you will get as an input
right = -left # same here
for entry in label_list:
left = min(left, entry[0])
right = max(right, entry[1])
ret.append([left,right])
return ret

Combinations of integers in OpenCL

I have a bunch of vectors (~500). I need to find triple products of all the combinations of the vectors in OpenCL. There are plenty of combination algorithms (r out of n things) in C++ but I am yet to find any implemented for GPU. I have seen quite a few parallel permutation algorithms in Cuda but I just want to know if there are any viable combination algorithms present?

I'll need to guess a bit here and there to answer your question.
I suppose you have an array V of n (~500) vectors. These vectors are all of same dimensionality m (probably m=3).
What you want is the component wise product of each 3 vectors vi, vj, vk where i,j,k in {0,..,n-1}.
Simple 3-dimensional example:
result[idx].x = V[i].x * V[j].x * V[k].x;
result[idx].y = V[i].y * V[j].y * V[k].y;
result[idx].z = V[i].z * V[j].z * V[k].z;
Now maybe your vectors are not 3-dimensional and maybe you don't want the component wise product but the sum of it (like in dot product), but I'm sure you're able to djust the code accordingly.
The real question here is how to compute all possible i,j,k and idx. Correct?
Now with CUDA you are in a very fortunate position. You can just launch n*n*n threads in a grid and therefore get i,j,k for free without having to think about ways to compute combinations or permutations at all. Just do the following:
dim3 grid, block;
block.x = n;
block.y = 1;
block z = 1;
grid.x = n;
grid.y = n;
grid.z = 1;
compute_product_kernel<<<grid, block>>>( V, result );
This way you'll launch n*n blocks of n threads. Computing i,j,k becomes trivial, computing idx is easy:
__device__ void compute_product_kernel( myVector* V, myVector* result)
{
int i = blockIdx.x;
int j = blockIdx.y;
int k = threadIdx.x;
int idx = i * gridDim.y * blockDim.x + j * blockDim.x + k;
...
}
Of course all of this only works because your n is within the limits of CUDA's block and grid range.
Two more things though:
Maybe you want permutations instead of combinations. You could do that by skipping every combination where any two of i,j,k are the same. But I'd recommend keeping them anyway because computing when to skip is probably more expensive that doing the actual work. Also I'd advise against using the permutation to save memory for result because it would save you less that 1% and make the calculation much more complex.
Are you sure you've got enough memory to actually do this? Storing the result requires n*n*n*m*sizeof(float) bytes. With n=500 and m=3 that would already be 1.5 GB. Is that really what you are looking for? Maybe the next step of your processing can be combined into the calculation so that storing the intermediate result is not neccessary.

finding all first consecutive prime factors and find max of that by Mathematica

Let
2|n, 3|n,..., p_i|n, p_ j|n,..., p_k|n
p_i < p_ j< ... < p_k
where all primes up to p_i divide n and
j > i+1
I want to write a code in Mathematica to find p_i and determine {2,3,5,...,p_i}.
thanks.
B = {};
n = 2^6 * 3^8 * 5^3 * 7^2 * 11 * 23 * 29;
For[i = 1, i <= k, i++,
If[Mod[n, Prime[i]] == 0, AppendTo[B, Prime[i]]
If[Mod[n, Prime[i + 1]] > 0, Break[]]]];
mep1= Max[B];
B
mep1
result is
{2,3,5,7,11}
11
I would like to write the code instead of B to get B[n], since I need to draw the graph of mep1[n] for given n.

If I understand your question and code correctly you want a list of prime factors of the integer n but only the initial part of that list which matches the initial part of the list of all prime numbers.
I'll first observe that what you've posted looks much more like C or one of its relatives than like Mathematica. In fact you don't seem to have used any of the power of Mathematica's in-built functions at all. If you want to really use Mathematica you need to start familiarising yourself with these functions; if that doesn't appeal stick to C and its ilk, it's a fairly useful programming language.
The first step I'd take is to get the prime factors of n like this:
listOfFactors = Transpose[FactorInteger[n]][[1]]
Look at the documentation for the details of what FactorInteger returns; here I'm using transposition and part to get only the list of prime factors and to drop their coefficients. You may not notice the use of the Part function, the doubled square brackets are the usual notation. Note also that I don't have Mathematica on this machine so my syntax may be a bit awry.
Next, you want only those elements of listOfFactors which match the corresponding elements in the list of all prime numbers. Do this in two steps. First, get the integers from 1 to k at which the two lists match:
matches = TakeWhile[Range[Length[listOfFactors]],(listOfFactors[[#]]==Prime[#])&]
and then
listOfFactors[[matches]]
I'll leave it to you to:
assemble these fragments into the function you want;
correct the syntactical errors I have made; and
figured out exactly what is going on in each (sub-)expression.
I make no warranty that this approach is the best approach in any general sense, but it makes much better use of Mathematica's intrinsic functionality than your own first try and will, I hope, point you towards better use of the system in future.

What's a good way to add a large number of small floats together?

Say you have 100000000 32-bit floating point values in an array, and each of these floats has a value between 0.0 and 1.0. If you tried to sum them all up like this
result = 0.0;
for (i = 0; i < 100000000; i++) {
result += array[i];
}
you'd run into problems as result gets much larger than 1.0.
So what are some of the ways to more accurately perform the summation?

Sounds like you want to use Kahan Summation.
According to Wikipedia,
The Kahan summation algorithm (also known as compensated summation) significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors).
In pseudocode, the algorithm is:
function kahanSum(input)
var sum = input[1]
var c = 0.0 //A running compensation for lost low-order bits.
for i = 2 to input.length
y = input[i] - c //So far, so good: c is zero.
t = sum + y //Alas, sum is big, y small, so low-order digits of y are lost.
c = (t - sum) - y //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
sum = t //Algebraically, c should always be zero. Beware eagerly optimising compilers!
next i //Next time around, the lost low part will be added to y in a fresh attempt.
return sum

Make result a double, assuming C or C++.

If you can tolerate a little extra space (in Java):
float temp = new float[1000000];
float temp2 = new float[1000];
float sum = 0.0f;
for (i=0 ; i<1000000000 ; i++) temp[i/1000] += array[i];
for (i=0 ; i<1000000 ; i++) temp2[i/1000] += temp[i];
for (i=0 ; i<1000 ; i++) sum += temp2[i];
Standard divide-and-conquer algorithm, basically. This only works if the numbers are randomly scattered; it won't work if the first half billion numbers are 1e-12 and the second half billion are much larger.
But before doing any of that, one might just accumulate the result in a double. That'll help a lot.

If in .NET using the LINQ .Sum() extension method that exists on an IEnumerable. Then it would just be:
var result = array.Sum();

The absolutely optimal way is to use a priority queue, in the following way:
PriorityQueue<Float> q = new PriorityQueue<Float>();
for(float x : list) q.add(x);
while(q.size() > 1) q.add(q.pop() + q.pop());
return q.pop();
(this code assumes the numbers are positive; generally the queue should be ordered by absolute value)
Explanation: given a list of numbers, to add them up as precisely as possible you should strive to make the numbers close, t.i. eliminate the difference between small and big ones. That's why you want to add up the two smallest numbers, thus increasing the minimal value of the list, decreasing the difference between the minimum and maximum in the list and reducing the problem size by 1.
Unfortunately I have no idea about how this can be vectorized, considering that you're using OpenCL. But I am almost sure that it can be. You might take a look at the book on vector algorithms, it is surprising how powerful they actually are: Vector Models for Data-Parallel Computing

Calculate the cosine of a sequence

I have to calculate the following:
float2 y = CONSTANT;
for (int i = 0; i < totalN; i++)
h[i] = cos(y*i);
totalN is a large number, so I would like to make this in a more efficient way. Is there any way to improve this? I suspect there is, because, after all, we know what's the result of cos(n), for n=1..N, so maybe there's some theorem that allows me to compute this in a faster way. I would really appreciate any hint.
Thanks in advance,
Federico

Using one of the most beautiful formulas of mathematics, Euler's formula
exp(i*x) = cos(x) + i*sin(x),
substituting x := n * phi:
cos(n*phi) = Re( exp(i*n*phi) )
sin(n*phi) = Im( exp(i*n*phi) )
exp(i*n*phi) = exp(i*phi) ^ n
Power ^n is n repeated multiplications.
Therefore you can calculate cos(n*phi) and simultaneously sin(n*phi) by repeated complex multiplication by exp(i*phi) starting with (1+i*0).
Code examples:
Python:
from math import *
DEG2RAD = pi/180.0 # conversion factor degrees --> radians
phi = 10*DEG2RAD # constant e.g. 10 degrees
c = cos(phi)+1j*sin(phi) # = exp(1j*phi)
h=1+0j
for i in range(1,10):
h = h*c
print "%d %8.3f"%(i,h.real)
or C:
#include <stdio.h>
#include <math.h>
// numer of values to calculate:
#define N 10
// conversion factor degrees --> radians:
#define DEG2RAD (3.14159265/180.0)
// e.g. constant is 10 degrees:
#define PHI (10*DEG2RAD)
typedef struct
{
double re,im;
} complex_t;
int main(int argc, char **argv)
{
complex_t c;
complex_t h[N];
int index;
c.re=cos(PHI);
c.im=sin(PHI);
h[0].re=1.0;
h[0].im=0.0;
for(index=1; index<N; index++)
{
// complex multiplication h[index] = h[index-1] * c;
h[index].re=h[index-1].re*c.re - h[index-1].im*c.im;
h[index].im=h[index-1].re*c.im + h[index-1].im*c.re;
printf("%d: %8.3f\n",index,h[index].re);
}
}

I'm not sure what kind of accuracy vs. performance compromises you're willing to make, but there are extensive discussions of various sinusoid approximation techniques at these links:
Fun with Sinusoids - http://www.audiomulch.com/~rossb/code/sinusoids/
Fast and accurate sine/cosine - http://www.devmaster.net/forums/showthread.php?t=5784
Edit (I think this is the "Don Cross" link that's broken on the "Fun with Sinusoids" page):
Optimizing Trig Calculations - http://groovit.disjunkt.com/analog/time-domain/fasttrig.html

Maybe the simplest formula is
cos(n+y) = 2cos(n)cos(y) - cos(n-y).
If you precompute the constant 2*cos(y) then each value cos(n+y) can be computed from the previous 2 values with one single multiplication and one subtraction.
I.e., in pseudocode
h[0] = 1.0
h[1] = cos(y)
m = 2*h[1]
for (int i = 2; i < totalN; ++i)
h[i] = m*h[i-1] - h[i-2]

Here's a method, but it uses a little bit of memory for the sin. It uses the trig identities:
cos(a + b) = cos(a)cos(b)-sin(a)sin(b)
sin(a + b) = sin(a)cos(b)+cos(a)sin(b)
Then here's the code:
h[0] = 1.0;
double g1 = sin(y);
double glast = g1;
h[1] = cos(y);
for (int i = 2; i < totalN; i++){
h[i] = h[i-1]*h[1]-glast*g1;
glast = glast*h[1]+h[i-1]*g1;
}
If I didn't make any errors then that should do it. Of course there could be round-off problems so be aware of that. I implemented this in Python and it is quite accurate.

There are some good answers here but they are all recursive. Recursive calculation will not work for cosine function when using floating point arithmetic; you will invariably get rounding errors which quickly compound.
Consider calculation y = 45 degrees, totalN 10 000. You won't end up with 1 as the final result.

To address Kirk's concerns: all of the solutions based on the recurrence for cos and sin boil down to computing
x(k) = R x(k - 1),
where R is the matrix that rotates by y and x(0) is the unit vector (1, 0). If the true result for k - 1 is x'(k - 1) and the true result for k is x'(k), then the error goes from e(k - 1) = x(k - 1) - x'(k - 1) to e(k) = R x(k - 1) - R x'(k - 1) = R e(k - 1) by linearity. Since R is what's called an orthogonal matrix, R e(k - 1) has the same norm as e(k - 1), and the error grows very slowly. (The reason it grows at all is due to round-off; the computer representation of R is in general almost, but not quite orthogonal, so it will be necessary to restart the recurrence using the trig operations from time to time depending on the accuracy required. This is still much, much faster than using the trig ops to compute each value.)

You can do this using complex numbers.
if you define x = sin(y) + i cos(y), cos(y*i) will be the real part of x^i.
You can compute for all i iteratively. Complex multiply is 2 multiplies plus two adds.

Knowing cos(n) doesn't help -- your math library already does these kind of trivial things for you.
Knowing that cos((i+1)y)=cos(iy+y)=cos(iy)cos(y)-sin(iy)sin(y) can help, if you precompute cos(y) and sin(y), and keep track of both cos(iy) and sin(i*y) along the way. It may result in some loss of precision, though - you'll have to check.

How accurate do you need the resulting cos(x) to be? If you can live with some, you could create a lookup table, sampling the unit circle at 2*PI/N intervals and then interpolate between two adjacent points. N would be chosen to achieve some desired level of accuracy.
What I don't know is whether an interpolation is actually less costly than computing a cosine. Since its usually done in microcode in modern CPUs, it may not be.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

euclidean distance C# - for-loop

Related

Algorithm for downsampling array of intervals

Combinations of integers in OpenCL

finding all first consecutive prime factors and find max of that by Mathematica

What's a good way to add a large number of small floats together?

Calculate the cosine of a sequence

Categories

Resources