Algorithms - Calculating Sums, variance, and efficieny - algorithm

I am in a class for Computer algorithms, I'm having trouble and the teacher is trying to help me. I am doing some problems in the book, but I just can't wrap my head around them.
I know how to do the notation that wolphra-alpha uses for these.
So for instance I have this: "Computer the following sum: Sum_(i=1)^n for 1/i(i+1)"
I am completely bewildered as to what I have to do to calculate such sum, I've had it explained to me a few times already, so the most detailed and coherent explanation would be greatly appreciated.
Another problem asks to Find and Compare the # of divisions/multiplications/additions and subtractions that are required for computing variance for the following formulas:
Formula 1: (Sum_(i=1)^n for (x-y)^2)/(n-1) where y = (Sum_(i=1)^n for x)/n
Formula 2: ((Sum_(i=1)^n for x^2) - ((Sum_(i=1)^n for x)/n)) / (n-1)
I am still lost entirely as to what I need to do.
I appreciate any and all help, even links to good material to read. I've read my textbook and quite a bit online.

For Sum_(i=1)^n for 1/i(i+1), the underscore _ is what goes under the ∑ symbol, and ^ is for what goes above it. In other words,
n
∑ 1/(i(i+1))
i=1
I'm assuming you mean i(i+1) to be in the denominator. This sum is equal to
1/(1(1+1)) + 1/(2(2+1)) + 1/(3(3+1)) + ... + 1/(n(n+1))
Make sure you can see the pattern here. Notice that this only depends on n, and not anything else (like i).
In C/C++, this would look like
double f(int n)
{
double sum = 0;
for( int i=1 ; i<=n ; ++i ) sum += 1.0/(i*(i+1));
return sum;
}
Hope that helps.

Related

DTW algorithm: simple implementation - Verification

I have tried to make a simple implementation of the DTW algorithm in C,without using any substantial optimization techniques. I am trying to use this implementation for some simple sketch recognition, which is to say finding the k closest neighbors of a given sketch from within a set. I have gotten some results that seem weird to me and I would like to know of this is because of my dtw implementation. I need someone to verify my algorithm.
As I said, I am trying to find the k closest neighbors, so the only 'optimization' I have implemented to make calculations faster is that if the minimum cost of a given line calculated is at any point greater than the maximum distance between the k sketches currently considered as the closest neighbors, I stop calculating and return +inf.
Here is the corresponding algorithm:
(returnValue totalCost) dtw(sketch1, sketch2, curMaxDist){
distMatrix = 'empty matrix of size (sketch.size) x (sketch2.size)'
totalCostMatrix = 'empty matrix of size (sketch1.size) x (sketch2.size)'
for(i = 0 to sketch1.size - 1){
for(j = 0 to sketch2.size - 1){
distMatrix[i][j] = euclidianDistance(sketch1.point[i], sketch2.point[j])
totalCostMatrix[i][j] = +inf
}
}
//I am forcing the first points of each sketch to correspond to one
// and continue applying the algorithm from the next points.
for(i = 1 to sketch1.size - 1){
curMinDist = +inf
for(j = 1 to sketch2.size - 1){
totalCostMatrix[i][j] = min(totalCostMatrix[i-1][j-1],
totalCostMatrix[i-1][j],
totalCostMatrix[i][j-1]) + distMatrix[i][j]
if(totalCostMatrix[i][j] < curMinDist)
curMinDist = totalCostMatrix[i][j]
}
if(curMinDist > curMaxDist)
return +inf
}
return totalCostMatrix[sketch1.size - 1][sketch2.size - 1]
}
I am sure there is nothind wrong with the implementation as far as the syntax, C language etc is concerned since I have checked that and I always get the expectes result. I was just wandering if there is something wrong with the reasoning behind the algorithm. I am asking because it is a really well known algorithm and a really simple implementation so maybe it is easy for someone to spot an error there.

How Random module get tested in OCaml?

OCaml has a Random module, I am wondering how it tests itself for randomness. However, i don't have a clue what exactly they are doing. I understand it tries to test for chi-square with two more dependencies tests. Here are the code for the testing part:
chi-square test
(* Return the sum of the squares of v[i0,i1[ *)
let rec sumsq v i0 i1 =
if i0 >= i1 then 0.0
else if i1 = i0 + 1 then Pervasives.float v.(i0) *. Pervasives.float v.(i0)
else sumsq v i0 ((i0+i1)/2) +. sumsq v ((i0+i1)/2) i1
;;
let chisquare g n r =
if n <= 10 * r then invalid_arg "chisquare";
let f = Array.make r 0 in
for i = 1 to n do
let t = g r in
f.(t) <- f.(t) + 1
done;
let t = sumsq f 0 r
and r = Pervasives.float r
and n = Pervasives.float n in
let sr = 2.0 *. sqrt r in
(r -. sr, (r *. t /. n) -. n, r +. sr)
;;
Q1:, why they write sum of squares like that?
It seems it is just summing up all squares. Why not write like:
let rec sumsq v i0 i1 =
if i0 >= i1 then 0.0
else Pervasives.float v.(i0) *. Pervasives.float v.(i0) + (sumsq v (i0+1) i1)
Q2:, why they seem to use different way for chisquare?
From the chi squared test wiki, they formula is
But it seems they are using different formula, what's behind the scene?
Other two dependencies tests
(* This is to test for linear dependencies between successive random numbers.
*)
let st = ref 0;;
let init_diff r = st := int r;;
let diff r =
let x1 = !st
and x2 = int r
in
st := x2;
if x1 >= x2 then
x1 - x2
else
r + x1 - x2
;;
let st1 = ref 0
and st2 = ref 0
;;
(* This is to test for quadratic dependencies between successive random
numbers.
*)
let init_diff2 r = st1 := int r; st2 := int r;;
let diff2 r =
let x1 = !st1
and x2 = !st2
and x3 = int r
in
st1 := x2;
st2 := x3;
(x3 - x2 - x2 + x1 + 2*r) mod r
;;
Q3: I don't really know these two tests, can someone en-light me?
Q1:
It's a question of memory usage. You will notice that for large arrays, your implementation of sumsq will fail with "Stack overflow during evaluation" (on my laptop, it fails for r = 200000). This is because before adding Pervasives.float v.(i0) *. Pervasives.float v.(i0) to (sumsq v (i0+1) i1), you have to compute the latter. So it's not until you have computed the result of the last call of sumsq that you can start "going up the stack" and adding everything. Clearly, sumsq is going to be called r times in your case, so you will have to keep track of r calls.
By contrast, with their approach they only have to keep track of log(r) calls because once sumsq has been computed for half the array, you only need to the result of the corresponding call (you can forget about all the other calls that you had to do to compute that).
However, there are other ways of achieving this result and I'm not sure why they chose this one (maybe somebody will be able to tell ?). If you want to know more on the problems linked to recursion and memory, you should probably check the wikipedia article on tail-recursion. If you want to know more on the technique that they used here, you should check the wikipedia article on divide and conquer algorithms -- be careful though, because here we are talking about memory and the Wikipedia article will probably talk a lot about temporal complexity (speed).
Q2:
You should look more closely at both expressions. Here, all the E_i's are equal to n/r. If you replace this in the expression you gave, you will find the same expression that they use: (r *. t /. n) -. n. I didn't check about the values of the bounds though, but since you have a Chi-squared distribution with parameter r-minus-one-or-two degrees of freedom, and r quite large, it's not surprising to see them use this kind of confidence interval. The Wikipedia article you mentionned should help you figure out what confidence interval they use exactly fairly easily.
Good luck!
Edit: Oops, I forgot about Q3. I don't know these tests either, but I'm sure you should be able to find more about them by googling something like "linear dependency between consecutive numbers" or something. =)
Edit 2: In reply to Jackson Tale's June 29 question about the confidence interval:
They should indeed test it against the Chi-squared distribution -- or, rather, use the Chi-squared distribution to find a confidence interval. However, because of the central limit theorem, the Chi-squared distribution with k degrees of freedom converges to a normal law with mean k and variance 2k. A classical result is that the 95% confidence interval for the normal law is approximately [μ - 1.96 σ, μ + 1.96 σ], where μ is the mean and σ the standard deviation -- so that's roughly the mean ± twice the standard deviation. Here, the number of degrees of freedom is (I think) r - 1 ~ r (because r is large) so that's why I said I wasn't surprised by a confidence interval of the form [r - 2 sqrt(r), r + 2 sqrt(r)]. Nevertheless, now that I think about it I can't see why they don't use ± 2 sqrt(2 r)... But I might have missed something. And anyway, even if I was correct, since sqrt(2) > 1, they get a more stringent confidence interval, so I guess that's not really a problem. But they should document what they're doing a bit more... I mean, the tests that they're using are probably pretty standard so most likely most people reading their code will know what they're doing, but still...
Also, you should note that, as is often the case, this kind of test is not conclusive: generally, you want to show that something has some kind of effect. So you formulate two hypothesis : the null hypothesis, "there is no effect", and the alternative hypothesis, "there is an effect". Then, you show that, given your data, the probability that the null hypothesis holds is very low. So you conclude that the alternative hypothesis is (most likely) true -- i.e. that there is some kind of effect. This is conclusive. Here, what we would like to show is that the random number generator is good. So we don't want to show that the numbers it produces differ from some law, but that they conform to it. The only way to do that is to perform as many tests as possible showing that the number produced have the same property as randomly generated ones. But the only conclusion we can draw is "we were not able to find a difference between the actual data and what we would have observed, had they really been randomly generated". But this is not a lack of rigor from the OCaml developers: people always do that (eg, a lot of tests require, say, the normality. So before performing these tests, you try to find a test which would show that your variable is not normally distributed. And when you can't find any, you say "Oh well, the normality of this variable is probably sufficient for my subsequent tests to hold") -- simply because there is no other way to do it...
Anyway, I'm no statistician and the considerations above are simply my two cents, so you should be careful. For instance, I'm sure there is a better reason why they're using this particular confidence interval. I also think you should be able to figure it out if you write everything down carefully to make sure about what they're doing exactly.

finding all first consecutive prime factors and find max of that by Mathematica

Let
2|n, 3|n,..., p_i|n, p_ j|n,..., p_k|n
p_i < p_ j< ... < p_k
where all primes up to p_i divide n and
j > i+1
I want to write a code in Mathematica to find p_i and determine {2,3,5,...,p_i}.
thanks.
B = {};
n = 2^6 * 3^8 * 5^3 * 7^2 * 11 * 23 * 29;
For[i = 1, i <= k, i++,
If[Mod[n, Prime[i]] == 0, AppendTo[B, Prime[i]]
If[Mod[n, Prime[i + 1]] > 0, Break[]]]];
mep1= Max[B];
B
mep1
result is
{2,3,5,7,11}
11
I would like to write the code instead of B to get B[n], since I need to draw the graph of mep1[n] for given n.
If I understand your question and code correctly you want a list of prime factors of the integer n but only the initial part of that list which matches the initial part of the list of all prime numbers.
I'll first observe that what you've posted looks much more like C or one of its relatives than like Mathematica. In fact you don't seem to have used any of the power of Mathematica's in-built functions at all. If you want to really use Mathematica you need to start familiarising yourself with these functions; if that doesn't appeal stick to C and its ilk, it's a fairly useful programming language.
The first step I'd take is to get the prime factors of n like this:
listOfFactors = Transpose[FactorInteger[n]][[1]]
Look at the documentation for the details of what FactorInteger returns; here I'm using transposition and part to get only the list of prime factors and to drop their coefficients. You may not notice the use of the Part function, the doubled square brackets are the usual notation. Note also that I don't have Mathematica on this machine so my syntax may be a bit awry.
Next, you want only those elements of listOfFactors which match the corresponding elements in the list of all prime numbers. Do this in two steps. First, get the integers from 1 to k at which the two lists match:
matches = TakeWhile[Range[Length[listOfFactors]],(listOfFactors[[#]]==Prime[#])&]
and then
listOfFactors[[matches]]
I'll leave it to you to:
assemble these fragments into the function you want;
correct the syntactical errors I have made; and
figured out exactly what is going on in each (sub-)expression.
I make no warranty that this approach is the best approach in any general sense, but it makes much better use of Mathematica's intrinsic functionality than your own first try and will, I hope, point you towards better use of the system in future.

Algorithm to check if a number if a perfect number

I am looking for an algorithm to find if a given number is a perfect number.
The most simple that comes to my mind is :
Find all the factors of the number
Get the prime factors [except the number itself, if it is prime] and add them up to check if it is a perfect number.
Is there a better way to do this ?.
On searching, some Euclids work came up, but didnt find any good algorithm. Also this golfscript wasnt helpful: https://stackoverflow.com/questions/3472534/checking-whether-a-number-is-mathematically-a-perfect-number .
The numbers etc can be cached etc in real world usage [which I dont know where perfect nos are used :)]
However, since this is being asked in interviews, I am assuming there should be a "derivable" way of optimizing it.
Thanks !
If the input is even, see if it is of the form 2^(p-1)*(2^p-1), with p and 2^p-1 prime.
If the input is odd, return "false". :-)
See the Wikipedia page for details.
(Actually, since there are only 47 perfect numbers with fewer than 25 million digits, you might start with a simple table of those. Ask the interviewer if you can assume you are using 64-bit numbers, for instance...)
Edit: Dang, I failed the interview! :-(
In my over zealous attempt at finding tricks or heuristics to improve upon the "factorize + enumerate divisors + sum them" approach, I failed to note that being 1 modulo 9 was merely a necessary, and certainly not a sufficient condition for at number (other than 6) to be perfect...
Duh... with on average 1 in 9 even number satisfying this condition, my algorithm would sure find a few too many perfect numbers ;-).
To redeem myself, persist and maintain the suggestion of using the digital root, but only as a filter, to avoid the more expensive computation of the factor, in most cases.
[Original attempt: hall of shame]
If the number is even,<br>
compute its [digital root][1].
if the digital root is 1, the number is perfect, otherwise it isn't.
If the number is odd...
there are no shortcuts, other than...
"Not perfect" if the number is smaller than 10^300
For bigger values, one would then need to run the algorithm described in
the question, possibly with a few twists typically driven by heuristics
that prove that the sum of divisors will be lacking when the number
doesn't have some of the low prime factors.
My reason for suggesting the digital root trick for even numbers is that this can be computed without the help of an arbitrary length arithmetic library (like GMP). It is also much less computationally expensive than the decomposition in prime factors and/or the factorization (2^(p-1) * ((2^p)-1)). Therefore if the interviewer were to be satisfied with a "No perfect" response for odd numbers, the solution would be both very efficient and codable in most computer languages.
[Second and third attempt...]
If the number is even,<br>
if it is 6
The number is PERFECT
otherwise compute its [digital root][1].
if the digital root is _not_ 1
The number is NOT PERFECT
else ...,
Compute the prime factors
Enumerate the divisors, sum them
if the sum of these divisor equals the 2 * the number
it is PERFECT
else
it is NOT PERFECT
If the number is odd...
same as previously
On this relatively odd interview question...
I second andrewdski's comment to another response in this post, that this particular question is rather odd in the context of an interview for a general purpose developer. As with many interview questions, it can be that the interviewer isn't seeking a particular solution, but rather is providing an opportunity for the candidate to demonstrate his/her ability to articulate the general pros and cons of various approaches. Also, if the candidate is offered an opportunity to look-up generic resources such as MathWorld or Wikipedia prior to responding, this may also be a good test of his/her ability to quickly make sense of the info offered there.
Here's a quick algorithm just for fun, in PHP - using just a simple for loop. You can easliy port that to other languages:
function isPerfectNumber($num) {
$out = false;
if($num%2 == 0) {
$divisors = array(1);
for($i=2; $i<$num; $i++) {
if($num%$i == 0)
$divisors[] = $i;
}
if(array_sum($divisors) == $num)
$out = true;
}
return $out ? 'It\'s perfect!' : 'Not a perfect number.';
}
Hope this helps, not sure if this is what you're looking for.
#include<stdio.h>
#include<stdlib.h>
int sumOfFactors(int );
int main(){
int x, start, end;
printf("Enter start of the range:\n");
scanf("%d", &start);
printf("Enter end of the range:\n");
scanf("%d", &end);
for(x = start;x <= end;x++){
if(x == sumOfFactors(x)){
printf("The numbers %d is a perfect number\n", x);
}
}
return 0;
}
int sumOfFactors(int x){
int sum = 1, i, j;
for(j=2;j <= x/2;j++){
if(x % j == 0)
sum += j;
}
return sum;
}

Does a range of integers contain at least one perfect square?

Given two integers a and b, is there an efficient way to test whether there is another integer n such that a ≤ n2 < b?
I do not need to know n, only whether at least one such n exists or not, so I hope to avoid computing square roots of any numbers in the interval.
Although testing whether an individual integer is a perfect square is faster than computing the square root, the range may be large and I would also prefer to avoid performing this test for every number within the range.
Examples:
intervalContainsSquare(2, 3) => false
intervalContainsSquare(5, 9) => false (note: 9 is outside this interval)
intervalContainsSquare(9, 9) => false (this interval is empty)
intervalContainsSquare(4, 9) => true (4 is inside this interval)
intervalContainsSquare(5, 16) => true (9 is inside this interval)
intervalContainsSquare(1, 10) => true (1, 4 and 9 are all inside this interval)
Computing whether or not a number is a square isn't really faster than computing its square root in hard cases, as far as I know. What is true is that you can do a precomputation to know that it isn't a square, which might save you time on average.
Likewise for this problem, you can do a precomputation to determine that sqrt(b)-sqrt(a) >= 1, which then means that a and b are far enough apart that there must be a square between them. With some algebra, this inequality is equivalent to the condition that (b-a-1)^2 >= 4*a, or if you want it in a more symmetric form, that (a-b)^2+1 >= 2*(a+b). So this precomputation can be done with no square roots, only with one integer product and some additions and subtractions.
If a and b are almost exactly the same, then you can still use the trick of looking at low order binary digits as a precomputation to know that there isn't a square between them. But they have to be so close together that this precomputation might not be worth it.
If these precomputations are inconclusive, then I can't think of anything other than everyone else's solution, a <= ceil(sqrt(a))^2 < b.
Since there was a question of doing the algebra right:
sqrt(b)-sqrt(a) >= 1
sqrt(b) >= 1+sqrt(a)
b >= 1+2*sqrt(a)+a
b-a-1 >= 2*sqrt(a)
(b-a-1)^2 >= 4*a
Also: Generally when a is a large number, you would compute sqrt(a) with Newton's method, or with a lookup table followed by a few Newton's method steps. It is faster in principle to compute ceil(sqrt(a)) than sqrt(a), because the floating point arithmetic can be simplified to integer arithmetic, and because you don't need as many Newton's method steps to nail down high precision that you're just going to throw away. But in practice, a numerical library function can be much faster if it uses square roots implemented in microcode. If for whatever reason you don't have that microcode to help you, then it might be worth it to hand-code ceil(sqrt(a)). Maybe the most interesting case would be if a and b are unbounded integers (like, a thousand digits). But for ordinary-sized integers on an ordinary non-obsolete computer, you can't beat the FPU.
Get the square root of the lower number. If this is an integer then you are done.
Otherwise round up and square the number. If this is less than b then it is true.
You only need to compute one square root this way.
In order to avoid a problem of when a is equal to b, you should check that first. As this case is always false.
If you will accept calculating two square roots, because of its monotonicity you have this inequality which is equivalent to your starting one:
sqrt(a) <= n < sqrt(b)
thus, if floor(sqrt(a)) != floor(sqrt(b)), floor(sqrt(b)) - 1 is guaranteed to be such an n.
get the square root of the lower number and round it up
get the square root of the higher number and round it down
if 1 is lower or equal 2, there will be a perfect square
Find the integral part of sqrt(a) and sqrt(b), say sa and sb.
If sa2 = a, then output yes.
If sb2 = b and sa = sb-1, then output no.
If sa < sb output yes.
Else output no.
You can optimize the above to get rid of the computation of sqrt(b) (similar to JDunkerly's answer).
Or did you want to avoid computing square roots of a and b too?
You can avoid computing square roots completely by using a method similar to binary search.
You start with a guess for n, n = 1 and compute n2
Consider if a <= n < b, you can stop.
If n < a < b, you double your guess n.
if a < b < n, you make it close to average of current + previous guess.
This will be O(logb) time.
In addition to JDunkerley's nice solution (+1), there could be a possible improvement that needs to be tested and uses integer square roots to calculate integer square roots
Why are you hoping to avoid square roots entirely? Even before you get to the most efficient way of solving this, you have seen methods that call for only 2 square roots. That's done in O(1) time, so it seems to me that any improvement you could hope to make would take more time to think about than it would EVER save you computing time. Am I wrong?
One way is to use Newton's method to find the integer square root for b. Then you can check if that number falls in the range. I doubt that it is faster than simply calling the square root function, but it is certainly more interesting:
int main( int argc, char* argv[] )
{
int a, b;
double xk=0, xk1;
int root;
int iter=0;
a = atoi( argv[1] );
b = atoi( argv[2] );
xk1 = b / 32 + 1; // +1 to ensure > 0
xk1 = b;
while( fabs( xk1 - xk ) >= .5 ) {
xk = xk1;
xk1 = ( xk + b / xk ) / 2.;
printf( "%d) xk = %f\n", ++iter, xk1 );
}
root = (int)xk1;
// If b is a perfect square, then this finds that root, so it also
// needs to check if (n-1)^2 falls in the range.
// And this does a lot more multiplications than it needs
if ( root*root >= a && root*root < b ||
(root-1)*(root-1) >= a && (root-1)*(root-1) < b )
printf( "Contains perfect square\n" );
else
printf( "Does not contain perfect square\n" );
return 1;
}

Resources