Correct implementation of The Johnson-Lindenstrauss lemma - algorithm

I am trying to implement the Johnson-Lindenstrauss lemma. I have search for the pseudocode here but could not get any.
I don't know if I have implemented it correctly or not. I just want you guys who understand the lemma to please check my code for me and advice me as to the correct matlab implementation.
n = 2;
d = 4;
k = 2;
G = rand(n,d);
epsilon = sqrt(log(n)/k);
% Projection in dim k << d
% Defining P (k x d)
P = randn(k,d);
% Projecting down to k-dim
proj = P.*G;
u = proj(:,1);
v = proj(:,2);
% u = P * G(:,5);
% v = P * G(:,36);
norm(G(:,1)-G(:,2))^2 * k * (1-epsilon);
norm(u - v)^2;
norm(G(:,1)-G(:,2))^2 * k * (1+epsilon);

for the first part of that to find the epsilon you need to solve a polynomial equation.
n = 2;
k = 2;
pol1 = [-1/3 1/2 0 4*log2(n)/k];
c = roots(pol1)
1.4654 + 1.4304i
1.4654 - 1.4304i
-1.4308 + 0.0000i
Then you need to remove the complex roots and keep the real one:
epsilon = c(imag(c)==0);
% if there are more than one root with imaginary part equal to 0 then you need to select the smaller one.
now you know that the epsilon should be equal or greater that the result.

For any set of m points in R^N and for k = 20*logm/epsilon^2 and epsilon < 1/2:
1/sqrt(k).*randn(k,N)
obtain Pr[success]>=1-2m^(5*epsilon-3)

An R package is available to perform Random projection using Johnson Lindenstrauss Lemma RandPro

Related

Number of N-digit numbers that are divisible by given two numbers

One of my friends got this question in google coding contest. Here goes the question.
Find the number of N-digit numbers that are divisible by both X and Y.
Since the answer can be very large, print the answer modulo 10^9 + 7.
Note: 0 is not considered single-digit number.
Input: N, X, Y.
Constraints:
1 <= N <= 10000
1 <= X,Y <= 20
Eg-1 :
N = 2, X = 5, Y = 7
output : 2 (35 and 70 are the required numbers)
Eg-2 :
N = 1, X = 2, Y = 3
output : 1 (6 is the required number)
If the constraints on N were smaller, then it would be easy (ans = 10^N / LCM(X,Y) - 10^(N-1) / LCM(X,Y)).
But N is upto 1000, hence I am unable to solve it.
This question looks like it was intended to be more difficult, but I would do it pretty much the way you said:
ans = floor((10N-1)/LCM(X,Y)) - floor((10N-1-1)/LCM(X,Y))
The trick is to calculate the terms quickly.
Let M = LCM(X,Y), and say we have:
10a = Mqa + ra, and
10b = Mqb + rb
The we can easily calculate:
10a+b = M(Mqaqb + raqb + rbqa + floor(rarb/M)) + (rarb%M)
With that formula, we can calculate the quotient and remainder for 10N/M in just 2 log N steps using exponentiation by squaring: https://en.wikipedia.org/wiki/Exponentiation_by_squaring
Following python works for this question ,
import math
MOD = 1000000007
def sub(x,y):
return (x-y+MOD)%MOD
def mul(x,y):
return (x*y)%MOD
def power(x,y):
res = 1
x%=MOD
while y!=0:
if y&1 :
res = mul(res,x)
y>>=1
x = mul(x,x)
return res
def mod_inv(n):
return power(n,MOD-2)
x,y = [int(i) for i in input().split()]
m = math.lcm(x,y)
n = int(input())
a = -1
b = -1
total = 1
for i in range(n-1):
total = (total * 10)%m
b = total % m
total = (total*10)%m
a = total % m
l = power(10 , n-1)
r = power(10 , n)
ans = sub( sub(r , l) , sub(a,b) )
ans = mul(ans , mod_inv(m))
print(ans)
Approach for this question is pretty straight forward,
let, m = lcm(x,y)
let,
10^n -1 = m*x + a
10^(n-1) -1 = m*y + b
now from above two equations it is clear that our answer is equal to
(x - y)%MOD .
so,
(x-y) = ((10^n - 10^(n-1)) - (a-b)) / m
also , a = (10^n)%m and b = (10^(n-1))%m
using simple modular arithmetic rules we can easily calculate a and b in O(n) time.
also for subtraction and division performed in the formula we can use modular subtraction and division respectively.
Note: (a/b)%MOD = ( a * (mod_inverse(b, MOD)%MOD )%MOD

How can I descale x by n/d, when x*n overflows?

My problem is limited to unsigned integers of 256 bits.
I have a value x, and I need to descale it by the ratio n / d, where n < d.
The simple solution is of course x * n / d, but the problem is that x * n may overflow.
I am looking for any arithmetic trick which may help in reaching a result as accurate as possible.
Dividing each of n and d by gcd(n, d) before calculating x * n / d does not guarantee success.
Is there any process (iterative or other) which i can use in order to solve this problem?
Note that I am willing to settle on an inaccurate solution, but I'd need to be able to estimate the error.
NOTE: Using integer division instead of normal division
Let us suppose
x = ad + b
n = cd + e
Then find a,b,c,e as follows:
a = x/d
b = x%d
c = n/d
e = n%d
Then,
nx/d = acd + ae + bc + be/d
CALCULATING be/d
1. Represent e in binary form
2. Find b/d, 2b/d, 4b/d, 8b/d, ... 256b/d and their remainders
3. Find be/d = b*binary terms + their remainders
Example:
e = 101 in binary = 4+1
be/d = (b/d + 4b/d) + (b%d + 4b%d)/d
FINDING b/d, 2b/d, ... 256b/d
quotient(2*ib/d) = 2*quotient(ib /d) + (2*remainder(ib /d))/d
remainder(2*ib/d) = (2*remainder(ib/d))%d
Executes in O(number of bits)

Speeding up program in matlab

I have 2 functions:
ccexpan - which calculates coefficients of interpolating polynomial of function f with N nodes in Chebyshew polynomial of the first kind basis.
csum - calculates value for arguments t using coefficients c from ccexpan (using Clenshaw algorithm).
This is what I have written so far:
function c = ccexpan(f,N)
z = zeros (1,N+1);
s = zeros (1,N+1);
for i = 1:(N+1)
z(i) = pi*(i-1)/N;
end
t = f(cos(z));
for k = 1:(N+1)
s(k) = sum(t.*cos(z.*(k-1)));
s(k) = s(k)-(f(1)+f(-1)*cos(pi*(k-1)))/2;
end
c = s.*2/N;
and:
function y = csum(t,c)
M = length(t);
N = length(c);
y = t;
b = zeros(1,N+2);
for k = 1:M
for i = N:-1:1
b(i) = c(i)+2*t(k)*b(i+1)-b(i+2);
end
y(k)=(b(1)-b(3))/2;
end
Unfortunately these programs are very slow, and also slightly inacurrate. Please give me some tips on how to speed them up, and how to improve accuracy.
Where possible try to get away from looping structures. At first blush, I would trade out your first for loop of
for i = 1:(N+1)
z(i) = pi*(i-1)/N;
end
and replace with
i=1:(N+1)
z = pi*(i-1)/N
I did not check the rest of you code but the above example will definitely speed up you code. And a second strategy is to combine loops when possible.
Martin,
Consider the following strategy.
% create hypothetical N and f
N = 3
f = #(x) 1./(1+15*x.*x)
% calculate z and t
i=1:(N+1)
z = pi*(i-1)/N
t = f(cos(z))
% make a column vector of k's
k = (1:(N+1))'
% do this: s(k) = sum(t.*cos(z.*(k-1)))
s1 = t.*cos(z.*(k-1)) % should be a matrix with one row for each row of k
% via implicit expansion
s2 = sum(s1,2) % row sum, i.e., one value for each row of k
% do this: s(k) = s(k)-(f(1)+f(-1)*cos(pi*(k-1)))/2
s3 = s2 - (f(1)+f(-1)*cos(pi*(k-1)))/2
% calculate c
c = s3 .* 2/N

Formula for calculating distance with decaying velocity

I have a moving graphic whose velocity decays geometrically every frame. I want to find the initial velocity that will make the graphic travel a desired distance in a given number of frames.
Using these variables:
v initial velocity
r rate
d distance
I can come up with d = v * (r0 + r1 + r2 + ...)
So if I want to find the v to travel 200 pixels in 3 frames with a decay rate of 90%, I would adapt to:
d = 200
r = .9
v = d / (r0 + r1 + r2)
That doesn't translate well to code, since I have to edit the expression if the number of frames changes. The only solution I can think of is this (in no specific language):
r = .9
numFrames = 3
d = 200
sum = 1
for (i = 1; i < numFrames; i++) {
sum = sum + power(r, i);
}
v = d / sum;
Is there a better way to do this without using a loop?
(I wouldn't be surprised if there is a mistake in there somewhere... today is just one of those days..)
What you have here is a geometric sequence. See the link:
http://www.mathsisfun.com/algebra/sequences-sums-geometric.html
To find the sum of a geometric sequence, you use this formula:
sum = a * ((1 - r^n) / (1 - r))
Since you are looking for a, the initial velocity, move the terms around:
a = sum * ((1-r) / (1 - r^n))
In Java:
int distanceInPixels = SOME_INTEGER;
int decayRate = SOME_DECIMAl;
int numberOfFrames = SOME_INTEGER;
int initialVelocity; //this is what we need to find
initialVelocity = distanceinPixel * ((1-decayRate) / (1-Math.pow(decayRate, NumberOfFrames)));
Using this formula you can get any one of the four variables if you know the values of the other three. Enjoy!
According to http://mikestoolbox.com/powersum.html, you should be able to reduce your for loop to:
F(x) = (x^n - 1)/(x-1)

Algorithm to find closest integer values that meet certain criteria

Edited to clarify the application by adding units (ml) and explaining the difficulty to measure wet reagents by units of 1/26. The word 'solution' was ambiguous because it was used to mean both a chemical solution as well as the solution to the problem.
Added results based on Edward's reply
The real world application is that I am trying to determine the closest "convenient" volumes to use when mixing reagents A and B to create a solution (in the wet chemistry sense) that best approximates a specific A:B ratio. Let's define "convenient" as divisible by 5.
Example
Given:
1. X = A/(A+B) * C
2. Y = B/(A+B) * C
3. X + Y = C
4. A, B, C always positive integer
// e.g. a 500ml solution (wet chemistry sense) C with a 1:25 ratio of A and B
A = 1
B = 25
C = 500
This gives the volumes to use of X and Y to create the solution (wet chemistry sense) with the proper A:B ratio.
X = 500/26 = ~19.23ml
Y = 12500/26 = ~480.77ml
C = 13000/26 = 500ml
These are the exact volumes create a total volume of 500ml, but trying to measure reagent volumes in units of 1/26ml is a challenge.
How to find "convenient values" (integer divisible by 5) for X, Y, and C that best approximate the exact values of X, Y, and C that would be multiples of 1/26? In this case I found as the closest "convenient" values for X, Y, C:
X = 20ml
Y = 500ml
C = 520ml
C in this case (520ml) is more than the required volume of 500ml, but it is more practical to physically measure the volumes of 20mL and 500mL than it would be to measure reagent volumes in 1/26ths. The extra 20mL is discarded, the cost for using nice values.
RESULTS BASED ON EDWARD'S ANSWER
A=1 B=25 C=500
X=20 Y=500 C2=520
A=1 B=20 C=500
X=25 Y=500 C2=525
A=1 B=100 C=500
X=5 Y=500 C2=505
A=1 B=75 C=500
X=10 Y=750 C2=760
A=1 B=50 C=900
X=20 Y=1000 C2=1020
One way to approach this would be to adjust C so that it absorbs the factor A+B. Then the ratio of A to B would be exact, and X, Y, and C would all be integers. Let D = 5*(A+B), C2 = ceiling(C/((double)D)) * D (round up so you get enough C), X = C2/(A+B)*A, Y = C2/(A+B)*B. If you want the closest value of C, use C2 = round(C/((double)D))*D instead.
If you're mixing chemicals, you probably want to round up rather than round to closest so you'll have enough with a little waste left over, which is better than not having enough.
You can phrase this as an optimization problem with an L1 (absolute value) objective function. (This is using a cannon to swat a mosquito, but I did it because I wanted to figure out about the L1 optimization.) I used the program glpsol from the GLPK package (open source). Here is my program:
param A, integer, >= 0;
param B, integer, >= 0;
param C, integer, >= 0;
var x, integer, >= 0;
var y, integer, >= 0;
var e1x, >= 0;
var e1y, >= 0;
minimize e1 : e1x + e1y;
subject to
c1 : (5*x - (C*A)/(A + B)) <= e1x;
c2 : ((C*A)/(A + B) - 5*x) <= e1x;
c3 : (5*y - (C*B)/(A + B)) <= e1y;
c4 : ((C*B)/(A + B) - 5*y) <= e1y;
solve;
printf "x=%g, y=%g, error=%g\n", x, y, e1;
data;
param A := 1;
param B := 25;
param C := 500;
Here is the output:
$ glpsol --model find_nice_integers.mod
[... snip ...]
x=4, y=96, error=1.53846
Here are some notes about how to handle absolute values in optimization problems.
So, you are given an integer number C and the ratio p:q between two other integer numbers A and B (i.e., A/B = p/q).
I will interpret your definition of convenient as requiring that X and Y are both multiple of 5 where
X = A / (A+B) * C'
Y = B / (A+B) * C'
C' is close to C
Replacing A/B with p/q we get
X = p / (p+q) * C'
Y = q / (p+q) * C'
Now, in order for X and Y to be integer both p * C' and q * C' must both be multiples of (p+q). And since we can assume that p:q is irreductible (i.e., p and q have no multiples in common) this means that C' must be divisible by p+q. In addition, C'/(p+q) must be multiple of 5. So, C' must be a multiple of 5*(p+q).
The multiple of 5*(p+q) that is closest to C is:
C' := round(C/(5*(p+q)))*5*(p+q)
Now we can calculate:
X := p/(p+q)*C'
Y := q/(p+q)*C'
and they are indeed multiple of 5 because C'/(p+q) is.
Let's see how this behaves with your example:
Inputs:
p = 1
q = 25
C = 500
Then
C' := round(500/5(1+25))*5*(1+25) = round(100/26)*5*26 = 4*5*26 = 520
Hence
X := p/(p+q)*C' = 1/(1+25)*4*5*26 = 1/26*4*5*26 = 4*5 = 20
Y := q/(p+q)*C' = 25/(1+25)*4*5*26 = 25/26*4*5*26 = 25*4*5 = 500.
Voila!
Let's first calculate optimal(float) A and B.
It could be Observed that optimal integer solutions are either {floor(A), ceiling(B)} or {ceiling(A), floor(B)}. So we simply try both and chose the answer with less error.

Resources