Algorithm to find closest integer values that meet certain criteria - algorithm

Edited to clarify the application by adding units (ml) and explaining the difficulty to measure wet reagents by units of 1/26. The word 'solution' was ambiguous because it was used to mean both a chemical solution as well as the solution to the problem.
Added results based on Edward's reply
The real world application is that I am trying to determine the closest "convenient" volumes to use when mixing reagents A and B to create a solution (in the wet chemistry sense) that best approximates a specific A:B ratio. Let's define "convenient" as divisible by 5.
Example
Given:
1. X = A/(A+B) * C
2. Y = B/(A+B) * C
3. X + Y = C
4. A, B, C always positive integer
// e.g. a 500ml solution (wet chemistry sense) C with a 1:25 ratio of A and B
A = 1
B = 25
C = 500
This gives the volumes to use of X and Y to create the solution (wet chemistry sense) with the proper A:B ratio.
X = 500/26 = ~19.23ml
Y = 12500/26 = ~480.77ml
C = 13000/26 = 500ml
These are the exact volumes create a total volume of 500ml, but trying to measure reagent volumes in units of 1/26ml is a challenge.
How to find "convenient values" (integer divisible by 5) for X, Y, and C that best approximate the exact values of X, Y, and C that would be multiples of 1/26? In this case I found as the closest "convenient" values for X, Y, C:
X = 20ml
Y = 500ml
C = 520ml
C in this case (520ml) is more than the required volume of 500ml, but it is more practical to physically measure the volumes of 20mL and 500mL than it would be to measure reagent volumes in 1/26ths. The extra 20mL is discarded, the cost for using nice values.
RESULTS BASED ON EDWARD'S ANSWER
A=1 B=25 C=500
X=20 Y=500 C2=520
A=1 B=20 C=500
X=25 Y=500 C2=525
A=1 B=100 C=500
X=5 Y=500 C2=505
A=1 B=75 C=500
X=10 Y=750 C2=760
A=1 B=50 C=900
X=20 Y=1000 C2=1020

One way to approach this would be to adjust C so that it absorbs the factor A+B. Then the ratio of A to B would be exact, and X, Y, and C would all be integers. Let D = 5*(A+B), C2 = ceiling(C/((double)D)) * D (round up so you get enough C), X = C2/(A+B)*A, Y = C2/(A+B)*B. If you want the closest value of C, use C2 = round(C/((double)D))*D instead.
If you're mixing chemicals, you probably want to round up rather than round to closest so you'll have enough with a little waste left over, which is better than not having enough.

You can phrase this as an optimization problem with an L1 (absolute value) objective function. (This is using a cannon to swat a mosquito, but I did it because I wanted to figure out about the L1 optimization.) I used the program glpsol from the GLPK package (open source). Here is my program:
param A, integer, >= 0;
param B, integer, >= 0;
param C, integer, >= 0;
var x, integer, >= 0;
var y, integer, >= 0;
var e1x, >= 0;
var e1y, >= 0;
minimize e1 : e1x + e1y;
subject to
c1 : (5*x - (C*A)/(A + B)) <= e1x;
c2 : ((C*A)/(A + B) - 5*x) <= e1x;
c3 : (5*y - (C*B)/(A + B)) <= e1y;
c4 : ((C*B)/(A + B) - 5*y) <= e1y;
solve;
printf "x=%g, y=%g, error=%g\n", x, y, e1;
data;
param A := 1;
param B := 25;
param C := 500;
Here is the output:
$ glpsol --model find_nice_integers.mod
[... snip ...]
x=4, y=96, error=1.53846
Here are some notes about how to handle absolute values in optimization problems.

So, you are given an integer number C and the ratio p:q between two other integer numbers A and B (i.e., A/B = p/q).
I will interpret your definition of convenient as requiring that X and Y are both multiple of 5 where
X = A / (A+B) * C'
Y = B / (A+B) * C'
C' is close to C
Replacing A/B with p/q we get
X = p / (p+q) * C'
Y = q / (p+q) * C'
Now, in order for X and Y to be integer both p * C' and q * C' must both be multiples of (p+q). And since we can assume that p:q is irreductible (i.e., p and q have no multiples in common) this means that C' must be divisible by p+q. In addition, C'/(p+q) must be multiple of 5. So, C' must be a multiple of 5*(p+q).
The multiple of 5*(p+q) that is closest to C is:
C' := round(C/(5*(p+q)))*5*(p+q)
Now we can calculate:
X := p/(p+q)*C'
Y := q/(p+q)*C'
and they are indeed multiple of 5 because C'/(p+q) is.
Let's see how this behaves with your example:
Inputs:
p = 1
q = 25
C = 500
Then
C' := round(500/5(1+25))*5*(1+25) = round(100/26)*5*26 = 4*5*26 = 520
Hence
X := p/(p+q)*C' = 1/(1+25)*4*5*26 = 1/26*4*5*26 = 4*5 = 20
Y := q/(p+q)*C' = 25/(1+25)*4*5*26 = 25/26*4*5*26 = 25*4*5 = 500.
Voila!

Let's first calculate optimal(float) A and B.
It could be Observed that optimal integer solutions are either {floor(A), ceiling(B)} or {ceiling(A), floor(B)}. So we simply try both and chose the answer with less error.

Related

How can I descale x by n/d, when x*n overflows?

My problem is limited to unsigned integers of 256 bits.
I have a value x, and I need to descale it by the ratio n / d, where n < d.
The simple solution is of course x * n / d, but the problem is that x * n may overflow.
I am looking for any arithmetic trick which may help in reaching a result as accurate as possible.
Dividing each of n and d by gcd(n, d) before calculating x * n / d does not guarantee success.
Is there any process (iterative or other) which i can use in order to solve this problem?
Note that I am willing to settle on an inaccurate solution, but I'd need to be able to estimate the error.
NOTE: Using integer division instead of normal division
Let us suppose
x = ad + b
n = cd + e
Then find a,b,c,e as follows:
a = x/d
b = x%d
c = n/d
e = n%d
Then,
nx/d = acd + ae + bc + be/d
CALCULATING be/d
1. Represent e in binary form
2. Find b/d, 2b/d, 4b/d, 8b/d, ... 256b/d and their remainders
3. Find be/d = b*binary terms + their remainders
Example:
e = 101 in binary = 4+1
be/d = (b/d + 4b/d) + (b%d + 4b%d)/d
FINDING b/d, 2b/d, ... 256b/d
quotient(2*ib/d) = 2*quotient(ib /d) + (2*remainder(ib /d))/d
remainder(2*ib/d) = (2*remainder(ib/d))%d
Executes in O(number of bits)

Writing a vector sum in MATLAB

Suppose I have a function phi(x1,x2)=k1*x1+k2*x2 which I have evaluated over a grid where the grid is a square having boundaries at -100 and 100 in both x1 and x2 axis with some step size say h=0.1. Now I want to calculate this sum over the grid with which I'm struggling:
What I was trying :
clear all
close all
clc
D=1; h=0.1;
D1 = -100;
D2 = 100;
X = D1 : h : D2;
Y = D1 : h : D2;
[x1, x2] = meshgrid(X, Y);
k1=2;k2=2;
phi = k1.*x1 + k2.*x2;
figure(1)
surf(X,Y,phi)
m1=-500:500;
m2=-500:500;
[M1,M2,X1,X2]=ndgrid(m1,m2,X,Y)
sys=#(m1,m2,X,Y) (k1*h*m1+k2*h*m2).*exp((-([X Y]-h*[m1 m2]).^2)./(h^2*D))
sum1=sum(sys(M1,M2,X1,X2))
Matlab says error in ndgrid, any idea how I should code this?
MATLAB shows:
Error using repmat
Requested 10001x1001x2001x2001 (298649.5GB) array exceeds maximum array size preference. Creation of arrays greater
than this limit may take a long time and cause MATLAB to become unresponsive. See array size limit or preference
panel for more information.
Error in ndgrid (line 72)
varargout{i} = repmat(x,s);
Error in new_try1 (line 16)
[M1,M2,X1,X2]=ndgrid(m1,m2,X,Y)
Judging by your comments and your code, it appears as though you don't fully understand what the equation is asking you to compute.
To obtain the value M(x1,x2) at some given (x1,x2), you have to compute that sum over Z2. Of course, using a numerical toolbox such as MATLAB, you could only ever hope to compute over some finite range of Z2. In this case, since (x1,x2) covers the range [-100,100] x [-100,100], and h=0.1, it follows that mh covers the range [-1000, 1000] x [-1000, 1000]. Example: m = (-1000, -1000) gives you mh = (-100, -100), which is the bottom-left corner of your domain. So really, phi(mh) is just phi(x1,x2) evaluated on all of your discretised points.
As an aside, since you need to compute |x-hm|^2, you can treat x = x1 + i x2 as a complex number to make use of MATLAB's abs function. If you were strictly working with vectors, you would have to use norm, which is OK too, but a bit more verbose. Thus, for some given x=(x10, x20), you would compute x-hm over the entire discretised plane as (x10 - x1) + i (x20 - x2).
Finally, you can compute 1 term of M at a time:
D=1; h=0.1;
D1 = -100;
D2 = 100;
X = (D1 : h : D2); % X is in rows (dim 2)
Y = (D1 : h : D2)'; % Y is in columns (dim 1)
k1=2;k2=2;
phi = k1*X + k2*Y;
M = zeros(length(Y), length(X));
for j = 1:length(X)
for i = 1:length(Y)
% treat (x - hm) as a complex number
x_hm = (X(j)-X) + 1i*(Y(i)-Y); % this computes x-hm for all m
M(i,j) = 1/(pi*D) * sum(sum(phi .* exp(-abs(x_hm).^2/(h^2*D)), 1), 2);
end
end
By the way, this computation takes quite a long time. You can consider either increasing h, reducing D1 and D2, or changing all three of them.

Convert Integer to Generic Base Matlab

I'm trying to convert a base-10 integer k into a base-q integer, but not in the standard way. Firstly, I'd like my result to be a vectors (or a string 'a,b,c,...' so that it can be converted to a vector, but not 'abc...'). Most importantly, I'd like each 'digit' to be in base-10. As an example, suppose I have the number 23 (in base-10) and I want to convert it to base-12. This would be 1B in the standard 1,...,9,A,B notation; however, I want it to come out as [1, 11]. I'm only interested in numbers k with 0 \le k \le n^q - 1, where n is fixed in advance.
Put another way, I wish to find coefficients a(r) such that
k = \sum_{r=0}^{n-1} a(r) q^r
where each a(r) is in base-10. (Note that 0 \le a(r) \le q-1.)
I know I could do this with a for-loop -- struggling to get the exact formula at the moment! -- but I want to do it vectorised, or with a fast internal function.
However, I want to be able to take n to be large, so would prefer a faster way than this. (Of course, I could change this to a parfor-loop or do it on the GPU; these aren't practical for my current situation, so I'd prefer a more direct version.)
I've looked at stuff like dec2base, num2str, str2num, base2dec and so on, but with no luck. Any suggestion would be most appreciated.
Regarding speed and space, any preallocation for integers in the range [0, q-1] or similar would also be good.
To be clear, I am looking for an algorithm that works for any q and n, converting any number in the range [0,q^n - 1].
You can use dec2base and replace the characters by numbers:
x = 23;
b = 12;
[~, result] = ismember(dec2base(x,b), ['0':'9' 'A':'Z']);
result = result -1;
gives
>> result
result =
1 11
This works for base up to 36 only, due to dec2base limitations.
For any base (possibly above 36) you need to do the conversion manually. I once wrote a base2base function to do that (it's essentially long division). The number should be input as a vector of digits in the origin base, so you need dec2base(...,10) first. For example:
x = 125;
b = 6;
result = base2base(dec2base(x,10), '0':'9', b); % origin nunber, origin base, target base
gives
result =
3 2 5
Or if you need to specify the number of digits:
x = 125;
b = 6;
d = 5;
result = base2base(dec2base(x,10), '0':'9', b, d)
result =
0 0 3 2 5
EDIT (August 15, 2017): Corrected two bugs: handling of input consisting of all "zeros" (thanks to #Sanchises for noticing), and properly left-padding the output with "zeros" if needed.
function Z = base2base(varargin)
% Three inputs: origin array, origin base, target base
% If a base is specified by a number, say b, the digits are [0,1,...,d-1].
% The base can also be directly an array with the digits
% Fourth input, optional: how many digits the output should have as a
% minimum (padding with leading zeros, i.e with the first digit)
% Non-valid digits in origin array are discarded.
% It works with cell arrays. In this case it gives a matrix in which each
% row is padded with leading zeros if needed
% If the base is specified as a number, digits are numbers, not
% characters as in `dec2base` and `base2dec`
if ~iscell(varargin{1}), varargin{1} = varargin(1); end
if numel(varargin{2})>1, ax = varargin{2}; bx=numel(ax); else bx = varargin{2}; ax = 0:bx-1; end
if numel(varargin{3})>1, az = varargin{3}; bz=numel(az); else bz = varargin{3}; az = 0:bz-1; end
Z = cell(size(varargin{1}));
for c = 1:numel(varargin{1})
x = varargin{1}{c}; [valid, x] = ismember(x,ax); x = x(valid)-1;
if ~isempty(x) && ~any(x) % Non-empty input, all zeros
z = 0;
elseif ~isempty(x) % Non-empty input, at least a nonzero
z = NaN(1,ceil(numel(x)*log2(bx)/log2(bz))); done_outer = false;
n = 0;
while ~done_outer
n = n + 1;
x = [0 x(find(x,1):end)];
y = NaN(size(x)); done_inner = false;
m = 0;
while ~done_inner
m = m + 1;
t = x(1)*bx+x(2);
r = mod(t, bz); q = (t-r)/bz;
y(m) = q; x = [r x(3:end)];
done_inner = numel(x) < 2;
end
y = y(1:m);
z(n) = r; x = y; done_outer = ~any(x);
end
z = z(n:-1:1);
else % Empty input
z = []; % output will be empty (unless user has required left-padding) with the
% appropriate class
end
if numel(varargin)>=4 && numel(z)<varargin{4}, z = [zeros(1,varargin{4}-numel(z)) z]; end
% left-pad if required by user
Z{c} = z;
end
L = max(cellfun(#numel, Z));
Z = cellfun(#(x) [zeros(1, L-numel(x)) x], Z, 'uniformoutput', false); % left-pad so that
% result will be a matrix
Z = vertcat(Z{:});
Z = az(Z+1);
Matlab's internal dec2base command contains essentially what you are asking for.
It actually creates an array of base-10 digits before they are converted to a character array of '0'-'9' and 'A'-'Z' which is the reason for its limitation to bases <= 36.
So after removing the last step of character conversion from dec2base and modifying the error checking accordingly gives the function dec2basevect you were asking for.
The result will be a base-10 vector and you are no longer limited to bases <= 36. The most significant digit will be in index one of this vector. If you need it the other way round, i.e. least significant digit in index one, just do a fliplr to the result.
Due to copyrights by MathWorks, you have to make the necessary modifications to dec2baseon your own.

Correct implementation of The Johnson-Lindenstrauss lemma

I am trying to implement the Johnson-Lindenstrauss lemma. I have search for the pseudocode here but could not get any.
I don't know if I have implemented it correctly or not. I just want you guys who understand the lemma to please check my code for me and advice me as to the correct matlab implementation.
n = 2;
d = 4;
k = 2;
G = rand(n,d);
epsilon = sqrt(log(n)/k);
% Projection in dim k << d
% Defining P (k x d)
P = randn(k,d);
% Projecting down to k-dim
proj = P.*G;
u = proj(:,1);
v = proj(:,2);
% u = P * G(:,5);
% v = P * G(:,36);
norm(G(:,1)-G(:,2))^2 * k * (1-epsilon);
norm(u - v)^2;
norm(G(:,1)-G(:,2))^2 * k * (1+epsilon);
for the first part of that to find the epsilon you need to solve a polynomial equation.
n = 2;
k = 2;
pol1 = [-1/3 1/2 0 4*log2(n)/k];
c = roots(pol1)
1.4654 + 1.4304i
1.4654 - 1.4304i
-1.4308 + 0.0000i
Then you need to remove the complex roots and keep the real one:
epsilon = c(imag(c)==0);
% if there are more than one root with imaginary part equal to 0 then you need to select the smaller one.
now you know that the epsilon should be equal or greater that the result.
For any set of m points in R^N and for k = 20*logm/epsilon^2 and epsilon < 1/2:
1/sqrt(k).*randn(k,N)
obtain Pr[success]>=1-2m^(5*epsilon-3)
An R package is available to perform Random projection using Johnson Lindenstrauss Lemma RandPro

Pseudo number generation

Following is text from Data structure and algorithm analysis by Mark Allen Wessis.
Following x(i+1) should be read as x subscript of i+1, and x(i) should be
read as x subscript i.
x(i + 1) = (a*x(i))mod m.
It is also common to return a random real number in the open interval
(0, 1) (0 and 1 are not possible values); this can be done by
dividing by m. From this, a random number in any closed interval [a,
b] can be computed by normalizing.
The problem with this routine is that the multiplication could
overflow; although this is not an error, it affects the result and
thus the pseudo-randomness. Schrage gave a procedure in which all of
the calculations can be done on a 32-bit machine without overflow. We
compute the quotient and remainder of m/a and define these as q and
r, respectively.
In our case for M=2,147,483,647 A =48,271, q = 127,773, r = 2,836, and r < q.
We have
x(i + 1) = (a*x(i))mod m.---------------------------> Eq 1.
= ax(i) - m (floorof(ax(i)/m)).------------> Eq 2
Also author is mentioning about:
x(i) = q(floor of(x(i)/q)) + (x(i) mod Q).--->Eq 3
My question
what does author mean by random number is computed by normalizing?
How author came with Eq 2 from Eq 1?
How author came with Eq 3?
Normalizing means if you have X ∈ [0,1] and you need to get Y ∈ [a, b] you can compute
Y = a + X * (b - a)
EDIT:
2. Let's suppose
a = 3, x = 5, m = 9
Then we have
where [ax/m] means an integer part.
So we have 15 = [ax/m]*m + 6
We need to get 6. 15 - [ax/m]*m = 6 => ax - [ax/m]*m = 6 => x(i+1) = ax(i) - [ax(i)/m]*m
If you have a random number in the range [0,1], you can get a number in the range [2,5] (for example) by multiplying by 3 and adding 2.

Resources